Patents by Inventor Oran Lang
Oran Lang has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20240428816Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for audio-visual speech separation. A method includes: receiving, by a user device, a first indication of one or more first speakers visible in a current view recorded by a camera of the user device, in response, generating a respective isolated speech signal for each of the one or more first speakers that isolates speech of the first speaker in the current view and sending the isolated speech signals for each of the one or more first speakers to a listening device operatively coupled to the user device, receiving, by the user device, a second indication of one or more second speakers visible in the current view recorded by the camera of the user device, and in response generating and sending a respective isolated speech signal for each of the one or more second speakers to the listening device.Type: ApplicationFiled: August 7, 2024Publication date: December 26, 2024Inventors: Anatoly Efros, Noam Etzion-Rosenberg, Tal Remez, Oran Lang, Inbar Mosseri, Israel Or Weinstein, Benjamin Schlesinger, Michael Rubinstein, Ariel Ephrat, Yukun Zhu, Stella Laurenzo, Amit Pitaru, Yossi Matias
-
Publication number: 20240355017Abstract: Methods and systems for editing an image are disclosed herein. The method includes receiving an input image and a target text, the target text indicating a desired edit for the input image and obtaining, by the computing system, a target text embedding based on the target text. The method also includes obtaining, by the computing system, an optimized text embedding based on the target text embedding and the input image and fine-tuning, by the computing system, a diffusion model based on the optimized text embedding. The method can further include interpolating, by the computing system, the target text embedding and the optimized text embedding to obtain an interpolated embedding and generating, by the computing system, an edited image including the desired edit using the diffusion model based on the input image and the interpolated embedding.Type: ApplicationFiled: April 18, 2023Publication date: October 24, 2024Inventors: Shiran Elyahu Zada, Bahjat Kawar, Oran Lang, Omer Tov, Huiwen Chang, Tali Dekel, Inbar Mosseri
-
Patent number: 12073844Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for audio-visual speech separation. A method includes: receiving, by a user device, a first indication of one or more first speakers visible in a current view recorded by a camera of the user device, in response, generating a respective isolated speech signal for each of the one or more first speakers that isolates speech of the first speaker in the current view and sending the isolated speech signals for each of the one or more first speakers to a listening device operatively coupled to the user device, receiving, by the user device, a second indication of one or more second speakers visible in the current view recorded by the camera of the user device, and in response generating and sending a respective isolated speech signal for each of the one or more second speakers to the listening device.Type: GrantFiled: October 1, 2020Date of Patent: August 27, 2024Assignee: Google LLCInventors: Anatoly Efros, Noam Etzion-Rosenberg, Tal Remez, Oran Lang, Inbar Mosseri, Israel Or Weinstein, Benjamin Schlesinger, Michael Rubinstein, Ariel Ephrat, Yukun Zhu, Stella Laurenzo, Amit Pitaru, Yossi Matias
-
Patent number: 11996116Abstract: Examples relate to on-device non-semantic representation fine-tuning for speech classification. A computing system may obtain audio data having a speech portion and train a neural network to learn a non-semantic speech representation based on the speech portion of the audio data. The computing system may evaluate performance of the non-semantic speech representation based on a set of benchmark tasks corresponding to a speech domain and perform a fine-tuning process on the non-semantic speech representation based on one or more downstream tasks. The computing system may further generate a model based on the non-semantic representation and provide the model to a mobile computing device. The model is configured to operate locally on the mobile computing device.Type: GrantFiled: August 24, 2020Date of Patent: May 28, 2024Assignee: Google LLCInventors: Joel Shor, Ronnie Maor, Oran Lang, Omry Tuval, Marco Tagliasacchi, Ira Shavitt, Felix de Chaumont Quitry, Dotan Emanuel, Aren Jansen
-
Patent number: 11894014Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for audio-visual speech separation. A method includes: obtaining, for each frame in a stream of frames from a video in which faces of one or more speakers have been detected, a respective per-frame face embedding of the face of each speaker; processing, for each speaker, the per-frame face embeddings of the face of the speaker to generate visual features for the face of the speaker; obtaining a spectrogram of an audio soundtrack for the video; processing the spectrogram to generate an audio embedding for the audio soundtrack; combining the visual features for the one or more speakers and the audio embedding for the audio soundtrack to generate an audio-visual embedding for the video; determining a respective spectrogram mask for each of the one or more speakers; and determining a respective isolated speech spectrogram for each speaker.Type: GrantFiled: September 22, 2022Date of Patent: February 6, 2024Assignee: Google LLCInventors: Inbar Mosseri, Michael Rubinstein, Ariel Ephrat, William Freeman, Oran Lang, Kevin William Wilson, Tali Dekel, Avinatan Hassidim
-
Publication number: 20230267942Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for audio-visual speech separation. A method includes: receiving, by a user device, a first indication of one or more first speakers visible in a current view recorded by a camera of the user device, in response, generating a respective isolated speech signal for each of the one or more first speakers that isolates speech of the first speaker in the current view and sending the isolated speech signals for each of the one or more first speakers to a listening device operatively coupled to the user device, receiving, by the user device, a second indication of one or more second speakers visible in the current view recorded by the camera of the user device, and in response generating and sending a respective isolated speech signal for each of the one or more second speakers to the listening device.Type: ApplicationFiled: October 1, 2020Publication date: August 24, 2023Inventors: Anatoly Efros, Noam Etzion-Rosenberg, Tal Remez, Oran Lang, Inbar Mosseri, Israel Or Weinstein, Benjamin Schlesinger, Michael Rubinstein, Ariel Ephrat, Yukun Zhu, Stella Laurenzo, Amit Pitaru, Yossi Matias
-
Publication number: 20230122905Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for audio-visual speech separation. A method includes: obtaining, for each frame in a stream of frames from a video in which faces of one or more speakers have been detected, a respective per-frame face embedding of the face of each speaker; processing, for each speaker, the per-frame face embeddings of the face of the speaker to generate visual features for the face of the speaker; obtaining a spectrogram of an audio soundtrack for the video; processing the spectrogram to generate an audio embedding for the audio soundtrack; combining the visual features for the one or more speakers and the audio embedding for the audio soundtrack to generate an audio-visual embedding for the video; determining a respective spectrogram mask for each of the one or more speakers; and determining a respective isolated speech spectrogram for each speaker.Type: ApplicationFiled: September 22, 2022Publication date: April 20, 2023Inventors: Inbar Mosseri, Michael Rubinstein, Ariel Ephrat, William Freeman, Oran Lang, Kevin William Wilson, Tali Dekel, Avinatan Hassidim
-
Patent number: 11456005Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for audio-visual speech separation. A method includes: obtaining, for each frame in a stream of frames from a video in which faces of one or more speakers have been detected, a respective per-frame face embedding of the face of each speaker; processing, for each speaker, the per-frame face embeddings of the face of the speaker to generate visual features for the face of the speaker; obtaining a spectrogram of an audio soundtrack for the video; processing the spectrogram to generate an audio embedding for the audio soundtrack; combining the visual features for the one or more speakers and the audio embedding for the audio soundtrack to generate an audio-visual embedding for the video; determining a respective spectrogram mask for each of the one or more speakers; and determining a respective isolated speech spectrogram for each speaker.Type: GrantFiled: November 21, 2018Date of Patent: September 27, 2022Assignee: Google LLCInventors: Inbar Mosseri, Michael Rubinstein, Ariel Ephrat, William Freeman, Oran Lang, Kevin William Wilson, Tali Dekel, Avinatan Hassidim
-
Publication number: 20220059117Abstract: Examples relate to on-device non-semantic representation fine-tuning for speech classification. A computing system may obtain audio data having a speech portion and train a neural network to learn a non-semantic speech representation based on the speech portion of the audio data. The computing system may evaluate performance of the non-semantic speech representation based on a set of benchmark tasks corresponding to a speech domain and perform a fine-tuning process on the non-semantic speech representation based on one or more downstream tasks. The computing system may further generate a model based on the non-semantic representation and provide the model to a mobile computing device. The model is configured to operate locally on the mobile computing device.Type: ApplicationFiled: August 24, 2020Publication date: February 24, 2022Inventors: Joel Shor, Ronnie Maor, Oran Lang, Omry Tuval, Marco Tagliasacchi, Ira Shavitt, Felix de Chaumont Quitry, Dotan Emanuel, Aren Jansen
-
Publication number: 20200335121Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for audio-visual speech separation. A method includes: obtaining, for each frame in a stream of frames from a video in which faces of one or more speakers have been detected, a respective per-frame face embedding of the face of each speaker; processing, for each speaker, the per-frame face embeddings of the face of the speaker to generate visual features for the face of the speaker; obtaining a spectrogram of an audio soundtrack for the video; processing the spectrogram to generate an audio embedding for the audio soundtrack; combining the visual features for the one or more speakers and the audio embedding for the audio soundtrack to generate an audio-visual embedding for the video; determining a respective spectrogram mask for each of the one or more speakers; and determining a respective isolated speech spectrogram for each speaker.Type: ApplicationFiled: November 21, 2018Publication date: October 22, 2020Inventors: Inbar Mosseri, Michael Rubinstein, Ariel Ephrat, William Freeman, Oran Lang, Kevin William Wilson, Tali Dekel, Avinatan Hassidim