Patents by Inventor Efthymios Tzinis

Efthymios Tzinis has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Audio-Visual Separation of On-Screen Sounds based on Machine Learning Models

Publication number: 20230386502

Abstract: Apparatus and methods related to separation of audio sources are provided. The method includes receiving an audio waveform associated with a plurality of video frames. The method includes estimating, by a neural network, one or more audio sources associated with the plurality of video frames. The method includes generating, by the neural network, one or more audio embeddings corresponding to the one or more estimated audio sources. The method includes determining, based on the audio embeddings and a video embedding, whether one or more audio sources of the one or more estimated audio sources correspond to objects in the plurality of video frames. The method includes predicting, by the neural network and based on the one or more audio embeddings and the video embedding, a version of the audio waveform comprising audio sources that correspond to objects in the plurality of video frames.

Type: Application

Filed: July 26, 2023

Publication date: November 30, 2023

Inventors: Efthymios Tzinis, Scott Wisdom, Aren Jansen, John R. Hershey
Method and System for Target Source Separation

Publication number: 20230326478

Abstract: Embodiments of the present disclosure disclose a system and method for extraction of a target sound signal. The system collects collect a mixture of sound signals. The system selects a query identifying the target sound signal to be extracted from the mixture of sound signals, the query comprising one or more identifiers. Each identifier is present in a predetermined set of one or more identifiers and defines at least one of mutually inclusive and mutually exclusive characteristics of the mixture of sound signals. The system determined one or more logical operators connecting the extracted one or more identifiers. The system transforms the one or more identifiers and the extracted logical operators into a digital representation. The system executes a neural network trained to extract the target sound signal by mixing the digital representation with intermediate outputs of intermediate layers of the neural network.

Type: Application

Filed: October 9, 2022

Publication date: October 12, 2023

Inventors: Gordon Wichern, Efthymios Tzinis, Aswin Shanmugam Subramanian, Jonathan Le Roux
Audio-visual separation of on-screen sounds based on machine learning models

Patent number: 11756570

Abstract: Apparatus and methods related to separation of audio sources are provided. The method includes receiving an audio waveform associated with a plurality of video frames. The method includes estimating, by a neural network, one or more audio sources associated with the plurality of video frames. The method includes generating, by the neural network, one or more audio embeddings corresponding to the one or more estimated audio sources. The method includes determining, based on the audio embeddings and a video embedding, whether one or more audio sources of the one or more estimated audio sources correspond to objects in the plurality of video frames. The method includes predicting, by the neural network and based on the one or more audio embeddings and the video embedding, a version of the audio waveform comprising audio sources that correspond to objects in the plurality of video frames.

Type: Grant

Filed: March 26, 2021

Date of Patent: September 12, 2023

Assignee: Google LLC

Inventors: Efthymios Tzinis, Scott Wisdom, Aren Jansen, John R Hershey
Audio-Visual Separation of On-Screen Sounds Based on Machine Learning Models

Publication number: 20220310113

Abstract: Apparatus and methods related to separation of audio sources are provided. The method includes receiving an audio waveform associated with a plurality of video frames. The method includes estimating, by a neural network, one or more audio sources associated with the plurality of video frames. The method includes generating, by the neural network, one or more audio embeddings corresponding to the one or more estimated audio sources. The method includes determining, based on the audio embeddings and a video embedding, whether one or more audio sources of the one or more estimated audio sources correspond to objects in the plurality of video frames. The method includes predicting, by the neural network and based on the one or more audio embeddings and the video embedding, a version of the audio waveform comprising audio sources that correspond to objects in the plurality of video frames.

Type: Application

Filed: March 26, 2021

Publication date: September 29, 2022

Inventors: Efthymios Tzinis, Scott Wisdom, Aren Jansen, John R. Hershey

Audio-Visual Separation of On-Screen Sounds based on Machine Learning Models

Method and System for Target Source Separation

Audio-visual separation of on-screen sounds based on machine learning models

Audio-Visual Separation of On-Screen Sounds Based on Machine Learning Models