Patents by Inventor Aren Jansen

Aren Jansen has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Conditioned Separation of Arbitrary Sounds based on Machine Learning Models

Publication number: 20230419989

Abstract: Example methods include receiving training data comprising a plurality of audio clips and a plurality of textual descriptions of audio. The methods include generating a shared representation comprising a joint embedding. An audio embedding of a given audio clip is within a threshold distance of a text embedding of a textual description of the given audio clip. The methods include generating, based on the joint embedding, a conditioning vector and training, based on the conditioning vector, a neural network to: receive (i) an input audio waveform, and (ii) an input comprising one or more of an input textual description of a target audio source in the input audio waveform, or an audio sample of the target audio source, separate audio corresponding to the target audio source from the input audio waveform, and output the separated audio corresponding to the target audio source in response to the receiving of the input.

Type: Application

Filed: June 24, 2022

Publication date: December 28, 2023

Inventors: Beat Gfeller, Kevin Ian Kilgour, Marco Tagliasacchi, Aren Jansen, Scott Thomas Wisdom, Qingqing Huang
Audio-Visual Separation of On-Screen Sounds based on Machine Learning Models

Publication number: 20230386502

Abstract: Apparatus and methods related to separation of audio sources are provided. The method includes receiving an audio waveform associated with a plurality of video frames. The method includes estimating, by a neural network, one or more audio sources associated with the plurality of video frames. The method includes generating, by the neural network, one or more audio embeddings corresponding to the one or more estimated audio sources. The method includes determining, based on the audio embeddings and a video embedding, whether one or more audio sources of the one or more estimated audio sources correspond to objects in the plurality of video frames. The method includes predicting, by the neural network and based on the one or more audio embeddings and the video embedding, a version of the audio waveform comprising audio sources that correspond to objects in the plurality of video frames.

Type: Application

Filed: July 26, 2023

Publication date: November 30, 2023

Inventors: Efthymios Tzinis, Scott Wisdom, Aren Jansen, John R. Hershey
Training machine-learned models for perceptual tasks using biometric data

Patent number: 11823439

Abstract: Generally, the present disclosure is directed to systems and methods that train machine-learned models (e.g., artificial neural networks) to perform perceptual or cognitive task(s) based on biometric data (e.g., brain wave recordings) collected from living organism(s) while the living organism(s) are performing the perceptual or cognitive task(s). In particular, aspects of the present disclosure are directed to a new supervision paradigm, by which machine-learned feature extraction models are trained using example stimuli paired with companion biometric data such as neural activity recordings (e g electroencephalogram data, electrocorticography data, functional near-infrared spectroscopy, and/or magnetoencephalography data) collected from a living organism (e.g., human being) while the organism perceived those examples (e.g., viewing the image, listening to the speech, etc.).

Type: Grant

Filed: January 16, 2020

Date of Patent: November 21, 2023

Assignee: GOOGLE LLC

Inventors: Aren Jansen, Malcolm Slaney
Systems and Methods for Upmixing Audiovisual Data

Publication number: 20230308823

Abstract: A computer-implemented method for upmixing audiovisual data can include obtaining audiovisual data including input audio data and video data accompanying the input audio data. Each frame of the video data can depict only a portion of a larger scene. The input audio data can have a first number of audio channels. The computer-implemented method can include providing the audiovisual data as input to a machine-learned audiovisual upmixing model. The audiovisual upmixing model can include a sequence-to-sequence model configured to model a respective location of one or more audio sources within the larger scene over multiple frames of the video data. The computer-implemented method can include receiving upmixed audio data from the audiovisual upmixing model. The upmixed audio data can have a second number of audio channels. The second number of audio channels can be greater than the first number of audio channels.

Type: Application

Filed: August 26, 2020

Publication date: September 28, 2023

Inventors: Aren Jansen, Manoj Plakal, Dan Ellis, Shawn Hershey, Richard Channing Moore, III
Audio-visual separation of on-screen sounds based on machine learning models

Patent number: 11756570

Abstract: Apparatus and methods related to separation of audio sources are provided. The method includes receiving an audio waveform associated with a plurality of video frames. The method includes estimating, by a neural network, one or more audio sources associated with the plurality of video frames. The method includes generating, by the neural network, one or more audio embeddings corresponding to the one or more estimated audio sources. The method includes determining, based on the audio embeddings and a video embedding, whether one or more audio sources of the one or more estimated audio sources correspond to objects in the plurality of video frames. The method includes predicting, by the neural network and based on the one or more audio embeddings and the video embedding, a version of the audio waveform comprising audio sources that correspond to objects in the plurality of video frames.

Type: Grant

Filed: March 26, 2021

Date of Patent: September 12, 2023

Assignee: Google LLC

Inventors: Efthymios Tzinis, Scott Wisdom, Aren Jansen, John R Hershey
Minimum-example/maximum-batch entropy-based clustering with neural networks

Patent number: 11475236

Abstract: A computing system can include an embedding model and a clustering model. The computing system input each of the plurality of inputs into the embedding model and receiving respective embeddings for the plurality of inputs as outputs of the embedding model. The computing system can input the respective embeddings for the plurality of inputs into the clustering model and receiving respective cluster assignments for the plurality of inputs as outputs of the clustering model. The computing system can evaluate a clustering loss function that evaluates a first average, across the plurality of inputs, of a respective first entropy of each respective probability distribution; and a second entropy of a second average of the probability distributions for the plurality of inputs. The computing system can modify parameter(s) of one or both of the clustering model and the embedding model based on the clustering loss function.

Type: Grant

Filed: May 21, 2020

Date of Patent: October 18, 2022

Assignee: GOOGLE LLC

Inventors: Aren Jansen, Ryan Michael Rifkin, Daniel Ellis
Audio-Visual Separation of On-Screen Sounds Based on Machine Learning Models

Publication number: 20220310113

Abstract: Apparatus and methods related to separation of audio sources are provided. The method includes receiving an audio waveform associated with a plurality of video frames. The method includes estimating, by a neural network, one or more audio sources associated with the plurality of video frames. The method includes generating, by the neural network, one or more audio embeddings corresponding to the one or more estimated audio sources. The method includes determining, based on the audio embeddings and a video embedding, whether one or more audio sources of the one or more estimated audio sources correspond to objects in the plurality of video frames. The method includes predicting, by the neural network and based on the one or more audio embeddings and the video embedding, a version of the audio waveform comprising audio sources that correspond to objects in the plurality of video frames.

Type: Application

Filed: March 26, 2021

Publication date: September 29, 2022

Inventors: Efthymios Tzinis, Scott Wisdom, Aren Jansen, John R. Hershey
Unsupervised learning of semantic audio representations

Patent number: 11335328

Abstract: Methods are provided for generating training triplets that can be used to train multidimensional embeddings to represent the semantic content of non-speech sounds present in a corpus of audio recordings. These training triplets can be used with a triplet loss function to train the multidimensional embeddings such that the embeddings can be used to cluster the contents of a corpus of audio recordings, to facilitate a query-by-example lookup from the corpus, to allow a small number of manually-labeled audio recordings to be generalized, or to facilitate some other audio classification task. The triplet sampling methods may be used individually or collectively, and each represent a respective heuristic about the semantic structure of audio recordings.

Type: Grant

Filed: October 26, 2018

Date of Patent: May 17, 2022

Assignee: Google LLC

Inventors: Aren Jansen, Manoj Plakal, Richard Channing Moore, Shawn Hershey, Ratheet Pandya, Ryan Rifkin, Jiayang Liu, Daniel Ellis
Training Machine-Learned Models for Perceptual Tasks Using Biometric Data

Publication number: 20220130134

Abstract: Generally, the present disclosure is directed to systems and methods that train machine-learned models (e.g., artificial neural networks) to perform perceptual or cognitive task(s) based on biometric data (e.g., brain wave recordings) collected from living organism(s) while the living organism(s) are performing the perceptual or cognitive task(s). In particular, aspects of the present disclosure are directed to a new supervision paradigm, by which machine-learned feature extraction models are trained using example stimuli paired with companion biometric data such as neural activity recordings (e g electroencephalogram data, electrocorticography data, functional near-infrared spectroscopy, and/or magnetoencephalography data) collected from a living organism (e.g., human being) while the organism perceived those examples (e.g., viewing the image, listening to the speech, etc.).

Type: Application

Filed: January 16, 2020

Publication date: April 28, 2022

Inventors: Aren Jansen, Malcolm Slaney
Methods and Systems for Implementing On-Device Non-Semantic Representation Fine-Tuning for Speech Classification

Publication number: 20220059117

Abstract: Examples relate to on-device non-semantic representation fine-tuning for speech classification. A computing system may obtain audio data having a speech portion and train a neural network to learn a non-semantic speech representation based on the speech portion of the audio data. The computing system may evaluate performance of the non-semantic speech representation based on a set of benchmark tasks corresponding to a speech domain and perform a fine-tuning process on the non-semantic speech representation based on one or more downstream tasks. The computing system may further generate a model based on the non-semantic representation and provide the model to a mobile computing device. The model is configured to operate locally on the mobile computing device.

Type: Application

Filed: August 24, 2020

Publication date: February 24, 2022

Inventors: Joel Shor, Ronnie Maor, Oran Lang, Omry Tuval, Marco Tagliasacchi, Ira Shavitt, Felix de Chaumont Quitry, Dotan Emanuel, Aren Jansen
System and Method for Generating Diagnostic Health Information Using Deep Learning and Sound Understanding

Publication number: 20210361227

Abstract: The present disclosure provides systems and methods that generating health diagnostic information from an audio recording. A computing system can include a machine-learned health model comprising that includes a sound model trained to receive data descriptive of a patient audio recording and output sound description data. The computing system can include a diagnostic model trained to receive the sound description data and output a diagnostic score. The computing system can include at least one tangible, non-transitory computer-readable medium that stores instructions that, when executed, cause the processor to perform operations. The operations can include obtaining the patient audio recording; inputting data descriptive of the patient audio recording into the sound model; receiving, as an output of the sound model, the sound description data; inputting the sound description data into the diagnostic model; and receiving, as an output of the diagnostic model, the diagnostic score.

Type: Application

Filed: May 4, 2018

Publication date: November 25, 2021

Inventors: Katherine Chou, Michael Dwight Howell, Kasumi Widner, Ryan Rifkin, Henry George Wei, Daniel Ellis, Alvin Rajkomar, Aren Jansen, David Michael Parish, Michael Philip Brenner
Minimum-Example/Maximum-Batch Entropy-Based Clustering with Neural Networks

Publication number: 20200372295

Abstract: A computing system can include an embedding model and a clustering model. The computing system input each of the plurality of inputs into the embedding model and receiving respective embeddings for the plurality of inputs as outputs of the embedding model. The computing system can input the respective embeddings for the plurality of inputs into the clustering model and receiving respective cluster assignments for the plurality of inputs as outputs of the clustering model. The computing system can evaluate a clustering loss function that evaluates a first average, across the plurality of inputs, of a respective first entropy of each respective probability distribution; and a second entropy of a second average of the probability distributions for the plurality of inputs. The computing system can modify parameter(s) of one or both of the clustering model and the embedding model based on the clustering loss function.

Type: Application

Filed: May 21, 2020

Publication date: November 26, 2020

Inventors: Aren Jansen, Ryan Michael Rifkin, Daniel Ellis
Unsupervised Learning of Semantic Audio Representations

Publication number: 20200349921

Abstract: Methods are provided for generating training triplets that can be used to train multidimensional embeddings to represent the semantic content of non-speech sounds present in a corpus of audio recordings. These training triplets can be used with a triplet loss function to train the multidimensional embeddings such that the embeddings can be used to cluster the contents of a corpus of audio recordings, to facilitate a query-by-example lookup from the corpus, to allow a small number of manually-labeled audio recordings to be generalized, or to facilitate some other audio classification task. The triplet sampling methods may be used individually or collectively, and each represent a respective heuristic about the semantic structure of audio recordings.

Type: Application

Filed: October 26, 2018

Publication date: November 5, 2020

Inventors: Aren Jansen, Manoj Plakal, Richard Channing Moore, Shawn Hershey, Ratheet Pandya, Ryan Rifkin, Jiayang Liu, Daniel Ellis
Filtering wind noises in video content

Patent number: 10356469

Abstract: Implementations disclose filtering wind noises in video content. A method includes receiving video content comprising an audio component and a video component, detecting, by a processing device, occurrence of a wind noise artifact in a segment of the audio component, identifying an intensity of the wind noise artifact, wherein the intensity is based on a signal-to-noise ratio of the wind noise artifact, selecting, by the processing device, a wind noise replacement operation based on the identified intensity of the wind noise artifact, and applying, by the processing device, the selected wind noise replacement operation to the segment of the audio component to remove the wind noise artifact from the segment.

Type: Grant

Filed: November 29, 2017

Date of Patent: July 16, 2019

Assignee: Google LLC

Inventors: Elad Eban, Aren Jansen, Sourish Chaudhuri
FILTERING WIND NOISES IN VIDEO CONTENT

Publication number: 20180084301

Abstract: Implementations disclose filtering wind noises in video content. A method includes receiving video content comprising an audio component and a video component, detecting, by a processing device, occurrence of a wind noise artifact in a segment of the audio component, identifying an intensity of the wind noise artifact, wherein the intensity is based on a signal-to-noise ratio of the wind noise artifact, selecting, by the processing device, a wind noise replacement operation based on the identified intensity of the wind noise artifact, and applying, by the processing device, the selected wind noise replacement operation to the segment of the audio component to remove the wind noise artifact from the segment.

Type: Application

Filed: November 29, 2017

Publication date: March 22, 2018

Inventors: Elad Eban, Aren Jansen, Sourish Chaudhuri
Filtering wind noises in video content

Patent number: 9838737

Abstract: Implementations disclose filtering wind noises in video content. A method includes receiving video content comprising an audio component and a video component, detecting, by a processing device, occurrence of a wind noise artifact in a segment of the audio component, identifying duration of the wind noise artifact and intensity of the wind noise artifact, selecting, by the processing device, a wind noise replacement operation based on the identified duration and intensity of the wind noise artifact, and applying, by the processing device, the selected wind noise replacement operation to the segment of the audio component to remove the wind noise artifact from the segment.

Type: Grant

Filed: May 5, 2016

Date of Patent: December 5, 2017

Assignee: Google Inc.

Inventors: Elad Eban, Aren Jansen, Sourish Chaudhuri
FILTERING WIND NOISES IN VIDEO CONTENT

Publication number: 20170324990

Abstract: Implementations disclose filtering wind noises in video content. A method includes receiving video content comprising an audio component and a video component, detecting, by a processing device, occurrence of a wind noise artifact in a segment of the audio component, identifying duration of the wind noise artifact and intensity of the wind noise artifact, selecting, by the processing device, a wind noise replacement operation based on the identified duration and intensity of the wind noise artifact, and applying, by the processing device, the selected wind noise replacement operation to the segment of the audio component to remove the wind noise artifact from the segment.

Type: Application

Filed: May 5, 2016

Publication date: November 9, 2017

Inventors: Elad Eban, Aren Jansen, Sourish Chaudhuri
System and method for processing speech to identify keywords or other information

Patent number: 9799333

Abstract: A system and method are provided for performing speech processing. A system includes an audio detection system configured to receive a signal including speech and a memory having stored therein a database of keyword models forming an ensemble of filters associated with each keyword in the database. A processor is configured to receive the signal including speech from the audio detection system, decompose the signal including speech into a sparse set of phonetic impulses, and access the database of keywords and convolve the sparse set of phonetic impulses with the ensemble of filters. The processor is further configured to identify keywords within the signal including speech based a result of the convolution and control operation the electronic system based on the keywords identified.

Type: Grant

Filed: August 31, 2015

Date of Patent: October 24, 2017

Assignee: The Johns Hopkins University

Inventors: Keith Kintzley, Aren Jansen, Hynek Hermansky, Kenneth Church
System and Method for Processing Speech to Identify Keywords or Other Information

Publication number: 20150371635

Abstract: A system and method are provided for performing speech processing. A system includes an audio detection system configured to receive a signal including speech and a memory having stored therein a database of keyword models forming an ensemble of filters associated with each keyword in the database. A processor is configured to receive the signal including speech from the audio detection system, decompose the signal including speech into a sparse set of phonetic impulses, and access the database of keywords and convolve the sparse set of phonetic impulses with the ensemble of filters. The processor is further configured to identify keywords within the signal including speech based a result of the convolution and control operation the electronic system based on the keywords identified.

Type: Application

Filed: August 31, 2015

Publication date: December 24, 2015

Inventors: Keith Kintzley, Aren Jansen, Hynek Hermansky, Kenneth Church
System and method for processing speech to identify keywords or other information

Patent number: 9177547

Abstract: A system and method are provided for performing speech processing. A system includes an audio detection system configured to receive a signal including speech and a memory having stored therein a database of keyword models forming an ensemble of filters associated with each keyword in the database. A processor is configured to receive the signal including speech from the audio detection system, decompose the signal including speech into a sparse set of phonetic impulses, and access the database of keywords and convolve the sparse set of phonetic impulses with the ensemble of filters. The processor is further configured to identify keywords within the signal including speech based a result of the convolution and control operation the electronic system based on the keywords identified.

Type: Grant

Filed: June 25, 2013

Date of Patent: November 3, 2015

Assignee: THE JOHNS HOPKINS UNIVERSITY

Inventors: Keith Kintzley, Aren Jansen, Hynek Hermansky, Kenneth Church

1 2 next