Patents by Inventor John R. Hershey

John R. Hershey has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Audio-Visual Separation of On-Screen Sounds Based on Machine Learning Models

Publication number: 20250149058

Abstract: Apparatus and methods related to separation of audio sources are provided. The method includes receiving an audio waveform associated with a plurality of video frames. The method includes estimating, by a neural network, one or more audio sources associated with the plurality of video frames. The method includes generating, by the neural network, one or more audio embeddings corresponding to the one or more estimated audio sources. The method includes determining, based on the audio embeddings and a video embedding, whether one or more audio sources of the one or more estimated audio sources correspond to objects in the plurality of video frames. The method includes predicting, by the neural network and based on the one or more audio embeddings and the video embedding, a version of the audio waveform comprising audio sources that correspond to objects in the plurality of video frames.

Type: Application

Filed: January 9, 2025

Publication date: May 8, 2025

Inventors: Efthymios Tzinis, Scott Wisdom, Aren Jansen, John R. Hershey
Audio-visual separation of on-screen sounds based on machine learning models

Patent number: 12217768

Abstract: Apparatus and methods related to separation of audio sources are provided. The method includes receiving an audio waveform associated with a plurality of video frames. The method includes estimating, by a neural network, one or more audio sources associated with the plurality of video frames. The method includes generating, by the neural network, one or more audio embeddings corresponding to the one or more estimated audio sources. The method includes determining, based on the audio embeddings and a video embedding, whether one or more audio sources of the one or more estimated audio sources correspond to objects in the plurality of video frames. The method includes predicting, by the neural network and based on the one or more audio embeddings and the video embedding, a version of the audio waveform comprising audio sources that correspond to objects in the plurality of video frames.

Type: Grant

Filed: July 26, 2023

Date of Patent: February 4, 2025

Assignee: Google LLC

Inventors: Efthymios Tzinis, Scott Wisdom, Aren Jansen, John R. Hershey
Audio-Visual Separation of On-Screen Sounds based on Machine Learning Models

Publication number: 20230386502

Abstract: Apparatus and methods related to separation of audio sources are provided. The method includes receiving an audio waveform associated with a plurality of video frames. The method includes estimating, by a neural network, one or more audio sources associated with the plurality of video frames. The method includes generating, by the neural network, one or more audio embeddings corresponding to the one or more estimated audio sources. The method includes determining, based on the audio embeddings and a video embedding, whether one or more audio sources of the one or more estimated audio sources correspond to objects in the plurality of video frames. The method includes predicting, by the neural network and based on the one or more audio embeddings and the video embedding, a version of the audio waveform comprising audio sources that correspond to objects in the plurality of video frames.

Type: Application

Filed: July 26, 2023

Publication date: November 30, 2023

Inventors: Efthymios Tzinis, Scott Wisdom, Aren Jansen, John R. Hershey
Audio-visual separation of on-screen sounds based on machine learning models

Patent number: 11756570

Abstract: Apparatus and methods related to separation of audio sources are provided. The method includes receiving an audio waveform associated with a plurality of video frames. The method includes estimating, by a neural network, one or more audio sources associated with the plurality of video frames. The method includes generating, by the neural network, one or more audio embeddings corresponding to the one or more estimated audio sources. The method includes determining, based on the audio embeddings and a video embedding, whether one or more audio sources of the one or more estimated audio sources correspond to objects in the plurality of video frames. The method includes predicting, by the neural network and based on the one or more audio embeddings and the video embedding, a version of the audio waveform comprising audio sources that correspond to objects in the plurality of video frames.

Type: Grant

Filed: March 26, 2021

Date of Patent: September 12, 2023

Assignee: Google LLC

Inventors: Efthymios Tzinis, Scott Wisdom, Aren Jansen, John R Hershey
Audio-Visual Separation of On-Screen Sounds Based on Machine Learning Models

Publication number: 20220310113

Abstract: Apparatus and methods related to separation of audio sources are provided. The method includes receiving an audio waveform associated with a plurality of video frames. The method includes estimating, by a neural network, one or more audio sources associated with the plurality of video frames. The method includes generating, by the neural network, one or more audio embeddings corresponding to the one or more estimated audio sources. The method includes determining, based on the audio embeddings and a video embedding, whether one or more audio sources of the one or more estimated audio sources correspond to objects in the plurality of video frames. The method includes predicting, by the neural network and based on the one or more audio embeddings and the video embedding, a version of the audio waveform comprising audio sources that correspond to objects in the plurality of video frames.

Type: Application

Filed: March 26, 2021

Publication date: September 29, 2022

Inventors: Efthymios Tzinis, Scott Wisdom, Aren Jansen, John R. Hershey
System and method for multichannel end-to-end speech recognition

Patent number: 11133011

Abstract: A speech recognition system includes a plurality of microphones to receive acoustic signals including speech signals, an input interface to generate multichannel inputs from the acoustic signals, one or more storages to store a multichannel speech recognition network, wherein the multichannel speech recognition network comprises mask estimation networks to generate time-frequency masks from the multichannel inputs, a beamformer network trained to select a reference channel input from the multichannel inputs using the time-frequency masks and generate an enhanced speech dataset based on the reference channel input and an encoder-decoder network trained to transform the enhanced speech dataset into a text. The system further includes one or more processors, using the multichannel speech recognition network in association with the one or more storages, to generate the text from the multichannel inputs, and an output interface to render the text.

Type: Grant

Filed: October 3, 2017

Date of Patent: September 28, 2021

Assignee: Mitsubishi Electric Research Laboratories, Inc.

Inventors: Shinji Watanabe, Tsubasa Ochiai, Takaaki Hori, John R Hershey
Methods and systems for end-to-end speech separation with unfolded iterative phase reconstruction

Patent number: 10529349

Abstract: Systems and methods for an audio signal processing system for transforming an input audio signal. A processor implements steps of a module by inputting an input audio signal into a spectrogram estimator to extract an audio feature sequence, and process the audio feature sequence to output a set of estimated spectrograms. Processing the set of estimated spectrograms and the audio feature sequence using a spectrogram refinement module, to output a set of refined spectrograms. Wherein the processing of the spectrogram refinement module is based on an iterative reconstruction algorithm. Processing the set of refined spectrograms for the one or more target audio signals using a signal refinement module, to obtain the target audio signal estimates. An output interface to output the optimized target audio signal estimates. Wherein the module is optimized by minimizing an error using an optimizer stored in the memory.

Type: Grant

Filed: May 18, 2018

Date of Patent: January 7, 2020

Assignee: Mitsubishi Electric Research Laboratories, Inc.

Inventors: Jonathan Le Roux, John R Hershey, Zhongqiu Wang, Gordon P Wichern
Methods and Systems for End-to-End Speech Separation with Unfolded Iterative Phase Reconstruction

Publication number: 20190318754

Abstract: Systems and methods for an audio signal processing system for transforming an input audio signal. A processor implements steps of a module by inputting an input audio signal into a spectrogram estimator to extract an audio feature sequence, and process the audio feature sequence to output a set of estimated spectrograms. Processing the set of estimated spectrograms and the audio feature sequence using a spectrogram refinement module, to output a set of refined spectrograms. Wherein the processing of the spectrogram refinement module is based on an iterative reconstruction algorithm. Processing the set of refined spectrograms for the one or more target audio signals using a signal refinement module, to obtain the target audio signal estimates. An output interface to output the optimized target audio signal estimates. Wherein the module is optimized by minimizing an error using an optimizer stored in the memory.

Type: Application

Filed: May 18, 2018

Publication date: October 17, 2019

Inventors: Jonathan Le Roux, John R Hershey, Zhongqiu Wang, Gordon P Wichern
System and Method for Multichannel End-to-End Speech Recognition

Publication number: 20180261225

Abstract: A speech recognition system includes a plurality of microphones to receive acoustic signals including speech signals, an input interface to generate multichannel inputs from the acoustic signals, one or more storages to store a multichannel speech recognition network, wherein the multichannel speech recognition network comprises mask estimation networks to generate time-frequency masks from the multichannel inputs, a beamformer network trained to select a reference channel input from the multichannel inputs using the time-frequency masks and generate an enhanced speech dataset based on the reference channel input and an encoder-decoder network trained to transform the enhanced speech dataset into a text. The system further includes one or more processors, using the multichannel speech recognition network in association with the one or more storages, to generate the text from the multichannel inputs, and an output interface to render the text.

Type: Application

Filed: October 3, 2017

Publication date: September 13, 2018

Inventors: Shinji Watanabe, Tsubasa Ochiai, Takaaki Hori, John R. Hershey
Statistical voice dialog system and method

Patent number: 9837075

Abstract: A method for processing a voice command using a statistical dialog model determines a belief state as a probability distribution over states organized in a hierarchy with a parent-child relationship of nodes representing the states. The belief state includes the hierarchy of state variables defining probabilities of each state to correspond to the voice command and a probability of a state of a child node in the hierarchy is conditioned on a probability of a state of a corresponding parent node. A system action is selected based on the belief state.

Type: Grant

Filed: February 10, 2014

Date of Patent: December 5, 2017

Assignee: Mitsubishi Electric Research Laboratories, Inc.

Inventors: Shinji Watanabe, John R. Hershey
Source signal separation by discriminatively-trained non-negative matrix factorization

Patent number: 9679559

Abstract: A method estimates source signals from a mixture of source signals by first training an analysis model and a reconstruction model using training data. The analysis model is applied to the mixture of source signals to obtain an analysis representation of the mixture of source signals, and the reconstruction model is applied to the analysis representation to obtain an estimate of the source signals, wherein the analysis model utilizes an analysis linear basis representation, and the reconstruction model utilizes a reconstruction linear basis representation.

Type: Grant

Filed: May 29, 2014

Date of Patent: June 13, 2017

Assignee: Mitsubishi Electric Research Laboratories, Inc.

Inventors: Jonathan Le Roux, John R. Hershey, Felix Weninger, Shinji Watanabe
Flat-panel acoustic apparatus

Patent number: 9661414

Abstract: In an acoustic apparatus, an acoustic transducer is arranged in a substrate. Multiple acoustic pathways in the substrate have predetermined lengths, wherein a proximal end of each pathway forms an opening in a front surface of the substrate, and a distal end terminates at the acoustic transducer. The predetermined lengths of the acoustic pathways are designed to form an acoustic spatial filter that selectively passes acoustic signals from or to different locations. The transducer can convert electric energy to acoustic energy when the apparatus operates as a speaker, or the the transducer can convert acoustic energy to electric energy and operate as a microphone.

Type: Grant

Filed: June 10, 2015

Date of Patent: May 23, 2017

Assignee: Mitsubishi Electric Research Laboratories, Inc.

Inventors: Jonathan Le Roux, John R Hershey, William S. Yerazunis, Petros T Boufounos, Laurent Daudet
Method for processing speech signals using an ensemble of speech enhancement procedures

Patent number: 9601130

Abstract: A method processes an acoustic signal that is a mixture of a target signal and interfering signals by first enhancing the acoustic signal by a set of enhancement procedures to produce a set of initial enhanced signals. Then, an ensemble learning procedure is applied to the acoustic signal and the set of initial enhancement signals to produce features of the acoustic signal.

Type: Grant

Filed: July 18, 2013

Date of Patent: March 21, 2017

Assignee: Mitsubishi Electric Research Laboratories, Inc.

Inventors: Jonathan Le Roux, Shinji Watanabe, John R Hershey
Flat-Panel Acoustic Apparatus

Publication number: 20160366511

Abstract: In an acoustic apparatus, an acoustic transducer is arranged in a substrate. Multiple acoustic pathways in the substrate have predetermined lengths, wherein a proximal end of each pathway forms an opening in a front surface of the substrate, and a distal end terminates at the acoustic transducer. The predetermined lengths of the acoustic pathways are designed to form an acoustic spatial filter that selectively passes acoustic signals from or to different locations. The transducer can convert electric energy to acoustic energy when the apparatus operates as a speaker, or the the transducer can convert acoustic energy to electric energy and operate as a microphone.

Type: Application

Filed: June 10, 2015

Publication date: December 15, 2016

Inventors: Jonathan Le Roux, John R. Hershey, William S. Yerazunis, Petros T. Boufounos, Laurent Daudet
Method and system for detecting events in an acoustic signal subject to cyclo-stationary noise

Patent number: 9477895

Abstract: A method detects events in an accoustic signal subject to cyclostationary background noise by first segmenting the signal into cycles. Features with a fixed dimension are derived from the cycles, such that the timing of the features is relative to a cycle time. The features are normalized using an estimate of the cyclostationary background noise. Then, after the normalizing, a classifier is applied to the features to detect the events.

Type: Grant

Filed: March 31, 2014

Date of Patent: October 25, 2016

Assignee: Mitsubishi Electric Research Laboratories, Inc.

Inventors: John R. Hershey, Vamsi K. Potluru, Jonathan Le Roux
Actions prediction for hypothetical driving conditions

Patent number: 9434389

Abstract: An information system includes a prediction engine for predicting an action based on a set of driving state parameters, and a driving history, and a simulation engine for generating a hypothetical scenario by simulating one or a combination of at least one driving state parameter and at least part of the driving history, such that the prediction engine predicts the action for the hypothetical scenario.

Type: Grant

Filed: March 6, 2014

Date of Patent: September 6, 2016

Assignee: Mitsubishi Electric Research Laboratories, Inc.

Inventors: Bret Harsham, John R. Hershey, Jonathan Le Roux, Daniel Nikolaev Nikovski, Alan W. Esenther
Denoising noisy speech signals using probabilistic model

Patent number: 9324338

Abstract: A method determines from an input noisy signal sequences of hidden variables including at least one sequence of hidden variables representing an excitation component of the clean speech signal, at least one sequence of hidden variables representing a filter component of the clean speech signal, and at least one sequence of hidden variables representing the noise signal. The sequences of hidden variables include hidden variables determined as a non-negative linear combination of non-negative basis functions. The determination uses the model of the clean speech signal that includes a non-negative source-filter dynamical system (NSFDS) constraining the hidden variables representing the excitation and the filter components to be statistically dependent over time. The method generates an output signal using a product of corresponding hidden variables representing the excitation and the filter components.

Type: Grant

Filed: March 26, 2014

Date of Patent: April 26, 2016

Assignee: Mitsubishi Electric Research Laboratories, Inc.

Inventors: Jonathan Le Roux, John R. Hershey, Umut Simsekli
Method and apparatus for processing text with variations in vocabulary usage

Patent number: 9251250

Abstract: Text is processed to construct a model of the text. The text has a shared vocabulary. The text is partitioned into sets and subsets of texts. The usage of the shared vocabulary in two or more sets is different, and the topics of two or more subsets are different. A probabilistic model is defined for the text. The probabilistic model considers each word in the text to be a token having a position and a word value, and the usage of the shared vocabulary, topics, subtopics, and word values for each token in the text are represented using distributions of random variables in the probabilistic model, wherein the random variables are discrete. Parameters are estimated for the model corresponding to the vocabulary usages, the word values, the topics, and the subtopics associated with the words.

Type: Grant

Filed: March 28, 2012

Date of Patent: February 2, 2016

Assignee: Mitsubishi Electric Research Laboratories, Inc.

Inventors: John R. Hershey, Jonathan Le Roux, Creighton K Heakulani
Method for localizing sources of signals in reverberant environments using sparse optimization

Patent number: 9251436

Abstract: Source signals emitted in a reverberant environment from different locations are processed by first receiving input signals corresponding to the source signals by a set of sensors. Then, a sparsity-based support estimation is applied to the input signals according to a reverberation model to produce estimates of the source signals and locations of a set of sources emitting the source signals.

Type: Grant

Filed: February 26, 2013

Date of Patent: February 2, 2016

Assignee: MITSUBISHI ELECTRIC RESEARCH LABORATORIES, INC.

Inventors: Petros T Boufounos, Jonathan Le Roux, Kang Kang, John R Hershey
Source Signal Separation by Discriminatively-Trained Non-Negative Matrix Factorization

Publication number: 20150348537

Abstract: A method estimates source signals from a mixture of source signals by first training an analysis model and a reconstruction model using training data. The analysis model is applied to the mixture of source signals to obtain an analysis representation of the mixture of source signals, and the reconstruction model is applied to the analysis representation to obtain an estimate of the source signals, wherein the analysis model utilizes an analysis linear basis representation, and the reconstruction model utilizes a reconstruction linear basis representation.

Type: Application

Filed: May 29, 2014

Publication date: December 3, 2015

Applicant: Mitsubishi Electric Research Laboratories, Inc.

Inventors: Jonathan Le Roux, John R. Hershey, Felix Weninger, Shinji Watanabe

1 2 3 next