Patents by Inventor John R. Hershey

John R. Hershey has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20230386502
    Abstract: Apparatus and methods related to separation of audio sources are provided. The method includes receiving an audio waveform associated with a plurality of video frames. The method includes estimating, by a neural network, one or more audio sources associated with the plurality of video frames. The method includes generating, by the neural network, one or more audio embeddings corresponding to the one or more estimated audio sources. The method includes determining, based on the audio embeddings and a video embedding, whether one or more audio sources of the one or more estimated audio sources correspond to objects in the plurality of video frames. The method includes predicting, by the neural network and based on the one or more audio embeddings and the video embedding, a version of the audio waveform comprising audio sources that correspond to objects in the plurality of video frames.
    Type: Application
    Filed: July 26, 2023
    Publication date: November 30, 2023
    Inventors: Efthymios Tzinis, Scott Wisdom, Aren Jansen, John R. Hershey
  • Patent number: 11756570
    Abstract: Apparatus and methods related to separation of audio sources are provided. The method includes receiving an audio waveform associated with a plurality of video frames. The method includes estimating, by a neural network, one or more audio sources associated with the plurality of video frames. The method includes generating, by the neural network, one or more audio embeddings corresponding to the one or more estimated audio sources. The method includes determining, based on the audio embeddings and a video embedding, whether one or more audio sources of the one or more estimated audio sources correspond to objects in the plurality of video frames. The method includes predicting, by the neural network and based on the one or more audio embeddings and the video embedding, a version of the audio waveform comprising audio sources that correspond to objects in the plurality of video frames.
    Type: Grant
    Filed: March 26, 2021
    Date of Patent: September 12, 2023
    Assignee: Google LLC
    Inventors: Efthymios Tzinis, Scott Wisdom, Aren Jansen, John R Hershey
  • Publication number: 20220310113
    Abstract: Apparatus and methods related to separation of audio sources are provided. The method includes receiving an audio waveform associated with a plurality of video frames. The method includes estimating, by a neural network, one or more audio sources associated with the plurality of video frames. The method includes generating, by the neural network, one or more audio embeddings corresponding to the one or more estimated audio sources. The method includes determining, based on the audio embeddings and a video embedding, whether one or more audio sources of the one or more estimated audio sources correspond to objects in the plurality of video frames. The method includes predicting, by the neural network and based on the one or more audio embeddings and the video embedding, a version of the audio waveform comprising audio sources that correspond to objects in the plurality of video frames.
    Type: Application
    Filed: March 26, 2021
    Publication date: September 29, 2022
    Inventors: Efthymios Tzinis, Scott Wisdom, Aren Jansen, John R. Hershey
  • Patent number: 11133011
    Abstract: A speech recognition system includes a plurality of microphones to receive acoustic signals including speech signals, an input interface to generate multichannel inputs from the acoustic signals, one or more storages to store a multichannel speech recognition network, wherein the multichannel speech recognition network comprises mask estimation networks to generate time-frequency masks from the multichannel inputs, a beamformer network trained to select a reference channel input from the multichannel inputs using the time-frequency masks and generate an enhanced speech dataset based on the reference channel input and an encoder-decoder network trained to transform the enhanced speech dataset into a text. The system further includes one or more processors, using the multichannel speech recognition network in association with the one or more storages, to generate the text from the multichannel inputs, and an output interface to render the text.
    Type: Grant
    Filed: October 3, 2017
    Date of Patent: September 28, 2021
    Assignee: Mitsubishi Electric Research Laboratories, Inc.
    Inventors: Shinji Watanabe, Tsubasa Ochiai, Takaaki Hori, John R Hershey
  • Patent number: 10529349
    Abstract: Systems and methods for an audio signal processing system for transforming an input audio signal. A processor implements steps of a module by inputting an input audio signal into a spectrogram estimator to extract an audio feature sequence, and process the audio feature sequence to output a set of estimated spectrograms. Processing the set of estimated spectrograms and the audio feature sequence using a spectrogram refinement module, to output a set of refined spectrograms. Wherein the processing of the spectrogram refinement module is based on an iterative reconstruction algorithm. Processing the set of refined spectrograms for the one or more target audio signals using a signal refinement module, to obtain the target audio signal estimates. An output interface to output the optimized target audio signal estimates. Wherein the module is optimized by minimizing an error using an optimizer stored in the memory.
    Type: Grant
    Filed: May 18, 2018
    Date of Patent: January 7, 2020
    Assignee: Mitsubishi Electric Research Laboratories, Inc.
    Inventors: Jonathan Le Roux, John R Hershey, Zhongqiu Wang, Gordon P Wichern
  • Publication number: 20190318754
    Abstract: Systems and methods for an audio signal processing system for transforming an input audio signal. A processor implements steps of a module by inputting an input audio signal into a spectrogram estimator to extract an audio feature sequence, and process the audio feature sequence to output a set of estimated spectrograms. Processing the set of estimated spectrograms and the audio feature sequence using a spectrogram refinement module, to output a set of refined spectrograms. Wherein the processing of the spectrogram refinement module is based on an iterative reconstruction algorithm. Processing the set of refined spectrograms for the one or more target audio signals using a signal refinement module, to obtain the target audio signal estimates. An output interface to output the optimized target audio signal estimates. Wherein the module is optimized by minimizing an error using an optimizer stored in the memory.
    Type: Application
    Filed: May 18, 2018
    Publication date: October 17, 2019
    Inventors: Jonathan Le Roux, John R Hershey, Zhongqiu Wang, Gordon P Wichern
  • Publication number: 20180261225
    Abstract: A speech recognition system includes a plurality of microphones to receive acoustic signals including speech signals, an input interface to generate multichannel inputs from the acoustic signals, one or more storages to store a multichannel speech recognition network, wherein the multichannel speech recognition network comprises mask estimation networks to generate time-frequency masks from the multichannel inputs, a beamformer network trained to select a reference channel input from the multichannel inputs using the time-frequency masks and generate an enhanced speech dataset based on the reference channel input and an encoder-decoder network trained to transform the enhanced speech dataset into a text. The system further includes one or more processors, using the multichannel speech recognition network in association with the one or more storages, to generate the text from the multichannel inputs, and an output interface to render the text.
    Type: Application
    Filed: October 3, 2017
    Publication date: September 13, 2018
    Inventors: Shinji Watanabe, Tsubasa Ochiai, Takaaki Hori, John R. Hershey
  • Patent number: 9837075
    Abstract: A method for processing a voice command using a statistical dialog model determines a belief state as a probability distribution over states organized in a hierarchy with a parent-child relationship of nodes representing the states. The belief state includes the hierarchy of state variables defining probabilities of each state to correspond to the voice command and a probability of a state of a child node in the hierarchy is conditioned on a probability of a state of a corresponding parent node. A system action is selected based on the belief state.
    Type: Grant
    Filed: February 10, 2014
    Date of Patent: December 5, 2017
    Assignee: Mitsubishi Electric Research Laboratories, Inc.
    Inventors: Shinji Watanabe, John R. Hershey
  • Patent number: 9679559
    Abstract: A method estimates source signals from a mixture of source signals by first training an analysis model and a reconstruction model using training data. The analysis model is applied to the mixture of source signals to obtain an analysis representation of the mixture of source signals, and the reconstruction model is applied to the analysis representation to obtain an estimate of the source signals, wherein the analysis model utilizes an analysis linear basis representation, and the reconstruction model utilizes a reconstruction linear basis representation.
    Type: Grant
    Filed: May 29, 2014
    Date of Patent: June 13, 2017
    Assignee: Mitsubishi Electric Research Laboratories, Inc.
    Inventors: Jonathan Le Roux, John R. Hershey, Felix Weninger, Shinji Watanabe
  • Patent number: 9661414
    Abstract: In an acoustic apparatus, an acoustic transducer is arranged in a substrate. Multiple acoustic pathways in the substrate have predetermined lengths, wherein a proximal end of each pathway forms an opening in a front surface of the substrate, and a distal end terminates at the acoustic transducer. The predetermined lengths of the acoustic pathways are designed to form an acoustic spatial filter that selectively passes acoustic signals from or to different locations. The transducer can convert electric energy to acoustic energy when the apparatus operates as a speaker, or the the transducer can convert acoustic energy to electric energy and operate as a microphone.
    Type: Grant
    Filed: June 10, 2015
    Date of Patent: May 23, 2017
    Assignee: Mitsubishi Electric Research Laboratories, Inc.
    Inventors: Jonathan Le Roux, John R Hershey, William S. Yerazunis, Petros T Boufounos, Laurent Daudet
  • Patent number: 9601130
    Abstract: A method processes an acoustic signal that is a mixture of a target signal and interfering signals by first enhancing the acoustic signal by a set of enhancement procedures to produce a set of initial enhanced signals. Then, an ensemble learning procedure is applied to the acoustic signal and the set of initial enhancement signals to produce features of the acoustic signal.
    Type: Grant
    Filed: July 18, 2013
    Date of Patent: March 21, 2017
    Assignee: Mitsubishi Electric Research Laboratories, Inc.
    Inventors: Jonathan Le Roux, Shinji Watanabe, John R Hershey
  • Publication number: 20160366511
    Abstract: In an acoustic apparatus, an acoustic transducer is arranged in a substrate. Multiple acoustic pathways in the substrate have predetermined lengths, wherein a proximal end of each pathway forms an opening in a front surface of the substrate, and a distal end terminates at the acoustic transducer. The predetermined lengths of the acoustic pathways are designed to form an acoustic spatial filter that selectively passes acoustic signals from or to different locations. The transducer can convert electric energy to acoustic energy when the apparatus operates as a speaker, or the the transducer can convert acoustic energy to electric energy and operate as a microphone.
    Type: Application
    Filed: June 10, 2015
    Publication date: December 15, 2016
    Inventors: Jonathan Le Roux, John R. Hershey, William S. Yerazunis, Petros T. Boufounos, Laurent Daudet
  • Patent number: 9477895
    Abstract: A method detects events in an accoustic signal subject to cyclostationary background noise by first segmenting the signal into cycles. Features with a fixed dimension are derived from the cycles, such that the timing of the features is relative to a cycle time. The features are normalized using an estimate of the cyclostationary background noise. Then, after the normalizing, a classifier is applied to the features to detect the events.
    Type: Grant
    Filed: March 31, 2014
    Date of Patent: October 25, 2016
    Assignee: Mitsubishi Electric Research Laboratories, Inc.
    Inventors: John R. Hershey, Vamsi K. Potluru, Jonathan Le Roux
  • Patent number: 9434389
    Abstract: An information system includes a prediction engine for predicting an action based on a set of driving state parameters, and a driving history, and a simulation engine for generating a hypothetical scenario by simulating one or a combination of at least one driving state parameter and at least part of the driving history, such that the prediction engine predicts the action for the hypothetical scenario.
    Type: Grant
    Filed: March 6, 2014
    Date of Patent: September 6, 2016
    Assignee: Mitsubishi Electric Research Laboratories, Inc.
    Inventors: Bret Harsham, John R. Hershey, Jonathan Le Roux, Daniel Nikolaev Nikovski, Alan W. Esenther
  • Patent number: 9324338
    Abstract: A method determines from an input noisy signal sequences of hidden variables including at least one sequence of hidden variables representing an excitation component of the clean speech signal, at least one sequence of hidden variables representing a filter component of the clean speech signal, and at least one sequence of hidden variables representing the noise signal. The sequences of hidden variables include hidden variables determined as a non-negative linear combination of non-negative basis functions. The determination uses the model of the clean speech signal that includes a non-negative source-filter dynamical system (NSFDS) constraining the hidden variables representing the excitation and the filter components to be statistically dependent over time. The method generates an output signal using a product of corresponding hidden variables representing the excitation and the filter components.
    Type: Grant
    Filed: March 26, 2014
    Date of Patent: April 26, 2016
    Assignee: Mitsubishi Electric Research Laboratories, Inc.
    Inventors: Jonathan Le Roux, John R. Hershey, Umut Simsekli
  • Patent number: 9251436
    Abstract: Source signals emitted in a reverberant environment from different locations are processed by first receiving input signals corresponding to the source signals by a set of sensors. Then, a sparsity-based support estimation is applied to the input signals according to a reverberation model to produce estimates of the source signals and locations of a set of sources emitting the source signals.
    Type: Grant
    Filed: February 26, 2013
    Date of Patent: February 2, 2016
    Assignee: MITSUBISHI ELECTRIC RESEARCH LABORATORIES, INC.
    Inventors: Petros T Boufounos, Jonathan Le Roux, Kang Kang, John R Hershey
  • Patent number: 9251250
    Abstract: Text is processed to construct a model of the text. The text has a shared vocabulary. The text is partitioned into sets and subsets of texts. The usage of the shared vocabulary in two or more sets is different, and the topics of two or more subsets are different. A probabilistic model is defined for the text. The probabilistic model considers each word in the text to be a token having a position and a word value, and the usage of the shared vocabulary, topics, subtopics, and word values for each token in the text are represented using distributions of random variables in the probabilistic model, wherein the random variables are discrete. Parameters are estimated for the model corresponding to the vocabulary usages, the word values, the topics, and the subtopics associated with the words.
    Type: Grant
    Filed: March 28, 2012
    Date of Patent: February 2, 2016
    Assignee: Mitsubishi Electric Research Laboratories, Inc.
    Inventors: John R. Hershey, Jonathan Le Roux, Creighton K Heakulani
  • Publication number: 20150348537
    Abstract: A method estimates source signals from a mixture of source signals by first training an analysis model and a reconstruction model using training data. The analysis model is applied to the mixture of source signals to obtain an analysis representation of the mixture of source signals, and the reconstruction model is applied to the analysis representation to obtain an estimate of the source signals, wherein the analysis model utilizes an analysis linear basis representation, and the reconstruction model utilizes a reconstruction linear basis representation.
    Type: Application
    Filed: May 29, 2014
    Publication date: December 3, 2015
    Applicant: Mitsubishi Electric Research Laboratories, Inc.
    Inventors: Jonathan Le Roux, John R. Hershey, Felix Weninger, Shinji Watanabe
  • Patent number: 9170119
    Abstract: A method adapts a user interface of a vehicle navigation system. Based on an input vector representing a current state related to the vehicle, probabilities of actions are predicted to achieve a next state using a predictive model representing previous states. Then, a subset of the actions with highest probabilities that minimize a complexity of interacting with the vehicle navigation system are displayed in the vehicle.
    Type: Grant
    Filed: September 24, 2013
    Date of Patent: October 27, 2015
    Assignee: Mitsubishi Electric Research Laboratories, Inc.
    Inventors: Daniel Nikolaev Nikovski, John R Hershey, Bret Harsham, Jonathan Le Roux
  • Patent number: 9159317
    Abstract: A system and a method recognize speech including a sequence of words. A set of interpretations of the speech is generated using an acoustic model and a language model, and, for each interpretation, a score representing correctness of an interpretation in representing the sequence of words is determined to produce a set of scores. Next, the set of scores is updated based on a consistency of each interpretation with a constraint determined in response to receiving a word sequence constraint.
    Type: Grant
    Filed: June 14, 2013
    Date of Patent: October 13, 2015
    Assignee: Mitsubishi Electric Research Laboratories, Inc.
    Inventors: Bret Harsham, John R. Hershey