Patents by Inventor Ron J. Weiss

Ron J. Weiss has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

SEQUENCE PROCESSING USING ONLINE ATTENTION

Publication number: 20190332919

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for generating a target sequence including a respective output at each of multiple output time steps from respective encoded representations of inputs in an input sequence. The method includes, for each output time step, starting from the position, in the input order, of the encoded representation that was selected as a preceding context vector at a preceding output time step, traversing the encoded representations until an encoded representation is selected as a current context vector at the output time step. A decoder neural network processes the current context vector and a preceding output at the preceding output time step to generate a respective output score for each possible output and to update the hidden state of the decoder recurrent neural network. An output is selected for the output time step using the output scores.

Type: Application

Filed: July 8, 2019

Publication date: October 31, 2019

Inventors: Ron J. Weiss, Thang Minh Luong, Peter J. Liu, Colin Abraham Raffel, Douglas Eck
END-TO-END TEXT-TO-SPEECH CONVERSION

Publication number: 20190311708

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating speech from text. One of the systems includes one or more computers and one or more storage devices storing instructions that when executed by one or more computers cause the one or more computers to implement: a sequence-to-sequence recurrent neural network configured to: receive a sequence of characters in a particular natural language, and process the sequence of characters to generate a spectrogram of a verbal utterance of the sequence of characters in the particular natural language; and a subsystem configured to: receive the sequence of characters in the particular natural language, and provide the sequence of characters as input to the sequence-to-sequence recurrent neural network to obtain as output the spectrogram of the verbal utterance of the sequence of characters in the particular natural language.

Type: Application

Filed: June 20, 2019

Publication date: October 10, 2019

Inventors: Samy Bengio, Yuxuan Wang, Zongheng Yang, Zhifeng Chen, Yonghui Wu, Ioannis Agiomyrgiannakis, Ron J. Weiss, Navdeep Jaitly, Ryan M. Rifkin, Robert Andrew James Clark, Quoc V. Le, Russell J. Ryan, Ying Xiao
Processing audio waveforms

Patent number: 10403269

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for processing audio waveforms. In some implementations, a time-frequency feature representation is generated based on audio data. The time-frequency feature representation is input to an acoustic model comprising a trained artificial neural network. The trained artificial neural network comprising a frequency convolution layer, a memory layer, and one or more hidden layers. An output that is based on output of the trained artificial neural network is received. A transcription is provided, where the transcription is determined based on the output of the acoustic model.

Type: Grant

Filed: March 25, 2016

Date of Patent: September 3, 2019

Inventors: Tara N. Sainath, Ron J. Weiss, Andrew W. Senior, Kevin William Wilson
ENHANCED MULTI-CHANNEL ACOUSTIC MODELS

Publication number: 20190259409

Abstract: This specification describes computer-implemented methods and systems. One method includes receiving, by a neural network of a speech recognition system, first data representing a first raw audio signal and second data representing a second raw audio signal. The first raw audio signal and the second raw audio signal describe audio occurring at a same period of time. The method further includes generating, by a spatial filtering layer of the neural network, a spatial filtered output using the first data and the second data, and generating, by a spectral filtering layer of the neural network, a spectral filtered output using the spatial filtered output. Generating the spectral filtered output comprises processing frequency-domain data representing the spatial filtered output. The method still further includes processing, by one or more additional layers of the neural network, the spectral filtered output to predict sub-word units encoded in both the first raw audio signal and the second raw audio signal.

Type: Application

Filed: February 19, 2019

Publication date: August 22, 2019

Inventors: Ehsan Variani, Kevin William Wilson, Ron J. Weiss, Tara N. Sainath, Arun Narayanan
Multichannel raw-waveform neural networks

Patent number: 10339921

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for using neural networks. One of the methods includes receiving, by a neural network in a speech recognition system, first data representing a first raw audio signal and second data representing a second raw audio signal, the first raw audio signal and the second raw audio signal for the same period of time, generating, by a spatial filtering convolutional layer in the neural network, a spatial filtered output the first data and the second data, generating, by a spectral filtering convolutional layer in the neural network, a spectral filtered output using the spatial filtered output, and processing, by one or more additional layers in the neural network, the spectral filtered output to predict sub-word units encoded in both the first raw audio signal and the second raw audio signal.

Type: Grant

Filed: January 4, 2016

Date of Patent: July 2, 2019

Assignee: Google LLC

Inventors: Tara N. Sainath, Ron J. Weiss, Kevin William Wilson
Enhanced multi-channel acoustic models

Patent number: 10224058

Abstract: This specification describes computer-implemented methods and systems. One method includes receiving, by a neural network of a speech recognition system, first data representing a first raw audio signal and second data representing a second raw audio signal. The first raw audio signal and the second raw audio signal describe audio occurring at a same period of time. The method further includes generating, by a spatial filtering layer of the neural network, a spatial filtered output using the first data and the second data, and generating, by a spectral filtering layer of the neural network, a spectral filtered output using the spatial filtered output. Generating the spectral filtered output comprises processing frequency-domain data representing the spatial filtered output. The method still further includes processing, by one or more additional layers of the neural network, the spectral filtered output to predict sub-word units encoded in both the first raw audio signal and the second raw audio signal.

Type: Grant

Filed: November 14, 2016

Date of Patent: March 5, 2019

Assignee: Google LLC

Inventors: Ehsan Variani, Kevin William Wilson, Ron J. Weiss, Tara N. Sainath, Arun Narayanan
GENERATING A PLAYLIST

Publication number: 20180357312

Abstract: Generating a playlist may include designating a seed track in an audio library; identifying audio tracks in the audio library having constructs that are within a range of a corresponding construct of the seed track, where the constructs for the audio tracks are derived from frequency representations of the audio tracks, and the corresponding construct for the seed track is derived from a frequency representation of the seed track; and generating the playlist using at least some of the audio tracks that were identified.

Type: Application

Filed: August 20, 2018

Publication date: December 13, 2018

Inventors: Geremy A. Heitz, III, Adam Berenzweig, Jason E. Weston, Ron J. Weiss, Sally A. Goldman, Thomas Walters, Samy Bengio, Douglas Eck, Jay M. Ponte, Ryan M. Rifkin
Generating a playlist

Patent number: 10055493

Abstract: Generating a playlist may include designating a seed track in an audio library; identifying audio tracks in the audio library having constructs that are within a range of a corresponding construct of the seed track, where the constructs for the audio tracks are derived from frequency representations of the audio tracks, and the corresponding construct for the seed track is derived from a frequency representation of the seed track; and generating the playlist using at least some of the audio tracks that were identified.

Type: Grant

Filed: May 9, 2011

Date of Patent: August 21, 2018

Assignee: Google LLC

Inventors: Geremy A. Heitz, III, Adam Berenzweig, Jason E. Weston, Ron J. Weiss, Sally A. Goldman, Thomas Walters, Samy Bengio, Douglas Eck, Jay M. Ponte, Ryan M. Rifkin
ADAPTIVE AUDIO ENHANCEMENT FOR MULTICHANNEL SPEECH RECOGNITION

Publication number: 20180197534

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for neural network adaptive beamforming for multichannel speech recognition are disclosed. In one aspect, a method includes the actions of receiving a first channel of audio data corresponding to an utterance and a second channel of audio data corresponding to the utterance. The actions further include generating a first set of filter parameters for a first filter based on the first channel of audio data and the second channel of audio data and a second set of filter parameters for a second filter based on the first channel of audio data and the second channel of audio data. The actions further include generating a single combined channel of audio data. The actions further include inputting the audio data to a neural network. The actions further include providing a transcription for the utterance.

Type: Application

Filed: December 20, 2017

Publication date: July 12, 2018

Inventors: Bo Li, Ron J. Weiss, Michiel A.U. Bacchiani, Tara N. Sainath, Kevin William Wilson
ENHANCED MULTI-CHANNEL ACOUSTIC MODELS

Publication number: 20180068675

Abstract: This specification describes computer-implemented methods and systems. One method includes receiving, by a neural network of a speech recognition system, first data representing a first raw audio signal and second data representing a second raw audio signal. The first raw audio signal and the second raw audio signal describe audio occurring at a same period of time. The method further includes generating, by a spatial filtering layer of the neural network, a spatial filtered output using the first data and the second data, and generating, by a spectral filtering layer of the neural network, a spectral filtered output using the spatial filtered output. Generating the spectral filtered output comprises processing frequency-domain data representing the spatial filtered output. The method still further includes processing, by one or more additional layers of the neural network, the spectral filtered output to predict sub-word units encoded in both the first raw audio signal and the second raw audio signal.

Type: Application

Filed: November 14, 2016

Publication date: March 8, 2018

Inventors: Ehsan Variani, Kevin William Wilson, Ron J. Weiss, Tara N. Sainath, Arun Narayanan
Adaptive audio enhancement for multichannel speech recognition

Patent number: 9886949

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for neural network adaptive beamforming for multichannel speech recognition are disclosed. In one aspect, a method includes the actions of receiving a first channel of audio data corresponding to an utterance and a second channel of audio data corresponding to the utterance. The actions further include generating a first set of filter parameters for a first filter based on the first channel of audio data and the second channel of audio data and a second set of filter parameters for a second filter based on the first channel of audio data and the second channel of audio data. The actions further include generating a single combined channel of audio data. The actions further include inputting the audio data to a neural network. The actions further include providing a transcription for the utterance.

Type: Grant

Filed: December 28, 2016

Date of Patent: February 6, 2018

Assignee: Google Inc.

Inventors: Bo Li, Ron J. Weiss, Michiel A. U. Bacchiani, Tara N. Sainath, Kevin William Wilson
ADAPTIVE AUDIO ENHANCEMENT FOR MULTICHANNEL SPEECH RECOGNITION

Publication number: 20170278513

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for neural network adaptive beamforming for multichannel speech recognition are disclosed. In one aspect, a method includes the actions of receiving a first channel of audio data corresponding to an utterance and a second channel of audio data corresponding to the utterance. The actions further include generating a first set of filter parameters for a first filter based on the first channel of audio data and the second channel of audio data and a second set of filter parameters for a second filter based on the first channel of audio data and the second channel of audio data. The actions further include generating a single combined channel of audio data. The actions further include inputting the audio data to a neural network. The actions further include providing a transcription for the utterance.

Type: Application

Filed: December 28, 2016

Publication date: September 28, 2017

Inventors: Bo Li, Ron J. Weiss, Michiel A.U. Bacchiani, Tara N. Sainath, Kevin William Wilson
Processing multi-channel audio waveforms

Patent number: 9697826

Abstract: Methods, including computer programs encoded on a computer storage medium, for enhancing the processing of audio waveforms for speech recognition using various neural network processing techniques. In one aspect, a method includes: receiving multiple channels of audio data corresponding to an utterance; convolving each of multiple filters, in a time domain, with each of the multiple channels of audio waveform data to generate convolution outputs, wherein the multiple filters have parameters that have been learned during a training process that jointly trains the multiple filters and trains a deep neural network as an acoustic model; combining, for each of the multiple filters, the convolution outputs for the filter for the multiple channels of audio waveform data; inputting the combined convolution outputs to the deep neural network trained jointly with the multiple filters; and providing a transcription for the utterance that is determined.

Type: Grant

Filed: July 8, 2016

Date of Patent: July 4, 2017

Assignee: Google Inc.

Inventors: Tara N. Sainath, Ron J. Weiss, Kevin William Wilson, Andrew W. Senior, Arun Narayanan, Yedid Hoshen, Michiel A. U. Bacchiani
MULTICHANNEL RAW-WAVEFORM NEURAL NETWORKS

Publication number: 20170092265

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for using neural networks. One of the methods includes receiving, by a neural network in a speech recognition system, first data representing a first raw audio signal and second data representing a second raw audio signal, the first raw audio signal and the second raw audio signal for the same period of time, generating, by a spatial filtering convolutional layer in the neural network, a spatial filtered output the first data and the second data, generating, by a spectral filtering convolutional layer in the neural network, a spectral filtered output using the spatial filtered output, and processing, by one or more additional layers in the neural network, the spectral filtered output to predict sub-word units encoded in both the first raw audio signal and the second raw audio signal.

Type: Application

Filed: January 4, 2016

Publication date: March 30, 2017

Inventors: Tara N. Sainath, Ron J. Weiss, Kevin William Wilson
PROCESSING MULTI-CHANNEL AUDIO WAVEFORMS

Publication number: 20160322055

Abstract: Methods, including computer programs encoded on a computer storage medium, for enhancing the processing of audio waveforms for speech recognition using various neural network processing techniques. In one aspect, a method includes: receiving multiple channels of audio data corresponding to an utterance; convolving each of multiple filters, in a time domain, with each of the multiple channels of audio waveform data to generate convolution outputs, wherein the multiple filters have parameters that have been learned during a training process that jointly trains the multiple filters and trains a deep neural network as an acoustic model; combining, for each of the multiple filters, the convolution outputs for the filter for the multiple channels of audio waveform data; inputting the combined convolution outputs to the deep neural network trained jointly with the multiple filters; and providing a transcription for the utterance that is determined.

Type: Application

Filed: July 8, 2016

Publication date: November 3, 2016

Inventors: Tara N. Sainath, Ron J. Weiss, Kevin William Wilson, Andrew W. Senior, Arun Narayanan, Yedid Hoshen, Michiel A.U. Bacchiani
PROCESSING AUDIO WAVEFORMS

Publication number: 20160284347

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for processing audio waveforms. In some implementations, a time-frequency feature representation is generated based on audio data. The time-frequency feature representation is input to an acoustic model comprising a trained artificial neural network. The trained artificial neural network comprising a frequency convolution layer, a memory layer, and one or more hidden layers. An output that is based on output of the trained artificial neural network is received. A transcription is provided, where the transcription is determined based on the output of the acoustic model.

Type: Application

Filed: March 25, 2016

Publication date: September 29, 2016

Inventors: Tara N. Sainath, Ron J. Weiss, Andrew W. Senior, Kevin William Wilson
GENERATING A PLAYLIST

Publication number: 20120290621

Abstract: Generating a playlist may include designating a seed track in an audio library; identifying audio tracks in the audio library having constructs that are within a range of a corresponding construct of the seed track, where the constructs for the audio tracks are derived from frequency representations of the audio tracks, and the corresponding construct for the seed track is derived from a frequency representation of the seed track; and generating the playlist using at least some of the audio tracks that were identified.

Type: Application

Filed: May 9, 2011

Publication date: November 15, 2012

Inventors: Geremy A. Heitz, III, Adam Berenzweig, Jason E. Weston, Ron J. Weiss, Sally A. Goldman, Thomas Walters, Samy Bengio, Douglas Eck, Jay M. Ponte, Ryan M. Rifkin
Speech detection

Patent number: 8131543

Abstract: The subject matter of this specification can be embodied in, among other things, a method that includes receiving an audio signal, determining an energy-independent component of a portion of the audio signal associated with a spectral shape of the portion, and determining an energy-dependent component of the portion associated with a gain level of the portion. The method also comprises comparing the energy-independent and energy-dependent components to a speech model, comparing the energy-independent and energy-dependent components to a noise model, and outputting an indication whether the portion of the audio signal more closely corresponds to the speech model or to the noise model based on the comparisons.

Type: Grant

Filed: April 14, 2008

Date of Patent: March 6, 2012

Assignee: Google Inc.

Inventors: Ron J. Weiss, Trausti Kristjansson

prev 1 2 3