Patents by Inventor Ron J. Weiss
Ron J. Weiss has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20190332919Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for generating a target sequence including a respective output at each of multiple output time steps from respective encoded representations of inputs in an input sequence. The method includes, for each output time step, starting from the position, in the input order, of the encoded representation that was selected as a preceding context vector at a preceding output time step, traversing the encoded representations until an encoded representation is selected as a current context vector at the output time step. A decoder neural network processes the current context vector and a preceding output at the preceding output time step to generate a respective output score for each possible output and to update the hidden state of the decoder recurrent neural network. An output is selected for the output time step using the output scores.Type: ApplicationFiled: July 8, 2019Publication date: October 31, 2019Inventors: Ron J. Weiss, Thang Minh Luong, Peter J. Liu, Colin Abraham Raffel, Douglas Eck
-
Publication number: 20190311708Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating speech from text. One of the systems includes one or more computers and one or more storage devices storing instructions that when executed by one or more computers cause the one or more computers to implement: a sequence-to-sequence recurrent neural network configured to: receive a sequence of characters in a particular natural language, and process the sequence of characters to generate a spectrogram of a verbal utterance of the sequence of characters in the particular natural language; and a subsystem configured to: receive the sequence of characters in the particular natural language, and provide the sequence of characters as input to the sequence-to-sequence recurrent neural network to obtain as output the spectrogram of the verbal utterance of the sequence of characters in the particular natural language.Type: ApplicationFiled: June 20, 2019Publication date: October 10, 2019Inventors: Samy Bengio, Yuxuan Wang, Zongheng Yang, Zhifeng Chen, Yonghui Wu, Ioannis Agiomyrgiannakis, Ron J. Weiss, Navdeep Jaitly, Ryan M. Rifkin, Robert Andrew James Clark, Quoc V. Le, Russell J. Ryan, Ying Xiao
-
Patent number: 10403269Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for processing audio waveforms. In some implementations, a time-frequency feature representation is generated based on audio data. The time-frequency feature representation is input to an acoustic model comprising a trained artificial neural network. The trained artificial neural network comprising a frequency convolution layer, a memory layer, and one or more hidden layers. An output that is based on output of the trained artificial neural network is received. A transcription is provided, where the transcription is determined based on the output of the acoustic model.Type: GrantFiled: March 25, 2016Date of Patent: September 3, 2019Inventors: Tara N. Sainath, Ron J. Weiss, Andrew W. Senior, Kevin William Wilson
-
Publication number: 20190259409Abstract: This specification describes computer-implemented methods and systems. One method includes receiving, by a neural network of a speech recognition system, first data representing a first raw audio signal and second data representing a second raw audio signal. The first raw audio signal and the second raw audio signal describe audio occurring at a same period of time. The method further includes generating, by a spatial filtering layer of the neural network, a spatial filtered output using the first data and the second data, and generating, by a spectral filtering layer of the neural network, a spectral filtered output using the spatial filtered output. Generating the spectral filtered output comprises processing frequency-domain data representing the spatial filtered output. The method still further includes processing, by one or more additional layers of the neural network, the spectral filtered output to predict sub-word units encoded in both the first raw audio signal and the second raw audio signal.Type: ApplicationFiled: February 19, 2019Publication date: August 22, 2019Inventors: Ehsan Variani, Kevin William Wilson, Ron J. Weiss, Tara N. Sainath, Arun Narayanan
-
Patent number: 10339921Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for using neural networks. One of the methods includes receiving, by a neural network in a speech recognition system, first data representing a first raw audio signal and second data representing a second raw audio signal, the first raw audio signal and the second raw audio signal for the same period of time, generating, by a spatial filtering convolutional layer in the neural network, a spatial filtered output the first data and the second data, generating, by a spectral filtering convolutional layer in the neural network, a spectral filtered output using the spatial filtered output, and processing, by one or more additional layers in the neural network, the spectral filtered output to predict sub-word units encoded in both the first raw audio signal and the second raw audio signal.Type: GrantFiled: January 4, 2016Date of Patent: July 2, 2019Assignee: Google LLCInventors: Tara N. Sainath, Ron J. Weiss, Kevin William Wilson
-
Patent number: 10224058Abstract: This specification describes computer-implemented methods and systems. One method includes receiving, by a neural network of a speech recognition system, first data representing a first raw audio signal and second data representing a second raw audio signal. The first raw audio signal and the second raw audio signal describe audio occurring at a same period of time. The method further includes generating, by a spatial filtering layer of the neural network, a spatial filtered output using the first data and the second data, and generating, by a spectral filtering layer of the neural network, a spectral filtered output using the spatial filtered output. Generating the spectral filtered output comprises processing frequency-domain data representing the spatial filtered output. The method still further includes processing, by one or more additional layers of the neural network, the spectral filtered output to predict sub-word units encoded in both the first raw audio signal and the second raw audio signal.Type: GrantFiled: November 14, 2016Date of Patent: March 5, 2019Assignee: Google LLCInventors: Ehsan Variani, Kevin William Wilson, Ron J. Weiss, Tara N. Sainath, Arun Narayanan
-
Publication number: 20180357312Abstract: Generating a playlist may include designating a seed track in an audio library; identifying audio tracks in the audio library having constructs that are within a range of a corresponding construct of the seed track, where the constructs for the audio tracks are derived from frequency representations of the audio tracks, and the corresponding construct for the seed track is derived from a frequency representation of the seed track; and generating the playlist using at least some of the audio tracks that were identified.Type: ApplicationFiled: August 20, 2018Publication date: December 13, 2018Inventors: Geremy A. Heitz, III, Adam Berenzweig, Jason E. Weston, Ron J. Weiss, Sally A. Goldman, Thomas Walters, Samy Bengio, Douglas Eck, Jay M. Ponte, Ryan M. Rifkin
-
Patent number: 10055493Abstract: Generating a playlist may include designating a seed track in an audio library; identifying audio tracks in the audio library having constructs that are within a range of a corresponding construct of the seed track, where the constructs for the audio tracks are derived from frequency representations of the audio tracks, and the corresponding construct for the seed track is derived from a frequency representation of the seed track; and generating the playlist using at least some of the audio tracks that were identified.Type: GrantFiled: May 9, 2011Date of Patent: August 21, 2018Assignee: Google LLCInventors: Geremy A. Heitz, III, Adam Berenzweig, Jason E. Weston, Ron J. Weiss, Sally A. Goldman, Thomas Walters, Samy Bengio, Douglas Eck, Jay M. Ponte, Ryan M. Rifkin
-
Publication number: 20180197534Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for neural network adaptive beamforming for multichannel speech recognition are disclosed. In one aspect, a method includes the actions of receiving a first channel of audio data corresponding to an utterance and a second channel of audio data corresponding to the utterance. The actions further include generating a first set of filter parameters for a first filter based on the first channel of audio data and the second channel of audio data and a second set of filter parameters for a second filter based on the first channel of audio data and the second channel of audio data. The actions further include generating a single combined channel of audio data. The actions further include inputting the audio data to a neural network. The actions further include providing a transcription for the utterance.Type: ApplicationFiled: December 20, 2017Publication date: July 12, 2018Inventors: Bo Li, Ron J. Weiss, Michiel A.U. Bacchiani, Tara N. Sainath, Kevin William Wilson
-
Publication number: 20180068675Abstract: This specification describes computer-implemented methods and systems. One method includes receiving, by a neural network of a speech recognition system, first data representing a first raw audio signal and second data representing a second raw audio signal. The first raw audio signal and the second raw audio signal describe audio occurring at a same period of time. The method further includes generating, by a spatial filtering layer of the neural network, a spatial filtered output using the first data and the second data, and generating, by a spectral filtering layer of the neural network, a spectral filtered output using the spatial filtered output. Generating the spectral filtered output comprises processing frequency-domain data representing the spatial filtered output. The method still further includes processing, by one or more additional layers of the neural network, the spectral filtered output to predict sub-word units encoded in both the first raw audio signal and the second raw audio signal.Type: ApplicationFiled: November 14, 2016Publication date: March 8, 2018Inventors: Ehsan Variani, Kevin William Wilson, Ron J. Weiss, Tara N. Sainath, Arun Narayanan
-
Patent number: 9886949Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for neural network adaptive beamforming for multichannel speech recognition are disclosed. In one aspect, a method includes the actions of receiving a first channel of audio data corresponding to an utterance and a second channel of audio data corresponding to the utterance. The actions further include generating a first set of filter parameters for a first filter based on the first channel of audio data and the second channel of audio data and a second set of filter parameters for a second filter based on the first channel of audio data and the second channel of audio data. The actions further include generating a single combined channel of audio data. The actions further include inputting the audio data to a neural network. The actions further include providing a transcription for the utterance.Type: GrantFiled: December 28, 2016Date of Patent: February 6, 2018Assignee: Google Inc.Inventors: Bo Li, Ron J. Weiss, Michiel A. U. Bacchiani, Tara N. Sainath, Kevin William Wilson
-
Publication number: 20170278513Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for neural network adaptive beamforming for multichannel speech recognition are disclosed. In one aspect, a method includes the actions of receiving a first channel of audio data corresponding to an utterance and a second channel of audio data corresponding to the utterance. The actions further include generating a first set of filter parameters for a first filter based on the first channel of audio data and the second channel of audio data and a second set of filter parameters for a second filter based on the first channel of audio data and the second channel of audio data. The actions further include generating a single combined channel of audio data. The actions further include inputting the audio data to a neural network. The actions further include providing a transcription for the utterance.Type: ApplicationFiled: December 28, 2016Publication date: September 28, 2017Inventors: Bo Li, Ron J. Weiss, Michiel A.U. Bacchiani, Tara N. Sainath, Kevin William Wilson
-
Patent number: 9697826Abstract: Methods, including computer programs encoded on a computer storage medium, for enhancing the processing of audio waveforms for speech recognition using various neural network processing techniques. In one aspect, a method includes: receiving multiple channels of audio data corresponding to an utterance; convolving each of multiple filters, in a time domain, with each of the multiple channels of audio waveform data to generate convolution outputs, wherein the multiple filters have parameters that have been learned during a training process that jointly trains the multiple filters and trains a deep neural network as an acoustic model; combining, for each of the multiple filters, the convolution outputs for the filter for the multiple channels of audio waveform data; inputting the combined convolution outputs to the deep neural network trained jointly with the multiple filters; and providing a transcription for the utterance that is determined.Type: GrantFiled: July 8, 2016Date of Patent: July 4, 2017Assignee: Google Inc.Inventors: Tara N. Sainath, Ron J. Weiss, Kevin William Wilson, Andrew W. Senior, Arun Narayanan, Yedid Hoshen, Michiel A. U. Bacchiani
-
Publication number: 20170092265Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for using neural networks. One of the methods includes receiving, by a neural network in a speech recognition system, first data representing a first raw audio signal and second data representing a second raw audio signal, the first raw audio signal and the second raw audio signal for the same period of time, generating, by a spatial filtering convolutional layer in the neural network, a spatial filtered output the first data and the second data, generating, by a spectral filtering convolutional layer in the neural network, a spectral filtered output using the spatial filtered output, and processing, by one or more additional layers in the neural network, the spectral filtered output to predict sub-word units encoded in both the first raw audio signal and the second raw audio signal.Type: ApplicationFiled: January 4, 2016Publication date: March 30, 2017Inventors: Tara N. Sainath, Ron J. Weiss, Kevin William Wilson
-
Publication number: 20160322055Abstract: Methods, including computer programs encoded on a computer storage medium, for enhancing the processing of audio waveforms for speech recognition using various neural network processing techniques. In one aspect, a method includes: receiving multiple channels of audio data corresponding to an utterance; convolving each of multiple filters, in a time domain, with each of the multiple channels of audio waveform data to generate convolution outputs, wherein the multiple filters have parameters that have been learned during a training process that jointly trains the multiple filters and trains a deep neural network as an acoustic model; combining, for each of the multiple filters, the convolution outputs for the filter for the multiple channels of audio waveform data; inputting the combined convolution outputs to the deep neural network trained jointly with the multiple filters; and providing a transcription for the utterance that is determined.Type: ApplicationFiled: July 8, 2016Publication date: November 3, 2016Inventors: Tara N. Sainath, Ron J. Weiss, Kevin William Wilson, Andrew W. Senior, Arun Narayanan, Yedid Hoshen, Michiel A.U. Bacchiani
-
Publication number: 20160284347Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for processing audio waveforms. In some implementations, a time-frequency feature representation is generated based on audio data. The time-frequency feature representation is input to an acoustic model comprising a trained artificial neural network. The trained artificial neural network comprising a frequency convolution layer, a memory layer, and one or more hidden layers. An output that is based on output of the trained artificial neural network is received. A transcription is provided, where the transcription is determined based on the output of the acoustic model.Type: ApplicationFiled: March 25, 2016Publication date: September 29, 2016Inventors: Tara N. Sainath, Ron J. Weiss, Andrew W. Senior, Kevin William Wilson
-
Publication number: 20120290621Abstract: Generating a playlist may include designating a seed track in an audio library; identifying audio tracks in the audio library having constructs that are within a range of a corresponding construct of the seed track, where the constructs for the audio tracks are derived from frequency representations of the audio tracks, and the corresponding construct for the seed track is derived from a frequency representation of the seed track; and generating the playlist using at least some of the audio tracks that were identified.Type: ApplicationFiled: May 9, 2011Publication date: November 15, 2012Inventors: Geremy A. Heitz, III, Adam Berenzweig, Jason E. Weston, Ron J. Weiss, Sally A. Goldman, Thomas Walters, Samy Bengio, Douglas Eck, Jay M. Ponte, Ryan M. Rifkin
-
Patent number: 8131543Abstract: The subject matter of this specification can be embodied in, among other things, a method that includes receiving an audio signal, determining an energy-independent component of a portion of the audio signal associated with a spectral shape of the portion, and determining an energy-dependent component of the portion associated with a gain level of the portion. The method also comprises comparing the energy-independent and energy-dependent components to a speech model, comparing the energy-independent and energy-dependent components to a noise model, and outputting an indication whether the portion of the audio signal more closely corresponds to the speech model or to the noise model based on the comparisons.Type: GrantFiled: April 14, 2008Date of Patent: March 6, 2012Assignee: Google Inc.Inventors: Ron J. Weiss, Trausti Kristjansson