Patents by Inventor Tara N. Sainath

Tara N. Sainath has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Linear transformation for speech recognition modeling

Patent number: 10714078

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for speech recognition using complex linear projection are disclosed. In one aspect, a method includes the actions of receiving audio data corresponding to an utterance. The method further includes generating frequency domain data using the audio data. The method further includes processing the frequency domain data using complex linear projection. The method further includes providing the processed frequency domain data to a neural network trained as an acoustic model. The method further includes generating a transcription for the utterance that is determined based at least on output that the neural network provides in response to receiving the processed frequency domain data.

Type: Grant

Filed: October 26, 2018

Date of Patent: July 14, 2020

Assignee: Google LLC

Inventors: Samuel Bengio, Mirkó Visontai, Christopher Walter George Thornton, Michiel A. U. Bacchiani, Tara N. Sainath, Ehsan Variani, Izhak Shafran
MULTI-DIALECT AND MULTILINGUAL SPEECH RECOGNITION

Publication number: 20200160836

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer-readable media, for speech recognition using multi-dialect and multilingual models. In some implementations, audio data indicating audio characteristics of an utterance is received. Input features determined based on the audio data are provided to a speech recognition model that has been trained to output score indicating the likelihood of linguistic units for each of multiple different language or dialects. The speech recognition model can be one that has been trained using cluster adaptive training. Output that the speech recognition model generated in response to receiving the input features determined based on the audio data is received. A transcription of the utterance generated based on the output of the speech recognition model is provided.

Type: Application

Filed: November 14, 2019

Publication date: May 21, 2020

Inventors: Zhifeng Chen, Bo Li, Eugene Weinstein, Yonghui Wu, Pedro J. Moreno Mengibar, Ron J. Weiss, Khe Chai Sim, Tara N. Sainath, Patrick An Phu Nguyen
COMPRESSED RECURRENT NEURAL NETWORK MODELS

Publication number: 20200134470

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for implementing long-short term memory layers with compressed gating functions. One of the systems includes a first long short-term memory (LSTM) layer, wherein the first LSTM layer is configured to, for each of the plurality of time steps, generate a new layer state and a new layer output by applying a plurality of gates to a current layer input, a current layer state, and a current layer output, each of the plurality of gates being configured to, for each of the plurality of time steps, generate a respective intermediate gate output vector by multiplying a gate input vector and a gate parameter matrix. The gate parameter matrix for at least one of the plurality of gates is a structured matrix or is defined by a compressed parameter matrix and a projection matrix.

Type: Application

Filed: December 23, 2019

Publication date: April 30, 2020

Inventors: Tara N. Sainath, Vikas Sindhwani
CONVOLUTIONAL, LONG SHORT-TERM MEMORY, FULLY CONNECTED DEEP NEURAL NETWORKS

Publication number: 20200135227

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for identifying the language of a spoken utterance. One of the methods includes receiving input features of an utterance; and processing the input features using an acoustic model that comprises one or more convolutional neural network (CNN) layers, one or more long short-term memory network (LSTM) layers, and one or more fully connected neural network layers to generate a transcription for the utterance.

Type: Application

Filed: December 31, 2019

Publication date: April 30, 2020

Inventors: Tara N. Sainath, Andrew W. Senior, Oriol Vinyals, Hasim Sak
ADAPTIVE AUDIO ENHANCEMENT FOR MULTICHANNEL SPEECH RECOGNITION

Publication number: 20200118553

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for neural network adaptive beamforming for multichannel speech recognition are disclosed. In one aspect, a method includes the actions of receiving a first channel of audio data corresponding to an utterance and a second channel of audio data corresponding to the utterance. The actions further include generating a first set of filter parameters for a first filter based on the first channel of audio data and the second channel of audio data and a second set of filter parameters for a second filter based on the first channel of audio data and the second channel of audio data. The actions further include generating a single combined channel of audio data. The actions further include inputting the audio data to a neural network. The actions further include providing a transcription for the utterance.

Type: Application

Filed: December 10, 2019

Publication date: April 16, 2020

Inventors: Bo Li, Ron J. Weiss, Michiel A.U. Bacchiani, Tara N. Sainath, Kevin William Wilson
LEARNING FRONT-END SPEECH RECOGNITION PARAMETERS WITHIN NEURAL NETWORK TRAINING

Publication number: 20200058296

Abstract: Techniques for learning front-end speech recognition parameters as part of training a neural network classifier include obtaining an input speech signal, and applying front-end speech recognition parameters to extract features from the input speech signal. The extracted features may be fed through a neural network to obtain an output classification for the input speech signal, and an error measure may be computed for the output classification through comparison of the output classification with a known target classification. Back propagation may be applied to adjust one or more of the front-end parameters as one or more layers of the neural network, based on the error measure.

Type: Application

Filed: July 23, 2019

Publication date: February 20, 2020

Applicant: Nuance Communications, Inc.

Inventors: Tara N. Sainath, Brian E. D. Kingsbury, Abdel-rahman Mohamed, Bhuvana Ramabhadran
CONVOLUTIONAL NEURAL NETWORKS

Publication number: 20200051551

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for keyword spotting. One of the methods includes training, by a keyword detection system, a convolutional neural network for keyword detection by providing a two-dimensional set of input values to the convolutional neural network, the input values including a first dimension in time and a second dimension in frequency, and performing convolutional multiplication on the two-dimensional set of input values for a filter using a frequency stride greater than one to generate a feature map.

Type: Application

Filed: October 16, 2019

Publication date: February 13, 2020

Applicant: Google LLC

Inventors: Tara N. Sainath, Maria Carolina Parada San Martin
MINIMUM WORD ERROR RATE TRAINING FOR ATTENTION-BASED SEQUENCE-TO-SEQUENCE MODELS

Publication number: 20200043483

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer-readable storage media, for speech recognition using attention-based sequence-to-sequence models. In some implementations, audio data indicating acoustic characteristics of an utterance is received. A sequence of feature vectors indicative of the acoustic characteristics of the utterance is generated. The sequence of feature vectors is processed using a speech recognition model that has been trained using a loss function that uses N-best lists of decoded hypotheses, the speech recognition model including an encoder, an attention module, and a decoder. The encoder and decoder each include one or more recurrent neural network layers. A sequence of output vectors representing distributions over a predetermined set of linguistic units is obtained. A transcription for the utterance is obtained based on the sequence of output vectors. Data indicating the transcription of the utterance is provided.

Type: Application

Filed: August 1, 2019

Publication date: February 6, 2020

Inventors: Rohit Prakash Prabhavalkar, Tara N. Sainath, Yonghui Wu, Patrick An Phu Nguyen, Zhifeng Chen, Chung-Cheng Chiu, Anjuli Patricia Kannan
SPEECH RECOGNITION WITH SEQUENCE-TO-SEQUENCE MODELS

Publication number: 20200027444

Abstract: Methods, systems, and apparatus, including computer-readable media, for performing speech recognition using sequence-to-sequence models. An automated speech recognition (ASR) system receives audio data for an utterance and provides features indicative of acoustic characteristics of the utterance as input to an encoder. The system processes an output of the encoder using an attender to generate a context vector and generates speech recognition scores using the context vector and a decoder trained using a training process that selects at least one input to the decoder with a predetermined probability. An input to the decoder during training is selected between input data based on a known value for an element in a training example, and input data based on an output of the decoder for the element in the training example. A transcription is generated for the utterance using word elements selected based on the speech recognition scores. The transcription is provided as an output of the ASR system.

Type: Application

Filed: July 19, 2019

Publication date: January 23, 2020

Inventors: Rohit Prakash Prabhavalkar, Zhifeng Chen, Bo Li, Chung-Cheng Chiu, Kanury Kanishka Rao, Yonghui Wu, Ron J. Weiss, Navdeep Jaitly, Michiel A.U. Bacchiani, Tara N. Sainath, Jan Kazimierz Chorowski, Anjuli Patricia Kannan, Ekaterina Gonina, Patrick An Phu Nguyen
Adaptive audio enhancement for multichannel speech recognition

Patent number: 10515626

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for neural network adaptive beamforming for multichannel speech recognition are disclosed. In one aspect, a method includes the actions of receiving a first channel of audio data corresponding to an utterance and a second channel of audio data corresponding to the utterance. The actions further include generating a first set of filter parameters for a first filter based on the first channel of audio data and the second channel of audio data and a second set of filter parameters for a second filter based on the first channel of audio data and the second channel of audio data. The actions further include generating a single combined channel of audio data. The actions further include inputting the audio data to a neural network. The actions further include providing a transcription for the utterance.

Type: Grant

Filed: December 20, 2017

Date of Patent: December 24, 2019

Assignee: Google LLC

Inventors: Bo Li, Ron J. Weiss, Michiel A. U. Bacchiani, Tara N. Sainath, Kevin William Wilson
Compressed recurrent neural network models

Patent number: 10515307

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for implementing long-short term memory layers with compressed gating functions. One of the systems includes a first long short-term memory (LSTM) layer, wherein the first LSTM layer is configured to, for each of the plurality of time steps, generate a new layer state and a new layer output by applying a plurality of gates to a current layer input, a current layer state, and a current layer output, each of the plurality of gates being configured to, for each of the plurality of time steps, generate a respective intermediate gate output vector by multiplying a gate input vector and a gate parameter matrix. The gate parameter matrix for at least one of the plurality of gates is a structured matrix or is defined by a compressed parameter matrix and a projection matrix.

Type: Grant

Filed: June 3, 2016

Date of Patent: December 24, 2019

Assignee: Google LLC

Inventors: Tara N. Sainath, Vikas Sindhwani
PROCESSING AUDIO WAVEFORMS

Publication number: 20190378498

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for processing audio waveforms. In some implementations, a time-frequency feature representation is generated based on audio data. The time-frequency feature representation is input to an acoustic model comprising a trained artificial neural network. The trained artificial neural network comprising a frequency convolution layer, a memory layer, and one or more hidden layers. An output that is based on output of the trained artificial neural network is received. A transcription is provided, where the transcription is determined based on the output of the acoustic model.

Type: Application

Filed: August 15, 2019

Publication date: December 12, 2019

Inventors: Tara N. Sainath, Ron J. Weiss, Andrew W. Senior, Kevin William Wilson
Processing audio waveforms

Patent number: 10403269

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for processing audio waveforms. In some implementations, a time-frequency feature representation is generated based on audio data. The time-frequency feature representation is input to an acoustic model comprising a trained artificial neural network. The trained artificial neural network comprising a frequency convolution layer, a memory layer, and one or more hidden layers. An output that is based on output of the trained artificial neural network is received. A transcription is provided, where the transcription is determined based on the output of the acoustic model.

Type: Grant

Filed: March 25, 2016

Date of Patent: September 3, 2019

Inventors: Tara N. Sainath, Ron J. Weiss, Andrew W. Senior, Kevin William Wilson
ENHANCED MULTI-CHANNEL ACOUSTIC MODELS

Publication number: 20190259409

Abstract: This specification describes computer-implemented methods and systems. One method includes receiving, by a neural network of a speech recognition system, first data representing a first raw audio signal and second data representing a second raw audio signal. The first raw audio signal and the second raw audio signal describe audio occurring at a same period of time. The method further includes generating, by a spatial filtering layer of the neural network, a spatial filtered output using the first data and the second data, and generating, by a spectral filtering layer of the neural network, a spectral filtered output using the spatial filtered output. Generating the spectral filtered output comprises processing frequency-domain data representing the spatial filtered output. The method still further includes processing, by one or more additional layers of the neural network, the spectral filtered output to predict sub-word units encoded in both the first raw audio signal and the second raw audio signal.

Type: Application

Filed: February 19, 2019

Publication date: August 22, 2019

Inventors: Ehsan Variani, Kevin William Wilson, Ron J. Weiss, Tara N. Sainath, Arun Narayanan
Learning front-end speech recognition parameters within neural network training

Patent number: 10360901

Abstract: Techniques for learning front-end speech recognition parameters as part of training a neural network classifier include obtaining an input speech signal, and applying front-end speech recognition parameters to extract features from the input speech signal. The extracted features may be fed through a neural network to obtain an output classification for the input speech signal, and an error measure may be computed for the output classification through comparison of the output classification with a known target classification. Back propagation may be applied to adjust one or more of the front-end parameters as one or more layers of the neural network, based on the error measure.

Type: Grant

Filed: December 5, 2014

Date of Patent: July 23, 2019

Assignee: Nuance Communications, Inc.

Inventors: Tara N. Sainath, Brian E. D. Kingsbury, Abdel-rahman Mohamed, Bhuvana Ramabhadran
Multichannel raw-waveform neural networks

Patent number: 10339921

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for using neural networks. One of the methods includes receiving, by a neural network in a speech recognition system, first data representing a first raw audio signal and second data representing a second raw audio signal, the first raw audio signal and the second raw audio signal for the same period of time, generating, by a spatial filtering convolutional layer in the neural network, a spatial filtered output the first data and the second data, generating, by a spectral filtering convolutional layer in the neural network, a spectral filtered output using the spatial filtered output, and processing, by one or more additional layers in the neural network, the spectral filtered output to predict sub-word units encoded in both the first raw audio signal and the second raw audio signal.

Type: Grant

Filed: January 4, 2016

Date of Patent: July 2, 2019

Assignee: Google LLC

Inventors: Tara N. Sainath, Ron J. Weiss, Kevin William Wilson
COMPLEX LINEAR PROJECTION FOR ACOUSTIC MODELING

Publication number: 20190115013

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for speech recognition using complex linear projection are disclosed. In one aspect, a method includes the actions of receiving audio data corresponding to an utterance. The method further includes generating frequency domain data using the audio data. The method further includes processing the frequency domain data using complex linear projection. The method further includes providing the processed frequency domain data to a neural network trained as an acoustic model. The method further includes generating a transcription for the utterance that is determined based at least on output that the neural network provides in response to receiving the processed frequency domain data.

Type: Application

Filed: October 26, 2018

Publication date: April 18, 2019

Inventors: Samuel Bengio, Mirko Visontai, Christopher Walter George Thornton, Michiel A.U. Bacchiani, Tara N. Sainath, Ehsan Variani, Izhak Shafran
Voice activity detection

Patent number: 10229700

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for detecting voice activity. In one aspect, a method include actions of receiving, by a neural network included in an automated voice activity detection system, a raw audio waveform, processing, by the neural network, the raw audio waveform to determine whether the audio waveform includes speech, and provide, by the neural network, a classification of the raw audio waveform indicating whether the raw audio waveform includes speech.

Type: Grant

Filed: January 4, 2016

Date of Patent: March 12, 2019

Assignee: Google LLC

Inventors: Tara N. Sainath, Gabor Simko, Maria Carolina Parada San Martin, Ruben Zazo Candil
Enhanced multi-channel acoustic models

Patent number: 10224058

Abstract: This specification describes computer-implemented methods and systems. One method includes receiving, by a neural network of a speech recognition system, first data representing a first raw audio signal and second data representing a second raw audio signal. The first raw audio signal and the second raw audio signal describe audio occurring at a same period of time. The method further includes generating, by a spatial filtering layer of the neural network, a spatial filtered output using the first data and the second data, and generating, by a spectral filtering layer of the neural network, a spectral filtered output using the spatial filtered output. Generating the spectral filtered output comprises processing frequency-domain data representing the spatial filtered output. The method still further includes processing, by one or more additional layers of the neural network, the spectral filtered output to predict sub-word units encoded in both the first raw audio signal and the second raw audio signal.

Type: Grant

Filed: November 14, 2016

Date of Patent: March 5, 2019

Assignee: Google LLC

Inventors: Ehsan Variani, Kevin William Wilson, Ron J. Weiss, Tara N. Sainath, Arun Narayanan
System and method for generating content corresponding to an event

Patent number: 10180974

Abstract: Systems and methods for generating content corresponding to an event are provided. A method for generating content corresponding to an event, comprises defining a plurality of sub-events of the event, classifying one or more actual occurrences in the event into one or more of the sub-events, monitoring behavior of one or more users to determine areas of the event of interest to the one or more users, linking the one or more users to the one or more classified actual occurrences based on the areas of the event of interest, and generating content for the one or more classified actual occurrences.

Type: Grant

Filed: September 16, 2014

Date of Patent: January 15, 2019

Assignee: International Business Machines Corporation

Inventors: Aleksandr Y. Aravkin, Carlos H. Cardonha, Sasha P. Caskey, Dimitri Kanevsky, Tara N. Sainath

prev 1 2 3 4 5 6 7 8 9 … next