Patents by Inventor Andrew W. Senior

Andrew W. Senior has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Context-dependent modeling of phonemes

Patent number: 9818409

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media for modeling phonemes. One method includes receiving an acoustic sequence, the acoustic sequence representing an utterance, and the acoustic sequence comprising a respective acoustic feature representation at each of a plurality of time steps; for each of the plurality of time steps: processing the acoustic feature representation through each of one or more recurrent neural network layers to generate a recurrent output; processing the recurrent output using a softmax output layer to generate a set of scores, the set of scores comprising a respective score for each of a plurality of context dependent vocabulary phonemes, the score for each context dependent vocabulary phoneme representing a likelihood that the context dependent vocabulary phoneme represents the utterance at the time step; and determining, from the scores for the plurality of time steps, a context dependent phoneme representation of the sequence.

Type: Grant

Filed: October 7, 2015

Date of Patent: November 14, 2017

Assignee: Google Inc.

Inventors: Andrew W. Senior, Hasim Sak, Izhak Shafran
Generating acoustic models

Patent number: 9786270

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for generating acoustic models. In some implementations, a first neural network trained as an acoustic model using the connectionist temporal classification algorithm is obtained. Output distributions from the first neural network are obtained for an utterance. A second neural network is trained as an acoustic model using the output distributions produced by the first neural network as output targets for the second neural network. An automated speech recognizer configured to use the trained second neural network is provided.

Type: Grant

Filed: July 8, 2016

Date of Patent: October 10, 2017

Assignee: Google Inc.

Inventors: Andrew W. Senior, Hasim Sak, Kanury Kanishka Rao
Generating representations of acoustic sequences

Patent number: 9721562

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating representation of acoustic sequences. One of the methods includes: receiving an acoustic sequence, the acoustic sequence comprising a respective acoustic feature representation at each of a plurality of time steps; processing the acoustic feature representation at an initial time step using an acoustic modeling neural network; for each subsequent time step of the plurality of time steps: receiving an output generated by the acoustic modeling neural network for a preceding time step, generating a modified input from the output generated by the acoustic modeling neural network for the preceding time step and the acoustic representation for the time step, and processing the modified input using the acoustic modeling neural network to generate an output for the time step; and generating a phoneme representation for the utterance from the outputs for each of the time steps.

Type: Grant

Filed: December 3, 2014

Date of Patent: August 1, 2017

Assignee: Google Inc.

Inventors: Hasim Sak, Andrew W. Senior
Processing multi-channel audio waveforms

Patent number: 9697826

Abstract: Methods, including computer programs encoded on a computer storage medium, for enhancing the processing of audio waveforms for speech recognition using various neural network processing techniques. In one aspect, a method includes: receiving multiple channels of audio data corresponding to an utterance; convolving each of multiple filters, in a time domain, with each of the multiple channels of audio waveform data to generate convolution outputs, wherein the multiple filters have parameters that have been learned during a training process that jointly trains the multiple filters and trains a deep neural network as an acoustic model; combining, for each of the multiple filters, the convolution outputs for the filter for the multiple channels of audio waveform data; inputting the combined convolution outputs to the deep neural network trained jointly with the multiple filters; and providing a transcription for the utterance that is determined.

Type: Grant

Filed: July 8, 2016

Date of Patent: July 4, 2017

Assignee: Google Inc.

Inventors: Tara N. Sainath, Ron J. Weiss, Kevin William Wilson, Andrew W. Senior, Arun Narayanan, Yedid Hoshen, Michiel A. U. Bacchiani
PROCESSING ACOUSTIC SEQUENCES USING LONG SHORT-TERM MEMORY (LSTM) NEURAL NETWORKS THAT INCLUDE RECURRENT PROJECTION LAYERS

Publication number: 20170186420

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating phoneme representations of acoustic sequences using projection sequences. One of the methods includes receiving an acoustic sequence, the acoustic sequence representing an utterance, and the acoustic sequence comprising a respective acoustic feature representation at each of a plurality of time steps; for each of the plurality of time steps, processing the acoustic feature representation through each of one or more long short-term memory (LSTM) layers; and for each of the plurality of time steps, processing the recurrent projected output generated by the highest LSTM layer for the time step using an output layer to generate a set of scores for the time step.

Type: Application

Filed: March 9, 2017

Publication date: June 29, 2017

Inventors: Hasim Sak, Andrew W. Senior
LATENCY CONSTRAINTS FOR ACOUSTIC MODELING

Publication number: 20170103752

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media for acoustic modeling of audio data. One method includes receiving audio data representing a portion of an utterance, providing the audio data to a trained recurrent neural network that has been trained to indicate the occurrence of a phone at any of multiple time frames within a maximum delay of receiving audio data corresponding to the phone, receiving, within the predetermined maximum delay of providing the audio data to the trained recurrent neural network, output of the trained neural network indicating a phone corresponding to the provided audio data using output of the trained neural network to determine a transcription for the utterance, and providing the transcription for the utterance.

Type: Application

Filed: October 9, 2015

Publication date: April 13, 2017

Inventors: Andrew W. Senior, Hasim Sak, Kanury Kanishka Rao
Processing acoustic sequences using long short-term memory (LSTM) neural networks that include recurrent projection layers

Patent number: 9620108

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating phoneme representations of acoustic sequences using projection sequences. One of the methods includes receiving an acoustic sequence, the acoustic sequence representing an utterance, and the acoustic sequence comprising a respective acoustic feature representation at each of a plurality of time steps; for each of the plurality of time steps, processing the acoustic feature representation through each of one or more long short-term memory (LSTM) layers; and for each of the plurality of time steps, processing the recurrent projected output generated by the highest LSTM layer for the time step using an output layer to generate a set of scores for the time step.

Type: Grant

Filed: December 2, 2014

Date of Patent: April 11, 2017

Assignee: Google Inc.

Inventors: Hasim Sak, Andrew W. Senior
Multisensory speech detection

Patent number: 9570094

Abstract: A computer-implemented method of multisensory speech detection is disclosed. The method comprises determining an orientation of a mobile device and determining an operating mode of the mobile device based on the orientation of the mobile device. The method further includes identifying speech detection parameters that specify when speech detection begins or ends based on the determined operating mode and detecting speech from a user of the mobile device based on the speech detection parameters.

Type: Grant

Filed: June 29, 2015

Date of Patent: February 14, 2017

Assignee: Google Inc.

Inventors: Dave Burke, Michael J. LeBeau, Konrad Gianno, Trausti T. Kristjansson, John Nicholas Jitkoff, Andrew W. Senior
FREQUENCY WARPING IN A SPEECH RECOGNITION SYSTEM

Publication number: 20170032802

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for receiving a sequence representing an utterance, the sequence comprising a plurality of audio frames; determining one or more warping factors for each audio frame in the sequence using a warping neural network; applying, for each audio frame, the one or more warping factors for the audio frame to the audio frame to generate a respective modified audio frame, wherein the applying comprises using at least one of the warping factors to scale a respective frequency of the audio frame to a new respective frequency in the respective modified audio frame; and decoding the modified audio frames using a decoding neural network, wherein the decoding neural network is configured to output a word sequence that is a transcription of the utterance.

Type: Application

Filed: July 27, 2016

Publication date: February 2, 2017

Inventor: Andrew W. Senior
GENERATING ACOUSTIC MODELS

Publication number: 20170011738

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for generating acoustic models. In some implementations, a first neural network trained as an acoustic model using the connectionist temporal classification algorithm is obtained. Output distributions from the first neural network are obtained for an utterance. A second neural network is trained as an acoustic model using the output distributions produced by the first neural network as output targets for the second neural network. An automated speech recognizer configured to use the trained second neural network is provided.

Type: Application

Filed: July 8, 2016

Publication date: January 12, 2017

Inventors: Andrew W. Senior, Hasim Sak, Kanury Kanishka Rao
CONTEXT-DEPENDENT MODELING OF PHONEMES

Publication number: 20160372118

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media for modeling phonemes. One method includes receiving an acoustic sequence, the acoustic sequence representing an utterance, and the acoustic sequence comprising a respective acoustic feature representation at each of a plurality of time steps; for each of the plurality of time steps: processing the acoustic feature representation through each of one or more recurrent neural network layers to generate a recurrent output; processing the recurrent output using a softmax output layer to generate a set of scores, the set of scores comprising a respective score for each of a plurality of context dependent vocabulary phonemes, the score for each context dependent vocabulary phoneme representing a likelihood that the context dependent vocabulary phoneme represents the utterance at the time step; and determining, from the scores for the plurality of time steps, a context dependent phoneme representation of the sequence.

Type: Application

Filed: October 7, 2015

Publication date: December 22, 2016

Inventors: Andrew W. Senior, Hasim Sak, Izhak Shafran
SPEECH RECOGNITION WITH ACOUSTIC MODELS

Publication number: 20160372119

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media for learning pronunciations from acoustic sequences. One method includes receiving an acoustic sequence, the acoustic sequence representing an utterance, and the acoustic sequence comprising a sequence of multiple frames of acoustic data at each of a plurality of time steps; stacking one or more frames of acoustic data to generate a sequence of modified frames of acoustic data; processing the sequence of modified frames of acoustic data through an acoustic modeling neural network comprising one or more recurrent neural network (RNN) layers and a final CTC output layer to generate a neural network output, wherein processing the sequence of modified frames of acoustic data comprises: sub sampling the modified frames of acoustic data; and processing each subsampled modified frame of acoustic data through the acoustic modeling neural network.

Type: Application

Filed: December 29, 2015

Publication date: December 22, 2016

Inventors: Hasim Sak, Andrew W. Senior
PROCESSING MULTI-CHANNEL AUDIO WAVEFORMS

Publication number: 20160322055

Abstract: Methods, including computer programs encoded on a computer storage medium, for enhancing the processing of audio waveforms for speech recognition using various neural network processing techniques. In one aspect, a method includes: receiving multiple channels of audio data corresponding to an utterance; convolving each of multiple filters, in a time domain, with each of the multiple channels of audio waveform data to generate convolution outputs, wherein the multiple filters have parameters that have been learned during a training process that jointly trains the multiple filters and trains a deep neural network as an acoustic model; combining, for each of the multiple filters, the convolution outputs for the filter for the multiple channels of audio waveform data; inputting the combined convolution outputs to the deep neural network trained jointly with the multiple filters; and providing a transcription for the utterance that is determined.

Type: Application

Filed: July 8, 2016

Publication date: November 3, 2016

Inventors: Tara N. Sainath, Ron J. Weiss, Kevin William Wilson, Andrew W. Senior, Arun Narayanan, Yedid Hoshen, Michiel A.U. Bacchiani
Deep networks for unit selection speech synthesis

Patent number: 9460704

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for providing a representation based on structured data in resources. The methods, systems, and apparatus include actions of receiving target acoustic features output from a neural network that has been trained to predict acoustic features given linguistic features. Additional actions include determining a distance between the target acoustic features and acoustic features of a stored acoustic sample. Further actions include selecting the acoustic sample to be used in speech synthesis based at least on the determined distance and synthesizing speech based on the selected acoustic sample.

Type: Grant

Filed: September 6, 2013

Date of Patent: October 4, 2016

Assignee: Google Inc.

Inventors: Andrew W. Senior, Javier Gonzalvo Fructuoso
PROCESSING AUDIO WAVEFORMS

Publication number: 20160284347

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for processing audio waveforms. In some implementations, a time-frequency feature representation is generated based on audio data. The time-frequency feature representation is input to an acoustic model comprising a trained artificial neural network. The trained artificial neural network comprising a frequency convolution layer, a memory layer, and one or more hidden layers. An output that is based on output of the trained artificial neural network is received. A transcription is provided, where the transcription is determined based on the output of the acoustic model.

Type: Application

Filed: March 25, 2016

Publication date: September 29, 2016

Inventors: Tara N. Sainath, Ron J. Weiss, Andrew W. Senior, Kevin William Wilson
Cluster specific speech model

Patent number: 9401143

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for receiving data representing acoustic characteristics of a user's voice; selecting a cluster for the data from among a plurality of clusters, where each cluster includes a plurality of vectors, and where each cluster is associated with a speech model trained by a neural network using at least one or more vectors of the plurality of vectors in the respective cluster; and in response to receiving one or more utterances of the user, providing the speech model associated with the cluster for transcribing the one or more utterances.

Type: Grant

Filed: March 20, 2015

Date of Patent: July 26, 2016

Assignee: Google Inc.

Inventors: Andrew W. Senior, Ignacio Lopez Moreno
CONVOLUTIONAL, LONG SHORT-TERM MEMORY, FULLY CONNECTED DEEP NEURAL NETWORKS

Publication number: 20160099010

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for identifying the language of a spoken utterance. One of the methods includes receiving input features of an utterance; and processing the input features using an acoustic model that comprises one or more convolutional neural network (CNN) layers, one or more long short-term memory network (LSTM) layers, and one or more fully connected neural network layers to generate a transcription for the utterance.

Type: Application

Filed: September 8, 2015

Publication date: April 7, 2016

Inventors: Tara N. Sainath, Andrew W. Senior, Oriol Vinyals, Hasim Sak
MULTILINGUAL PROSODY GENERATION

Publication number: 20160071512

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for multilingual prosody generation. In some implementations, data indicating a set of linguistic features corresponding to a text is obtained. Data indicating the linguistic features and data indicating the language of the text are provided as input to a neural network that has been trained to provide output indicating prosody information for multiple languages. The neural network can be a neural network having been trained using speech in multiple languages. Output indicating prosody information for the linguistic features is received from the neural network. Audio data representing the text is generated using the output of the neural network.

Type: Application

Filed: November 16, 2015

Publication date: March 10, 2016

Inventors: Javier Gonzalvo Fructuoso, Andrew W. Senior, Byungha Chun
CACHING SPEECH RECOGNITION SCORES

Publication number: 20150371631

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for caching speech recognition scores. In some implementations, one or more values comprising data about an utterance are received. An index value is determined for the one or more values. An acoustic model score for the one or more received values is selected, from a cache of acoustic model scores that were computed before receiving the one or more values, based on the index value. A transcription for the utterance is determined using the selected acoustic model score.

Type: Application

Filed: June 23, 2014

Publication date: December 24, 2015

Inventors: Eugene Weinstein, Sanjiv Kumar, Ignacio L. Moreno, Andrew W. Senior, Nikhil Prasad Bhat
Multilingual prosody generation

Patent number: 9195656

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for multilingual prosody generation. In some implementations, data indicating a set of linguistic features corresponding to a text is obtained. Data indicating the linguistic features and data indicating the language of the text are provided as input to a neural network that has been trained to provide output indicating prosody information for multiple languages. The neural network can be a neural network having been trained using speech in multiple languages. Output indicating prosody information for the linguistic features is received from the neural network. Audio data representing the text is generated using the output of the neural network.

Type: Grant

Filed: December 30, 2013

Date of Patent: November 24, 2015

Assignee: Google Inc.

Inventors: Javier Gonzalvo Fructuoso, Andrew W. Senior, Byungha Chun

prev 1 2 3 4 5 6 7 next