Patents by Inventor Andrew W. Senior

Andrew W. Senior has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 9818409
    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media for modeling phonemes. One method includes receiving an acoustic sequence, the acoustic sequence representing an utterance, and the acoustic sequence comprising a respective acoustic feature representation at each of a plurality of time steps; for each of the plurality of time steps: processing the acoustic feature representation through each of one or more recurrent neural network layers to generate a recurrent output; processing the recurrent output using a softmax output layer to generate a set of scores, the set of scores comprising a respective score for each of a plurality of context dependent vocabulary phonemes, the score for each context dependent vocabulary phoneme representing a likelihood that the context dependent vocabulary phoneme represents the utterance at the time step; and determining, from the scores for the plurality of time steps, a context dependent phoneme representation of the sequence.
    Type: Grant
    Filed: October 7, 2015
    Date of Patent: November 14, 2017
    Assignee: Google Inc.
    Inventors: Andrew W. Senior, Hasim Sak, Izhak Shafran
  • Patent number: 9786270
    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for generating acoustic models. In some implementations, a first neural network trained as an acoustic model using the connectionist temporal classification algorithm is obtained. Output distributions from the first neural network are obtained for an utterance. A second neural network is trained as an acoustic model using the output distributions produced by the first neural network as output targets for the second neural network. An automated speech recognizer configured to use the trained second neural network is provided.
    Type: Grant
    Filed: July 8, 2016
    Date of Patent: October 10, 2017
    Assignee: Google Inc.
    Inventors: Andrew W. Senior, Hasim Sak, Kanury Kanishka Rao
  • Patent number: 9721562
    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating representation of acoustic sequences. One of the methods includes: receiving an acoustic sequence, the acoustic sequence comprising a respective acoustic feature representation at each of a plurality of time steps; processing the acoustic feature representation at an initial time step using an acoustic modeling neural network; for each subsequent time step of the plurality of time steps: receiving an output generated by the acoustic modeling neural network for a preceding time step, generating a modified input from the output generated by the acoustic modeling neural network for the preceding time step and the acoustic representation for the time step, and processing the modified input using the acoustic modeling neural network to generate an output for the time step; and generating a phoneme representation for the utterance from the outputs for each of the time steps.
    Type: Grant
    Filed: December 3, 2014
    Date of Patent: August 1, 2017
    Assignee: Google Inc.
    Inventors: Hasim Sak, Andrew W. Senior
  • Patent number: 9697826
    Abstract: Methods, including computer programs encoded on a computer storage medium, for enhancing the processing of audio waveforms for speech recognition using various neural network processing techniques. In one aspect, a method includes: receiving multiple channels of audio data corresponding to an utterance; convolving each of multiple filters, in a time domain, with each of the multiple channels of audio waveform data to generate convolution outputs, wherein the multiple filters have parameters that have been learned during a training process that jointly trains the multiple filters and trains a deep neural network as an acoustic model; combining, for each of the multiple filters, the convolution outputs for the filter for the multiple channels of audio waveform data; inputting the combined convolution outputs to the deep neural network trained jointly with the multiple filters; and providing a transcription for the utterance that is determined.
    Type: Grant
    Filed: July 8, 2016
    Date of Patent: July 4, 2017
    Assignee: Google Inc.
    Inventors: Tara N. Sainath, Ron J. Weiss, Kevin William Wilson, Andrew W. Senior, Arun Narayanan, Yedid Hoshen, Michiel A. U. Bacchiani
  • Publication number: 20170186420
    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating phoneme representations of acoustic sequences using projection sequences. One of the methods includes receiving an acoustic sequence, the acoustic sequence representing an utterance, and the acoustic sequence comprising a respective acoustic feature representation at each of a plurality of time steps; for each of the plurality of time steps, processing the acoustic feature representation through each of one or more long short-term memory (LSTM) layers; and for each of the plurality of time steps, processing the recurrent projected output generated by the highest LSTM layer for the time step using an output layer to generate a set of scores for the time step.
    Type: Application
    Filed: March 9, 2017
    Publication date: June 29, 2017
    Inventors: Hasim Sak, Andrew W. Senior
  • Publication number: 20170103752
    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media for acoustic modeling of audio data. One method includes receiving audio data representing a portion of an utterance, providing the audio data to a trained recurrent neural network that has been trained to indicate the occurrence of a phone at any of multiple time frames within a maximum delay of receiving audio data corresponding to the phone, receiving, within the predetermined maximum delay of providing the audio data to the trained recurrent neural network, output of the trained neural network indicating a phone corresponding to the provided audio data using output of the trained neural network to determine a transcription for the utterance, and providing the transcription for the utterance.
    Type: Application
    Filed: October 9, 2015
    Publication date: April 13, 2017
    Inventors: Andrew W. Senior, Hasim Sak, Kanury Kanishka Rao
  • Patent number: 9620108
    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating phoneme representations of acoustic sequences using projection sequences. One of the methods includes receiving an acoustic sequence, the acoustic sequence representing an utterance, and the acoustic sequence comprising a respective acoustic feature representation at each of a plurality of time steps; for each of the plurality of time steps, processing the acoustic feature representation through each of one or more long short-term memory (LSTM) layers; and for each of the plurality of time steps, processing the recurrent projected output generated by the highest LSTM layer for the time step using an output layer to generate a set of scores for the time step.
    Type: Grant
    Filed: December 2, 2014
    Date of Patent: April 11, 2017
    Assignee: Google Inc.
    Inventors: Hasim Sak, Andrew W. Senior
  • Patent number: 9570094
    Abstract: A computer-implemented method of multisensory speech detection is disclosed. The method comprises determining an orientation of a mobile device and determining an operating mode of the mobile device based on the orientation of the mobile device. The method further includes identifying speech detection parameters that specify when speech detection begins or ends based on the determined operating mode and detecting speech from a user of the mobile device based on the speech detection parameters.
    Type: Grant
    Filed: June 29, 2015
    Date of Patent: February 14, 2017
    Assignee: Google Inc.
    Inventors: Dave Burke, Michael J. LeBeau, Konrad Gianno, Trausti T. Kristjansson, John Nicholas Jitkoff, Andrew W. Senior
  • Publication number: 20170032802
    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for receiving a sequence representing an utterance, the sequence comprising a plurality of audio frames; determining one or more warping factors for each audio frame in the sequence using a warping neural network; applying, for each audio frame, the one or more warping factors for the audio frame to the audio frame to generate a respective modified audio frame, wherein the applying comprises using at least one of the warping factors to scale a respective frequency of the audio frame to a new respective frequency in the respective modified audio frame; and decoding the modified audio frames using a decoding neural network, wherein the decoding neural network is configured to output a word sequence that is a transcription of the utterance.
    Type: Application
    Filed: July 27, 2016
    Publication date: February 2, 2017
    Inventor: Andrew W. Senior
  • Publication number: 20170011738
    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for generating acoustic models. In some implementations, a first neural network trained as an acoustic model using the connectionist temporal classification algorithm is obtained. Output distributions from the first neural network are obtained for an utterance. A second neural network is trained as an acoustic model using the output distributions produced by the first neural network as output targets for the second neural network. An automated speech recognizer configured to use the trained second neural network is provided.
    Type: Application
    Filed: July 8, 2016
    Publication date: January 12, 2017
    Inventors: Andrew W. Senior, Hasim Sak, Kanury Kanishka Rao
  • Publication number: 20160372118
    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media for modeling phonemes. One method includes receiving an acoustic sequence, the acoustic sequence representing an utterance, and the acoustic sequence comprising a respective acoustic feature representation at each of a plurality of time steps; for each of the plurality of time steps: processing the acoustic feature representation through each of one or more recurrent neural network layers to generate a recurrent output; processing the recurrent output using a softmax output layer to generate a set of scores, the set of scores comprising a respective score for each of a plurality of context dependent vocabulary phonemes, the score for each context dependent vocabulary phoneme representing a likelihood that the context dependent vocabulary phoneme represents the utterance at the time step; and determining, from the scores for the plurality of time steps, a context dependent phoneme representation of the sequence.
    Type: Application
    Filed: October 7, 2015
    Publication date: December 22, 2016
    Inventors: Andrew W. Senior, Hasim Sak, Izhak Shafran
  • Publication number: 20160372119
    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media for learning pronunciations from acoustic sequences. One method includes receiving an acoustic sequence, the acoustic sequence representing an utterance, and the acoustic sequence comprising a sequence of multiple frames of acoustic data at each of a plurality of time steps; stacking one or more frames of acoustic data to generate a sequence of modified frames of acoustic data; processing the sequence of modified frames of acoustic data through an acoustic modeling neural network comprising one or more recurrent neural network (RNN) layers and a final CTC output layer to generate a neural network output, wherein processing the sequence of modified frames of acoustic data comprises: sub sampling the modified frames of acoustic data; and processing each subsampled modified frame of acoustic data through the acoustic modeling neural network.
    Type: Application
    Filed: December 29, 2015
    Publication date: December 22, 2016
    Inventors: Hasim Sak, Andrew W. Senior
  • Publication number: 20160322055
    Abstract: Methods, including computer programs encoded on a computer storage medium, for enhancing the processing of audio waveforms for speech recognition using various neural network processing techniques. In one aspect, a method includes: receiving multiple channels of audio data corresponding to an utterance; convolving each of multiple filters, in a time domain, with each of the multiple channels of audio waveform data to generate convolution outputs, wherein the multiple filters have parameters that have been learned during a training process that jointly trains the multiple filters and trains a deep neural network as an acoustic model; combining, for each of the multiple filters, the convolution outputs for the filter for the multiple channels of audio waveform data; inputting the combined convolution outputs to the deep neural network trained jointly with the multiple filters; and providing a transcription for the utterance that is determined.
    Type: Application
    Filed: July 8, 2016
    Publication date: November 3, 2016
    Inventors: Tara N. Sainath, Ron J. Weiss, Kevin William Wilson, Andrew W. Senior, Arun Narayanan, Yedid Hoshen, Michiel A.U. Bacchiani
  • Patent number: 9460704
    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for providing a representation based on structured data in resources. The methods, systems, and apparatus include actions of receiving target acoustic features output from a neural network that has been trained to predict acoustic features given linguistic features. Additional actions include determining a distance between the target acoustic features and acoustic features of a stored acoustic sample. Further actions include selecting the acoustic sample to be used in speech synthesis based at least on the determined distance and synthesizing speech based on the selected acoustic sample.
    Type: Grant
    Filed: September 6, 2013
    Date of Patent: October 4, 2016
    Assignee: Google Inc.
    Inventors: Andrew W. Senior, Javier Gonzalvo Fructuoso
  • Publication number: 20160284347
    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for processing audio waveforms. In some implementations, a time-frequency feature representation is generated based on audio data. The time-frequency feature representation is input to an acoustic model comprising a trained artificial neural network. The trained artificial neural network comprising a frequency convolution layer, a memory layer, and one or more hidden layers. An output that is based on output of the trained artificial neural network is received. A transcription is provided, where the transcription is determined based on the output of the acoustic model.
    Type: Application
    Filed: March 25, 2016
    Publication date: September 29, 2016
    Inventors: Tara N. Sainath, Ron J. Weiss, Andrew W. Senior, Kevin William Wilson
  • Patent number: 9401143
    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for receiving data representing acoustic characteristics of a user's voice; selecting a cluster for the data from among a plurality of clusters, where each cluster includes a plurality of vectors, and where each cluster is associated with a speech model trained by a neural network using at least one or more vectors of the plurality of vectors in the respective cluster; and in response to receiving one or more utterances of the user, providing the speech model associated with the cluster for transcribing the one or more utterances.
    Type: Grant
    Filed: March 20, 2015
    Date of Patent: July 26, 2016
    Assignee: Google Inc.
    Inventors: Andrew W. Senior, Ignacio Lopez Moreno
  • Publication number: 20160099010
    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for identifying the language of a spoken utterance. One of the methods includes receiving input features of an utterance; and processing the input features using an acoustic model that comprises one or more convolutional neural network (CNN) layers, one or more long short-term memory network (LSTM) layers, and one or more fully connected neural network layers to generate a transcription for the utterance.
    Type: Application
    Filed: September 8, 2015
    Publication date: April 7, 2016
    Inventors: Tara N. Sainath, Andrew W. Senior, Oriol Vinyals, Hasim Sak
  • Publication number: 20160071512
    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for multilingual prosody generation. In some implementations, data indicating a set of linguistic features corresponding to a text is obtained. Data indicating the linguistic features and data indicating the language of the text are provided as input to a neural network that has been trained to provide output indicating prosody information for multiple languages. The neural network can be a neural network having been trained using speech in multiple languages. Output indicating prosody information for the linguistic features is received from the neural network. Audio data representing the text is generated using the output of the neural network.
    Type: Application
    Filed: November 16, 2015
    Publication date: March 10, 2016
    Inventors: Javier Gonzalvo Fructuoso, Andrew W. Senior, Byungha Chun
  • Publication number: 20150371631
    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for caching speech recognition scores. In some implementations, one or more values comprising data about an utterance are received. An index value is determined for the one or more values. An acoustic model score for the one or more received values is selected, from a cache of acoustic model scores that were computed before receiving the one or more values, based on the index value. A transcription for the utterance is determined using the selected acoustic model score.
    Type: Application
    Filed: June 23, 2014
    Publication date: December 24, 2015
    Inventors: Eugene Weinstein, Sanjiv Kumar, Ignacio L. Moreno, Andrew W. Senior, Nikhil Prasad Bhat
  • Patent number: 9195656
    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for multilingual prosody generation. In some implementations, data indicating a set of linguistic features corresponding to a text is obtained. Data indicating the linguistic features and data indicating the language of the text are provided as input to a neural network that has been trained to provide output indicating prosody information for multiple languages. The neural network can be a neural network having been trained using speech in multiple languages. Output indicating prosody information for the linguistic features is received from the neural network. Audio data representing the text is generated using the output of the neural network.
    Type: Grant
    Filed: December 30, 2013
    Date of Patent: November 24, 2015
    Assignee: Google Inc.
    Inventors: Javier Gonzalvo Fructuoso, Andrew W. Senior, Byungha Chun