Patents by Inventor Tara N. Sainath

Tara N. Sainath has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20230096821
    Abstract: A method of training a language model for rare-word speech recognition includes obtaining a set of training text samples, and obtaining a set of training utterances used for training a speech recognition model. Each training utterance in the plurality of training utterances includes audio data corresponding to an utterance and a corresponding transcription of the utterance. The method also includes applying rare word filtering on the set of training text samples to identify a subset of rare-word training text samples that include words that do not appear in the transcriptions from the set of training utterances or appear in the transcriptions from the set of training utterances less than a threshold number of times. The method further includes training the external language model on the transcriptions from the set of training utterances and the identified subset of rare-word training text samples.
    Type: Application
    Filed: December 13, 2021
    Publication date: March 30, 2023
    Inventors: Ronny Huang, Tara N. Sainath
  • Patent number: 11594212
    Abstract: A method includes receiving a training example for a listen-attend-spell (LAS) decoder of a two-pass streaming neural network model and determining whether the training example corresponds to a supervised audio-text pair or an unpaired text sequence. When the training example corresponds to an unpaired text sequence, the method also includes determining a cross entropy loss based on a log probability associated with a context vector of the training example. The method also includes updating the LAS decoder and the context vector based on the determined cross entropy loss.
    Type: Grant
    Filed: January 21, 2021
    Date of Patent: February 28, 2023
    Assignee: Google LLC
    Inventors: Tara N. Sainath, Ruoming Pang, Ron Weiss, Yanzhang He, Chung-Cheng Chiu, Trevor Strohman
  • Patent number: 11580956
    Abstract: A method includes receiving a training example that includes audio data representing a spoken utterance and a ground truth transcription. For each word in the spoken utterance, the method also includes inserting a placeholder symbol before the respective word identifying a respective ground truth alignment for a beginning and an end of the respective word, determining a beginning word piece and an ending word piece, and generating a first constrained alignment for the beginning word piece and a second constrained alignment for the ending word piece. The first constrained alignment is aligned with the ground truth alignment for the beginning of the respective word and the second constrained alignment is aligned with the ground truth alignment for the ending of the respective word. The method also includes constraining an attention head of a second pass decoder by applying the first and second constrained alignments.
    Type: Grant
    Filed: March 17, 2021
    Date of Patent: February 14, 2023
    Assignee: Google LLC
    Inventors: Tara N. Sainath, Basi Garcia, David Rybach, Trevor Strohman, Ruoming Pang
  • Patent number: 11545142
    Abstract: A method includes receiving audio data encoding an utterance, processing, using a speech recognition model, the audio data to generate speech recognition scores for speech elements, and determining context scores for the speech elements based on context data indicating a context for the utterance. The method also includes executing, using the speech recognition scores and the context scores, a beam search decoding process to determine one or more candidate transcriptions for the utterance. The method also includes selecting a transcription for the utterance from the one or more candidate transcriptions.
    Type: Grant
    Filed: March 24, 2020
    Date of Patent: January 3, 2023
    Assignee: Google LLC
    Inventors: Ding Zhao, Bo Li, Ruoming Pang, Tara N. Sainath, David Rybach, Deepti Bhatia, Zelin Wu
  • Publication number: 20220366897
    Abstract: A method includes receiving audio data encoding an utterance and obtaining a set of bias phrases corresponding to a context of the utterance. Each bias phrase includes one or more words. The method also includes processing, using a speech recognition model, acoustic features derived from the audio to generate an output from the speech recognition model. The speech recognition model includes a first encoder configured to receive the acoustic features, a bias encoder configured to receive data indicating the obtained set of bias phrases, a bias encoder, and a decoder configured to determine likelihoods of sequences of speech elements based on output of the first attention module and output of the bias attention module. The method also includes determining a transcript for the utterance based on the likelihoods of sequences of speech elements.
    Type: Application
    Filed: July 26, 2022
    Publication date: November 17, 2022
    Applicant: Google LLC
    Inventors: Rohit Prakash Prabhavalkar, Golan Pundak, Tara N. Sainath
  • Patent number: 11475880
    Abstract: A method includes receiving audio data of an utterance and processing the audio data to obtain, as output from a speech recognition model configured to jointly perform speech decoding and endpointing of utterances: partial speech recognition results for the utterance; and an endpoint indication indicating when the utterance has ended. While processing the audio data, the method also includes detecting, based on the endpoint indication, the end of the utterance. In response to detecting the end of the utterance, the method also includes terminating the processing of any subsequent audio data received after the end of the utterance was detected.
    Type: Grant
    Filed: March 4, 2020
    Date of Patent: October 18, 2022
    Assignee: Google LLC
    Inventors: Shuo-yiin Chang, Rohit Prakash Prabhavalkar, Gabor Simko, Tara N. Sainath, Bo Li, Yangzhang He
  • Patent number: 11468244
    Abstract: A method of transcribing speech using a multilingual end-to-end (E2E) speech recognition model includes receiving audio data for an utterance spoken in a particular native language, obtaining a language vector identifying the particular language, and processing, using the multilingual E2E speech recognition model, the language vector and acoustic features derived from the audio data to generate a transcription for the utterance. The multilingual E2E speech recognition model includes a plurality of language-specific adaptor modules that include one or more adaptor modules specific to the particular native language and one or more other adaptor modules specific to at least one other native language different than the particular native language. The method also includes providing the transcription for output.
    Type: Grant
    Filed: March 30, 2020
    Date of Patent: October 11, 2022
    Assignee: Google LLC
    Inventors: Anjuli Patricia Kannan, Tara N. Sainath, Yonghui Wu, Ankur Bapna, Arindrima Datta
  • Publication number: 20220310072
    Abstract: Two-pass automatic speech recognition (ASR) models can be used to perform streaming on-device ASR to generate a text representation of an utterance captured in audio data. Various implementations include a first-pass portion of the ASR model used to generate streaming candidate recognition(s) of an utterance captured in audio data. For example, the first-pass portion can include a recurrent neural network transformer (RNN-T) decoder. Various implementations include a second-pass portion of the ASR model used to revise the streaming candidate recognition(s) of the utterance and generate a text representation of the utterance. For example, the second-pass portion can include a listen attend spell (LAS) decoder. Various implementations include a shared encoder shared between the RNN-T decoder and the LAS decoder.
    Type: Application
    Filed: June 3, 2020
    Publication date: September 29, 2022
    Inventors: Tara N. Sainath, Ruoming Pang, David Rybach, Yanzhang He, Rohit Prabhavalkar, Wei Li, Mirkó Visontai, Qiao Liang, Trevor Strohman, Yonghui Wu, Ian C. McGraw, Chung-Cheng Chiu
  • Publication number: 20220310067
    Abstract: A computer-implemented method includes receiving audio data that corresponds to an utterance spoken by a user and captured by a user device. The method also includes processing the audio data to determine a candidate transcription that includes a sequence of tokens for the spoken utterance. Tor each token in the sequence of tokens, the method includes determining a token embedding for corresponding token, determining a n-gram token embedding for a previous sequence of n-gram tokens, and concatenating the token embedding and the n-gram token embedding to generate a concatenated output for the corresponding token. The method also includes rescoring the candidate transcription for the spoken utterance by processing the concatenated output generated for each corresponding token in the sequence of tokens.
    Type: Application
    Filed: February 10, 2022
    Publication date: September 29, 2022
    Applicant: Google LLC
    Inventors: Ronny Huang, Tara N. Sainath, Trevor Strohman, Shankar Kumar
  • Patent number: 11423883
    Abstract: A method includes receiving audio data encoding an utterance and obtaining a set of bias phrases corresponding to a context of the utterance. Each bias phrase includes one or more words. The method also includes processing, using a speech recognition model, acoustic features derived from the audio to generate an output from the speech recognition model. The speech recognition model includes a first encoder configured to receive the acoustic features, a first attention module, a bias encoder configured to receive data indicating the obtained set of bias phrases, a bias encoder, and a decoder configured to determine likelihoods of sequences of speech elements based on output of the first attention module and output of the bias attention module. The method also includes determining a transcript for the utterance based on the likelihoods of sequences of speech elements.
    Type: Grant
    Filed: March 31, 2020
    Date of Patent: August 23, 2022
    Assignee: Google LLC
    Inventors: Rohit Prakash Prabhavalkar, Golan Pundak, Tara N. Sainath
  • Publication number: 20220238101
    Abstract: Two-pass automatic speech recognition (ASR) models can be used to perform streaming on-device ASR to generate a text representation of an utterance captured in audio data. Various implementations include a first-pass portion of the ASR model used to generate streaming candidate recognition(s) of an utterance captured in audio data. For example, the first-pass portion can include a recurrent neural network transformer (RNN-T) decoder. Various implementations include a second-pass portion of the ASR model used to revise the streaming candidate recognition(s) of the utterance and generate a text representation of the utterance. For example, the second-pass portion can include a listen attend spell (LAS) decoder. Various implementations include a shared encoder shared between the RNN-T decoder and the LAS decoder.
    Type: Application
    Filed: December 3, 2020
    Publication date: July 28, 2022
    Inventors: Tara N. Sainath, Yanzhang He, Bo Li, Arun Narayanan, Ruoming Pang, Antoine Jean Bruguier, Shuo-yiin Chang, Wei Li
  • Patent number: 11393457
    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for speech recognition using complex linear projection are disclosed. In one aspect, a method includes the actions of receiving audio data corresponding to an utterance. The method further includes generating frequency domain data using the audio data. The method further includes processing the frequency domain data using complex linear projection. The method further includes providing the processed frequency domain data to a neural network trained as an acoustic model. The method further includes generating a transcription for the utterance that is determined based at least on output that the neural network provides in response to receiving the processed frequency domain data.
    Type: Grant
    Filed: May 20, 2020
    Date of Patent: July 19, 2022
    Assignee: Google LLC
    Inventors: Samuel Bengio, Mirko Visontai, Christopher Walter George Thornton, Tara N. Sainath, Ehsan Variani, Izhak Shafran, Michiel A. u. Bacchiani
  • Patent number: 11367432
    Abstract: A method for generating final transcriptions representing numerical sequences of utterances in a written domain includes receiving audio data for an utterance containing a numeric sequence, and decoding, using a sequence-to-sequence speech recognition model, the audio data for the utterance to generate, as output from the sequence-to-sequence speech recognition model, an intermediate transcription of the utterance. The method also includes processing, using a neural corrector/denormer, the intermediate transcription to generate a final transcription that represents the numeric sequence of the utterance in a written domain. The neural corrector/denormer is trained on a set of training samples, where each training sample includes a speech recognition hypothesis for a training utterance and a ground-truth transcription of the training utterance. The ground-truth transcription of the training utterance is in the written domain.
    Type: Grant
    Filed: March 26, 2020
    Date of Patent: June 21, 2022
    Assignee: Google LLC
    Inventors: Charles Caleb Peyser, Hao Zhang, Tara N. Sainath, Zelin Wu
  • Publication number: 20220172706
    Abstract: A method includes receiving audio data encoding an utterance spoken by a native speaker of a first language, and receiving a biasing term list including one or more terms in a second language different than the first language. The method also includes processing, using a speech recognition model, acoustic features derived from the audio data to generate speech recognition scores for both wordpieces and corresponding phoneme sequences in the first language. The method also includes rescoring the speech recognition scores for the phoneme sequences based on the one or more terms in the biasing term list, and executing, using the speech recognition scores for the wordpieces and the rescored speech recognition scores for the phoneme sequences, a decoding graph to generate a transcription for the utterance.
    Type: Application
    Filed: February 16, 2022
    Publication date: June 2, 2022
    Applicant: Google LLC
    Inventors: Ke Hu, Golan Pundak, Rohit Prakash Prabhavalkar, Antoine Jean Bruguier, Tara N. Sainath
  • Publication number: 20220148582
    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for neural network adaptive beamforming for multichannel speech recognition are disclosed. In one aspect, a method includes the actions of receiving a first channel of audio data corresponding to an utterance and a second channel of audio data corresponding to the utterance. The actions further include generating a first set of filter parameters for a first filter based on the first channel of audio data and the second channel of audio data and a second set of filter parameters for a second filter based on the first channel of audio data and the second channel of audio data. The actions further include generating a single combined channel of audio data. The actions further include inputting the audio data to a neural network. The actions further include providing a transcription for the utterance.
    Type: Application
    Filed: January 26, 2022
    Publication date: May 12, 2022
    Applicant: Google LLC
    Inventors: Bo Li, Ron Weiss, Michiel A.U. Bacchiani, Tara N. Sainath, Kevin William Wilson
  • Publication number: 20220130374
    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer-readable media, for speech recognition using multi-dialect and multilingual models. In some implementations, audio data indicating audio characteristics of an utterance is received. Input features determined based on the audio data are provided to a speech recognition model that has been trained to output score indicating the likelihood of linguistic units for each of multiple different language or dialects. The speech recognition model can be one that has been trained using cluster adaptive training. Output that the speech recognition model generated in response to receiving the input features determined based on the audio data is received. A transcription of the utterance generated based on the output of the speech recognition model is provided.
    Type: Application
    Filed: January 10, 2022
    Publication date: April 28, 2022
    Applicant: Google LLC
    Inventors: Zhifeng Chen, Bo Li, Eugene Weinstein, Yonghui Wu, Pedro J. Moreno Mengibar, Ron J. Weiss, Khe Chai Sim, Tara N. Sainath, Patrick An Phu Nguyen
  • Publication number: 20220101836
    Abstract: A method of biasing speech recognition includes receiving audio data encoding an utterance and obtaining a set of one or more biasing phrases corresponding to a context of the utterance. Each biasing phrase in the set of one or more biasing phrases includes one or more words. The method also includes processing, using a speech recognition model, acoustic features derived from the audio data and grapheme and phoneme data derived from the set of one or more biasing phrases to generate an output of the speech recognition model. The method also includes determining a transcription for the utterance based on the output of the speech recognition model.
    Type: Application
    Filed: December 9, 2021
    Publication date: March 31, 2022
    Applicant: Google LLC
    Inventors: Rohit Prakash Prabhavalkar, Golan Pundak, Tara N. Sainath, Antoine Jean Bruguier
  • Patent number: 11270687
    Abstract: A method includes receiving audio data encoding an utterance spoken by a native speaker of a first language, and receiving a biasing term list including one or more terms in a second language different than the first language. The method also includes processing, using a speech recognition model, acoustic features derived from the audio data to generate speech recognition scores for both wordpieces and corresponding phoneme sequences in the first language. The method also includes rescoring the speech recognition scores for the phoneme sequences based on the one or more terms in the biasing term list, and executing, using the speech recognition scores for the wordpieces and the rescored speech recognition scores for the phoneme sequences, a decoding graph to generate a transcription for the utterance.
    Type: Grant
    Filed: April 28, 2020
    Date of Patent: March 8, 2022
    Assignee: Google LLC
    Inventors: Ke Hu, Antoine Jean Bruguier, Tara N. Sainath, Rohit Prakash Prabhavalkar, Golan Pundak
  • Patent number: 11257485
    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for neural network adaptive beamforming for multichannel speech recognition are disclosed. In one aspect, a method includes the actions of receiving a first channel of audio data corresponding to an utterance and a second channel of audio data corresponding to the utterance. The actions further include generating a first set of filter parameters for a first filter based on the first channel of audio data and the second channel of audio data and a second set of filter parameters for a second filter based on the first channel of audio data and the second channel of audio data. The actions further include generating a single combined channel of audio data. The actions further include inputting the audio data to a neural network. The actions further include providing a transcription for the utterance.
    Type: Grant
    Filed: December 10, 2019
    Date of Patent: February 22, 2022
    Assignee: Google LLC
    Inventors: Bo Li, Ron J. Weiss, Michiel A. U. Bacchiani, Tara N. Sainath, Kevin William Wilson
  • Patent number: 11238845
    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer-readable media, for speech recognition using multi-dialect and multilingual models. In some implementations, audio data indicating audio characteristics of an utterance is received. Input features determined based on the audio data are provided to a speech recognition model that has been trained to output score indicating the likelihood of linguistic units for each of multiple different language or dialects. The speech recognition model can be one that has been trained using cluster adaptive training. Output that the speech recognition model generated in response to receiving the input features determined based on the audio data is received. A transcription of the utterance generated based on the output of the speech recognition model is provided.
    Type: Grant
    Filed: November 14, 2019
    Date of Patent: February 1, 2022
    Assignee: Google LLC
    Inventors: Zhifeng Chen, Bo Li, Eugene Weinstein, Yonghui Wu, Pedro J. Moreno Mengibar, Ron J. Weiss, Khe Chai Sim, Tara N. Sainath, Patrick An Phu Nguyen