Patents by Inventor Rohit Prakash Prabhavalkar

Rohit Prakash Prabhavalkar has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20240379095
    Abstract: A method includes receiving audio data encoding an utterance and obtaining a set of bias phrases corresponding to a context of the utterance. Each bias phrase includes one or more words. The method also includes processing, using a speech recognition model, acoustic features derived from the audio to generate an output from the speech recognition model. The speech recognition model includes a first encoder configured to receive the acoustic features, a bias encoder configured to receive data indicating the obtained set of bias phrases, a bias encoder, and a decoder configured to determine likelihoods of sequences of speech elements based on output of the first attention module and output of the bias attention module. The method also includes determining a transcript for the utterance based on the likelihoods of sequences of speech elements.
    Type: Application
    Filed: July 23, 2024
    Publication date: November 14, 2024
    Applicant: Google LLC
    Inventors: Rohit Prakash Prabhavalkar, Golan Pundak, Tara N. Sainath
  • Patent number: 12106749
    Abstract: A method for performing speech recognition using sequence-to-sequence models includes receiving audio data for an utterance and providing features indicative of acoustic characteristics of the utterance as input to an encoder. The method also includes processing an output of the encoder using an attender to generate a context vector, generating speech recognition scores using the context vector and a decoder trained using a training process, and generating a transcription for the utterance using word elements selected based on the speech recognition scores. The transcription is provided as an output of the ASR system.
    Type: Grant
    Filed: September 20, 2021
    Date of Patent: October 1, 2024
    Assignee: Google LLC
    Inventors: Rohit Prakash Prabhavalkar, Zhifeng Chen, Bo Li, Chung-cheng Chiu, Kanury Kanishka Rao, Yonghui Wu, Ron J. Weiss, Navdeep Jaitly, Michiel A. u. Bacchiani, Tara N. Sainath, Jan Kazimierz Chorowski, Anjuli Patricia Kannan, Ekaterina Gonina, Patrick An Phu Nguyen
  • Publication number: 20240304181
    Abstract: A method includes receiving a plurality of training samples spanning multiple different domains. Each corresponding training sample includes audio data characterizing an utterance paired with a corresponding transcription of the utterance. The method also includes re-labeling each corresponding training sample of the plurality of training samples by annotating the corresponding transcription of the utterance with one or more speaker tags. Each speaker tag indicates a respective segment of the transcription for speech that was spoken by a particular type of speaker. The method also includes training a multi-domain speech recognition model on the re-labeled training samples to teach the multi-domain speech recognition model to learn to share parameters for recognizing speech across each of the different multiple different domains.
    Type: Application
    Filed: March 7, 2024
    Publication date: September 12, 2024
    Applicant: Google LLC
    Inventors: Guru Prakash Arumugam, Shuo-yiin Chang, Shaan Jagdeep Patrick Bijwadia, Weiran Wang, Quan Wang, Rohit Prakash Prabhavalkar, Tara N. Sainath
  • Publication number: 20240265923
    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for an automated calling system are disclosed. In one aspect, a method includes the actions of receiving audio data of an utterance spoken by a user who is having a telephone conversation with a bot. The actions further include determining a context of the telephone conversation. The actions further include determining a user intent of a first previous portion of the telephone conversation spoken by the user and a bot intent of a second previous portion of the telephone conversation outputted by a speech synthesizer of the bot. The actions further include, based on the audio data of the utterance, the context of the telephone conversation, the user intent, and the bot intent, generating synthesized speech of a reply by the bot to the utterance. The actions further include, providing, for output, the synthesized speech.
    Type: Application
    Filed: April 15, 2024
    Publication date: August 8, 2024
    Inventors: Asaf Aharoni, Arun Narayanan, Nir Shabat, Parisa Haghani, Galen Tsai Chuang, Yaniv Leviathan, Neeraj Gaur, Pedro J. Moreno Mengibar, Rohit Prakash Prabhavalkar, Zhongdi Qu, Austin Severn Waters, Tomer Amiaz, Michiel A.U. Bacchiani
  • Patent number: 12051407
    Abstract: A method includes receiving audio data encoding an utterance and obtaining a set of bias phrases corresponding to a context of the utterance. Each bias phrase includes one or more words. The method also includes processing, using a speech recognition model, acoustic features derived from the audio to generate an output from the speech recognition model. The speech recognition model includes a first encoder configured to receive the acoustic features, a bias encoder configured to receive data indicating the obtained set of bias phrases, a bias encoder, and a decoder configured to determine likelihoods of sequences of speech elements based on output of the first attention module and output of the bias attention module. The method also includes determining a transcript for the utterance based on the likelihoods of sequences of speech elements.
    Type: Grant
    Filed: July 26, 2022
    Date of Patent: July 30, 2024
    Assignee: Google LLC
    Inventors: Rohit Prakash Prabhavalkar, Golan Pundak, Tara N. Sainath
  • Publication number: 20240221750
    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for detecting utterances of a key phrase in an audio signal. One of the methods includes receiving, by a key phrase spotting system, an audio signal encoding one or more utterances; while continuing to receive the audio signal, generating, by the key phrase spotting system, an attention output using an attention mechanism that is configured to compute the attention output based on a series of encodings generated by an encoder comprising one or more neural network layers; generating, by the key phrase spotting system and using attention output, output that indicates whether the audio signal likely encodes the key phrase; and providing, by the key phrase spotting system, the output that indicates whether the audio signal likely encodes the key phrase.
    Type: Application
    Filed: March 19, 2024
    Publication date: July 4, 2024
    Applicant: Google LLC
    Inventors: Wei Li, Rohit Prakash Prabhavalkar, Kanury Kanishka Rao, Yanzhang He, Ian C. McGraw, Anton Bakhtin
  • Patent number: 12027158
    Abstract: A method of performing speech recognition using a two-pass deliberation architecture includes receiving a first-pass hypothesis and an encoded acoustic frame and encoding the first-pass hypothesis at a hypothesis encoder. The first-pass hypothesis is generated by a recurrent neural network (RNN) decoder model for the encoded acoustic frame. The method also includes generating, using a first attention mechanism attending to the encoded acoustic frame, a first context vector, and generating, using a second attention mechanism attending to the encoded first-pass hypothesis, a second context vector. The method also includes decoding the first context vector and the second context vector at a context vector decoder to form a second-pass hypothesis.
    Type: Grant
    Filed: February 6, 2023
    Date of Patent: July 2, 2024
    Assignee: Google LLC
    Inventors: Ke Hu, Tara N. Sainath, Ruoming Pang, Rohit Prakash Prabhavalkar
  • Publication number: 20240185841
    Abstract: A method includes obtaining an ASR model trained to recognize speech in a first language and receiving transcribed training utterances in a second language. The method also includes integrating the ASR model with an input reprogramming module and a latent reprogramming module. The method also includes adapting the ASR model to learn how to recognize speech in the second language by training the input reprogramming module and the latent reprogramming module while parameters of the ASR model are frozen.
    Type: Application
    Filed: October 20, 2023
    Publication date: June 6, 2024
    Applicant: Google LLC
    Inventors: Bo Li, Yu Zhang, Nanxin Chen, Rohit Prakash Prabhavalkar, Chao-Han Huck Yang, Tara N. Sainath, Trevor Strohman
  • Patent number: 11990133
    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for an automated calling system are disclosed. In one aspect, a method includes the actions of receiving audio data of an utterance spoken by a user who is having a telephone conversation with a bot. The actions further include determining a context of the telephone conversation. The actions further include determining a user intent of a first previous portion of the telephone conversation spoken by the user and a bot intent of a second previous portion of the telephone conversation outputted by a speech synthesizer of the bot. The actions further include, based on the audio data of the utterance, the context of the telephone conversation, the user intent, and the bot intent, generating synthesized speech of a reply by the bot to the utterance. The actions further include, providing, for output, the synthesized speech.
    Type: Grant
    Filed: July 7, 2023
    Date of Patent: May 21, 2024
    Assignee: GOOGLE LLC
    Inventors: Asaf Aharoni, Arun Narayanan, Nir Shabat, Parisa Haghani, Galen Tsai Chuang, Yaniv Leviathan, Neeraj Gaur, Pedro J. Moreno Mengibar, Rohit Prakash Prabhavalkar, Zhongdi Qu, Austin Severn Waters, Tomer Amiaz, Michiel A. U. Bacchiani
  • Publication number: 20240153498
    Abstract: A method includes receiving context biasing data that includes a set of unspoken textual utterances corresponding to a particular context. The method also includes obtaining a list of carrier phrases associated with the particular context. For each respective unspoken textual utterance, the method includes generating a corresponding training data pair that includes the respective unspoken textual utterance and a carrier phrase. For each respective training data pair, the method includes tokenizing the respective training data pair into a sequence of sub-word units, generating a first higher order textual feature representation for a corresponding sub-word unit, receiving the first higher order textual feature representation, and generating a first probability distribution over possible text units. The method also includes training a speech recognition model based on the first probability distribution over possible text units.
    Type: Application
    Filed: October 20, 2023
    Publication date: May 9, 2024
    Applicant: Google LLC
    Inventors: Tara N. Sainath, Rohit Prakash Prabhavalkar, Diamantino Antonio Caseiro, Patrick Maxim Rondon, Cyril Allauzen
  • Publication number: 20240144917
    Abstract: A method includes obtaining a base encoder from a pre-trained model, and receiving training data comprising a sequence of acoustic frames characterizing an utterance paired with a ground-truth transcription of the utterance. At each of a plurality of output steps, the method includes: generating, by the base encoder, a first encoded representation for a corresponding acoustic frame; generating, by an exporter network configured to receive a continuous sequence of first encoded representations generated by the base encoder, a second encoded representation for a corresponding acoustic frame; generating, by an exporter decoder, a probability distribution over possible logits; and determining an exporter decoder loss based on the probability distribution over possible logits generated by the exporter decoder at the corresponding output step and the ground-truth transcription.
    Type: Application
    Filed: October 25, 2023
    Publication date: May 2, 2024
    Applicant: Google LLC
    Inventors: Rami Magdi Fahmi Botros, Rohit Prakash Prabhavalkar, Johan Schalkwyk, Tara N. Sainath, Ciprian Ioan Chelba, Francoise Beaufays
  • Patent number: 11948570
    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for detecting utterances of a key phrase in an audio signal. One of the methods includes receiving, by a key phrase spotting system, an audio signal encoding one or more utterances; while continuing to receive the audio signal, generating, by the key phrase spotting system, an attention output using an attention mechanism that is configured to compute the attention output based on a series of encodings generated by an encoder comprising one or more neural network layers; generating, by the key phrase spotting system and using attention output, output that indicates whether the audio signal likely encodes the key phrase; and providing, by the key phrase spotting system, the output that indicates whether the audio signal likely encodes the key phrase.
    Type: Grant
    Filed: March 9, 2022
    Date of Patent: April 2, 2024
    Assignee: Google LLC
    Inventors: Wei Li, Rohit Prakash Prabhavalkar, Kanury Kanishka Rao, Yanzhang He, Ian C. Mcgraw, Anton Bakhtin
  • Patent number: 11948062
    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for implementing a compressed recurrent neural network (RNN). One of the systems includes a compressed RNN, the compressed RNN comprising a plurality of recurrent layers, wherein each of the recurrent layers has a respective recurrent weight matrix and a respective inter-layer weight matrix, and wherein at least one of recurrent layers is compressed such that a respective recurrent weight matrix of the compressed layer is defined by a first compressed weight matrix and a projection matrix and a respective inter-layer weight matrix of the compressed layer is defined by a second compressed weight matrix and the projection matrix.
    Type: Grant
    Filed: December 4, 2020
    Date of Patent: April 2, 2024
    Assignee: Google LLC
    Inventors: Ouais Alsharif, Rohit Prakash Prabhavalkar, Ian C. McGraw, Antoine Jean Bruguier
  • Patent number: 11942076
    Abstract: A method includes receiving audio data encoding an utterance spoken by a native speaker of a first language, and receiving a biasing term list including one or more terms in a second language different than the first language. The method also includes processing, using a speech recognition model, acoustic features derived from the audio data to generate speech recognition scores for both wordpieces and corresponding phoneme sequences in the first language. The method also includes rescoring the speech recognition scores for the phoneme sequences based on the one or more terms in the biasing term list, and executing, using the speech recognition scores for the wordpieces and the rescored speech recognition scores for the phoneme sequences, a decoding graph to generate a transcription for the utterance.
    Type: Grant
    Filed: February 16, 2022
    Date of Patent: March 26, 2024
    Assignee: Google LLC
    Inventors: Ke Hu, Golan Pundak, Rohit Prakash Prabhavalkar, Antoine Jean Bruguier, Tara N. Sainath
  • Patent number: 11922932
    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer-readable storage media, for speech recognition using attention-based sequence-to-sequence models. In some implementations, audio data indicating acoustic characteristics of an utterance is received. A sequence of feature vectors indicative of the acoustic characteristics of the utterance is generated. The sequence of feature vectors is processed using a speech recognition model that has been trained using a loss function that uses a set of speech recognition hypothesis samples, the speech recognition model including an encoder, an attention module, and a decoder. The encoder and decoder each include one or more recurrent neural network layers. A sequence of output vectors representing distributions over a predetermined set of linguistic units is obtained. A transcription for the utterance is obtained based on the sequence of output vectors. Data indicating the transcription of the utterance is provided.
    Type: Grant
    Filed: March 31, 2023
    Date of Patent: March 5, 2024
    Assignee: Google LLC
    Inventors: Rohit Prakash Prabhavalkar, Tara N. Sainath, Yonghui Wu, Patrick An Phu Nguyen, Zhifeng Chen, Chung-Cheng Chiu, Anjuli Patricia Kannan
  • Patent number: 11908461
    Abstract: A method of performing speech recognition using a two-pass deliberation architecture includes receiving a first-pass hypothesis and an encoded acoustic frame and encoding the first-pass hypothesis at a hypothesis encoder. The first-pass hypothesis is generated by a recurrent neural network (RNN) decoder model for the encoded acoustic frame. The method also includes generating, using a first attention mechanism attending to the encoded acoustic frame, a first context vector, and generating, using a second attention mechanism attending to the encoded first-pass hypothesis, a second context vector. The method also includes decoding the first context vector and the second context vector at a context vector decoder to form a second-pass hypothesis.
    Type: Grant
    Filed: January 14, 2021
    Date of Patent: February 20, 2024
    Assignee: Google LLC
    Inventors: Ke Hu, Tara N. Sainath, Ruoming Pang, Rohit Prakash Prabhavalkar
  • Publication number: 20240028829
    Abstract: A method includes receiving training data that includes a set of unspoken textual utterances. For each respective unspoken textual utterance, the method includes, tokenizing the respective textual utterance into a sequence of sub-word units, generating a first higher order textual feature representation for a corresponding sub-word unit tokenized from the respective unspoken textual utterance, receiving the first higher order textual feature representation generated by a text encoder, and generating a first probability distribution over possible text units. The method also includes training an encoder based on the first probability distribution over possible text units generated by a first-pass decoder for each respective unspoken textual utterance in the set of unspoken textual utterances.
    Type: Application
    Filed: July 1, 2023
    Publication date: January 25, 2024
    Applicant: Google LLC
    Inventors: Tara N. Sainath, Zhouyuan Huo, Zhehuai Chen, Yu Zhang, Weiran Wang, Trevor Strohman, Rohit Prakash Prabhavalkar, Bo Li, Ankur Bapna
  • Publication number: 20230352027
    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for an automated calling system are disclosed. In one aspect, a method includes the actions of receiving audio data of an utterance spoken by a user who is having a telephone conversation with a bot. The actions further include determining a context of the telephone conversation. The actions further include determining a user intent of a first previous portion of the telephone conversation spoken by the user and a bot intent of a second previous portion of the telephone conversation outputted by a speech synthesizer of the bot. The actions further include, based on the audio data of the utterance, the context of the telephone conversation, the user intent, and the bot intent, generating synthesized speech of a reply by the bot to the utterance. The actions further include, providing, for output, the synthesized speech.
    Type: Application
    Filed: July 7, 2023
    Publication date: November 2, 2023
    Inventors: Asaf Aharoni, Arun Narayanan, Nir Shabat, Parisa Haghani, Galen Tsai Chuang, Yaniv Leviathan, Neeraj Gaur, Pedro J. Moreno Mengibar, Rohit Prakash Prabhavalkar, Zhongdi Qu, Austin Severn Waters, Tomer Amiaz, Michiel A.U. Bacchiani
  • Publication number: 20230343332
    Abstract: A joint segmenting and ASR model includes an encoder and decoder. The encoder configured to: receive a sequence of acoustic frames characterizing one or more utterances; and generate, at each output step, a higher order feature representation for a corresponding acoustic frame. The decoder configured to: receive the higher order feature representation and generate, at each output step: a probability distribution over possible speech recognition hypotheses, and an indication of whether the corresponding output step corresponds to an end of speech segment. The j oint segmenting and ASR model trained on a set of training samples, each training sample including: audio data characterizing a spoken utterance; and a corresponding transcription of the spoken utterance, the corresponding transcription having an end of speech segment ground truth token inserted into the corresponding transcription automatically based on a set of heuristic-based rules and exceptions applied to the training sample.
    Type: Application
    Filed: April 20, 2023
    Publication date: October 26, 2023
    Applicant: Google LLC
    Inventors: Ronny Huang, Shuo-yiin Chang, David Rybach, Rohit Prakash Prabhavalkar, Tara N. Sainath, Cyril Allauzen, Charles Caleb Peyser, Zhiyun Lu
  • Publication number: 20230298570
    Abstract: A method includes generating, using an audio encoder, a higher-order feature representation for each acoustic frame in a sequence of acoustic frames; generating, using a decoder, based on the higher-order feature representation, a plurality of speech recognition hypotheses, each hypotheses corresponding to a candidate transcription of an utterance and having an associated first likelihood score; generating, using an external language model, for each speech recognition hypothesis, a second likelihood score; determining, using a learnable fusion module, for each speech recognition hypothesis, a set of fusion weights based on the higher-order feature representation and the speech recognition hypothesis; and generating, using the learnable fusion module, for each speech recognition hypothesis, a third likelihood score based on the first likelihood score, the second likelihood score, and the set of fusion weights, the audio encoder and decoder trained using minimum additive error rate training in the presence of t
    Type: Application
    Filed: March 21, 2023
    Publication date: September 21, 2023
    Applicant: Google LLC
    Inventors: Weiran Wang, Tongzhou Chen, Tara N. Sainath, Ehsan Variani, Rohit Prakash Prabhavalkar, Ronny Huang, Bhuvana Ramabhadran, Neeraj Gaur, Sepand Mavandadi, Charles Caleb Peyser, Trevor Strohman, Yangzhang He, David Rybach