Patents by Inventor Bhuvana Ramabhadran

Bhuvana Ramabhadran has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20240153484
    Abstract: A method includes receiving training data that includes a plurality of sets of text-to-speech (TTS) spoken utterances each associated with a respective language and including TTS utterances of synthetic speech spoken that includes a corresponding reference speech representation paired with a corresponding input text sequence. For each TTS utterance in each set of the TTS spoken training utterances of the received training data, the method includes generating a corresponding TTS encoded textual representation for the corresponding input text sequence, generating a corresponding speech encoding for the corresponding TTS utterance of synthetic speech, generating a shared encoder output, generating a predicted speech representation for the corresponding TTS utterance of synthetic speech, and determining a reconstruction loss. The method also includes training a TTS model based on the reconstruction losses determined for the TTS utterances in each set of the TTS spoken training utterances.
    Type: Application
    Filed: October 25, 2023
    Publication date: May 9, 2024
    Applicant: Google LLC
    Inventors: Andrew M. Rosenberg, Takaaki Saeki, Zhehuai Chen, Byungha Chun, Bhuvana Ramabhadran
  • Patent number: 11929060
    Abstract: A method for training a speech recognition model includes receiving a set of training utterance pairs each including a non-synthetic speech representation and a synthetic speech representation of a same corresponding utterance. At each of a plurality of output steps for each training utterance pair in the set of training utterance pairs, the method also includes determining a consistent loss term for the corresponding training utterance pair based on a first probability distribution over possible non-synthetic speech recognition hypotheses generated for the corresponding non-synthetic speech representation and a second probability distribution over possible synthetic speech recognition hypotheses generated for the corresponding synthetic speech representation. The first and second probability distributions are generated for output by the speech recognition model.
    Type: Grant
    Filed: February 8, 2021
    Date of Patent: March 12, 2024
    Assignee: Google LLC
    Inventors: Zhehuai Chen, Andrew Rosenberg, Bhuvana Ramabhadran, Pedro Jose Moreno Mengibar
  • Publication number: 20240029715
    Abstract: A method includes receiving training data that includes unspoken textual utterances in a target language. Each unspoken textual utterance not paired with any corresponding spoken utterance of non-synthetic speech. The method also includes generating a corresponding alignment output for each unspoken textual utterance using an alignment model trained on transcribed speech utterance in one or more training languages each different than the target language. The method also includes generating a corresponding encoded textual representation for each alignment output using a text encoder and training a speech recognition model on the encoded textual representations generated for the alignment outputs. Training the speech recognition model teaches the speech recognition model to learn how to recognize speech in the target language.
    Type: Application
    Filed: July 20, 2023
    Publication date: January 25, 2024
    Applicant: Google LLC
    Inventors: Andrew Rosenberg, Zhehuai Chen, Ankur Bapna, Yu Zhang, Bhuvana Ramabhadran
  • Patent number: 11837216
    Abstract: A method for training a generative adversarial network (GAN)-based text-to-speech (TTS) model and a speech recognition model in unison includes obtaining a plurality of training text utterances. At each of a plurality of output steps for each training text utterance, the method also includes generating, for output by the GAN-Based TTS model, a synthetic speech representation of the corresponding training text utterance, and determining, using an adversarial discriminator of the GAN, an adversarial loss term indicative of an amount of acoustic noise disparity in one of the non-synthetic speech representations selected from the set of spoken training utterances relative to the corresponding synthetic speech representation of the corresponding training text utterance. The method also includes updating parameters of the GAN-based TTS model based on the adversarial loss term determined at each of the plurality of output steps for each training text utterance of the plurality of training text utterances.
    Type: Grant
    Filed: February 14, 2023
    Date of Patent: December 5, 2023
    Assignee: Google LLC
    Inventors: Zhehuai Chen, Andrew M. Rosenberg, Bhuvana Ramabhadran, Pedro J. Moreno Mengibar
  • Patent number: 11823697
    Abstract: A method for training a speech recognition model includes obtaining sample utterances of synthesized speech in a target domain, obtaining transcribed utterances of non-synthetic speech in the target domain, and pre-training the speech recognition model on the sample utterances of synthesized speech in the target domain to attain an initial state for warm-start training. After pre-training the speech recognition model, the method also includes warm-start training the speech recognition model on the transcribed utterances of non-synthetic speech in the target domain to teach the speech recognition model to learn to recognize real/human speech in the target domain.
    Type: Grant
    Filed: August 20, 2021
    Date of Patent: November 21, 2023
    Assignee: Google LLC
    Inventors: Andrew Rosenberg, Bhuvana Ramabhadran
  • Publication number: 20230317059
    Abstract: A method includes receiving training data that includes unspoken textual utterances, un-transcribed non-synthetic speech utterances, and transcribed non-synthetic speech utterances. Each unspoken textual utterance is not paired with any corresponding spoken utterance of non-synthetic speech. Each un-transcribed non-synthetic speech utterance not paired with a corresponding transcription. Each transcribed non-synthetic speech utterance paired with a corresponding transcription. The method also includes generating a corresponding alignment output for each unspoken textual utterance of the received training data using an alignment model. The method also includes pre-training an audio encoder on the alignment outputs generated for corresponding to the unspoken textual utterances, the un-transcribed non-synthetic speech utterances, and the transcribed non-synthetic speech utterances to teach the audio encoder to jointly learn shared speech and text representations.
    Type: Application
    Filed: February 13, 2023
    Publication date: October 5, 2023
    Applicant: Google LLC
    Inventors: Andrew M Rosenberg, Zhehuai Chen, Yu Zhang, Bhuvana Ramabhadran, Pedro J. Moreno Mengibar
  • Publication number: 20230298570
    Abstract: A method includes generating, using an audio encoder, a higher-order feature representation for each acoustic frame in a sequence of acoustic frames; generating, using a decoder, based on the higher-order feature representation, a plurality of speech recognition hypotheses, each hypotheses corresponding to a candidate transcription of an utterance and having an associated first likelihood score; generating, using an external language model, for each speech recognition hypothesis, a second likelihood score; determining, using a learnable fusion module, for each speech recognition hypothesis, a set of fusion weights based on the higher-order feature representation and the speech recognition hypothesis; and generating, using the learnable fusion module, for each speech recognition hypothesis, a third likelihood score based on the first likelihood score, the second likelihood score, and the set of fusion weights, the audio encoder and decoder trained using minimum additive error rate training in the presence of t
    Type: Application
    Filed: March 21, 2023
    Publication date: September 21, 2023
    Applicant: Google LLC
    Inventors: Weiran Wang, Tongzhou Chen, Tara N. Sainath, Ehsan Variani, Rohit Prakash Prabhavalkar, Ronny Huang, Bhuvana Ramabhadran, Neeraj Gaur, Sepand Mavandadi, Charles Caleb Peyser, Trevor Strohman, Yangzhang He, David Rybach
  • Publication number: 20230298565
    Abstract: A method includes receiving a set of training utterances each including a non-synthetic speech representation of a corresponding utterance, and for each training utterance, generating a corresponding synthetic speech representation by using a voice conversion model. The non-synthetic speech representation and the synthetic speech representation form a corresponding training utterance pair. At each of a plurality of output steps for each training utterance pair, the method also includes generating, for output by a speech recognition model, a first probability distribution over possible non-synthetic speech recognition hypotheses for the non-synthetic speech representation and a second probability distribution over possible synthetic speech recognition hypotheses for the synthetic speech representation.
    Type: Application
    Filed: April 25, 2022
    Publication date: September 21, 2023
    Applicant: Google LLC
    Inventors: Andrew M. Rosenberg, Gary Wang, Bhuvana Ramabhadran, Fadi Biadsy
  • Publication number: 20230274727
    Abstract: A method for instantaneous learning in text-to-speech (TTS) during dialog includes receiving a user pronunciation of a particular word present in a query spoken by a user. The method also includes receiving a TTS pronunciation of the same particular word that is present in a TTS input where the TTS pronunciation of the particular word is different than the user pronunciation of the particular word. The method also includes obtaining user pronunciation-related features and TTS pronunciation related features associated with the particular word. The method also includes generating a pronunciation decision selecting one of the user pronunciation or the TTS pronunciation of the particular word that is associated with a highest confidence. The method also include providing the TTS audio that includes a synthesized speech representation of the response to the query using the user pronunciation or the TTS pronunciation for the particular word.
    Type: Application
    Filed: May 4, 2023
    Publication date: August 31, 2023
    Applicant: Google LLC
    Inventors: Vijayaditya Peddinti, Bhuvana Ramabhadran, Andrew Rosenberg, Mateusz Golebiewski
  • Patent number: 11741355
    Abstract: A student neural network may be trained by a computer-implemented method, including: inputting common input data to each teacher neural network among a plurality of teacher neural networks to obtain a soft label output among a plurality of soft label outputs from each teacher neural network among the plurality of teacher neural networks, and training a student neural network with the input data and the plurality of soft label outputs.
    Type: Grant
    Filed: July 27, 2018
    Date of Patent: August 29, 2023
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Takashi Fukuda, Masayuki Suzuki, Osamu Ichikawa, Gakuto Kurata, Samuel Thomas, Bhuvana Ramabhadran
  • Publication number: 20230223009
    Abstract: A method includes obtaining a plurality of training data sets each associated with a respective native language and includes a plurality of respective training data samples. For each respective training data sample of each training data set in the respective native language, the method includes transliterating the corresponding transcription in the respective native script into corresponding transliterated text representing the respective native language of the corresponding audio in a target script and associating the corresponding transliterated text in the target script with the corresponding audio in the respective native language to generate a respective normalized training data sample.
    Type: Application
    Filed: March 21, 2023
    Publication date: July 13, 2023
    Applicant: Google LLC
    Inventors: Arindrima Datta, Bhuvana Ramabhadran, Jesse Emond, Brian Roark
  • Publication number: 20230197057
    Abstract: A method for training a generative adversarial network (GAN)-based text-to-speech (TTS) model and a speech recognition model in unison includes obtaining a plurality of training text utterances. At each of a plurality of output steps for each training text utterance, the method also includes generating, for output by the GAN-Based TTS model, a synthetic speech representation of the corresponding training text utterance, and determining, using an adversarial discriminator of the GAN, an adversarial loss term indicative of an amount of acoustic noise disparity in one of the non-synthetic speech representations selected from the set of spoken training utterances relative to the corresponding synthetic speech representation of the corresponding training text utterance. The method also includes updating parameters of the GAN-based TTS model based on the adversarial loss term determined at each of the plurality of output steps for each training text utterance of the plurality of training text utterances.
    Type: Application
    Filed: February 14, 2023
    Publication date: June 22, 2023
    Applicant: Google LLC
    Inventors: Zhehuai Chen, Andrew M. Rosenberg, Bhuvana Ramabhadran, Pedro J. Moreno Mengibar
  • Patent number: 11676572
    Abstract: A method for instantaneous learning in text-to-speech (TTS) during dialog includes receiving a user pronunciation of a particular word present in a query spoken by a user. The method also includes receiving a TTS pronunciation of the same particular word that is present in a TTS input where the TTS pronunciation of the particular word is different than the user pronunciation of the particular word. The method also includes obtaining user pronunciation-related features and TTS pronunciation related features associated with the particular word. The method also includes generating a pronunciation decision selecting one of the user pronunciation or the TTS pronunciation of the particular word that is associated with a highest confidence. The method also include providing the TTS audio that includes a synthesized speech representation of the response to the query using the user pronunciation or the TTS pronunciation for the particular word.
    Type: Grant
    Filed: March 3, 2021
    Date of Patent: June 13, 2023
    Assignee: Google LLC
    Inventors: Vijayaditya Peddinti, Bhuvana Ramabhadran, Andrew Rosenberg, Mateusz Golebiewski
  • Publication number: 20230178068
    Abstract: A method includes receiving an input text sequence to be synthesized into speech in a first language and obtaining a speaker embedding, the speaker embedding specifying specific voice characteristics of a target speaker for synthesizing the input text sequence into speech that clones a voice of the target speaker. The target speaker includes a native speaker of a second language different than the first language. The method also includes generating, using a text-to-speech (TTS) model, an output audio feature representation of the input text by processing the input text sequence and the speaker embedding. The output audio feature representation includes the voice characteristics of the target speaker specified by the speaker embedding.
    Type: Application
    Filed: January 30, 2023
    Publication date: June 8, 2023
    Applicant: Google LLC
    Inventors: Yu Zhang, Ron J. Weiss, Byungha Chun, Yonghui Wu, Zhifeng Chen, Russell John Wyatt Skerry-Ryan, Ye Jia, Andrew M. Rosenberg, Bhuvana Ramabhadran
  • Publication number: 20230103722
    Abstract: A method of guided data selection for masked speech modeling includes obtaining a sequence of encoded representations corresponding to an utterance. For each respective encoded representation, the method includes processing the respective encoded representation to generate a corresponding probability distribution over possible speech recognition hypotheses and assigning, to the respective encode representation, a confidence score as a highest probability from the corresponding probability distribution over possible speech recognition hypotheses. The method also includes selecting a set of unmasked encoded representations to mask based on the confidence scores assigned to the sequence of encoded representations. The method also includes generating a set of masked encoded representations by masking the selected set of unmasked encoded representations.
    Type: Application
    Filed: August 18, 2022
    Publication date: April 6, 2023
    Applicant: Google LLC
    Inventors: Andrew Rosenberg, Bhuvana Ramabhadran, Yu Zhang, Murali Karthick Baskar
  • Patent number: 11615779
    Abstract: A method includes obtaining a plurality of training data sets each associated with a respective native language and includes a plurality of respective training data samples. For each respective training data sample of each training data set in the respective native language, the method includes transliterating the corresponding transcription in the respective native script into corresponding transliterated text representing the respective native language of the corresponding audio in a target script and associating the corresponding transliterated text in the target script with the corresponding audio in the respective native language to generate a respective normalized training data sample.
    Type: Grant
    Filed: January 19, 2021
    Date of Patent: March 28, 2023
    Assignee: Google LLC
    Inventors: Arindrima Datta, Bhuvana Ramabhadran, Jesse Emond, Brian Roark
  • Patent number: 11610108
    Abstract: A student neural network may be trained by a computer-implemented method, including: selecting a teacher neural network among a plurality of teacher neural networks, inputting an input data to the selected teacher neural network to obtain a soft label output generated by the selected teacher neural network, and training a student neural network with at least the input data and the soft label output from the selected teacher neural network.
    Type: Grant
    Filed: July 27, 2018
    Date of Patent: March 21, 2023
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Takashi Fukuda, Masayuki Suzuki, Osamu Ichikawa, Gakuto Kurata, Samuel Thomas, Bhuvana Ramabhadran
  • Patent number: 11605368
    Abstract: A method for training a generative adversarial network (GAN)-based text-to-speech (TTS) model and a speech recognition model in unison includes obtaining a plurality of training text utterances. At each of a plurality of output steps for each training text utterance, the method also includes generating, for output by the GAN-Based TTS model, a synthetic speech representation of the corresponding training text utterance, and determining, using an adversarial discriminator of the GAN, an adversarial loss term indicative of an amount of acoustic noise disparity in one of the non-synthetic speech representations selected from the set of spoken training utterances relative to the corresponding synthetic speech representation of the corresponding training text utterance. The method also includes updating parameters of the GAN-based TTS model based on the adversarial loss term determined at each of the plurality of output steps for each training text utterance of the plurality of training text utterances.
    Type: Grant
    Filed: November 11, 2021
    Date of Patent: March 14, 2023
    Assignee: Google LLC
    Inventors: Zhehuai Chen, Andrew M. Rosenberg, Bhuvana Ramabhadran, Pedro J. Moreno Mengibar
  • Publication number: 20230058447
    Abstract: A method for training a speech recognition model includes obtaining sample utterances of synthesized speech in a target domain, obtaining transcribed utterances of non-synthetic speech in the target domain, and pre-training the speech recognition model on the sample utterances of synthesized speech in the target domain to attain an initial state for warm-start training. After pre-training the speech recognition model, the method also includes warm-start training the speech recognition model on the transcribed utterances of non-synthetic speech in the target domain to teach the speech recognition model to learn to recognize real/human speech in the target domain.
    Type: Application
    Filed: August 20, 2021
    Publication date: February 23, 2023
    Applicant: Google LLC
    Inventors: Andrew Rosenberg, Bhuvana Ramabhadran
  • Patent number: 11580952
    Abstract: A method includes receiving an input text sequence to be synthesized into speech in a first language and obtaining a speaker embedding, the speaker embedding specifying specific voice characteristics of a target speaker for synthesizing the input text sequence into speech that clones a voice of the target speaker. The target speaker includes a native speaker of a second language different than the first language. The method also includes generating, using a text-to-speech (TTS) model, an output audio feature representation of the input text by processing the input text sequence and the speaker embedding. The output audio feature representation includes the voice characteristics of the target speaker specified by the speaker embedding.
    Type: Grant
    Filed: April 22, 2020
    Date of Patent: February 14, 2023
    Assignee: Google LLC
    Inventors: Yu Zhang, Ron J. Weiss, Byungha Chun, Yonghui Wu, Zhifeng Chen, Russell John Wyatt Skerry-Ryan, Ye Jia, Andrew M. Rosenberg, Bhuvana Ramabhadran