Patents by Inventor Pedro J. Moreno

Pedro J. Moreno has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

LANGUAGE MODEL BIASING SYSTEM

Publication number: 20250131917

Abstract: Methods, systems, and apparatus for receiving audio data corresponding to a user utterance and context data, identifying an initial set of one or more n-grams from the context data, generating an expanded set of one or more n-grams based on the initial set of n-grams, adjusting a language model based at least on the expanded set of n-grams, determining one or more speech recognition candidates for at least a portion of the user utterance using the adjusted language model, adjusting a score for a particular speech recognition candidate determined to be included in the expanded set of n-grams, determining a transcription of user utterance that includes at least one of the one or more speech recognition candidates, and providing the transcription of the user utterance for output.

Type: Application

Filed: December 24, 2024

Publication date: April 24, 2025

Applicant: Google LLC

Inventors: Petar Aleksic, Pedro J. Moreno Mengibar
Advancing the use of text and speech in ASR pretraining with consistency and contrastive losses

Patent number: 12272363

Abstract: A method includes receiving training data that includes unspoken text utterances, un-transcribed non-synthetic speech utterances, and transcribed non-synthetic speech utterances. Each unspoken text utterance is not paired with any corresponding spoken utterance of non-synthetic speech. Each un-transcribed non-synthetic speech utterance is not paired with a corresponding transcription. Each transcribed non-synthetic speech utterance is paired with a corresponding transcription. The method also includes generating a corresponding synthetic speech representation for each unspoken textual utterance of the received training data using a text-to-speech model. The method also includes pre-training an audio encoder on the synthetic speech representations generated for the unspoken textual utterances, the un-transcribed non-synthetic speech utterances, and the transcribed non-synthetic speech utterances to teach the audio encoder to jointly learn shared speech and text representations.

Type: Grant

Filed: April 15, 2022

Date of Patent: April 8, 2025

Assignee: Google LLC

Inventors: Andrew Rosenberg, Zhehuai Chen, Bhuvana Ramabhadran, Pedro J. Moreno Mengibar, Yuan Wang, Yu Zhang
Conformer-based speech conversion model

Patent number: 12272348

Abstract: A method for speech conversion includes receiving, as input to an encoder of a speech conversion model, an input spectrogram corresponding to an utterance, the encoder including a stack of self-attention blocks. The method further includes generating, as output from the encoder, an encoded spectrogram and receiving, as input to a spectrogram decoder of the speech conversion model, the encoded spectrogram generated as output from the encoder. The method further includes generating, as output from the spectrogram decoder, an output spectrogram corresponding to a synthesized speech representation of the utterance.

Type: Grant

Filed: March 16, 2022

Date of Patent: April 8, 2025

Assignee: Google LLC

Inventors: Bhuvana Ramabhadran, Zhehuai Chen, Fadi Biadsy, Pedro J. Moreno Mengibar
Multilingual re-scoring models for automatic speech recognition

Patent number: 12254875

Abstract: A method includes receiving a sequence of acoustic frames extracted from audio data corresponding to an utterance. During a first pass, the method includes processing the sequence of acoustic frames to generate N candidate hypotheses for the utterance. During a second pass, and for each candidate hypothesis, the method includes: generating a respective un-normalized likelihood score; generating a respective external language model score; generating a standalone score that models prior statistics of the corresponding candidate hypothesis; and generating a respective overall score for the candidate hypothesis based on the un-normalized likelihood score, the external language model score, and the standalone score. The method also includes selecting the candidate hypothesis having the highest respective overall score from among the N candidate hypotheses as a final transcription of the utterance.

Type: Grant

Filed: February 27, 2024

Date of Patent: March 18, 2025

Assignee: Google LLC

Inventors: Neeraj Gaur, Tongzhou Chen, Ehsan Variani, Bhuvana Ramabhadran, Parisa Haghani, Pedro J. Moreno Mengibar
Multi-dialect and multilingual speech recognition

Patent number: 12254865

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer-readable media, for speech recognition using multi-dialect and multilingual models. In some implementations, audio data indicating audio characteristics of an utterance is received. Input features determined based on the audio data are provided to a speech recognition model that has been trained to output score indicating the likelihood of linguistic units for each of multiple different language or dialects. The speech recognition model can be one that has been trained using cluster adaptive training. Output that the speech recognition model generated in response to receiving the input features determined based on the audio data is received. A transcription of the utterance generated based on the output of the speech recognition model is provided.

Type: Grant

Filed: January 20, 2024

Date of Patent: March 18, 2025

Assignee: Google LLC

Inventors: Zhifeng Chen, Bo Li, Eugene Weinstein, Yonghui Wu, Pedro J. Moreno Mengibar, Ron J. Weiss, Khe Chai Sim, Tara N. Sainath, Patrick An Phu Nguyen
Automated calling system

Patent number: 12254883

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for an automated calling system are disclosed. In one aspect, a method includes the actions of receiving audio data of an utterance spoken by a user who is having a telephone conversation with a bot. The actions further include determining a context of the telephone conversation. The actions further include determining a user intent of a first previous portion of the telephone conversation spoken by the user and a bot intent of a second previous portion of the telephone conversation outputted by a speech synthesizer of the bot. The actions further include, based on the audio data of the utterance, the context of the telephone conversation, the user intent, and the bot intent, generating synthesized speech of a reply by the bot to the utterance. The actions further include, providing, for output, the synthesized speech.

Type: Grant

Filed: April 15, 2024

Date of Patent: March 18, 2025

Assignee: GOOGLE LLC

Inventors: Asaf Aharoni, Arun Narayanan, Nir Shabat, Parisa Haghani, Galen Tsai Chuang, Yaniv Leviathan, Neeraj Gaur, Pedro J. Moreno Mengibar, Rohit Prakash Prabhavalkar, Zhongdi Qu, Austin Severn Waters, Tomer Amiaz, Michiel A. U. Bacchiani
Injecting Text in Self-Supervised Speech Pre-training

Publication number: 20250078807

Abstract: A method includes receiving training data that includes unspoken text utterances and un-transcribed non-synthetic speech utterances. Each unspoken text utterance is not paired with any corresponding spoken utterance of non-synthetic speech. Each un-transcribed non-synthetic speech utterance is not paired with a corresponding transcription. The method also includes generating a corresponding synthetic speech representation for each unspoken textual utterance of the received training data using a text-to-speech model. The method also includes pre-training an audio encoder on the synthetic speech representations generated for the unspoken textual utterances and the un-transcribed non-synthetic speech utterances to teach the audio encoder to jointly learn shared speech and text representations.

Type: Application

Filed: November 18, 2024

Publication date: March 6, 2025

Applicant: Google LLC

Inventors: Zhehuai Chen, Bhuvana Ramabhadran, Andrew M. Rosenberg, Yu Zhang, Pedro J. Moreno Mengibar
Sub-models for neural contextual biasing

Patent number: 12230258

Abstract: A method for contextual biasing for speech recognition includes obtaining a base automatic speech recognition (ASR) model trained on non-biased data and a sub-model trained on biased data representative of a particular domain. The method includes receiving a speech recognition request including audio data characterizing an utterance captured in streaming audio. The method further includes determining whether the speech recognition request includes a contextual indicator indicating the particular domain. When the speech recognition request does not include the contextual indicator, the method includes generating, using the base ASR model, a first speech recognition result of the utterance by processing the audio data.

Type: Grant

Filed: April 19, 2022

Date of Patent: February 18, 2025

Assignee: Google LLC

Inventors: Fadi Biadsy, Pedro J. Moreno Mengibar
Language model biasing modulation

Patent number: 12230251

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for modulating language model biasing. In some implementations, context data is received. A likely context associated with a user is determined based on at least a portion of the context data. One or more language model biasing parameters based at least on the likely context associated with the user is selected. A context confidence score associated with the likely context based on at least a portion of the context data is determined. One or more language model biasing parameters based at least on the context confidence score is adjusted. A baseline language model based at least on the one or more of the adjusted language model biasing parameters is biased. The baseline language model is provided for use by an automated speech recognizer (ASR).

Type: Grant

Filed: December 12, 2022

Date of Patent: February 18, 2025

Assignee: Google LLC

Inventors: Pedro J. Moreno Mengibar, Petar Aleksic
Mixture Model Attention for Flexible Streaming and Non-Streaming Automatic Speech Recognition

Publication number: 20250022458

Abstract: A method for an automated speech recognition (ASR) model for unifying streaming and non-streaming speech recognition including receiving a sequence of acoustic frames. The method includes generating, using an audio encoder of an automatic speech recognition (ASR) model, a higher order feature representation for a corresponding acoustic frame in the sequence of acoustic frames. The method further includes generating, using a joint encoder of the ASR model, a probability distribution over possible speech recognition hypothesis at the corresponding time step based on the higher order feature representation generated by the audio encoder at the corresponding time step. The audio encoder comprises a neural network that applies mixture model (MiMo) attention to compute an attention probability distribution function (PDF) using a set of mixture components of softmaxes over a context window.

Type: Application

Filed: September 25, 2024

Publication date: January 16, 2025

Applicant: Google LLC

Inventors: Kartik Audhkhasi, Bhuvana Ramabhadran, Tongzhou Chen, Pedro J. Moreno Mengibar
Language model biasing system

Patent number: 12183328

Abstract: Methods, systems, and apparatus for receiving audio data corresponding to a user utterance and context data, identifying an initial set of one or more n-grams from the context data, generating an expanded set of one or more n-grams based on the initial set of n-grams, adjusting a language model based at least on the expanded set of n-grams, determining one or more speech recognition candidates for at least a portion of the user utterance using the adjusted language model, adjusting a score for a particular speech recognition candidate determined to be included in the expanded set of n-grams, determining a transcription of user utterance that includes at least one of the one or more speech recognition candidates, and providing the transcription of the user utterance for output.

Type: Grant

Filed: May 16, 2023

Date of Patent: December 31, 2024

Assignee: Google LLC

Inventors: Petar Aleksic, Pedro J. Moreno Mengibar
CONTEXTUAL TAGGING AND BIASING OF GRAMMARS INSIDE WORD LATTICES

Publication number: 20240428785

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for implementing contextual grammar selection are disclosed. In one aspect, a method includes the actions of receiving audio data of an utterance. The actions include generating a word lattice that includes multiple candidate transcriptions of the utterance and that includes transcription confidence scores. The actions include determining a context of the computing device. The actions include based on the context of the computing device, identifying grammars that correspond to the multiple candidate transcriptions. The actions include determining, for each of the multiple candidate transcriptions, grammar confidence scores that reflect a likelihood that a respective grammar is a match for a respective candidate transcription. The actions include selecting, from among the candidate transcriptions, a candidate transcription.

Type: Application

Filed: September 4, 2024

Publication date: December 26, 2024

Applicant: Google LLC

Inventors: Petar Aleksic, Pedro J. Moreno Mengibar, Leonid Velikovich
DETERMINING DIALOG STATES FOR LANGUAGE MODELS

Publication number: 20240428790

Abstract: Systems, methods, devices, and other techniques are described herein for determining dialog states that correspond to voice inputs and for biasing a language model based on the determined dialog states. In some implementations, a method includes receiving, at a computing system, audio data that indicates a voice input and determining a particular dialog state, from among a plurality of dialog states, which corresponds to the voice input. A set of n-grams can be identified that are associated with the particular dialog state that corresponds to the voice input. In response to identifying the set of n-grams that are associated with the particular dialog state that corresponds to the voice input, a language model can be biased by adjusting probability scores that the language model indicates for n-grams in the set of n-grams. The voice input can be transcribed using the adjusted language model.

Type: Application

Filed: September 3, 2024

Publication date: December 26, 2024

Applicant: Google LLC

Inventors: Petar Aleksic, Pedro J. Moreno Mengibar
Multilingual Re-Scoring Models for Automatic Speech Recognition

Publication number: 20240420692

Abstract: A method includes receiving a sequence of acoustic frames extracted from audio data corresponding to an utterance. During a first pass, the method includes processing the sequence of acoustic frames to generate N candidate hypotheses for the utterance. During a second pass, and for each candidate hypothesis, the method includes: generating a respective un-normalized likelihood score; generating a respective external language model score; generating a standalone score that models prior statistics of the corresponding candidate hypothesis; and generating a respective overall score for the candidate hypothesis based on the un-normalized likelihood score, the external language model score, and the standalone score. The method also includes selecting the candidate hypothesis having the highest respective overall score from among the N candidate hypotheses as a final transcription of the utterance.

Type: Application

Filed: August 28, 2024

Publication date: December 19, 2024

Applicant: Google LLC

Inventors: Neeraj Gaur, Tongzhou Chen, Ehsan Variani, Bhuvana Ramabhadran, Parisa Haghani, Pedro J. Moreno Mengibar
SERVER SIDE HOTWORDING

Publication number: 20240412734

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for detecting hotwords using a server. One of the methods includes receiving an audio signal encoding one or more utterances including a first utterance; determining whether at least a portion of the first utterance satisfies a first threshold of being at least a portion of a key phrase; in response to determining that at least the portion of the first utterance satisfies the first threshold of being at least a portion of a key phrase, sending the audio signal to a server system that determines whether the first utterance satisfies a second threshold of being the key phrase, the second threshold being more restrictive than the first threshold; and receiving tagged text data representing the one or more utterances encoded in the audio signal when the server system determines that the first utterance satisfies the second threshold.

Type: Application

Filed: August 21, 2024

Publication date: December 12, 2024

Applicant: GOOGLE LLC

Inventors: Alexander H. Gruenstein, Petar Aleksic, Johan Schalkwyk, Pedro J. Moreno Mengibar
Injecting text in self-supervised speech pre-training

Patent number: 12159617

Abstract: A method includes receiving training data that includes unspoken text utterances and un-transcribed non-synthetic speech utterances. Each unspoken text utterance is not paired with any corresponding spoken utterance of non-synthetic speech. Each un-transcribed non-synthetic speech utterance is not paired with a corresponding transcription. The method also includes generating a corresponding synthetic speech representation for each unspoken textual utterance of the received training data using a text-to-speech model. The method also includes pre-training an audio encoder on the synthetic speech representations generated for the unspoken textual utterances and the un-transcribed non-synthetic speech utterances to teach the audio encoder to jointly learn shared speech and text representations.

Type: Grant

Filed: June 21, 2022

Date of Patent: December 3, 2024

Assignee: Google LLC

Inventors: Zhehuai Chen, Bhuvana Ramabhadran, Andrew M. Rosenberg, Yu Zhang, Pedro J. Moreno Mengibar
Mixture model attention for flexible streaming and non-streaming automatic speech recognition

Patent number: 12136415

Abstract: A method for an automated speech recognition (ASR) model for unifying streaming and non-streaming speech recognition including receiving a sequence of acoustic frames. The method includes generating, using an audio encoder of an automatic speech recognition (ASR) model, a higher order feature representation for a corresponding acoustic frame in the sequence of acoustic frames. The method further includes generating, using a joint encoder of the ASR model, a probability distribution over possible speech recognition hypothesis at the corresponding time step based on the higher order feature representation generated by the audio encoder at the corresponding time step. The audio encoder comprises a neural network that applies mixture model (MiMo) attention to compute an attention probability distribution function (PDF) using a set of mixture components of softmaxes over a context window.

Type: Grant

Filed: December 15, 2021

Date of Patent: November 5, 2024

Assignee: Google LLC

Inventors: Kartik Audhkhasi, Bhuvana Ramabhadran, Tongzhou Chen, Pedro J. Moreno Mengibar
Server side hotwording

Patent number: 12094472

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for detecting hotwords using a server. One of the methods includes receiving an audio signal encoding one or more utterances including a first utterance; determining whether at least a portion of the first utterance satisfies a first threshold of being at least a portion of a key phrase; in response to determining that at least the portion of the first utterance satisfies the first threshold of being at least a portion of a key phrase, sending the audio signal to a server system that determines whether the first utterance satisfies a second threshold of being the key phrase, the second threshold being more restrictive than the first threshold; and receiving tagged text data representing the one or more utterances encoded in the audio signal when the server system determines that the first utterance satisfies the second threshold.

Type: Grant

Filed: June 30, 2023

Date of Patent: September 17, 2024

Assignee: GOOGLE LLC

Inventors: Alexander H. Gruenstein, Petar Aleksic, Johan Schalkwyk, Pedro J. Moreno Mengibar
Multilingual re-scoring models for automatic speech recognition

Patent number: 12080283

Abstract: A method includes receiving a sequence of acoustic frames extracted from audio data corresponding to an utterance. During a first pass, the method includes processing the sequence of acoustic frames to generate N candidate hypotheses for the utterance. During a second pass, and for each candidate hypothesis, the method includes: generating a respective un-normalized likelihood score; generating a respective external language model score; generating a standalone score that models prior statistics of the corresponding candidate hypothesis; and generating a respective overall score for the candidate hypothesis based on the un-normalized likelihood score, the external language model score, and the standalone score. The method also includes selecting the candidate hypothesis having the highest respective overall score from among the N candidate hypotheses as a final transcription of the utterance.

Type: Grant

Filed: March 22, 2022

Date of Patent: September 3, 2024

Assignee: Google LLC

Inventors: Neeraj Gaur, Tongzhou Chen, Ehsan Variani, Bhuvana Ramabhadran, Parisa Haghani, Pedro J. Moreno Mengibar
CHUNK-WISE ATTENTION FOR LONGFORM ASR

Publication number: 20240290321

Abstract: A method includes receiving training data including a corpus of multilingual unspoken textual utterances, a corpus of multilingual un-transcribed non-synthetic speech utterances, and a corpus of multilingual transcribed non-synthetic speech utterances. For each un-transcribed non-synthetic speech utterance, the method includes generating a target quantized vector token and a target token index, generating contrastive context vectors from corresponding masked audio features, and deriving a contrastive loss term. The method also includes generating an alignment output, generating a first probability distribution over possible speech recognition hypotheses for the alignment output, and determining an alignment output loss term. The method also includes generating a second probability distribution over possible speech recognition hypotheses and determining a non-synthetic speech loss term.

Type: Application

Filed: February 23, 2024

Publication date: August 29, 2024

Applicant: Google LLC

Inventors: Yongqiang Wang, Yu Zhang, Wei Han, Parisa Haghani, Pedro J. Moreno Mengibar

1 2 3 4 5 … next