Patents by Inventor Pedro Jose Moreno Mengibar

Pedro Jose Moreno Mengibar has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Consistency prediction on streaming sequence models

Patent number: 11929060

Abstract: A method for training a speech recognition model includes receiving a set of training utterance pairs each including a non-synthetic speech representation and a synthetic speech representation of a same corresponding utterance. At each of a plurality of output steps for each training utterance pair in the set of training utterance pairs, the method also includes determining a consistent loss term for the corresponding training utterance pair based on a first probability distribution over possible non-synthetic speech recognition hypotheses generated for the corresponding non-synthetic speech representation and a second probability distribution over possible synthetic speech recognition hypotheses generated for the corresponding synthetic speech representation. The first and second probability distributions are generated for output by the speech recognition model.

Type: Grant

Filed: February 8, 2021

Date of Patent: March 12, 2024

Assignee: Google LLC

Inventors: Zhehuai Chen, Andrew Rosenberg, Bhuvana Ramabhadran, Pedro Jose Moreno Mengibar
Sub-models for Neural Contextual Biasing with Attention and Embedding Space

Publication number: 20240021190

Abstract: A method for training a sub-model for contextual biasing for speech recognition includes obtaining a base speech recognition model trained on non-biased data. The method includes obtaining a set of training utterances representative of a particular domain, each training utterance in the set of training utterances including audio data characterizing the training utterances and a ground truth transcription of the training utterance. The method further includes, for each corresponding training utterance in the set of training utterances, determining, using an embedding encoder, a corresponding document embedding from the ground truth transcription of the corresponding training utterance. The method includes training, using the corresponding document embeddings determined from the ground truth transcriptions of the set of training utterances, a sub-model to bias the base speech recognition model to recognize speech in the particular domain.

Type: Application

Filed: July 18, 2022

Publication date: January 18, 2024

Applicant: Google LLC

Inventors: Fadi Biadsy, Pedro Jose Moreno Mengibar
Speech recognition

Patent number: 11580994

Abstract: A method includes receiving acoustic features of a first utterance spoken by a first user that speaks with typical speech and processing the acoustic features of the first utterance using a general speech recognizer to generate a first transcription of the first utterance. The operations also include analyzing the first transcription of the first utterance to identify one or more bias terms in the first transcription and biasing the alternative speech recognizer on the one or more bias terms identified in the first transcription. The operations also include receiving acoustic features of a second utterance spoken by a second user that speaks with atypical speech and processing, using the alternative speech recognizer biased on the one or more terms identified in the first transcription, the acoustic features of the second utterance to generate a second transcription of the second utterance.

Type: Grant

Filed: January 20, 2021

Date of Patent: February 14, 2023

Assignee: Google LLC

Inventors: Fadi Biadsy, Pedro Jose Moreno Mengibar
Speaker Identification Accuracy

Publication number: 20230015169

Abstract: A method of generating an accurate speaker representation for an audio sample includes receiving a first audio sample from a first speaker and a second audio sample from a second speaker. The method includes dividing a respective audio sample into a plurality of audio slices. The method also includes, based on the plurality of slices, generating a set of candidate acoustic embeddings where each candidate acoustic embedding includes a vector representation of acoustic features. The method further includes removing a subset of the candidate acoustic embeddings from the set of candidate acoustic embeddings. The method additionally includes generating an aggregate acoustic embedding from the remaining candidate acoustic embeddings in the set of candidate acoustic embeddings after removing the subset of the candidate acoustic embeddings.

Type: Application

Filed: September 19, 2022

Publication date: January 19, 2023

Applicant: Google LLC

Inventors: Yeming Fang, Quan Wang, Pedro Jose Moreno Mengibar, Ignacio Lopez Moreno, Gang Feng, Fang Chu, Jin Shi, Jason William Pelecanos
Speaker identification accuracy

Patent number: 11468900

Abstract: A method of generating an accurate speaker representation for an audio sample includes receiving a first audio sample from a first speaker and a second audio sample from a second speaker. The method includes dividing a respective audio sample into a plurality of audio slices. The method also includes, based on the plurality of slices, generating a set of candidate acoustic embeddings where each candidate acoustic embedding includes a vector representation of acoustic features. The method further includes removing a subset of the candidate acoustic embeddings from the set of candidate acoustic embeddings. The method additionally includes generating an aggregate acoustic embedding from the remaining candidate acoustic embeddings in the set of candidate acoustic embeddings after removing the subset of the candidate acoustic embeddings.

Type: Grant

Filed: October 15, 2020

Date of Patent: October 11, 2022

Assignee: Google LLC

Inventors: Yeming Fang, Quan Wang, Pedro Jose Moreno Mengibar, Ignacio Lopez Moreno, Gang Feng, Fang Chu, Jin Shi, Jason William Pelecanos
SELECTIVELY INVOKING AN AUTOMATED ASSISTANT BASED ON DETECTED ENVIRONMENTAL CONDITIONS WITHOUT NECESSITATING VOICE-BASED INVOCATION OF THE AUTOMATED ASSISTANT

Publication number: 20220310089

Abstract: Implementations set forth herein relate to an automated assistant that is invoked according to contextual signals—in lieu of requiring a user to explicitly speak an invocation phrase. When a user is in an environment with an assistant-enabled device, contextual data characterizing features of the environment can be processed to determine whether a user intends to invoke the automated assistant. Therefore, when such features are detected by the automated assistant, the automated assistant can bypass requiring an invocation phrase from a user and, instead, be responsive to one or more assistant commands from the user. The automated assistant can operate based on a trained machine learning model that is trained using instances of training data that characterize previous interactions in which one or more users invoked or did not invoke the automated assistant.

Type: Application

Filed: January 17, 2020

Publication date: September 29, 2022

Inventors: Petar Aleksic, Pedro Jose Moreno Mengibar
DETERMINING DIALOG STATES FOR LANGUAGE MODELS

Publication number: 20220165270

Abstract: Systems, methods, devices, and other techniques are described herein for determining dialog states that correspond to voice inputs and for biasing a language model based on the determined dialog states. In some implementations, a method includes receiving, at a computing system, audio data that indicates a voice input and determining a particular dialog state, from among a plurality of dialog states, which corresponds to the voice input. A set of n-grams can be identified that are associated with the particular dialog state that corresponds to the voice input. In response to identifying the set of n-grams that are associated with the particular dialog state that corresponds to the voice input, a language model can be biased by adjusting probability scores that the language model indicates for n-grams in the set of n-grams. The voice input can be transcribed using the adjusted language model.

Type: Application

Filed: February 10, 2022

Publication date: May 26, 2022

Applicant: Google LLC

Inventors: Petar Aleksic, Pedro Jose Moreno Mengibar
Speaker Identification Accuracy

Publication number: 20220122612

Abstract: A method of generating an accurate speaker representation for an audio sample includes receiving a first audio sample from a first speaker and a second audio sample from a second speaker. The method includes dividing a respective audio sample into a plurality of audio slices. The method also includes, based on the plurality of slices, generating a set of candidate acoustic embeddings where each candidate acoustic embedding includes a vector representation of acoustic features. The method further includes removing a subset of the candidate acoustic embeddings from the set of candidate acoustic embeddings. The method additionally includes generating an aggregate acoustic embedding from the remaining candidate acoustic embeddings in the set of candidate acoustic embeddings after removing the subset of the candidate acoustic embeddings.

Type: Application

Filed: October 15, 2020

Publication date: April 21, 2022

Applicant: Google LLC

Inventors: Yeming Fang, Quan Wang, Pedro Jose Moreno Mengibar, Ignacio Lopez Moreno, Gang Feng, Fang Chu, Jin Shi, Jason William Pelecanos
Consistency Prediction On Streaming Sequence Models

Publication number: 20210280170

Abstract: A method for training a speech recognition model includes receiving a set of training utterance pairs each including a non-synthetic speech representation and a synthetic speech representation of a same corresponding utterance. At each of a plurality of output steps for each training utterance pair in the set of training utterance pairs, the method also includes determining a consistent loss term for the corresponding training utterance pair based on a first probability distribution over possible non-synthetic speech recognition hypotheses generated for the corresponding non-synthetic speech representation and a second probability distribution over possible synthetic speech recognition hypotheses generated for the corresponding synthetic speech representation. The first and second probability distributions are generated for output by the speech recognition model.

Type: Application

Filed: February 8, 2021

Publication date: September 9, 2021

Applicant: Google LLC

Inventors: Zhehuai Chen, Andrew Rosenberg, Bhuvana Ramabhadran, Pedro Jose Moreno Mengibar
SPEECH RECOGNITION

Publication number: 20210241777

Abstract: A method includes receiving acoustic features of a first utterance spoken by a first user that speaks with typical speech and processing the acoustic features of the first utterance using a general speech recognizer to generate a first transcription of the first utterance. The operations also include analyzing the first transcription of the first utterance to identify one or more bias terms in the first transcription and biasing the alternative speech recognizer on the one or more bias terms identified in the first transcription. The operations also include receiving acoustic features of a second utterance spoken by a second user that speaks with atypical speech and processing, using the alternative speech recognizer biased on the one or more terms identified in the first transcription, the acoustic features of the second utterance to generate a second transcription of the second utterance.

Type: Application

Filed: January 20, 2021

Publication date: August 5, 2021

Applicant: Google LLC

Inventors: Fadi Bladsy, Pedro Jose Moreno Mengibar
Customized query-action mappings for an offline grammar model

Patent number: 9836527

Abstract: An offline semantic processor of a resource-constrained voice-enabled device such as a mobile device utilizes an offline grammar model with reduced resource requirements to parse voice-based queries. In various implementations, a query issued at a resource-constrained device may be semantically processed to identify candidate responsive actions that are performable by the resource-constrained device. Candidate responsive action performance statistics may be analyzed to select, from the one or more candidate responsive actions, a qualifying responsive action. In various implementations, the candidate responsive action performance statistics may relate to performance of the one or more candidate responsive actions by the resource-constrained device following issuance of the query.

Type: Grant

Filed: February 24, 2016

Date of Patent: December 5, 2017

Assignee: Google LLC

Inventors: Yuli Gao, Sangsoo Sung, Pedro Jose Moreno Mengibar
Customized Query-Action Mappings for an Offline Grammar Model

Publication number: 20170242914

Abstract: An offline semantic processor of a resource-constrained voice-enabled device such as a mobile device utilizes an offline grammar model with reduced resource requirements to parse voice-based queries. In various implementations, a query issued at a resource-constrained device may be semantically processed to identify candidate responsive actions that are performable by the resource-constrained device. Candidate responsive action performance statistics may be analyzed to select, from the one or more candidate responsive actions, a qualifying responsive action. In various implementations, the candidate responsive action performance statistics may relate to performance of the one or more candidate responsive actions by the resource-constrained device following issuance of the query.

Type: Application

Filed: February 24, 2016

Publication date: August 24, 2017

Inventors: Yuli Gao, Sangsoo Sung, Pedro Jose Moreno Mengibar