Patents by Inventor Pedro Jose Moreno Mengibar

Pedro Jose Moreno Mengibar has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11929060
    Abstract: A method for training a speech recognition model includes receiving a set of training utterance pairs each including a non-synthetic speech representation and a synthetic speech representation of a same corresponding utterance. At each of a plurality of output steps for each training utterance pair in the set of training utterance pairs, the method also includes determining a consistent loss term for the corresponding training utterance pair based on a first probability distribution over possible non-synthetic speech recognition hypotheses generated for the corresponding non-synthetic speech representation and a second probability distribution over possible synthetic speech recognition hypotheses generated for the corresponding synthetic speech representation. The first and second probability distributions are generated for output by the speech recognition model.
    Type: Grant
    Filed: February 8, 2021
    Date of Patent: March 12, 2024
    Assignee: Google LLC
    Inventors: Zhehuai Chen, Andrew Rosenberg, Bhuvana Ramabhadran, Pedro Jose Moreno Mengibar
  • Publication number: 20240021190
    Abstract: A method for training a sub-model for contextual biasing for speech recognition includes obtaining a base speech recognition model trained on non-biased data. The method includes obtaining a set of training utterances representative of a particular domain, each training utterance in the set of training utterances including audio data characterizing the training utterances and a ground truth transcription of the training utterance. The method further includes, for each corresponding training utterance in the set of training utterances, determining, using an embedding encoder, a corresponding document embedding from the ground truth transcription of the corresponding training utterance. The method includes training, using the corresponding document embeddings determined from the ground truth transcriptions of the set of training utterances, a sub-model to bias the base speech recognition model to recognize speech in the particular domain.
    Type: Application
    Filed: July 18, 2022
    Publication date: January 18, 2024
    Applicant: Google LLC
    Inventors: Fadi Biadsy, Pedro Jose Moreno Mengibar
  • Patent number: 11580994
    Abstract: A method includes receiving acoustic features of a first utterance spoken by a first user that speaks with typical speech and processing the acoustic features of the first utterance using a general speech recognizer to generate a first transcription of the first utterance. The operations also include analyzing the first transcription of the first utterance to identify one or more bias terms in the first transcription and biasing the alternative speech recognizer on the one or more bias terms identified in the first transcription. The operations also include receiving acoustic features of a second utterance spoken by a second user that speaks with atypical speech and processing, using the alternative speech recognizer biased on the one or more terms identified in the first transcription, the acoustic features of the second utterance to generate a second transcription of the second utterance.
    Type: Grant
    Filed: January 20, 2021
    Date of Patent: February 14, 2023
    Assignee: Google LLC
    Inventors: Fadi Biadsy, Pedro Jose Moreno Mengibar
  • Publication number: 20230015169
    Abstract: A method of generating an accurate speaker representation for an audio sample includes receiving a first audio sample from a first speaker and a second audio sample from a second speaker. The method includes dividing a respective audio sample into a plurality of audio slices. The method also includes, based on the plurality of slices, generating a set of candidate acoustic embeddings where each candidate acoustic embedding includes a vector representation of acoustic features. The method further includes removing a subset of the candidate acoustic embeddings from the set of candidate acoustic embeddings. The method additionally includes generating an aggregate acoustic embedding from the remaining candidate acoustic embeddings in the set of candidate acoustic embeddings after removing the subset of the candidate acoustic embeddings.
    Type: Application
    Filed: September 19, 2022
    Publication date: January 19, 2023
    Applicant: Google LLC
    Inventors: Yeming Fang, Quan Wang, Pedro Jose Moreno Mengibar, Ignacio Lopez Moreno, Gang Feng, Fang Chu, Jin Shi, Jason William Pelecanos
  • Patent number: 11468900
    Abstract: A method of generating an accurate speaker representation for an audio sample includes receiving a first audio sample from a first speaker and a second audio sample from a second speaker. The method includes dividing a respective audio sample into a plurality of audio slices. The method also includes, based on the plurality of slices, generating a set of candidate acoustic embeddings where each candidate acoustic embedding includes a vector representation of acoustic features. The method further includes removing a subset of the candidate acoustic embeddings from the set of candidate acoustic embeddings. The method additionally includes generating an aggregate acoustic embedding from the remaining candidate acoustic embeddings in the set of candidate acoustic embeddings after removing the subset of the candidate acoustic embeddings.
    Type: Grant
    Filed: October 15, 2020
    Date of Patent: October 11, 2022
    Assignee: Google LLC
    Inventors: Yeming Fang, Quan Wang, Pedro Jose Moreno Mengibar, Ignacio Lopez Moreno, Gang Feng, Fang Chu, Jin Shi, Jason William Pelecanos
  • Publication number: 20220310089
    Abstract: Implementations set forth herein relate to an automated assistant that is invoked according to contextual signals—in lieu of requiring a user to explicitly speak an invocation phrase. When a user is in an environment with an assistant-enabled device, contextual data characterizing features of the environment can be processed to determine whether a user intends to invoke the automated assistant. Therefore, when such features are detected by the automated assistant, the automated assistant can bypass requiring an invocation phrase from a user and, instead, be responsive to one or more assistant commands from the user. The automated assistant can operate based on a trained machine learning model that is trained using instances of training data that characterize previous interactions in which one or more users invoked or did not invoke the automated assistant.
    Type: Application
    Filed: January 17, 2020
    Publication date: September 29, 2022
    Inventors: Petar Aleksic, Pedro Jose Moreno Mengibar
  • Publication number: 20220165270
    Abstract: Systems, methods, devices, and other techniques are described herein for determining dialog states that correspond to voice inputs and for biasing a language model based on the determined dialog states. In some implementations, a method includes receiving, at a computing system, audio data that indicates a voice input and determining a particular dialog state, from among a plurality of dialog states, which corresponds to the voice input. A set of n-grams can be identified that are associated with the particular dialog state that corresponds to the voice input. In response to identifying the set of n-grams that are associated with the particular dialog state that corresponds to the voice input, a language model can be biased by adjusting probability scores that the language model indicates for n-grams in the set of n-grams. The voice input can be transcribed using the adjusted language model.
    Type: Application
    Filed: February 10, 2022
    Publication date: May 26, 2022
    Applicant: Google LLC
    Inventors: Petar Aleksic, Pedro Jose Moreno Mengibar
  • Publication number: 20220122612
    Abstract: A method of generating an accurate speaker representation for an audio sample includes receiving a first audio sample from a first speaker and a second audio sample from a second speaker. The method includes dividing a respective audio sample into a plurality of audio slices. The method also includes, based on the plurality of slices, generating a set of candidate acoustic embeddings where each candidate acoustic embedding includes a vector representation of acoustic features. The method further includes removing a subset of the candidate acoustic embeddings from the set of candidate acoustic embeddings. The method additionally includes generating an aggregate acoustic embedding from the remaining candidate acoustic embeddings in the set of candidate acoustic embeddings after removing the subset of the candidate acoustic embeddings.
    Type: Application
    Filed: October 15, 2020
    Publication date: April 21, 2022
    Applicant: Google LLC
    Inventors: Yeming Fang, Quan Wang, Pedro Jose Moreno Mengibar, Ignacio Lopez Moreno, Gang Feng, Fang Chu, Jin Shi, Jason William Pelecanos
  • Publication number: 20210280170
    Abstract: A method for training a speech recognition model includes receiving a set of training utterance pairs each including a non-synthetic speech representation and a synthetic speech representation of a same corresponding utterance. At each of a plurality of output steps for each training utterance pair in the set of training utterance pairs, the method also includes determining a consistent loss term for the corresponding training utterance pair based on a first probability distribution over possible non-synthetic speech recognition hypotheses generated for the corresponding non-synthetic speech representation and a second probability distribution over possible synthetic speech recognition hypotheses generated for the corresponding synthetic speech representation. The first and second probability distributions are generated for output by the speech recognition model.
    Type: Application
    Filed: February 8, 2021
    Publication date: September 9, 2021
    Applicant: Google LLC
    Inventors: Zhehuai Chen, Andrew Rosenberg, Bhuvana Ramabhadran, Pedro Jose Moreno Mengibar
  • Publication number: 20210241777
    Abstract: A method includes receiving acoustic features of a first utterance spoken by a first user that speaks with typical speech and processing the acoustic features of the first utterance using a general speech recognizer to generate a first transcription of the first utterance. The operations also include analyzing the first transcription of the first utterance to identify one or more bias terms in the first transcription and biasing the alternative speech recognizer on the one or more bias terms identified in the first transcription. The operations also include receiving acoustic features of a second utterance spoken by a second user that speaks with atypical speech and processing, using the alternative speech recognizer biased on the one or more terms identified in the first transcription, the acoustic features of the second utterance to generate a second transcription of the second utterance.
    Type: Application
    Filed: January 20, 2021
    Publication date: August 5, 2021
    Applicant: Google LLC
    Inventors: Fadi Bladsy, Pedro Jose Moreno Mengibar
  • Patent number: 9836527
    Abstract: An offline semantic processor of a resource-constrained voice-enabled device such as a mobile device utilizes an offline grammar model with reduced resource requirements to parse voice-based queries. In various implementations, a query issued at a resource-constrained device may be semantically processed to identify candidate responsive actions that are performable by the resource-constrained device. Candidate responsive action performance statistics may be analyzed to select, from the one or more candidate responsive actions, a qualifying responsive action. In various implementations, the candidate responsive action performance statistics may relate to performance of the one or more candidate responsive actions by the resource-constrained device following issuance of the query.
    Type: Grant
    Filed: February 24, 2016
    Date of Patent: December 5, 2017
    Assignee: Google LLC
    Inventors: Yuli Gao, Sangsoo Sung, Pedro Jose Moreno Mengibar
  • Publication number: 20170242914
    Abstract: An offline semantic processor of a resource-constrained voice-enabled device such as a mobile device utilizes an offline grammar model with reduced resource requirements to parse voice-based queries. In various implementations, a query issued at a resource-constrained device may be semantically processed to identify candidate responsive actions that are performable by the resource-constrained device. Candidate responsive action performance statistics may be analyzed to select, from the one or more candidate responsive actions, a qualifying responsive action. In various implementations, the candidate responsive action performance statistics may relate to performance of the one or more candidate responsive actions by the resource-constrained device following issuance of the query.
    Type: Application
    Filed: February 24, 2016
    Publication date: August 24, 2017
    Inventors: Yuli Gao, Sangsoo Sung, Pedro Jose Moreno Mengibar