Patents by Inventor Antoine Jean Bruguier

Antoine Jean Bruguier has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Compressed recurrent neural network models

Patent number: 11948062

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for implementing a compressed recurrent neural network (RNN). One of the systems includes a compressed RNN, the compressed RNN comprising a plurality of recurrent layers, wherein each of the recurrent layers has a respective recurrent weight matrix and a respective inter-layer weight matrix, and wherein at least one of recurrent layers is compressed such that a respective recurrent weight matrix of the compressed layer is defined by a first compressed weight matrix and a projection matrix and a respective inter-layer weight matrix of the compressed layer is defined by a second compressed weight matrix and the projection matrix.

Type: Grant

Filed: December 4, 2020

Date of Patent: April 2, 2024

Assignee: Google LLC

Inventors: Ouais Alsharif, Rohit Prakash Prabhavalkar, Ian C. McGraw, Antoine Jean Bruguier
Phoneme-based contextualization for cross-lingual speech recognition in end-to-end models

Patent number: 11942076

Abstract: A method includes receiving audio data encoding an utterance spoken by a native speaker of a first language, and receiving a biasing term list including one or more terms in a second language different than the first language. The method also includes processing, using a speech recognition model, acoustic features derived from the audio data to generate speech recognition scores for both wordpieces and corresponding phoneme sequences in the first language. The method also includes rescoring the speech recognition scores for the phoneme sequences based on the one or more terms in the biasing term list, and executing, using the speech recognition scores for the wordpieces and the rescored speech recognition scores for the phoneme sequences, a decoding graph to generate a transcription for the utterance.

Type: Grant

Filed: February 16, 2022

Date of Patent: March 26, 2024

Assignee: Google LLC

Inventors: Ke Hu, Golan Pundak, Rohit Prakash Prabhavalkar, Antoine Jean Bruguier, Tara N. Sainath
Flickering Reduction with Partial Hypothesis Re-ranking for Streaming ASR

Publication number: 20240029718

Abstract: A method includes processing, using a speech recognizer, a first portion of audio data to generate a first lattice, and generating a first partial transcription for an utterance based on the first lattice. The method includes processing, using the recognizer, a second portion of the data to generate, based on the first lattice, a second lattice representing a plurality of partial speech recognition hypotheses for the utterance and a plurality of corresponding speech recognition scores. For each particular partial speech recognition hypothesis, the method includes generating a corresponding re-ranked score based on the corresponding speech recognition score and whether the particular partial speech recognition hypothesis shares a prefix with the first partial transcription.

Type: Application

Filed: July 13, 2023

Publication date: January 25, 2024

Applicant: Google LLC

Inventors: Antoine Jean Bruguier, David Qiu, Yangzhang He, Trevor Strohman
Contextual Biasing for Speech Recognition

Publication number: 20230274736

Abstract: A method of biasing speech recognition includes receiving audio data encoding an utterance and obtaining a set of one or more biasing phrases corresponding to a context of the utterance. Each biasing phrase in the set of one or more biasing phrases includes one or more words. The method also includes processing, using a speech recognition model, acoustic features derived from the audio data and grapheme and phoneme data derived from the set of one or more biasing phrases to generate an output of the speech recognition model. The method also includes determining a transcription for the utterance based on the output of the speech recognition model.

Type: Application

Filed: May 4, 2023

Publication date: August 31, 2023

Applicant: Google LLC

Inventors: Rohit Prakash Prabhavalkar, Golan Pundak, Tara N. Sainath, Antoine Jean Bruguier
Contextual biasing for speech recognition

Patent number: 11664021

Abstract: A method of biasing speech recognition includes receiving audio data encoding an utterance and obtaining a set of one or more biasing phrases corresponding to a context of the utterance. Each biasing phrase in the set of one or more biasing phrases includes one or more words. The method also includes processing, using a speech recognition model, acoustic features derived from the audio data and grapheme and phoneme data derived from the set of one or more biasing phrases to generate an output of the speech recognition model. The method also includes determining a transcription for the utterance based on the output of the speech recognition model.

Type: Grant

Filed: December 9, 2021

Date of Patent: May 30, 2023

Assignee: Google LLC

Inventors: Rohit Prakash Prabhavalkar, Golan Pundak, Tara N. Sainath, Antoine Jean Bruguier
TWO-PASS END TO END SPEECH RECOGNITION

Publication number: 20220238101

Abstract: Two-pass automatic speech recognition (ASR) models can be used to perform streaming on-device ASR to generate a text representation of an utterance captured in audio data. Various implementations include a first-pass portion of the ASR model used to generate streaming candidate recognition(s) of an utterance captured in audio data. For example, the first-pass portion can include a recurrent neural network transformer (RNN-T) decoder. Various implementations include a second-pass portion of the ASR model used to revise the streaming candidate recognition(s) of the utterance and generate a text representation of the utterance. For example, the second-pass portion can include a listen attend spell (LAS) decoder. Various implementations include a shared encoder shared between the RNN-T decoder and the LAS decoder.

Type: Application

Filed: December 3, 2020

Publication date: July 28, 2022

Inventors: Tara N. Sainath, Yanzhang He, Bo Li, Arun Narayanan, Ruoming Pang, Antoine Jean Bruguier, Shuo-yiin Chang, Wei Li
PHONEME-BASED CONTEXTUALIZATION FOR CROSS-LINGUAL SPEECH RECOGNITION IN END-TO-END MODELS

Publication number: 20220172706

Abstract: A method includes receiving audio data encoding an utterance spoken by a native speaker of a first language, and receiving a biasing term list including one or more terms in a second language different than the first language. The method also includes processing, using a speech recognition model, acoustic features derived from the audio data to generate speech recognition scores for both wordpieces and corresponding phoneme sequences in the first language. The method also includes rescoring the speech recognition scores for the phoneme sequences based on the one or more terms in the biasing term list, and executing, using the speech recognition scores for the wordpieces and the rescored speech recognition scores for the phoneme sequences, a decoding graph to generate a transcription for the utterance.

Type: Application

Filed: February 16, 2022

Publication date: June 2, 2022

Applicant: Google LLC

Inventors: Ke Hu, Golan Pundak, Rohit Prakash Prabhavalkar, Antoine Jean Bruguier, Tara N. Sainath
Contextual Biasing for Speech Recognition

Publication number: 20220101836

Abstract: A method of biasing speech recognition includes receiving audio data encoding an utterance and obtaining a set of one or more biasing phrases corresponding to a context of the utterance. Each biasing phrase in the set of one or more biasing phrases includes one or more words. The method also includes processing, using a speech recognition model, acoustic features derived from the audio data and grapheme and phoneme data derived from the set of one or more biasing phrases to generate an output of the speech recognition model. The method also includes determining a transcription for the utterance based on the output of the speech recognition model.

Type: Application

Filed: December 9, 2021

Publication date: March 31, 2022

Applicant: Google LLC

Inventors: Rohit Prakash Prabhavalkar, Golan Pundak, Tara N. Sainath, Antoine Jean Bruguier
Phoneme-based contextualization for cross-lingual speech recognition in end-to-end models

Patent number: 11270687

Abstract: A method includes receiving audio data encoding an utterance spoken by a native speaker of a first language, and receiving a biasing term list including one or more terms in a second language different than the first language. The method also includes processing, using a speech recognition model, acoustic features derived from the audio data to generate speech recognition scores for both wordpieces and corresponding phoneme sequences in the first language. The method also includes rescoring the speech recognition scores for the phoneme sequences based on the one or more terms in the biasing term list, and executing, using the speech recognition scores for the wordpieces and the rescored speech recognition scores for the phoneme sequences, a decoding graph to generate a transcription for the utterance.

Type: Grant

Filed: April 28, 2020

Date of Patent: March 8, 2022

Assignee: Google LLC

Inventors: Ke Hu, Antoine Jean Bruguier, Tara N. Sainath, Rohit Prakash Prabhavalkar, Golan Pundak
Contextual biasing for speech recognition using grapheme and phoneme data

Patent number: 11217231

Abstract: A method of biasing speech recognition includes receiving audio data encoding an utterance and obtaining a set of one or more biasing phrases corresponding to a context of the utterance. Each biasing phrase in the set of one or more biasing phrases includes one or more words. The method also includes processing, using a speech recognition model, acoustic features derived from the audio data and grapheme and phoneme data derived from the set of one or more biasing phrases to generate an output of the speech recognition model. The method also includes determining a transcription for the utterance based on the output of the speech recognition model.

Type: Grant

Filed: April 30, 2020

Date of Patent: January 4, 2022

Assignee: Google LLC

Inventors: Rohit Prakash Prabhavalkar, Golan Pundak, Tara N. Sainath, Antoine Jean Bruguier
COMPRESSED RECURRENT NEURAL NETWORK MODELS

Publication number: 20210089916

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for implementing a compressed recurrent neural network (RNN). One of the systems includes a compressed RNN, the compressed RNN comprising a plurality of recurrent layers, wherein each of the recurrent layers has a respective recurrent weight matrix and a respective inter-layer weight matrix, and wherein at least one of recurrent layers is compressed such that a respective recurrent weight matrix of the compressed layer is defined by a first compressed weight matrix and a projection matrix and a respective inter-layer weight matrix of the compressed layer is defined by a second compressed weight matrix and the projection matrix.

Type: Application

Filed: December 4, 2020

Publication date: March 25, 2021

Applicant: Google LLC

Inventors: Ouais Alsharif, Rohit Prakash Prabhavalkar, Ian C. McGraw, Antoine Jean Bruguier
Compressed recurrent neural network models

Patent number: 10878319

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for implementing a compressed recurrent neural network (RNN). One of the systems includes a compressed RNN, the compressed RNN comprising a plurality of recurrent layers, wherein each of the recurrent layers has a respective recurrent weight matrix and a respective inter-layer weight matrix, and wherein at least one of recurrent layers is compressed such that a respective recurrent weight matrix of the compressed layer is defined by a first compressed weight matrix and a projection matrix and a respective inter-layer weight matrix of the compressed layer is defined by a second compressed weight matrix and the projection matrix.

Type: Grant

Filed: December 29, 2016

Date of Patent: December 29, 2020

Assignee: Google LLC

Inventors: Ouais Alsharif, Rohit Prakash Prabhavalkar, Ian C. McGraw, Antoine Jean Bruguier
CONTEXTUAL BIASING FOR SPEECH RECOGNITION

Publication number: 20200402501

Abstract: A method of biasing speech recognition includes receiving audio data encoding an utterance and obtaining a set of one or more biasing phrases corresponding to a context of the utterance. Each biasing phrase in the set of one or more biasing phrases includes one or more words. The method also includes processing, using a speech recognition model, acoustic features derived from the audio data and grapheme and phoneme data derived from the set of one or more biasing phrases to generate an output of the speech recognition model. The method also includes determining a transcription for the utterance based on the output of the speech recognition model.

Type: Application

Filed: April 30, 2020

Publication date: December 24, 2020

Applicant: Google LLC

Inventors: Rohit Prakash Prabhavalkar, Golan Pundak, Tara N. Sainath, Antoine Jean Bruguier
PHONEME-BASED CONTEXTUALIZATION FOR CROSS-LINGUAL SPEECH RECOGNITION IN END-TO-END MODELS

Publication number: 20200349923

Abstract: A method includes receiving audio data encoding an utterance spoken by a native speaker of a first language, and receiving a biasing term list including one or more terms in a second language different than the first language. The method also includes processing, using a speech recognition model, acoustic features derived from the audio data to generate speech recognition scores for both wordpieces and corresponding phoneme sequences in the first language. The method also includes rescoring the speech recognition scores for the phoneme sequences based on the one or more terms in the biasing term list, and executing, using the speech recognition scores for the wordpieces and the rescored speech recognition scores for the phoneme sequences, a decoding graph to generate a transcription for the utterance.

Type: Application

Filed: April 28, 2020

Publication date: November 5, 2020

Applicant: Google LLC

Inventors: Ke Hu, Antoine Jean Bruguier, Tara N. Sainath, Rohit Prakash Prabhavalkar, Golan Pundak
Date and/or time resolution

Patent number: 10484319

Abstract: Methods and apparatus are disclosed for resolving multiple interpretations of an ambiguous temporal term of a resource to a subset of the multiple interpretations. In some implementations, a group of one or more messages is identified, an ambiguous temporal term of the messages determined, additional content of the messages determined, and multiple interpretations of the ambiguous temporal term resolved to a subset based on the additional content.

Type: Grant

Filed: March 21, 2019

Date of Patent: November 19, 2019

Assignee: GOOGLE LLC

Inventors: Bryan Christopher Horling, Ashutosh Shukla, Antoine Jean Bruguier
Date and/or time resolution

Patent number: 10277543

Abstract: Methods and apparatus are disclosed for resolving multiple interpretations of an ambiguous temporal term of a resource to a subset of the multiple interpretations. In some implementations, a group of one or more messages is identified, an ambiguous temporal term of the messages determined, additional content of the messages determined, and multiple interpretations of the ambiguous temporal term resolved to a subset based on the additional content.

Type: Grant

Filed: June 26, 2014

Date of Patent: April 30, 2019

Assignee: GOOGLE LLC

Inventors: Bryan Christopher Horling, Ashutosh Shukla, Antoine Jean Bruguier
DATE AND/OR TIME RESOLUTION

Publication number: 20190068532

Abstract: Methods and apparatus are disclosed for resolving multiple interpretations of an ambiguous temporal term of a resource to a subset of the multiple interpretations. In some implementations, a group of one or more messages is identified, an ambiguous temporal term of the messages determined, additional content of the messages determined, and multiple interpretations of the ambiguous temporal term resolved to a subset based on the additional content.

Type: Application

Filed: June 26, 2014

Publication date: February 28, 2019

Inventors: Bryan Christopher Horling, Ashutosh Shukla, Antoine Jean Bruguier
Learning personalized entity pronunciations

Patent number: 10152965

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage medium, for implementing a pronunciation dictionary that stores entity name pronunciations. In one aspect, a method includes actions of receiving audio data corresponding to an utterance that includes a command and an entity name. Additional actions may include generating, by an automated speech recognizer, an initial transcription for a portion of the audio data that is associated with the entity name, receiving a corrected transcription for the portion of the utterance that is associated with the entity name, obtaining a phonetic pronunciation that is associated with the portion of the audio data that is associated with the entity name, updating a pronunciation dictionary to associate the phonetic pronunciation with the entity name, receiving a subsequent utterance that includes the entity name, and transcribing the subsequent utterance based at least in part on the updated pronunciation dictionary.

Type: Grant

Filed: February 3, 2016

Date of Patent: December 11, 2018

Assignee: Google LLC

Inventors: Antoine Jean Bruguier, Fuchun Peng, Francoise Beaufays
LEARNING PERSONALIZED ENTITY PRONUNCIATIONS

Publication number: 20170221475

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage medium, for implementing a pronunciation dictionary that stores entity name pronunciations. In one aspect, a method includes actions of receiving audio data corresponding to an utterance that includes a command and an entity name. Additional actions may include generating, by an automated speech recognizer, an initial transcription for a portion of the audio data that is associated with the entity name, receiving a corrected transcription for the portion of the utterance that is associated with the entity name, obtaining a phonetic pronunciation that is associated with the portion of the audio data that is associated with the entity name, updating a pronunciation dictionary to associate the phonetic pronunciation with the entity name, receiving a subsequent utterance that includes the entity name, and transcribing the subsequent utterance based at least in part on the updated pronunciation dictionary.

Type: Application

Filed: February 3, 2016

Publication date: August 3, 2017

Inventors: Antoine Jean Bruguier, Fuchun Peng, Francoise Beaufays
COMPRESSED RECURRENT NEURAL NETWORK MODELS

Publication number: 20170220925

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for implementing a compressed recurrent neural network (RNN). One of the systems includes a compressed RNN, the compressed RNN comprising a plurality of recurrent layers, wherein each of the recurrent layers has a respective recurrent weight matrix and a respective inter-layer weight matrix, and wherein at least one of recurrent layers is compressed such that a respective recurrent weight matrix of the compressed layer is defined by a first compressed weight matrix and a projection matrix and a respective inter-layer weight matrix of the compressed layer is defined by a second compressed weight matrix and the projection matrix.

Type: Application

Filed: December 29, 2016

Publication date: August 3, 2017

Inventors: Ouais Alsharif, Rohit Prakash Prabhavalkar, Ian C. McGraw, Antoine Jean Bruguier

1 2 next