Patents by Inventor Golan Pundak

Golan Pundak has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Phoneme-based contextualization for cross-lingual speech recognition in end-to-end models

Patent number: 11942076

Abstract: A method includes receiving audio data encoding an utterance spoken by a native speaker of a first language, and receiving a biasing term list including one or more terms in a second language different than the first language. The method also includes processing, using a speech recognition model, acoustic features derived from the audio data to generate speech recognition scores for both wordpieces and corresponding phoneme sequences in the first language. The method also includes rescoring the speech recognition scores for the phoneme sequences based on the one or more terms in the biasing term list, and executing, using the speech recognition scores for the wordpieces and the rescored speech recognition scores for the phoneme sequences, a decoding graph to generate a transcription for the utterance.

Type: Grant

Filed: February 16, 2022

Date of Patent: March 26, 2024

Assignee: Google LLC

Inventors: Ke Hu, Golan Pundak, Rohit Prakash Prabhavalkar, Antoine Jean Bruguier, Tara N. Sainath
RESOLVING UNIQUE PERSONAL IDENTIFIERS DURING CORRESPONDING CONVERSATIONS BETWEEN A VOICE BOT AND A HUMAN

Publication number: 20230419964

Abstract: Implementations are directed to causing a voice bot to utilize a plurality of ML layers in resolving unique personal identifier(s) for a human while the voice bot is engaged in a corresponding conversation with the human. The unique personal identifier(s) can include a unique sequence of alphanumeric characters that is personal to the human. In some implementations, ASR speech hypothes(es) corresponding to spoken utterance(s) that include the unique personal identifier(s) can be processed to generate candidate unique personal identifier(s), given alphanumeric character(s) of the candidate unique personal identifier(s) can be selected, and the voice bot can prompt the human with clarification request(s) to clarify the given alphanumeric character(s) until it is predicted to correspond to the an actual unique personal identifier(s) for the human(s). The unique personal identifier(s) can then be utilized in performance of further action(s) by the voice bot and/or other systems.

Type: Application

Filed: September 7, 2023

Publication date: December 28, 2023

Inventors: Rafael Goldfarb, Or Guz, Lior Alon, Assaf Hurwitz Michaely, Golan Pundak, Shmuel Leibtag, Tomer Amiaz, Dan Rasin, Asaf Aharoni
PROPER NOUN RECOGNITION IN END-TO-END SPEECH RECOGNITION

Publication number: 20230377564

Abstract: A method for training a speech recognition model with a minimum word error rate loss function includes receiving a training example comprising a proper noun and generating a plurality of hypotheses corresponding to the training example. Each hypothesis of the plurality of hypotheses represents the proper noun and includes a corresponding probability that indicates a likelihood that the hypothesis represents the proper noun. The method also includes determining that the corresponding probability associated with one of the plurality of hypotheses satisfies a penalty criteria. The penalty criteria indicating that the corresponding probability satisfies a probability threshold, and the associated hypothesis incorrectly represents the proper noun. The method also includes applying a penalty to the minimum word error rate loss function.

Type: Application

Filed: July 31, 2023

Publication date: November 23, 2023

Applicant: Google LLC

Inventors: Charles Caleb Peyser, Tara N. Sainath, Golan Pundak
Resolving unique personal identifiers during corresponding conversations between a voice bot and a human

Patent number: 11790906

Abstract: Implementations are directed to causing a voice bot to utilize a plurality of ML layers in resolving unique personal identifier(s) for a human while the voice bot is engaged in a corresponding conversation with the human. The unique personal identifier(s) can include a unique sequence of alphanumeric characters that is personal to the human. In some implementations, ASR speech hypothes(es) corresponding to spoken utterance(s) that include the unique personal identifier(s) can be processed to generate candidate unique personal identifier(s), given alphanumeric character(s) of the candidate unique personal identifier(s) can be selected, and the voice bot can prompt the human with clarification request(s) to clarify the given alphanumeric character(s) until it is predicted to correspond to the an actual unique personal identifier(s) for the human(s). The unique personal identifier(s) can then be utilized in performance of further action(s) by the voice bot and/or other systems.

Type: Grant

Filed: January 25, 2021

Date of Patent: October 17, 2023

Assignee: GOOGLE LLC

Inventors: Rafael Goldfarb, Or Guz, Lior Alon, Assaf Hurwitz Michaely, Golan Pundak, Shmuel Leibtag, Tomer Amiaz, Dan Rasin, Asaf Aharoni
Proper noun recognition in end-to-end speech recognition

Patent number: 11749259

Abstract: A method for training a speech recognition model with a minimum word error rate loss function includes receiving a training example comprising a proper noun and generating a plurality of hypotheses corresponding to the training example. Each hypothesis of the plurality of hypotheses represents the proper noun and includes a corresponding probability that indicates a likelihood that the hypothesis represents the proper noun. The method also includes determining that the corresponding probability associated with one of the plurality of hypotheses satisfies a penalty criteria. The penalty criteria indicating that the corresponding probability satisfies a probability threshold, and the associated hypothesis incorrectly represents the proper noun. The method also includes applying a penalty to the minimum word error rate loss function.

Type: Grant

Filed: January 15, 2021

Date of Patent: September 5, 2023

Assignee: Google LLC

Inventors: Charles Caleb Peyser, Tara N. Sainath, Golan Pundak
Contextual Biasing for Speech Recognition

Publication number: 20230274736

Abstract: A method of biasing speech recognition includes receiving audio data encoding an utterance and obtaining a set of one or more biasing phrases corresponding to a context of the utterance. Each biasing phrase in the set of one or more biasing phrases includes one or more words. The method also includes processing, using a speech recognition model, acoustic features derived from the audio data and grapheme and phoneme data derived from the set of one or more biasing phrases to generate an output of the speech recognition model. The method also includes determining a transcription for the utterance based on the output of the speech recognition model.

Type: Application

Filed: May 4, 2023

Publication date: August 31, 2023

Applicant: Google LLC

Inventors: Rohit Prakash Prabhavalkar, Golan Pundak, Tara N. Sainath, Antoine Jean Bruguier
Contextual biasing for speech recognition

Patent number: 11664021

Abstract: A method of biasing speech recognition includes receiving audio data encoding an utterance and obtaining a set of one or more biasing phrases corresponding to a context of the utterance. Each biasing phrase in the set of one or more biasing phrases includes one or more words. The method also includes processing, using a speech recognition model, acoustic features derived from the audio data and grapheme and phoneme data derived from the set of one or more biasing phrases to generate an output of the speech recognition model. The method also includes determining a transcription for the utterance based on the output of the speech recognition model.

Type: Grant

Filed: December 9, 2021

Date of Patent: May 30, 2023

Assignee: Google LLC

Inventors: Rohit Prakash Prabhavalkar, Golan Pundak, Tara N. Sainath, Antoine Jean Bruguier
ENHANCING AUDIO USING MULTIPLE RECORDING DEVICES

Publication number: 20220392489

Abstract: In general, the subject matter described in this disclosure can be embodied in methods, systems, and program products for identifying that a first audio stream includes first, second, and third sources of audio. A computing system identifies that a second audio stream includes the first, second, and third sources of audio. The computing system determines that the first and second sources of audio are part of a first conversation. The computing system generates a third audio stream that combines the first source of audio from the first audio stream, the first source of audio from the second audio stream, the second source of audio from the first audio stream, and the second source of audio from the second audio stream, and diminishes the third source of audio from the first audio stream, and the third source of audio from the second audio stream.

Type: Application

Filed: August 19, 2022

Publication date: December 8, 2022

Inventors: Dimitri Kanevsky, Golan Pundak
CONTEXTUAL BIASING FOR SPEECH RECOGNITION

Publication number: 20220366897

Abstract: A method includes receiving audio data encoding an utterance and obtaining a set of bias phrases corresponding to a context of the utterance. Each bias phrase includes one or more words. The method also includes processing, using a speech recognition model, acoustic features derived from the audio to generate an output from the speech recognition model. The speech recognition model includes a first encoder configured to receive the acoustic features, a bias encoder configured to receive data indicating the obtained set of bias phrases, a bias encoder, and a decoder configured to determine likelihoods of sequences of speech elements based on output of the first attention module and output of the bias attention module. The method also includes determining a transcript for the utterance based on the likelihoods of sequences of speech elements.

Type: Application

Filed: July 26, 2022

Publication date: November 17, 2022

Applicant: Google LLC

Inventors: Rohit Prakash Prabhavalkar, Golan Pundak, Tara N. Sainath
Enhancing audio using multiple recording devices

Patent number: 11443769

Abstract: In general, the subject matter described in this disclosure can be embodied in methods, systems, and program products for identifying that a first audio stream includes first, second, and third sources of audio. A computing system identifies that a second audio stream includes the first, second, and third sources of audio. The computing system determines that the first and second sources of audio are part of a first conversation. The computing system generates a third audio stream that combines the first source of audio from the first audio stream, the first source of audio from the second audio stream, the second source of audio from the first audio stream, and the second source of audio from the second audio stream, and diminishes the third source of audio from the first audio stream, and the third source of audio from the second audio stream.

Type: Grant

Filed: March 8, 2021

Date of Patent: September 13, 2022

Assignee: Google LLC

Inventors: Dimitri Kanevsky, Golan Pundak
Contextual biasing for speech recognition

Patent number: 11423883

Abstract: A method includes receiving audio data encoding an utterance and obtaining a set of bias phrases corresponding to a context of the utterance. Each bias phrase includes one or more words. The method also includes processing, using a speech recognition model, acoustic features derived from the audio to generate an output from the speech recognition model. The speech recognition model includes a first encoder configured to receive the acoustic features, a first attention module, a bias encoder configured to receive data indicating the obtained set of bias phrases, a bias encoder, and a decoder configured to determine likelihoods of sequences of speech elements based on output of the first attention module and output of the bias attention module. The method also includes determining a transcript for the utterance based on the likelihoods of sequences of speech elements.

Type: Grant

Filed: March 31, 2020

Date of Patent: August 23, 2022

Assignee: Google LLC

Inventors: Rohit Prakash Prabhavalkar, Golan Pundak, Tara N. Sainath
RESOLVING UNIQUE PERSONAL IDENTIFIERS DURING CORRESPONDING CONVERSATIONS BETWEEN A VOICE BOT AND A HUMAN

Publication number: 20220238105

Abstract: Implementations are directed to causing a voice bot to utilize a plurality of ML layers in resolving unique personal identifier(s) for a human while the voice bot is engaged in a corresponding conversation with the human. The unique personal identifier(s) can include a unique sequence of alphanumeric characters that is personal to the human. In some implementations, ASR speech hypothes(es) corresponding to spoken utterance(s) that include the unique personal identifier(s) can be processed to generate candidate unique personal identifier(s), given alphanumeric character(s) of the candidate unique personal identifier(s) can be selected, and the voice bot can prompt the human with clarification request(s) to clarify the given alphanumeric character(s) until it is predicted to correspond to the an actual unique personal identifier(s) for the human(s). The unique personal identifier(s) can then be utilized in performance of further action(s) by the voice bot and/or other systems.

Type: Application

Filed: January 25, 2021

Publication date: July 28, 2022

Inventors: Rafael Goldfarb, Or Guz, Lior Alon, Assaf Hurwitz Michaely, Golan Pundak, Shmuel Leibtag, Tomer Amiaz, Dan Rasin, Asaf Aharoni
PHONEME-BASED CONTEXTUALIZATION FOR CROSS-LINGUAL SPEECH RECOGNITION IN END-TO-END MODELS

Publication number: 20220172706

Abstract: A method includes receiving audio data encoding an utterance spoken by a native speaker of a first language, and receiving a biasing term list including one or more terms in a second language different than the first language. The method also includes processing, using a speech recognition model, acoustic features derived from the audio data to generate speech recognition scores for both wordpieces and corresponding phoneme sequences in the first language. The method also includes rescoring the speech recognition scores for the phoneme sequences based on the one or more terms in the biasing term list, and executing, using the speech recognition scores for the wordpieces and the rescored speech recognition scores for the phoneme sequences, a decoding graph to generate a transcription for the utterance.

Type: Application

Filed: February 16, 2022

Publication date: June 2, 2022

Applicant: Google LLC

Inventors: Ke Hu, Golan Pundak, Rohit Prakash Prabhavalkar, Antoine Jean Bruguier, Tara N. Sainath
Contextual Biasing for Speech Recognition

Publication number: 20220101836

Abstract: A method of biasing speech recognition includes receiving audio data encoding an utterance and obtaining a set of one or more biasing phrases corresponding to a context of the utterance. Each biasing phrase in the set of one or more biasing phrases includes one or more words. The method also includes processing, using a speech recognition model, acoustic features derived from the audio data and grapheme and phoneme data derived from the set of one or more biasing phrases to generate an output of the speech recognition model. The method also includes determining a transcription for the utterance based on the output of the speech recognition model.

Type: Application

Filed: December 9, 2021

Publication date: March 31, 2022

Applicant: Google LLC

Inventors: Rohit Prakash Prabhavalkar, Golan Pundak, Tara N. Sainath, Antoine Jean Bruguier
Phoneme-based contextualization for cross-lingual speech recognition in end-to-end models

Patent number: 11270687

Abstract: A method includes receiving audio data encoding an utterance spoken by a native speaker of a first language, and receiving a biasing term list including one or more terms in a second language different than the first language. The method also includes processing, using a speech recognition model, acoustic features derived from the audio data to generate speech recognition scores for both wordpieces and corresponding phoneme sequences in the first language. The method also includes rescoring the speech recognition scores for the phoneme sequences based on the one or more terms in the biasing term list, and executing, using the speech recognition scores for the wordpieces and the rescored speech recognition scores for the phoneme sequences, a decoding graph to generate a transcription for the utterance.

Type: Grant

Filed: April 28, 2020

Date of Patent: March 8, 2022

Assignee: Google LLC

Inventors: Ke Hu, Antoine Jean Bruguier, Tara N. Sainath, Rohit Prakash Prabhavalkar, Golan Pundak
Contextual biasing for speech recognition using grapheme and phoneme data

Patent number: 11217231

Abstract: A method of biasing speech recognition includes receiving audio data encoding an utterance and obtaining a set of one or more biasing phrases corresponding to a context of the utterance. Each biasing phrase in the set of one or more biasing phrases includes one or more words. The method also includes processing, using a speech recognition model, acoustic features derived from the audio data and grapheme and phoneme data derived from the set of one or more biasing phrases to generate an output of the speech recognition model. The method also includes determining a transcription for the utterance based on the output of the speech recognition model.

Type: Grant

Filed: April 30, 2020

Date of Patent: January 4, 2022

Assignee: Google LLC

Inventors: Rohit Prakash Prabhavalkar, Golan Pundak, Tara N. Sainath, Antoine Jean Bruguier
Proper Noun Recognition in End-to-End Speech Recognition

Publication number: 20210233512

Abstract: A method for training a speech recognition model with a minimum word error rate loss function includes receiving a training example comprising a proper noun and generating a plurality of hypotheses corresponding to the training example. Each hypothesis of the plurality of hypotheses represents the proper noun and includes a corresponding probability that indicates a likelihood that the hypothesis represents the proper noun. The method also includes determining that the corresponding probability associated with one of the plurality of hypotheses satisfies a penalty criteria. The penalty criteria indicating that the corresponding probability satisfies a probability threshold, and the associated hypothesis incorrectly represents the proper noun. The method also includes applying a penalty to the minimum word error rate loss function.

Type: Application

Filed: January 15, 2021

Publication date: July 29, 2021

Applicant: Google LLC

Inventors: Charles Caleb Peyser, Tara N. Sainath, Golan Pundak
ENHANCING AUDIO USING MULTIPLE RECORDING DEVICES

Publication number: 20210193180

Abstract: In general, the subject matter described in this disclosure can be embodied in methods, systems, and program products for identifying that a first audio stream includes first, second, and third sources of audio. A computing system identifies that a second audio stream includes the first, second, and third sources of audio. The computing system determines that the first and second sources of audio are part of a first conversation. The computing system generates a third audio stream that combines the first source of audio from the first audio stream, the first source of audio from the second audio stream, the second source of audio from the first audio stream, and the second source of audio from the second audio stream, and diminishes the third source of audio from the first audio stream, and the third source of audio from the second audio stream.

Type: Application

Filed: March 8, 2021

Publication date: June 24, 2021

Inventors: Dimitri Kanevsky, Golan Pundak
Enhancing audio using multiple recording devices

Patent number: 10943619

Abstract: In general, the subject matter described in this disclosure can be embodied in methods, systems, and program products for identifying that a first audio stream includes first, second, and third sources of audio. A computing system identifies that a second audio stream includes the first, second, and third sources of audio. The computing system determines that the first and second sources of audio are part of a first conversation. The computing system generates a third audio stream that combines the first source of audio from the first audio stream, the first source of audio from the second audio stream, the second source of audio from the first audio stream, and the second source of audio from the second audio stream, and diminishes the third source of audio from the first audio stream, and the third source of audio from the second audio stream.

Type: Grant

Filed: March 9, 2020

Date of Patent: March 9, 2021

Assignee: Google LLC

Inventors: Dimitri Kanevsky, Golan Pundak
CONTEXTUAL BIASING FOR SPEECH RECOGNITION

Publication number: 20200402501

Abstract: A method of biasing speech recognition includes receiving audio data encoding an utterance and obtaining a set of one or more biasing phrases corresponding to a context of the utterance. Each biasing phrase in the set of one or more biasing phrases includes one or more words. The method also includes processing, using a speech recognition model, acoustic features derived from the audio data and grapheme and phoneme data derived from the set of one or more biasing phrases to generate an output of the speech recognition model. The method also includes determining a transcription for the utterance based on the output of the speech recognition model.

Type: Application

Filed: April 30, 2020

Publication date: December 24, 2020

Applicant: Google LLC

Inventors: Rohit Prakash Prabhavalkar, Golan Pundak, Tara N. Sainath, Antoine Jean Bruguier

1 2 next