Patents by Inventor Petar Aleksic

Petar Aleksic has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

MIXED LANGUAGE MODEL SPEECH RECOGNITION

Publication number: 20250232774

Abstract: In one aspect, a method comprises accessing audio data generated by a computing device based on audio input from a user, the audio data encoding one or more user utterances. The method further comprises generating a first transcription of the utterances by performing speech recognition on the audio data using a first speech recognizer that employs a language model based on user-specific data. The method further comprises generating a second transcription of the utterances by performing speech recognition on the audio data using a second speech recognizer that employs a language model independent of user-specific data. The method further comprises determining that the second transcription of the utterances includes a term from a predefined set of one or more terms. The method further comprises, based on determining that the second transcription of the utterance includes the term, providing an output of the first transcription of the utterance.

Type: Application

Filed: April 3, 2025

Publication date: July 17, 2025

Applicant: Google LLC

Inventors: Alexander H. Gruenstein, Petar Aleksic
Selectively invoking an automated assistant based on detected environmental conditions without necessitating voice-based invocation of the automated assistant

Patent number: 12327552

Abstract: Implementations set forth herein relate to an automated assistant that is invoked according to contextual signals—in lieu of requiring a user to explicitly speak an invocation phrase. When a user is in an environment with an assistant-enabled device, contextual data characterizing features of the environment can be processed to determine whether a user intends to invoke the automated assistant. Therefore, when such features are detected by the automated assistant, the automated assistant can bypass requiring an invocation phrase from a user and, instead, be responsive to one or more assistant commands from the user. The automated assistant can operate based on a trained machine learning model that is trained using instances of training data that characterize previous interactions in which one or more users invoked or did not invoke the automated assistant.

Type: Grant

Filed: January 17, 2020

Date of Patent: June 10, 2025

Assignee: GOOGLE LLC

Inventors: Petar Aleksic, Pedro Jose Moreno Mengibar
LANGUAGE MODEL BIASING MODULATION

Publication number: 20250174223

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for modulating language model biasing. In some implementations, context data is received. A likely context associated with a user is determined based on at least a portion of the context data. One or more language model biasing parameters based at least on the likely context associated with the user is selected. A context confidence score associated with the likely context based on at least a portion of the context data is determined. One or more language model biasing parameters based at least on the context confidence score is adjusted. A baseline language model based at least on the one or more of the adjusted language model biasing parameters is biased. The baseline language model is provided for use by an automated speech recognizer (ASR).

Type: Application

Filed: January 29, 2025

Publication date: May 29, 2025

Applicant: Google LLC

Inventors: Pedro J. Moreno Mengibar, Petar Aleksic
Mixed language model speech recognition

Patent number: 12288559

Abstract: In one aspect, a method comprises accessing audio data generated by a computing device based on audio input from a user, the audio data encoding one or more user utterances. The method further comprises generating a first transcription of the utterances by performing speech recognition on the audio data using a first speech recognizer that employs a language model based on user-specific data. The method further comprises generating a second transcription of the utterances by performing speech recognition on the audio data using a second speech recognizer that employs a language model independent of user-specific data. The method further comprises determining that the second transcription of the utterances includes a term from a predefined set of one or more terms. The method further comprises, based on determining that the second transcription of the utterance includes the term, providing an output of the first transcription of the utterance.

Type: Grant

Filed: May 3, 2022

Date of Patent: April 29, 2025

Assignee: Google LLC

Inventors: Alexander H Gruenstein, Petar Aleksic
LANGUAGE MODEL BIASING SYSTEM

Publication number: 20250131917

Abstract: Methods, systems, and apparatus for receiving audio data corresponding to a user utterance and context data, identifying an initial set of one or more n-grams from the context data, generating an expanded set of one or more n-grams based on the initial set of n-grams, adjusting a language model based at least on the expanded set of n-grams, determining one or more speech recognition candidates for at least a portion of the user utterance using the adjusted language model, adjusting a score for a particular speech recognition candidate determined to be included in the expanded set of n-grams, determining a transcription of user utterance that includes at least one of the one or more speech recognition candidates, and providing the transcription of the user utterance for output.

Type: Application

Filed: December 24, 2024

Publication date: April 24, 2025

Applicant: Google LLC

Inventors: Petar Aleksic, Pedro J. Moreno Mengibar
Alphanumeric sequence biasing for automatic speech recognition using a rendered system prompt

Patent number: 12283278

Abstract: Speech processing techniques are disclosed that enable determining a text representation of alphanumeric sequences in captured audio data. Various implementations include determining a contextual biasing finite state transducer (FST) based on contextual information corresponding to the captured audio data. Additional or alternative implementations include modifying probabilities of one or more candidate recognitions of the alphanumeric sequence using the contextual biasing FST.

Type: Grant

Filed: March 25, 2024

Date of Patent: April 22, 2025

Assignee: GOOGLE LLC

Inventors: Benjamin Haynor, Petar Aleksic
Language model biasing modulation

Patent number: 12230251

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for modulating language model biasing. In some implementations, context data is received. A likely context associated with a user is determined based on at least a portion of the context data. One or more language model biasing parameters based at least on the likely context associated with the user is selected. A context confidence score associated with the likely context based on at least a portion of the context data is determined. One or more language model biasing parameters based at least on the context confidence score is adjusted. A baseline language model based at least on the one or more of the adjusted language model biasing parameters is biased. The baseline language model is provided for use by an automated speech recognizer (ASR).

Type: Grant

Filed: December 12, 2022

Date of Patent: February 18, 2025

Assignee: Google LLC

Inventors: Pedro J. Moreno Mengibar, Petar Aleksic
Voice to text conversion based on third-party agent content

Patent number: 12217759

Abstract: Implementations relate to dynamically, and in a context-sensitive manner, biasing voice to text conversion. In some implementations, the biasing of voice to text conversions is performed by a voice to text engine of a local agent, and the biasing is based at least in part on content provided to the local agent by a third-party (3P) agent that is in network communication with the local agent. In some of those implementations, the content includes contextual parameters that are provided by the 3P agent in combination with responsive content generated by the 3P agent during a dialog that: is between the 3P agent, and a user of a voice-enabled electronic device; and is facilitated by the local agent. The contextual parameters indicate potential feature(s) of further voice input that is to be provided in response to the responsive content generated by the 3P agent.

Type: Grant

Filed: February 6, 2024

Date of Patent: February 4, 2025

Assignee: GOOGLE LLC

Inventors: Barnaby James, Bo Wang, Sunil Vemuri, David Schairer, Ulas Kirazci, Ertan Dogrultan, Petar Aleksic
Determining dialog states for language models

Patent number: 12205586

Abstract: Systems, methods, devices, and other techniques are described herein for determining dialog states that correspond to voice inputs and for biasing a language model based on the determined dialog states. In some implementations, a method includes receiving, at a computing system, audio data that indicates a voice input and determining a particular dialog state, from among a plurality of dialog states, which corresponds to the voice input. A set of n-grams can be identified that are associated with the particular dialog state that corresponds to the voice input. In response to identifying the set of n-grams that are associated with the particular dialog state that corresponds to the voice input, a language model can be biased by adjusting probability scores that the language model indicates for n-grams in the set of n-grams. The voice input can be transcribed using the adjusted language model.

Type: Grant

Filed: February 10, 2022

Date of Patent: January 21, 2025

Assignee: Google LLC

Inventors: Petar Aleksic, Pedro Jose Moreno Mengibar
Language model biasing system

Patent number: 12183328

Abstract: Methods, systems, and apparatus for receiving audio data corresponding to a user utterance and context data, identifying an initial set of one or more n-grams from the context data, generating an expanded set of one or more n-grams based on the initial set of n-grams, adjusting a language model based at least on the expanded set of n-grams, determining one or more speech recognition candidates for at least a portion of the user utterance using the adjusted language model, adjusting a score for a particular speech recognition candidate determined to be included in the expanded set of n-grams, determining a transcription of user utterance that includes at least one of the one or more speech recognition candidates, and providing the transcription of the user utterance for output.

Type: Grant

Filed: May 16, 2023

Date of Patent: December 31, 2024

Assignee: Google LLC

Inventors: Petar Aleksic, Pedro J. Moreno Mengibar
CONTEXTUAL TAGGING AND BIASING OF GRAMMARS INSIDE WORD LATTICES

Publication number: 20240428785

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for implementing contextual grammar selection are disclosed. In one aspect, a method includes the actions of receiving audio data of an utterance. The actions include generating a word lattice that includes multiple candidate transcriptions of the utterance and that includes transcription confidence scores. The actions include determining a context of the computing device. The actions include based on the context of the computing device, identifying grammars that correspond to the multiple candidate transcriptions. The actions include determining, for each of the multiple candidate transcriptions, grammar confidence scores that reflect a likelihood that a respective grammar is a match for a respective candidate transcription. The actions include selecting, from among the candidate transcriptions, a candidate transcription.

Type: Application

Filed: September 4, 2024

Publication date: December 26, 2024

Applicant: Google LLC

Inventors: Petar Aleksic, Pedro J. Moreno Mengibar, Leonid Velikovich
DETERMINING DIALOG STATES FOR LANGUAGE MODELS

Publication number: 20240428790

Abstract: Systems, methods, devices, and other techniques are described herein for determining dialog states that correspond to voice inputs and for biasing a language model based on the determined dialog states. In some implementations, a method includes receiving, at a computing system, audio data that indicates a voice input and determining a particular dialog state, from among a plurality of dialog states, which corresponds to the voice input. A set of n-grams can be identified that are associated with the particular dialog state that corresponds to the voice input. In response to identifying the set of n-grams that are associated with the particular dialog state that corresponds to the voice input, a language model can be biased by adjusting probability scores that the language model indicates for n-grams in the set of n-grams. The voice input can be transcribed using the adjusted language model.

Type: Application

Filed: September 3, 2024

Publication date: December 26, 2024

Applicant: Google LLC

Inventors: Petar Aleksic, Pedro J. Moreno Mengibar
SERVER SIDE HOTWORDING

Publication number: 20240412734

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for detecting hotwords using a server. One of the methods includes receiving an audio signal encoding one or more utterances including a first utterance; determining whether at least a portion of the first utterance satisfies a first threshold of being at least a portion of a key phrase; in response to determining that at least the portion of the first utterance satisfies the first threshold of being at least a portion of a key phrase, sending the audio signal to a server system that determines whether the first utterance satisfies a second threshold of being the key phrase, the second threshold being more restrictive than the first threshold; and receiving tagged text data representing the one or more utterances encoded in the audio signal when the server system determines that the first utterance satisfies the second threshold.

Type: Application

Filed: August 21, 2024

Publication date: December 12, 2024

Applicant: GOOGLE LLC

Inventors: Alexander H. Gruenstein, Petar Aleksic, Johan Schalkwyk, Pedro J. Moreno Mengibar
Server side hotwording

Patent number: 12094472

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for detecting hotwords using a server. One of the methods includes receiving an audio signal encoding one or more utterances including a first utterance; determining whether at least a portion of the first utterance satisfies a first threshold of being at least a portion of a key phrase; in response to determining that at least the portion of the first utterance satisfies the first threshold of being at least a portion of a key phrase, sending the audio signal to a server system that determines whether the first utterance satisfies a second threshold of being the key phrase, the second threshold being more restrictive than the first threshold; and receiving tagged text data representing the one or more utterances encoded in the audio signal when the server system determines that the first utterance satisfies the second threshold.

Type: Grant

Filed: June 30, 2023

Date of Patent: September 17, 2024

Assignee: GOOGLE LLC

Inventors: Alexander H. Gruenstein, Petar Aleksic, Johan Schalkwyk, Pedro J. Moreno Mengibar
Determining dialog states for language models

Patent number: 12080290

Abstract: Systems, methods, devices, and other techniques are described herein for determining dialog states that correspond to voice inputs and for biasing a language model based on the determined dialog states. In some implementations, a method includes receiving, at a computing system, audio data that indicates a voice input and determining a particular dialog state, from among a plurality of dialog states, which corresponds to the voice input. A set of n-grams can be identified that are associated with the particular dialog state that corresponds to the voice input. In response to identifying the set of n-grams that are associated with the particular dialog state that corresponds to the voice input, a language model can be biased by adjusting probability scores that the language model indicates for n-grams in the set of n-grams. The voice input can be transcribed using the adjusted language model.

Type: Grant

Filed: February 10, 2022

Date of Patent: September 3, 2024

Assignee: Google LLC

Inventors: Petar Aleksic, Pedro Jose Moreno Mengibar
VOICE RECOGNITION SYSTEM

Publication number: 20240282309

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for voice recognition. In one aspect, a method includes the actions of receiving a voice input; determining a transcription for the voice input, wherein determining the transcription for the voice input includes, for a plurality of segments of the voice input: obtaining a first candidate transcription for a first segment of the voice input; determining one or more contexts associated with the first candidate transcription; adjusting a respective weight for each of the one or more contexts; and determining a second candidate transcription for a second segment of the voice input based in part on the adjusted weights; and providing the transcription of the plurality of segments of the voice input for output.

Type: Application

Filed: April 30, 2024

Publication date: August 22, 2024

Applicant: Google LLC

Inventors: Petar Aleksic, Pedro J. Moreno Mengibar
VOICE TO TEXT CONVERSION BASED ON THIRD-PARTY AGENT CONTENT

Publication number: 20240274133

Abstract: Implementations relate to dynamically, and in a context-sensitive manner, biasing voice to text conversion. In some implementations, the biasing of voice to text conversions is performed by a voice to text engine of a local agent, and the biasing is based at least in part on content provided to the local agent by a third-party (3P) agent that is in network communication with the local agent. In some of those implementations, the content includes contextual parameters that are provided by the 3P agent in combination with responsive content generated by the 3P agent during a dialog that: is between the 3P agent, and a user of a voice-enabled electronic device; and is facilitated by the local agent. The contextual parameters indicate potential feature(s) of further voice input that is to be provided in response to the responsive content generated by the 3P agent.

Type: Application

Filed: February 6, 2024

Publication date: August 15, 2024

Inventors: Barnaby James, Bo Wang, Sunil Vemuri, David Schairer, Ulas Kirazci, Ertan Dogrultan, Petar Aleksic
ALPHANUMERIC SEQUENCE BIASING FOR AUTOMATIC SPEECH RECOGNITION

Publication number: 20240233732

Abstract: Speech processing techniques are disclosed that enable determining a text representation of alphanumeric sequences in captured audio data. Various implementations include determining a contextual biasing finite state transducer (FST) based on contextual information corresponding to the captured audio data. Additional or alternative implementations include modifying probabilities of one or more candidate recognitions of the alphanumeric sequence using the contextual biasing FST.

Type: Application

Filed: March 25, 2024

Publication date: July 11, 2024

Inventors: Benjamin Haynor, Petar Aleksic
Speech recognition for keywords

Patent number: 12026753

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for speech recognition are disclosed. In one aspect, a method includes receiving a candidate adword from an advertiser. The method further includes generating a score for the candidate adword based on a likelihood of a speech recognizer generating, based on an utterance of the candidate adword, a transcription that includes a word that is associated with an expected pronunciation of the candidate adword. The method further includes classifying, based at least on the score, the candidate adword as an appropriate adword for use in a bidding process for advertisements that are selected based on a transcription of a speech query or as not an appropriate adword for use in the bidding process for advertisements that are selected based on the transcription of the speech query.

Type: Grant

Filed: May 5, 2021

Date of Patent: July 2, 2024

Assignee: Google LLC

Inventors: Petar Aleksic, Pedro J. Moreno Mengibar
Enhanced speech endpointing

Patent number: 11996085

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for receiving audio data including an utterance, obtaining context data that indicates one or more expected speech recognition results, determining an expected speech recognition result based on the context data, receiving an intermediate speech recognition result generated by a speech recognition engine, comparing the intermediate speech recognition result to the expected speech recognition result for the audio data based on the context data, determining whether the intermediate speech recognition result corresponds to the expected speech recognition result for the audio data based on the context data, and setting an end of speech condition and providing a final speech recognition result in response to determining the intermediate speech recognition result matches the expected speech recognition result, the final speech recognition result including the one or more expected speech recognition results indicated b

Type: Grant

Filed: December 8, 2020

Date of Patent: May 28, 2024

Assignee: Google LLC

Inventors: Petar Aleksic, Glen Shires, Michael Buchanan

1 2 3 4 5 … next