Patents by Inventor Petar Aleksic

Petar Aleksic has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

SELECTIVELY INVOKING AN AUTOMATED ASSISTANT BASED ON DETECTED ENVIRONMENTAL CONDITIONS WITHOUT NECESSITATING VOICE-BASED INVOCATION OF THE AUTOMATED ASSISTANT

Publication number: 20220310089

Abstract: Implementations set forth herein relate to an automated assistant that is invoked according to contextual signals—in lieu of requiring a user to explicitly speak an invocation phrase. When a user is in an environment with an assistant-enabled device, contextual data characterizing features of the environment can be processed to determine whether a user intends to invoke the automated assistant. Therefore, when such features are detected by the automated assistant, the automated assistant can bypass requiring an invocation phrase from a user and, instead, be responsive to one or more assistant commands from the user. The automated assistant can operate based on a trained machine learning model that is trained using instances of training data that characterize previous interactions in which one or more users invoked or did not invoke the automated assistant.

Type: Application

Filed: January 17, 2020

Publication date: September 29, 2022

Inventors: Petar Aleksic, Pedro Jose Moreno Mengibar
CONTEXTUAL TAGGING AND BIASING OF GRAMMARS INSIDE WORD LATTICES

Publication number: 20220310082

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for implementing contextual grammar selection are disclosed. In one aspect, a method includes the actions of receiving audio data of an utterance. The actions include generating a word lattice that includes multiple candidate transcriptions of the utterance and that includes transcription confidence scores. The actions include determining a context of the computing device. The actions include based on the context of the computing device, identifying grammars that correspond to the multiple candidate transcriptions. The actions include determining, for each of the multiple candidate transcriptions, grammar confidence scores that reflect a likelihood that a respective grammar is a match for a respective candidate transcription. The actions include selecting, from among the candidate transcriptions, a candidate transcription.

Type: Application

Filed: June 16, 2022

Publication date: September 29, 2022

Applicant: Google LLC

Inventors: Petar Aleksic, Pedro J. Moreno Mengibar, Leonid Velikovich
Cross-lingual speech recognition

Patent number: 11437025

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for cross-lingual speech recognition are disclosed. In one aspect, a method includes the actions of determining a context of a second computing device. The actions further include identifying, by a first computing device, an additional pronunciation for a term of multiple terms. The actions further include including the additional pronunciation for the term in the lexicon. The actions further include receiving audio data of an utterance. The actions further include generating a transcription of the utterance by using the lexicon that includes the multiple terms and the pronunciation for each of the multiple terms and the additional pronunciation for the term. The actions further include after generating the transcription of the utterance, removing the additional pronunciation for the term from the lexicon. The actions further include providing, for output, the transcription.

Type: Grant

Filed: October 4, 2019

Date of Patent: September 6, 2022

Assignee: Google LLC

Inventors: Petar Aleksic, Pedro J. Moreno Mengibar
Contextual Denormalization For Automatic Speech Recognition

Publication number: 20220277749

Abstract: A method includes receiving a speech input from a user and obtaining context metadata associated with the speech input. The method also includes generating a raw speech recognition result corresponding to the speech input and selecting a list of one or more denormalizers to apply to the generated raw speech recognition result based on the context metadata associated with the speech input. The generated raw speech recognition result includes normalized text. The method also includes denormalizing the generated raw speech recognition result into denormalized text by applying the list of the one or more denormalizers in sequence to the generated raw speech recognition result.

Type: Application

Filed: February 28, 2022

Publication date: September 1, 2022

Applicant: Google LLC

Inventors: Assaf Hurwitz Michaely, Petar Aleksic, Pedro J. Moreno Mengibar
MIXED MODEL SPEECH RECOGNITION

Publication number: 20220262365

Abstract: In one aspect, a method comprises accessing audio data generated by a computing device based on audio input from a user, the audio data encoding one or more user utterances. The method further comprises generating a first transcription of the utterances by performing speech recognition on the audio data using a first speech recognizer that employs a language model based on user-specific data. The method further comprises generating a second transcription of the utterances by performing speech recognition on the audio data using a second speech recognizer that employs a language model independent of user-specific data. The method further comprises determining that the second transcription of the utterances includes a term from a predefined set of one or more terms. The method further comprises, based on determining that the second transcription of the utterance includes the term, providing an output of the first transcription of the utterance.

Type: Application

Filed: May 3, 2022

Publication date: August 18, 2022

Applicant: Google LLC

Inventors: Alexander H. Gruenstein, Petar Aleksic
Voice recognition system

Patent number: 11410660

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for voice recognition. In one aspect, a method includes the actions of receiving a voice input; determining a transcription for the voice input, wherein determining the transcription for the voice input includes, for a plurality of segments of the voice input: obtaining a first candidate transcription for a first segment of the voice input; determining one or more contexts associated with the first candidate transcription; adjusting a respective weight for each of the one or more contexts; and determining a second candidate transcription for a second segment of the voice input based in part on the adjusted weights; and providing the transcription of the plurality of segments of the voice input for output.

Type: Grant

Filed: April 1, 2020

Date of Patent: August 9, 2022

Assignee: Google LLC

Inventors: Petar Aleksic, Pedro J. Moreno Mengibar
WORD LATTICE AUGMENTATION FOR AUTOMATIC SPEECH RECOGNITION

Publication number: 20220229992

Abstract: Speech processing techniques are disclosed that enable determining a text representation of named entities in captured audio data. Various implementations include determining the location of a carrier phrase in a word lattice representation of the captured audio data, where the carrier phrase provides an indication of a named entity. Additional or alternative implementations include matching a candidate named entity with the portion of the word lattice, and augmenting the word lattice with the matched candidate named entity.

Type: Application

Filed: January 31, 2022

Publication date: July 21, 2022

Inventors: Leonid Velikovich, Petar Aleksic, Pedro Moreno
Contextual tagging and biasing of grammars inside word lattices

Patent number: 11386889

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for implementing contextual grammar selection are disclosed. In one aspect, a method includes the actions of receiving audio data of an utterance. The actions include generating a word lattice that includes multiple candidate transcriptions of the utterance and that includes transcription confidence scores. The actions include determining a context of the computing device. The actions include based on the context of the computing device, identifying grammars that correspond to the multiple candidate transcriptions. The actions include determining, for each of the multiple candidate transcriptions, grammar confidence scores that reflect a likelihood that a respective grammar is a match for a respective candidate transcription. The actions include selecting, from among the candidate transcriptions, a candidate transcription.

Type: Grant

Filed: November 27, 2019

Date of Patent: July 12, 2022

Assignee: Google LLC

Inventors: Petar Aleksic, Pedro J. Moreno Mengibar, Leonid Velikovich
DETERMINING DIALOG STATES FOR LANGUAGE MODELS

Publication number: 20220165270

Abstract: Systems, methods, devices, and other techniques are described herein for determining dialog states that correspond to voice inputs and for biasing a language model based on the determined dialog states. In some implementations, a method includes receiving, at a computing system, audio data that indicates a voice input and determining a particular dialog state, from among a plurality of dialog states, which corresponds to the voice input. A set of n-grams can be identified that are associated with the particular dialog state that corresponds to the voice input. In response to identifying the set of n-grams that are associated with the particular dialog state that corresponds to the voice input, a language model can be biased by adjusting probability scores that the language model indicates for n-grams in the set of n-grams. The voice input can be transcribed using the adjusted language model.

Type: Application

Filed: February 10, 2022

Publication date: May 26, 2022

Applicant: Google LLC

Inventors: Petar Aleksic, Pedro Jose Moreno Mengibar
Speech recognition using two language models

Patent number: 11341972

Abstract: In one aspect, a method comprises accessing audio data generated by a computing device based on audio input from a user, the audio data encoding one or more user utterances. The method further comprises generating a first transcription of the utterances by performing speech recognition on the audio data using a first speech recognizer that employs a language model based on user-specific data. The method further comprises generating a second transcription of the utterances by performing speech recognition on the audio data using a second speech recognizer that employs a language model independent of user-specific data. The method further comprises determining that the second transcription of the utterances includes a term from a predefined set of one or more terms. The method further comprises, based on determining that the second transcription of the utterance includes the term, providing an output of the first transcription of the utterance.

Type: Grant

Filed: October 22, 2020

Date of Patent: May 24, 2022

Assignee: Google LLC

Inventors: Alexander H. Gruenstein, Petar Aleksic
VOICE TO TEXT CONVERSION BASED ON THIRD-PARTY AGENT CONTENT

Publication number: 20220148596

Abstract: Implementations relate to dynamically, and in a context-sensitive manner, biasing voice to text conversion. In some implementations, the biasing of voice to text conversions is performed by a voice to text engine of a local agent, and the biasing is based at least in part on content provided to the local agent by a third-party (3P) agent that is in network communication with the local agent. In some of those implementations, the content includes contextual parameters that are provided by the 3P agent in combination with responsive content generated by the 3P agent during a dialog that: is between the 3P agent, and a user of a voice-enabled electronic device; and is facilitated by the local agent. The contextual parameters indicate potential feature(s) of further voice input that is to be provided in response to the responsive content generated by the 3P agent.

Type: Application

Filed: January 24, 2022

Publication date: May 12, 2022

Inventors: Barnaby James, Bo Wang, Sunil Vemuri, David Schairer, Ulas Kirazci, Ertan Dogrultan, Petar Aleksic
Contextual denormalization for automatic speech recognition

Patent number: 11282525

Abstract: A method includes receiving a speech input from a user and obtaining context metadata associated with the speech input. The method also includes generating a raw speech recognition result corresponding to the speech input and selecting a list of one or more denormalizers to apply to the generated raw speech recognition result based on the context metadata associated with the speech input. The generated raw speech recognition result includes normalized text. The method also includes denormalizing the generated raw speech recognition result into denormalized text by applying the list of the one or more denormalizers in sequence to the generated raw speech recognition result.

Type: Grant

Filed: September 1, 2020

Date of Patent: March 22, 2022

Assignee: Google LLC

Inventors: Assaf Hurwitz Michaely, Petar Aleksic, Pedro J. Moreno Mengibar
Negative n-gram biasing

Patent number: 11282513

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for performing dynamic, stroke-based alignment of touch displays. In one aspect, a method includes obtaining a candidate transcription that an automated speech recognizer generates for an utterance, determining a particular context associated with the utterance, determining that a particular n-gram that is included in the candidate transcription is included among a set of undesirable n-grams that is associated with the context, adjusting a speech recognition confidence score associated with the transcription based on determining that the particular n-gram that is included in the candidate transcription is included among the set of undesirable n-grams that is associated with the context, and determining whether to provide the candidate transcription for output based at least on the adjusted speech recognition confidence score.

Type: Grant

Filed: June 15, 2020

Date of Patent: March 22, 2022

Assignee: Google LLC

Inventors: Pedro J. Moreno Mengibar, Petar Aleksic
Determining dialog states for language models

Patent number: 11264028

Abstract: Systems, methods, devices, and other techniques are described herein for determining dialog states that correspond to voice inputs and for biasing a language model based on the determined dialog states. In some implementations, a method includes receiving, at a computing system, audio data that indicates a voice input and determining a particular dialog state, from among a plurality of dialog states, which corresponds to the voice input. A set of n-grams can be identified that are associated with the particular dialog state that corresponds to the voice input. In response to identifying the set of n-grams that are associated with the particular dialog state that corresponds to the voice input, a language model can be biased by adjusting probability scores that the language model indicates for n-grams in the set of n-grams. The voice input can be transcribed using the adjusted language model.

Type: Grant

Filed: January 2, 2020

Date of Patent: March 1, 2022

Assignee: Google LLC

Inventors: Petar Aleksic, Pedro J. Moreno Mengibar
Word lattice augmentation for automatic speech recognition

Patent number: 11238227

Abstract: Speech processing techniques are disclosed that enable determining a text representation of named entities in captured audio data. Various implementations include determining the location of a carrier phrase in a word lattice representation of the captured audio data, where the carrier phrase provides an indication of a named entity. Additional or alternative implementations include matching a candidate named entity with the portion of the word lattice, and augmenting the word lattice with the matched candidate named entity.

Type: Grant

Filed: June 27, 2019

Date of Patent: February 1, 2022

Assignee: Google LLC

Inventors: Leonid Velikovich, Petar Aleksic, Pedro Moreno
Voice to text conversion based on third-party agent content

Patent number: 11232797

Abstract: Implementations relate to dynamically, and in a context-sensitive manner, biasing voice to text conversion. In some implementations, the biasing of voice to text conversions is performed by a voice to text engine of a local agent, and the biasing is based at least in part on content provided to the local agent by a third-party (3P) agent that is in network communication with the local agent. In some of those implementations, the content includes contextual parameters that are provided by the 3P agent in combination with responsive content generated by the 3P agent during a dialog that: is between the 3P agent, and a user of a voice-enabled electronic device; and is facilitated by the local agent. The contextual parameters indicate potential feature(s) of further voice input that is to be provided in response to the responsive content generated by the 3P agent.

Type: Grant

Filed: February 14, 2020

Date of Patent: January 25, 2022

Assignee: Google LLC

Inventors: Barnaby James, Bo Wang, Sunil Vemuri, David Schairer, Ulas Kirazci, Ertan Dogrultan, Petar Aleksic
ALPHANUMERIC SEQUENCE BIASING FOR AUTOMATIC SPEECH RECOGNITION

Publication number: 20220013126

Abstract: Speech processing techniques are disclosed that enable determining a text representation of alphanumeric sequences in captured audio data. Various implementations include determining a contextual biasing finite state transducer (FST) based on contextual information corresponding to the captured audio data.

Type: Application

Filed: January 17, 2020

Publication date: January 13, 2022

Inventors: Benjamin Haynor, Petar Aleksic
SPEECH PROCESSING

Publication number: 20210398519

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for adapting a language model are disclosed. In one aspect, a method includes the actions of receiving transcriptions of utterances that were received by computing devices operating in a domain and that are in a source language. The actions further include generating translated transcriptions of the transcriptions of the utterances in a target language. The actions further include receiving a language model for the target language. The actions further include biasing the language model for the target language by increasing the likelihood of the language model selecting terms included in the translated transcriptions. The actions further include generating a transcription of an utterance in the target language using the biased language model and while operating in the domain.

Type: Application

Filed: September 9, 2021

Publication date: December 23, 2021

Applicant: Google LLC

Inventors: Petar Aleksic, Benjamin Paul Hillson Haynor
LANGUAGE MODEL BIASING SYSTEM

Publication number: 20210358479

Abstract: Methods, systems, and apparatus for receiving audio data corresponding to a user utterance and context data, identifying an initial set of one or more n-grams from the context data, generating an expanded set of one or more n-grams based on the initial set of n-grams, adjusting a language model based at least on the expanded set of n-grams, determining one or more speech recognition candidates for at least a portion of the user utterance using the adjusted language model, adjusting a score for a particular speech recognition candidate determined to be included in the expanded set of n-grams, determining a transcription of user utterance that includes at least one of the one or more speech recognition candidates, and providing the transcription of the user utterance for output.

Type: Application

Filed: June 2, 2021

Publication date: November 18, 2021

Applicant: Google LLC

Inventors: Petar Aleksic, Pedro J. Moreno Mengibar
ALLOWING SPELLING OF ARBITRARY WORDS

Publication number: 20210350074

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for natural language processing. One of the methods includes receiving a first voice input from a user device; generating a first recognition output; receiving a user selection of one or more terms in the first recognition output; receiving a second voice input spelling a correction of the user selection; determining a corrected recognition output for the selected portion; and providing a second recognition output that merges the first recognition output and the corrected recognition output.

Type: Application

Filed: July 24, 2021

Publication date: November 11, 2021

Applicant: Google LLC

Inventors: Evgeny A. Cherepanov, Gleb Skobeltsyn, Jakob Nicolaus Foerster, Petar Aleksic, Assaf Avner Hurwitz Michaely

prev 1 2 3 4 5 6 … next