Patents by Inventor Ágoston Weisz
Ágoston Weisz has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 12266358Abstract: Implementations perform, independent of any explicit assistant invocation input(s), automatic speech recognition (ASR) on audio data, that is detected via microphone(s) of an assistant device, to generate ASR text that predicts a spoken utterance that is captured in the audio data. The ASR text is processed and candidate automated assistant action(s) that correspond to the command, if any, are generated. For each of any candidate automated assistant action(s), it is determined whether to (a) cause automatic performance of the automated assistant action responsive to the spoken utterance or, instead, (b) suppress any automatic performance of the automated assistant action responsive to the spoken utterance. Such determination can be made based on processing both (i) action feature(s) for the candidate automated assistant action; and (ii) environment feature(s) that each reflects a corresponding current value for a corresponding dynamic state of an environment of the assistant device.Type: GrantFiled: September 1, 2022Date of Patent: April 1, 2025Assignee: GOOGLE LLCInventors: Konrad Miller, Ágoston Weisz, Herbert Jordan
-
Publication number: 20250061146Abstract: Implementations utilize an LLM to respond to queries comprising image data, such as multimodal queries that include both text and image data. A natural language processing system is extended such that when an image is provided, the natural language processing system invokes one or more auxiliary image processing models (e.g., visual query) and/or image search engines. The results, of invoking such model(s) and/or search engine(s), are collected into structured data signals related to the image. These signals form part of the conversation context and are used to extend the text prompt that is sent to the LLM. This allows the LLM to take the context into account when being used to process the user query, thereby enabling generation of an LLM reply that addresses relevant feature(s) of the image.Type: ApplicationFiled: August 13, 2024Publication date: February 20, 2025Inventors: Olivier Siegenthaler, Ágoston Weisz, Boris Bluntschli, Dan Banica, Kaan Ege Özgün, Daniel Mogoreanu, Filip Sladek
-
Publication number: 20250061889Abstract: A method includes receiving audio data corresponding to a query spoken and processing the audio data to generate multiple candidate hypotheses each represented by a respective sequence of hypothesized terms. For each candidate hypothesis, the method includes determining whether the sequence of hypothesized terms includes a source phrase from a list of phrase correction pairs. Each phrase correction pair includes a corresponding source phrase that was misrecognized and a corresponding target phrase replacing the source phrase. When the respective sequence of hypothesized terms includes the source phrase, the method includes generating a corresponding additional candidate hypothesis that replaces the source phrase.Type: ApplicationFiled: November 1, 2024Publication date: February 20, 2025Applicant: Google LLCInventors: Ágoston Weisz, Leonid Velikovich
-
Publication number: 20250053751Abstract: Implementations relate to generating multi-modal response(s) through utilization of large language model(s) (LLM(s)). Processor(s) of a system can: receive natural language (NL) based input, generate a multi-modal response that is responsive to the NL based output, and cause the multi-modal response to be rendered. In some implementations, and in generating the multi-modal response, the processor(s) can process, using a LLM, LLM input (e.g., that includes at least the NL based input) to generate LLM output, and determine, based on the LLM output, textual content for inclusion in the multi-modal response and multimedia content for inclusion in the multi-modal response. In some implementations, the multimedia content can be obtained based on a multimedia content tag that is included in the LLM output and that is indicative of the multimedia content. In various implementations, the multimedia content can be interleaved between segments of the textual content.Type: ApplicationFiled: January 16, 2024Publication date: February 13, 2025Inventors: Oscar Akerlund, Evgeny Sluzhaev, Golnaz Ghiasi, Thang Luong, Yifeng Lu, Igor Petrovski, Agoston Weisz, Wei Yu, Rakesh Shivanna, Michael Andrew Goodman, Apoorv Kulshreshtha, Yu Du, Amin Ghafouri, Sanil Jain, Dustin Tran, Vikas Peswani, YaGuang Li
-
Publication number: 20250046296Abstract: A method, device, and computer-readable storage medium for predicting pronunciation of a text sample. The method includes selecting a predicted text sample corresponding to an audio sample, receiving a correction text sample corresponding to the audio sample, updating an encoding of allowable pronunciations of the correction text sample based on the predicted text sample and the audio sample, the updated encoding of allowable pronunciations of the correction text sample including a pronunciation of the predicted text sample, and predicting a pronunciation of the correction text sample based on the updated encoding of allowable pronunciations of the correction text sample.Type: ApplicationFiled: July 31, 2023Publication date: February 6, 2025Applicant: GOOGLE LLCInventors: Leonid VELIKOVICH, Ágoston WEISZ
-
Publication number: 20240427997Abstract: A method includes obtaining a set of training queries that each specify a corresponding operation to perform and include a corresponding plurality of speech recognition hypotheses that each represent a corresponding candidate transcription of the training query, and a corresponding ground-truth transcription of the training query. For each training query, the method includes processing, using an encoder of a neural semantic parsing (NSP) model, the corresponding plurality of speech recognition hypotheses to generate a corresponding NSP embedding, processing, using a transcription decoder, the corresponding NSP embedding to generate a corresponding predicted transcription, and determining a corresponding first loss based on the corresponding predicted transcription and the corresponding ground-truth transcription.Type: ApplicationFiled: June 20, 2023Publication date: December 26, 2024Applicant: Google LLCInventors: Khalid Salama, Ágoston Weisz
-
Patent number: 12165641Abstract: A method includes receiving follow-on audio data captured by an assistant-enabled device, the follow-on audio data corresponding to a follow-on query spoken by a user of the assistant-enabled device to a digital assistant subsequent to the user submitting a previous query to the digital assistant. The method also includes processing, using a speech recognizer, the follow-on audio data to generate multiple candidate hypotheses, each candidate hypothesis corresponding to a candidate transcription for the follow-on query and represented by a respective sequence of hypothesized terms. For each corresponding candidate hypothesis among the multiple candidate hypotheses, the method also includes determining a corresponding similarity metric between the previous query and the corresponding candidate hypothesis and determining a transcription of the follow-on query spoken by the user based on the similarity metrics determined for the multiple candidate hypotheses.Type: GrantFiled: July 11, 2022Date of Patent: December 10, 2024Assignee: Google LLCInventors: Patrick Siegler, Aurélien Boffy, Ágoston Weisz
-
Patent number: 12165628Abstract: Techniques are disclosed that enable determining and/or utilizing a misrecognition of a spoken utterance, where the misrecognition is generated using an automatic speech recognition (ASR) model. Various implementations include determining a recognition based on the spoken utterance and a previous utterance spoken prior to the spoken utterance. Additionally or alternatively, implementations include personalizing an ASR engine for a user based on the spoken utterance and the previous utterance spoken prior to the spoken utterance (e.g., based on audio data capturing the previous utterance and a text representation of the spoken utterance).Type: GrantFiled: July 8, 2020Date of Patent: December 10, 2024Assignee: GOOGLE LLCInventors: Ágoston Weisz, Ignacio Lopez Moreno, Alexandru Dovlecel
-
Patent number: 12154549Abstract: A method includes receiving audio data corresponding to a query spoken and processing the audio data to generate multiple candidate hypotheses each represented by a respective sequence of hypothesized terms. For each candidate hypothesis, the method includes determining whether the sequence of hypothesized terms includes a source phrase from a list of phrase correction pairs. Each phrase correction pair includes a corresponding source phrase that was misrecognized and a corresponding target phrase replacing the source phrase. When the respective sequence of hypothesized terms includes the source phrase, the method includes generating a corresponding additional candidate hypothesis that replaces the source phrase.Type: GrantFiled: December 15, 2021Date of Patent: November 26, 2024Assignee: Google LLCInventors: Ágoston Weisz, Leonid Velikovich
-
Patent number: 12086504Abstract: Techniques enable an automatic adjustment of a muted response setting of an automated assistant based on a determination of an expectation by a user to hear an audible response to their query, despite the muted setting. Determination of the expectation may be based on historical, empirical data uploaded from multiple users over time for a given response scenario. For example, the system may determine from the historical data that a certain type of query has been associated with a user both repeating their query and increasing a response volume setting within a given timeframe. Metrics may be generated, stored, and invoked in response to attributes associated with identifiable types of queries and query scenarios. Automated response characteristics meant to reduce inefficiencies may be associated with certain queries that can otherwise collectively burden network bandwidth and processing resources.Type: GrantFiled: September 18, 2023Date of Patent: September 10, 2024Assignee: GOOGLE LLCInventors: Michael Schaer, Vitaly Gatsko, Ágoston Weisz
-
Publication number: 20240257799Abstract: A method includes receiving a biased transcription for a voice command spoken by a user and captured by a user device, the biased transcription biased to include a biasing phrase from a set of biasing phrases specific to the user. The method also includes instructing an application executing on the user device to perform an action specified by the biased transcription for the voice command, and receiving one or more user behavior signals responsive to the application performing the action specified by the biased transcription. The method further includes generating, as output from a confidence model, a confidence score of the biased transcription based on the one or more user behavior signals input to the confidence model and, based on the confidence score output from the confidence model, training a speech recognizer on the biased transcription.Type: ApplicationFiled: January 30, 2023Publication date: August 1, 2024Applicant: Google LLCInventors: Dragan Zivkovic, Agoston Weisz
-
Publication number: 20240194188Abstract: A method of using voice query history to improve speech recognition includes receiving audio data corresponding to a current query spoken by a user and processing the audio data to generate a lattice of candidate hypotheses. The method also includes obtaining voice query history data associated with the user that includes n-grams extracted from transcriptions of previous queries spoken by the user, and generating, using a biasing context model configured to receive the voice query history data, a biasing context vector. The biasing context vector indicates a likelihood that each n-gram from the n-grams extracted from the transcriptions of the previous queries spoken by the user will appear in the current query. The method also includes augmenting the lattice of candidate hypotheses based on the biasing context vector and determining a transcription for the current query based on the augmented lattice of candidate hypotheses.Type: ApplicationFiled: December 8, 2022Publication date: June 13, 2024Inventors: Agoston Weisz, Mikhail Dektiarev
-
Patent number: 11947923Abstract: Implementations relate to managing multimedia content that is obtained by large language model(s) (LLM(s)) and/or generated by other generative model(s). Processor(s) of a system can: receive natural language (NL) based input that requests multimedia content, generate a response that is responsive to the NL based input, and cause the response to be rendered. In some implementations, and in generating the response, the processor(s) can process, using a LLM, LLM input to generate LLM output, and determine, based on the LLM output, at least multimedia content to be included in the response. Further, the processor(s) can evaluate the multimedia content to determine whether it should be included in the response. In response to determining that the multimedia content should not be included in the response, the processor(s) can cause the response, including alternative multimedia content or other textual content, to be rendered.Type: GrantFiled: November 27, 2023Date of Patent: April 2, 2024Assignee: GOOGLE LLCInventors: Sanil Jain, Wei Yu, Ágoston Weisz, Michael Andrew Goodman, Diana Avram, Amin Ghafouri, Golnaz Ghiasi, Igor Petrovski, Khyatti Gupta, Oscar Akerlund, Evgeny Sluzhaev, Rakesh Shivanna, Thang Luong, Komal Singh, Yifeng Lu, Vikas Peswani
-
Patent number: 11907674Abstract: Implementations relate to generating multi-modal response(s) through utilization of large language model(s) (LLM(s)). Processor(s) of a system can: receive natural language (NL) based input, generate a multi-modal response that is responsive to the NL based output, and cause the multi-modal response to be rendered. In some implementations, and in generating the multi-modal response, the processor(s) can process, using a LLM, LLM input (e.g., that includes at least the NL based input) to generate LLM output, and determine, based on the LLM output, textual content for inclusion in the multi-modal response and multimedia content for inclusion in the multi-modal response. In some implementations, the multimedia content can be obtained based on a multimedia content tag that is included in the LLM output and that is indicative of the multimedia content. In various implementations, the multimedia content can be interleaved between segments of the textual content.Type: GrantFiled: September 20, 2023Date of Patent: February 20, 2024Assignee: GOOGLE LLCInventors: Oscar Akerlund, Evgeny Sluzhaev, Golnaz Ghiasi, Thang Luong, Yifeng Lu, Igor Petrovski, Ágoston Weisz, Wei Yu, Rakesh Shivanna, Michael Andrew Goodman, Apoorv Kulshreshtha, Yu Du, Amin Ghafouri, Sanil Jain, Dustin Tran, Vikas Peswani, YaGuang Li
-
Publication number: 20240046925Abstract: Implementations perform, independent of any explicit assistant invocation input(s), automatic speech recognition (ASR) on audio data, that is detected via microphone(s) of an assistant device, to generate ASR text that predicts a spoken utterance that is captured in the audio data. The ASR text is processed and candidate automated assistant action(s) that correspond to the command, if any, are generated. For each of any candidate automated assistant action(s), it is determined whether to (a) cause automatic performance of the automated assistant action responsive to the spoken utterance or, instead, (b) suppress any automatic performance of the automated assistant action responsive to the spoken utterance. Such determination can be made based on processing both (i) action feature(s) for the candidate automated assistant action; and (ii) environment feature(s) that each reflects a corresponding current value for a corresponding dynamic state of an environment of the assistant device.Type: ApplicationFiled: September 1, 2022Publication date: February 8, 2024Inventors: Konrad Miller, Ágoston Weisz, Herbert Jordan
-
Publication number: 20240013782Abstract: A method includes receiving follow-on audio data captured by an assistant-enabled device, the follow-on audio data corresponding to a follow-on query spoken by a user of the assistant-enabled device to a digital assistant subsequent to the user submitting a previous query to the digital assistant. The method also includes processing, using a speech recognizer, the follow-on audio data to generate multiple candidate hypotheses, each candidate hypothesis corresponding to a candidate transcription for the follow-on query and represented by a respective sequence of hypothesized terms. For each corresponding candidate hypothesis among the multiple candidate hypotheses, the method also includes determining a corresponding similarity metric between the previous query and the corresponding candidate hypothesis and determining a transcription of the follow-on query spoken by the user based on the similarity metrics determined for the multiple candidate hypotheses.Type: ApplicationFiled: July 11, 2022Publication date: January 11, 2024Applicant: Google LLCInventors: Patrick Siegler, Aurélien Boffy, Ágoston Weisz
-
Publication number: 20240004608Abstract: Techniques enable an automatic adjustment of a muted response setting of an automated assistant based on a determination of an expectation by a user to hear an audible response to their query, despite the muted setting. Determination of the expectation may be based on historical, empirical data uploaded from multiple users over time for a given response scenario. For example, the system may determine from the historical data that a certain type of query has been associated with a user both repeating their query and increasing a response volume setting within a given timeframe. Metrics may be generated, stored, and invoked in response to attributes associated with identifiable types of queries and query scenarios. Automated response characteristics meant to reduce inefficiencies may be associated with certain queries that can otherwise collectively burden network bandwidth and processing resources.Type: ApplicationFiled: September 18, 2023Publication date: January 4, 2024Inventors: Michael Schaer, Vitaly Gatsko, Ágoston Weisz
-
Publication number: 20230402034Abstract: Implementations relate to correcting a speech recognition hypothesis based on prior correction(s) made by a user and/or fulfillment data associated with fulfilling a request embodied in the speech recognition hypothesis. A candidate speech recognition hypothesis can be generated in response to the user providing a spoken utterance to an application, such as an automated assistant. When a confidence metric for the candidate speech recognition hypothesis does not satisfy a threshold, one or more terms of the candidate speech recognition hypothesis can be compared to correcting data. The correcting data can indicate whether the user previously corrected any term(s) present in the candidate speech recognition hypothesis and, if so, correct the term(s) accordingly. Fulfillment data generated for the candidate hypothesis and/or for the corrected hypothesis can also be processed to determine whether to utilize the candidate hypothesis or the corrected hypothesis in responding to the user.Type: ApplicationFiled: June 16, 2022Publication date: December 14, 2023Inventors: Ágoston Weisz, Miroslaw Michalski, Aurélien Boffy
-
Patent number: 11789695Abstract: Techniques enable an automatic adjustment of a muted response setting of an automated assistant based on a determination of an expectation by a user to hear an audible response to their query, despite the muted setting. Determination of the expectation may be based on historical, empirical data uploaded from multiple users over time for a given response scenario. For example, the system may determine from the historical data that a certain type of query has been associated with a user both repeating their query and increasing a response volume setting within a given timeframe. Metrics may be generated, stored, and invoked in response to attributes associated with identifiable types of queries and query scenarios. Automated response characteristics meant to reduce inefficiencies may be associated with certain queries that can otherwise collectively burden network bandwidth and processing resources.Type: GrantFiled: October 13, 2022Date of Patent: October 17, 2023Assignee: GOOGLE LLCInventors: Michael Schaer, Vitaly Gatsko, Ágoston Weisz
-
Publication number: 20230186898Abstract: A method includes receiving audio data corresponding to a query spoken and processing the audio data to generate multiple candidate hypotheses each represented by a respective sequence of hypothesized terms. For each candidate hypothesis, the method includes determining whether the sequence of hypothesized terms includes a source phrase from a list of phrase correction pairs. Each phrase correction pair includes a corresponding source phrase that was misrecognized and a corresponding target phrase replacing the source phrase. When the respective sequence of hypothesized terms includes the source phrase, the method includes generating a corresponding additional candidate hypothesis that replaces the source phrase.Type: ApplicationFiled: December 15, 2021Publication date: June 15, 2023Applicant: Google LLCInventors: Ágoston Weisz, Leonid Velikovich