Patents by Inventor Ágoston Weisz

Ágoston Weisz has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11947923
    Abstract: Implementations relate to managing multimedia content that is obtained by large language model(s) (LLM(s)) and/or generated by other generative model(s). Processor(s) of a system can: receive natural language (NL) based input that requests multimedia content, generate a response that is responsive to the NL based input, and cause the response to be rendered. In some implementations, and in generating the response, the processor(s) can process, using a LLM, LLM input to generate LLM output, and determine, based on the LLM output, at least multimedia content to be included in the response. Further, the processor(s) can evaluate the multimedia content to determine whether it should be included in the response. In response to determining that the multimedia content should not be included in the response, the processor(s) can cause the response, including alternative multimedia content or other textual content, to be rendered.
    Type: Grant
    Filed: November 27, 2023
    Date of Patent: April 2, 2024
    Assignee: GOOGLE LLC
    Inventors: Sanil Jain, Wei Yu, Ágoston Weisz, Michael Andrew Goodman, Diana Avram, Amin Ghafouri, Golnaz Ghiasi, Igor Petrovski, Khyatti Gupta, Oscar Akerlund, Evgeny Sluzhaev, Rakesh Shivanna, Thang Luong, Komal Singh, Yifeng Lu, Vikas Peswani
  • Patent number: 11907674
    Abstract: Implementations relate to generating multi-modal response(s) through utilization of large language model(s) (LLM(s)). Processor(s) of a system can: receive natural language (NL) based input, generate a multi-modal response that is responsive to the NL based output, and cause the multi-modal response to be rendered. In some implementations, and in generating the multi-modal response, the processor(s) can process, using a LLM, LLM input (e.g., that includes at least the NL based input) to generate LLM output, and determine, based on the LLM output, textual content for inclusion in the multi-modal response and multimedia content for inclusion in the multi-modal response. In some implementations, the multimedia content can be obtained based on a multimedia content tag that is included in the LLM output and that is indicative of the multimedia content. In various implementations, the multimedia content can be interleaved between segments of the textual content.
    Type: Grant
    Filed: September 20, 2023
    Date of Patent: February 20, 2024
    Assignee: GOOGLE LLC
    Inventors: Oscar Akerlund, Evgeny Sluzhaev, Golnaz Ghiasi, Thang Luong, Yifeng Lu, Igor Petrovski, Ágoston Weisz, Wei Yu, Rakesh Shivanna, Michael Andrew Goodman, Apoorv Kulshreshtha, Yu Du, Amin Ghafouri, Sanil Jain, Dustin Tran, Vikas Peswani, YaGuang Li
  • Publication number: 20240046925
    Abstract: Implementations perform, independent of any explicit assistant invocation input(s), automatic speech recognition (ASR) on audio data, that is detected via microphone(s) of an assistant device, to generate ASR text that predicts a spoken utterance that is captured in the audio data. The ASR text is processed and candidate automated assistant action(s) that correspond to the command, if any, are generated. For each of any candidate automated assistant action(s), it is determined whether to (a) cause automatic performance of the automated assistant action responsive to the spoken utterance or, instead, (b) suppress any automatic performance of the automated assistant action responsive to the spoken utterance. Such determination can be made based on processing both (i) action feature(s) for the candidate automated assistant action; and (ii) environment feature(s) that each reflects a corresponding current value for a corresponding dynamic state of an environment of the assistant device.
    Type: Application
    Filed: September 1, 2022
    Publication date: February 8, 2024
    Inventors: Konrad Miller, Ágoston Weisz, Herbert Jordan
  • Publication number: 20240013782
    Abstract: A method includes receiving follow-on audio data captured by an assistant-enabled device, the follow-on audio data corresponding to a follow-on query spoken by a user of the assistant-enabled device to a digital assistant subsequent to the user submitting a previous query to the digital assistant. The method also includes processing, using a speech recognizer, the follow-on audio data to generate multiple candidate hypotheses, each candidate hypothesis corresponding to a candidate transcription for the follow-on query and represented by a respective sequence of hypothesized terms. For each corresponding candidate hypothesis among the multiple candidate hypotheses, the method also includes determining a corresponding similarity metric between the previous query and the corresponding candidate hypothesis and determining a transcription of the follow-on query spoken by the user based on the similarity metrics determined for the multiple candidate hypotheses.
    Type: Application
    Filed: July 11, 2022
    Publication date: January 11, 2024
    Applicant: Google LLC
    Inventors: Patrick Siegler, Aurélien Boffy, Ágoston Weisz
  • Publication number: 20240004608
    Abstract: Techniques enable an automatic adjustment of a muted response setting of an automated assistant based on a determination of an expectation by a user to hear an audible response to their query, despite the muted setting. Determination of the expectation may be based on historical, empirical data uploaded from multiple users over time for a given response scenario. For example, the system may determine from the historical data that a certain type of query has been associated with a user both repeating their query and increasing a response volume setting within a given timeframe. Metrics may be generated, stored, and invoked in response to attributes associated with identifiable types of queries and query scenarios. Automated response characteristics meant to reduce inefficiencies may be associated with certain queries that can otherwise collectively burden network bandwidth and processing resources.
    Type: Application
    Filed: September 18, 2023
    Publication date: January 4, 2024
    Inventors: Michael Schaer, Vitaly Gatsko, Ágoston Weisz
  • Publication number: 20230402034
    Abstract: Implementations relate to correcting a speech recognition hypothesis based on prior correction(s) made by a user and/or fulfillment data associated with fulfilling a request embodied in the speech recognition hypothesis. A candidate speech recognition hypothesis can be generated in response to the user providing a spoken utterance to an application, such as an automated assistant. When a confidence metric for the candidate speech recognition hypothesis does not satisfy a threshold, one or more terms of the candidate speech recognition hypothesis can be compared to correcting data. The correcting data can indicate whether the user previously corrected any term(s) present in the candidate speech recognition hypothesis and, if so, correct the term(s) accordingly. Fulfillment data generated for the candidate hypothesis and/or for the corrected hypothesis can also be processed to determine whether to utilize the candidate hypothesis or the corrected hypothesis in responding to the user.
    Type: Application
    Filed: June 16, 2022
    Publication date: December 14, 2023
    Inventors: Ágoston Weisz, Miroslaw Michalski, Aurélien Boffy
  • Patent number: 11789695
    Abstract: Techniques enable an automatic adjustment of a muted response setting of an automated assistant based on a determination of an expectation by a user to hear an audible response to their query, despite the muted setting. Determination of the expectation may be based on historical, empirical data uploaded from multiple users over time for a given response scenario. For example, the system may determine from the historical data that a certain type of query has been associated with a user both repeating their query and increasing a response volume setting within a given timeframe. Metrics may be generated, stored, and invoked in response to attributes associated with identifiable types of queries and query scenarios. Automated response characteristics meant to reduce inefficiencies may be associated with certain queries that can otherwise collectively burden network bandwidth and processing resources.
    Type: Grant
    Filed: October 13, 2022
    Date of Patent: October 17, 2023
    Assignee: GOOGLE LLC
    Inventors: Michael Schaer, Vitaly Gatsko, Ágoston Weisz
  • Publication number: 20230186898
    Abstract: A method includes receiving audio data corresponding to a query spoken and processing the audio data to generate multiple candidate hypotheses each represented by a respective sequence of hypothesized terms. For each candidate hypothesis, the method includes determining whether the sequence of hypothesized terms includes a source phrase from a list of phrase correction pairs. Each phrase correction pair includes a corresponding source phrase that was misrecognized and a corresponding target phrase replacing the source phrase. When the respective sequence of hypothesized terms includes the source phrase, the method includes generating a corresponding additional candidate hypothesis that replaces the source phrase.
    Type: Application
    Filed: December 15, 2021
    Publication date: June 15, 2023
    Applicant: Google LLC
    Inventors: Ágoston Weisz, Leonid Velikovich
  • Publication number: 20230033396
    Abstract: Techniques enable an automatic adjustment of a muted response setting of an automated assistant based on a determination of an expectation by a user to hear an audible response to their query, despite the muted setting. Determination of the expectation may be based on historical, empirical data uploaded from multiple users over time for a given response scenario. For example, the system may determine from the historical data that a certain type of query has been associated with a user both repeating their query and increasing a response volume setting within a given timeframe. Metrics may be generated, stored, and invoked in response to attributes associated with identifiable types of queries and query scenarios. Automated response characteristics meant to reduce inefficiencies may be associated with certain queries that can otherwise collectively burden network bandwidth and processing resources.
    Type: Application
    Filed: October 13, 2022
    Publication date: February 2, 2023
    Inventors: Michael Schaer, Vitaly Gatsko, Ágoston Weisz
  • Patent number: 11474773
    Abstract: Techniques enable an automatic adjustment of a muted response setting of an automated assistant based on a determination of an expectation by a user to hear an audible response to their query, despite the muted setting. Determination of the expectation may be based on historical, empirical data uploaded from multiple users over time for a given response scenario. For example, the system may determine from the historical data that a certain type of query has been associated with a user both repeating their query and increasing a response volume setting within a given timeframe. Metrics may be generated, stored, and invoked in response to attributes associated with identifiable types of queries and query scenarios. Automated response characteristics meant to reduce inefficiencies may be associated with certain queries that can otherwise collectively burden network bandwidth and processing resources.
    Type: Grant
    Filed: September 2, 2020
    Date of Patent: October 18, 2022
    Assignee: GOOGLE LLC
    Inventors: Michael Schaer, Vitaly Gatsko, Ágoston Weisz
  • Patent number: 11393476
    Abstract: Implementations relate to determining a language for speech recognition of a spoken utterance, received via an automated assistant interface, for interacting with an automated assistant. In various implementations, audio data indicative of a voice input that includes a natural language request from a user may be applied as input across multiple speech-to-text (“STT”) machine learning models to generate multiple candidate speech recognition outputs. Each STT machine learning model may trained in a particular language. For each respective STT machine learning model of the multiple STT models, the multiple candidate speech recognition outputs may be analyzed to determine an entropy score for the respective STT machine learning model. Based on the entropy scores, a target language associated with at least one STT machine learning model of the multiple STT machine learning models may be selected. The automated assistant may respond to the request using the target language.
    Type: Grant
    Filed: January 8, 2019
    Date of Patent: July 19, 2022
    Assignee: GOOGLE LLC
    Inventors: Ignacio Lopez Moreno, Lukas Lopatovsky, Ágoston Weisz
  • Publication number: 20220139373
    Abstract: Techniques are disclosed that enable determining and/or utilizing a misrecognition of a spoken utterance, where the misrecognition is generated using an automatic speech recognition (ASR) model. Various implementations include determining a recognition based on the spoken utterance and a previous utterance spoken prior to the spoken utterance. Additionally or alternatively, implementations include personalizing an ASR engine for a user based on the spoken utterance and the previous utterance spoken prior to the spoken utterance (e.g., based on audio data capturing the previous utterance and a text representation of the spoken utterance).
    Type: Application
    Filed: July 8, 2020
    Publication date: May 5, 2022
    Inventors: Ágoston Weisz, Ignacio Lopez Moreno, Alexandru Dovlecel
  • Publication number: 20220084503
    Abstract: Implementations set forth herein relate to speech recognition techniques for handling variations in speech among users (e.g. due to different accents) and processing features of user context in order to expand a number of speech recognition hypotheses when interpreting a spoken utterance from a user. In order to adapt to an accent of the user, terms common to multiple speech recognition hypotheses can be filtered out in order to identify inconsistent terms apparent in a group of hypotheses. Mappings between inconsistent terms can be stored for subsequent users as term correspondence data. In this way, supplemental speech recognition hypotheses can be generated and subject to probability-based scoring for identifying a speech recognition hypothesis that most correlates to a spoken utterance provided by a user. In some implementations, prior to scoring, hypotheses can be supplemented based on contextual data, such as on-screen content and/or application capabilities.
    Type: Application
    Filed: November 29, 2021
    Publication date: March 17, 2022
    Inventors: Ágoston Weisz, Alexandru Dovlecel, Gleb Skobeltsyn, Evgeny Cherepanov, Justas Klimavicius, Yihui Ma, Lukas Lopatovsky
  • Publication number: 20220066731
    Abstract: Techniques enable an automatic adjustment of a muted response setting of an automated assistant based on a determination of an expectation by a user to hear an audible response to their query, despite the muted setting. Determination of the expectation may be based on historical, empirical data uploaded from multiple users over time for a given response scenario. For example, the system may determine from the historical data that a certain type of query has been associated with a user both repeating their query and increasing a response volume setting within a given timeframe. Metrics may be generated, stored, and invoked in response to attributes associated with identifiable types of queries and query scenarios. Automated response characteristics meant to reduce inefficiencies may be associated with certain queries that can otherwise collectively burden network bandwidth and processing resources.
    Type: Application
    Filed: September 2, 2020
    Publication date: March 3, 2022
    Inventors: Michael Schaer, Vitaly Gatsko, Ágoston Weisz
  • Patent number: 11189264
    Abstract: Implementations set forth herein relate to speech recognition techniques for handling variations in speech among users (e.g. due to different accents) and processing features of user context in order to expand a number of speech recognition hypotheses when interpreting a spoken utterance from a user. In order to adapt to an accent of the user, terms common to multiple speech recognition hypotheses can be filtered out in order to identify inconsistent terms apparent in a group of hypotheses. Mappings between inconsistent terms can be stored for subsequent users as term correspondence data. In this way, supplemental speech recognition hypotheses can be generated and subject to probability-based scoring for identifying a speech recognition hypothesis that most correlates to a spoken utterance provided by a user. In some implementations, prior to scoring, hypotheses can be supplemented based on contextual data, such as on-screen content and/or application capabilities.
    Type: Grant
    Filed: July 17, 2019
    Date of Patent: November 30, 2021
    Assignee: GOOGLE LLC
    Inventors: Ágoston Weisz, Alexandru Dovlecel, Gleb Skobeltsyn, Evgeny Cherepanov, Justas Klimavicius, Yihui Ma, Lukas Lopatovsky
  • Publication number: 20210074295
    Abstract: Implementations relate to determining a language for speech recognition of a spoken utterance, received via an automated assistant interface, for interacting with an automated assistant. In various implementations, audio data indicative of a voice input that includes a natural language request from a user may be applied as input across multiple speech-to-text (“STT”) machine learning models to generate multiple candidate speech recognition outputs. Each STT machine learning model may trained in a particular language. For each respective STT machine learning model of the multiple STT models, the multiple candidate speech recognition outputs may be analyzed to determine an entropy score for the respective STT machine learning model. Based on the entropy scores, a target language associated with at least one STT machine learning model of the multiple STT machine learning models may be selected. The automated assistant may respond to the request using the target language.
    Type: Application
    Filed: January 8, 2019
    Publication date: March 11, 2021
    Inventors: Ignacio Lopez Moreno, Lukas Lopatovsky, Ágoston Weisz
  • Publication number: 20210012765
    Abstract: Implementations set forth herein relate to speech recognition techniques for handling variations in speech among users (e.g. due to different accents) and processing features of user context in order to expand a number of speech recognition hypotheses when interpreting a spoken utterance from a user. In order to adapt to an accent of the user, terms common to multiple speech recognition hypotheses can be filtered out in order to identify inconsistent terms apparent in a group of hypotheses. Mappings between inconsistent terms can be stored for subsequent users as term correspondence data. In this way, supplemental speech recognition hypotheses can be generated and subject to probability-based scoring for identifying a speech recognition hypothesis that most correlates to a spoken utterance provided by a user. In some implementations, prior to scoring, hypotheses can be supplemented based on contextual data, such as on-screen content and/or application capabilities.
    Type: Application
    Filed: July 17, 2019
    Publication date: January 14, 2021
    Inventors: Ágoston Weisz, Alexandru Dovlecel, Gleb Skobeltsyn, Evgeny Cherepanov, Justas Klimavicius, Yihui Ma, Lukas Lopatovsky
  • Patent number: 10853435
    Abstract: System and methods for aligning event data recorded by recording devices. Recording devices create, transmit, and store alignment data. Alignment data created by a recording device is stored in the memory of the recording device with a time that is maintained by the recording device and that is relative to the time of event data recorded by the recording device that creates the alignment data. Recording devices further receive and store transmitted alignment data. Alignment data received by a recording device is stored in the memory of the recording device with a time that is maintained by the receiving recording device and that is relative to the time of event data recorded by the recording device that creates alignment data. Stored alignment data may be used to align the playback of event data of devices that have the same alignment data.
    Type: Grant
    Filed: June 9, 2017
    Date of Patent: December 1, 2020
    Assignee: Axon Enterprise, Inc.
    Inventors: James Norton Reitz, Raymond T. Fortna, Nathan A. Grubb, Michael J. Bohlander, Tyler J. Conant, Tamas Agoston Weisz, Zachary S. Emmel, Trevin Chow, Melissa S. Kersh, Jacob Davis Hershfield, Patrick W. Smith, Abraham Alvarez Zayas
  • Publication number: 20170364602
    Abstract: System and methods for aligning event data recorded by recording devices. Recording devices create, transmit, and store alignment data. Alignment data created by a recording device is stored in the memory of the recording device with a time that is maintained by the recording device and that is relative to the time of event data recorded by the recording device that creates the alignment data. Recording devices further receive and store transmitted alignment data. Alignment data received by a recording device is stored in the memory of the recording device with a time that is maintained by the receiving recording device and that is relative to the time of event data recorded by the recording device that creates alignment data. Stored alignment data may be used to align the playback of event data of devices that have the same alignment data.
    Type: Application
    Filed: June 9, 2017
    Publication date: December 21, 2017
    Applicant: Axon Enterprise, Inc.
    Inventors: James Norton Reitz, Raymond T. Fortna, Nathan A. Grubb, Michael J. Bohlander, Tyler J. Conant, Tamas Agoston Weisz, Zachary S. Emmel, Trevin Chow, Melissa S. Kersh, Jacob Davis Hershfield, Patrick W. Smith, Abraham Alvarez Zayas