Patents by Inventor Francoise Beaufays
Francoise Beaufays has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 11955134Abstract: A method of phrase extraction for ASR models includes obtaining audio data characterizing an utterance and a corresponding ground-truth transcription of the utterance and modifying the audio data to obfuscate a particular phrase recited in the utterance. The method also includes processing, using a trained ASR model, the modified audio data to generate a predicted transcription of the utterance, and determining whether the predicted transcription includes the particular phrase by comparing the predicted transcription of the utterance to the ground-truth transcription of the utterance. When the predicted transcription includes the particular phrase, the method includes generating an output indicating that the trained ASR model leaked the particular phrase from a training data set used to train the ASR model.Type: GrantFiled: December 13, 2021Date of Patent: April 9, 2024Assignee: Google LLCInventors: Ehsan Amid, Om Thakkar, Rajiv Mathews, Francoise Beaufays
-
Publication number: 20240112673Abstract: Implementations described herein identify and correct automatic speech recognition (ASR) misrecognitions. For example, on-device processor(s) of a client device may generate a predicted textual segment that is predicted to correspond to spoken utterance of a user of the client device, and may receive further input that modifies the predicted textual segment to an alternate textual segment. Further, the on-device processor(s) may store these textual segments in on-device storage as a candidate correction pair, and transmit the candidate correction pair to a remote system. Moreover, remote processor(s) of the remote system may determine that the candidate correction pair is an actual correction pair, and may cause client devices to generate updates for a global ASR model for the candidate correction pair. Additionally, the remote processor(s) may distribute the global ASR model to the client devices and/or additional client devices.Type: ApplicationFiled: October 3, 2022Publication date: April 4, 2024Inventors: Rajiv Mathews, Rohit Prabhavalkar, Giovanni Motta, Mingqing Chen, Lillian Zhou, Dhruv Guliani, Harry Zhang, Trevor Strohman, Françoise Beaufays
-
Publication number: 20240086063Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for cross input modality learning in a mobile device are disclosed. In one aspect, a method includes activating a first modality user input mode in which user inputs by way of a first modality are recognized using a first modality recognizer; and receiving a user input by way of the first modality. The method includes, obtaining, as a result of the first modality recognizer recognizing the user input, a transcription that includes a particular term; and generating an input context data structure that references at least the particular term. The method further includes, transmitting, by the first modality recognizer, the input context data structure to a second modality recognizer for use in updating a second modality recognition model associated with the second modality recognizer.Type: ApplicationFiled: November 22, 2023Publication date: March 14, 2024Inventors: Yu Ouyang, Diego Melendo Casado, Mohammadinamul Hasan Sheik, Francoise Beaufays, Dragan Zivkovic, Meltem Oktem
-
Publication number: 20240080038Abstract: Systems and methods for compression of data that exhibits mixed compressibility, such as floating-point data, are provided. As one example, aspects of the present disclosure can be used to compress floating-point data that represents the values of parameters of a machine-learned model. Therefore, aspects of the present disclosure can be used to compress machine-learned models (e.g., for reducing storage requirements associated with the model, reducing the bandwidth expended to transmit the model, etc.).Type: ApplicationFiled: October 27, 2023Publication date: March 7, 2024Inventors: Giovanni Motta, Francoise Beaufays, Petr Zadrazil
-
Publication number: 20240029711Abstract: Processor(s) of a client device can: receive audio data that captures a spoken utterance of a user of the client device; process, using an on-device speech recognition model, the audio data to generate a predicted textual segment that is a prediction of the spoken utterance; cause at least part of the predicted textual segment to be rendered (e.g., visually and/or audibly); receive further user interface input that is a correction of the predicted textual segment to an alternate textual segment; and generate a gradient based on comparing at least part of the predicted output to ground truth output that corresponds to the alternate textual segment. The gradient is used, by processor(s) of the client device, to update weights of the on-device speech recognition model and/or is transmitted to a remote system for use in remote updating of global weights of a global speech recognition model.Type: ApplicationFiled: October 5, 2023Publication date: January 25, 2024Inventors: Françoise Beaufays, Johan Schalkwyk, Giovanni Motta
-
Patent number: 11843397Abstract: Systems and methods for compression of data that exhibits mixed compressibility, such as floating-point data, are provided. As one example, aspects of the present disclosure can be used to compress floating-point data that represents the values of parameters of a machine-learned model. Therefore, aspects of the present disclosure can be used to compress machine-learned models (e.g., for reducing storage requirements associated with the model, reducing the bandwidth expended to transmit the model, etc.).Type: GrantFiled: September 9, 2019Date of Patent: December 12, 2023Assignee: GOOGLE LLCInventors: Giovanni Motta, Francoise Beaufays, Petr Zadrazil
-
Patent number: 11842045Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for cross input modality learning in a mobile device are disclosed. In one aspect, a method includes activating a first modality user input mode in which user inputs by way of a first modality are recognized using a first modality recognizer; and receiving a user input by way of the first modality. The method includes, obtaining, as a result of the first modality recognizer recognizing the user input, a transcription that includes a particular term; and generating an input context data structure that references at least the particular term. The method further includes, transmitting, by the first modality recognizer, the input context data structure to a second modality recognizer for use in updating a second modality recognition model associated with the second modality recognizer.Type: GrantFiled: August 31, 2022Date of Patent: December 12, 2023Assignee: Google LLCInventors: Yu Ouyang, Diego Melendo Casado, Mohammadinamul Hasan Sheik, Francoise Beaufays, Dragan Zivkovic, Meltem Oktem
-
Patent number: 11817080Abstract: Processor(s) of a client device can: receive audio data that captures a spoken utterance of a user of the client device; process, using an on-device speech recognition model, the audio data to generate a predicted textual segment that is a prediction of the spoken utterance; cause at least part of the predicted textual segment to be rendered (e.g., visually and/or audibly); receive further user interface input that is a correction of the predicted textual segment to an alternate textual segment; and generate a gradient based on comparing at least part of the predicted output to ground truth output that corresponds to the alternate textual segment. The gradient is used, by processor(s) of the client device, to update weights of the on-device speech recognition model and/or is transmitted to a remote system for use in remote updating of global weights of a global speech recognition model.Type: GrantFiled: October 11, 2019Date of Patent: November 14, 2023Assignee: GOOGLE LLCInventors: Françoise Beaufays, Johan Schalkwyk, Giovanni Motta
-
Publication number: 20230352019Abstract: Processor(s) of a client device can: receive sensor data that captures environmental attributes of an environment of the client device; process the sensor data using a machine learning model to generate a predicted output that dictates whether one or more currently dormant automated assistant functions are activated; making a decision as to whether to trigger the one or more currently dormant automated assistant functions; subsequent to making the decision, determining that the decision was incorrect; and in response to determining that the determination was incorrect, generating a gradient based on comparing the predicted output to ground truth output. In some implementations, the generated gradient is used, by processor(s) of the client device, to update weights of the on-device speech recognition model. In some implementations, the generated gradient is additionally or alternatively transmitted to a remote system for use in remote updating of global weights of a global speech recognition model.Type: ApplicationFiled: July 6, 2023Publication date: November 2, 2023Inventors: Françoise Beaufays, Rajiv Mathews, Dragan Zivkovic, Kurt Partridge, Andrew Hard
-
Publication number: 20230352004Abstract: Implementations disclosed herein are directed to federated learning of machine learning (“ML”) model(s) based on gradient(s) generated at corresponding client devices and a remote system. Processor(s) of the corresponding client devices can process client data generated locally at the corresponding client devices using corresponding on-device ML model(s) to generate corresponding predicted outputs, generate corresponding client gradients based on the corresponding predicted outputs, and transmit the corresponding client gradients to the remote system. Processor(s) of the remote system can process remote data obtained from remote database(s) using global ML model(s) to generate additional corresponding predicted outputs, generate corresponding remote gradients based on the additional corresponding predicted outputs. Further, the remote system can utilize the corresponding client gradients and the corresponding remote gradients to update the global ML model(s) or weights thereof.Type: ApplicationFiled: July 5, 2023Publication date: November 2, 2023Inventors: Françoise Beaufays, Andrew Hard, Swaroop Indra Ramaswamy, Om Dipakbhai Thakkar, Rajiv Mathews
-
Publication number: 20230317082Abstract: An unintentional memorization measure can be used to determine whether an automatic speech recognition (ASR) model has unintentionally memorized one or more phrases during training of the ASR model. Various implementations include generating one or more candidate transcripts based on the vocabulary of the ASR model. For example, the system can generate a candidate transcript by appending a token of the vocabulary to a previous candidate transcript. Various implementations include processing the candidate transcript using a speech synthesis model to generate synthesized speech audio data that includes synthesized speech of the candidate transcript. Additionally or alternatively, the synthesized speech audio data can be processed using the ASR model to generate ASR output. Various implementations can include generating a loss based on comparing the ASR output and the candidate transcript.Type: ApplicationFiled: March 31, 2022Publication date: October 5, 2023Inventors: Om Dipakbhai Thakkar, Hakim Sidahmed, W. Ronny Huang, Rajiv Mathews, Françoise Beaufays, Florian Tramèr
-
Publication number: 20230306955Abstract: Processor(s) of a client device can: identify a textual segment stored locally at the client device; process the textual segment, using a speech synthesis model stored locally at the client device, to generate synthesized speech audio data that includes synthesized speech of the identified textual segment; process the synthesized speech, using an on-device speech recognition model that is stored locally at the client device, to generate predicted output; and generate a gradient based on comparing the predicted output to ground truth output that corresponds to the textual segment. In some implementations, the generated gradient is used, by processor(s) of the client device, to update weights of the on-device speech recognition model. In some implementations, the generated gradient is additionally or alternatively transmitted to a remote system for use in remote updating of global weights of a global speech recognition model.Type: ApplicationFiled: May 31, 2023Publication date: September 28, 2023Inventors: Françoise Beaufays, Johan Schalkwyk, Khe Chai Sim
-
Publication number: 20230281248Abstract: A method includes receiving a content feed that includes audio data corresponding to speech utterances and processing the content feed to generate a semantically-rich, structured document. The structured document includes a transcription of the speech utterances and includes a plurality of words each aligned with a corresponding audio segment of the audio data that indicates a time when the word was recognized in the audio data. During playback of the content feed, the method also includes receiving a query from a user requesting information contained in the content feed and processing, by a large language model, the query and the structured document to generate a response to the query. The response conveys the requested information contained in the content feed. The method also includes providing, for output from a user device associated with the user, the response to the query.Type: ApplicationFiled: March 2, 2023Publication date: September 7, 2023Applicant: Google LLCInventors: Johan SCHALKWYK, Francoise BEAUFAYS
-
Patent number: 11749261Abstract: Implementations disclosed herein are directed to federated learning of machine learning (“ML”) model(s) based on gradient(s) generated at corresponding client devices and a remote system. Processor(s) of the corresponding client devices can process client data generated locally at the corresponding client devices using corresponding on-device ML model(s) to generate corresponding predicted outputs, generate corresponding client gradients based on the corresponding predicted outputs, and transmit the corresponding client gradients to the remote system. Processor(s) of the remote system can process remote data obtained from remote database(s) using global ML model(s) to generate additional corresponding predicted outputs, generate corresponding remote gradients based on the additional corresponding predicted outputs. Further, the remote system can utilize the corresponding client gradients and the corresponding remote gradients to update the global ML model(s) or weights thereof.Type: GrantFiled: March 10, 2021Date of Patent: September 5, 2023Assignee: GOOGLE LLCInventors: Françoise Beaufays, Andrew Hard, Swaroop Indra Ramaswamy, Om Dipakbhai Thakkar, Rajiv Mathews
-
Patent number: 11741953Abstract: Processor(s) of a client device can: receive sensor data that captures environmental attributes of an environment of the client device; process the sensor data using a machine learning model to generate a predicted output that dictates whether one or more currently dormant automated assistant functions are activated; making a decision as to whether to trigger the one or more currently dormant automated assistant functions; subsequent to making the decision, determining that the decision was incorrect; and in response to determining that the determination was incorrect, generating a gradient based on comparing the predicted output to ground truth output. In some implementations, the generated gradient is used, by processor(s) of the client device, to update weights of the on-device speech recognition model. In some implementations, the generated gradient is additionally or alternatively transmitted to a remote system for use in remote updating of global weights of a global speech recognition model.Type: GrantFiled: November 8, 2019Date of Patent: August 29, 2023Assignee: GOOGLE LLCInventors: Françoise Beaufays, Rajiv Mathews, Dragan Zivkovic, Kurt Partridge, Andrew Hard
-
Patent number: 11705106Abstract: Processor(s) of a client device can: identify a textual segment stored locally at the client device; process the textual segment, using a speech synthesis model stored locally at the client device, to generate synthesized speech audio data that includes synthesized speech of the identified textual segment; process the synthesized speech, using an on-device speech recognition model that is stored locally at the client device, to generate predicted output; and generate a gradient based on comparing the predicted output to ground truth output that corresponds to the textual segment. In some implementations, the generated gradient is used, by processor(s) of the client device, to update weights of the on-device speech recognition model. In some implementations, the generated gradient is additionally or alternatively transmitted to a remote system for use in remote updating of global weights of a global speech recognition model.Type: GrantFiled: September 20, 2021Date of Patent: July 18, 2023Assignee: GOOGLE LLCInventors: Françoise Beaufays, Johan Schalkwyk, Khe Chai Sim
-
Publication number: 20230178094Abstract: A method of phrase extraction for ASR models includes obtaining audio data characterizing an utterance and a corresponding ground-truth transcription of the utterance and modifying the audio data to obfuscate a particular phrase recited in the utterance. The method also includes processing, using a trained ASR model, the modified audio data to generate a predicted transcription of the utterance, and determining whether the predicted transcription includes the particular phrase by comparing the predicted transcription of the utterance to the ground-truth transcription of the utterance. When the predicted transcription includes the particular phrase, the method includes generating an output indicating that the trained ASR model leaked the particular phrase from a training data set used to train the ASR model.Type: ApplicationFiled: December 13, 2021Publication date: June 8, 2023Applicant: Google LLCInventors: Ehsan Amid, Om Thakkar, Rajiv Mathews, Francoise Beaufays
-
Publication number: 20230177382Abstract: Implementations disclosed herein are directed to efficient federated learning of machine learning (ML) model(s) at a remote system (e.g., remote server(s)) based on update(s) generated at client device(s). Processor(s) of the client device(s) can receive client data, process, using on-device ML model(s), the client data to generate predicted output(s), generate, using unsupervised learning, gradient(s) based on the predicted output(s), generate, based on the gradient(s), the update(s) for disparate portions of the on-device ML model(s) and/or global ML model(s) that are remote-based counterparts of the on-device ML model(s). Further, processor(s) of the remote system can receive, from the client device(s), the update(s) for the disparate portions of the on-device ML model(s), and cause the global ML model(s) to be updated based on the update(s) for the disparate portions of the on-device ML model(s) received from disparate client device(s).Type: ApplicationFiled: December 2, 2021Publication date: June 8, 2023Inventors: Françoise Beaufays, Giovanni Motta, Khe Chai Sim
-
Publication number: 20230156248Abstract: Implementations disclosed herein are directed to ephemeral learning of machine learning (“ML”) model(s) based on gradient(s) generated at a remote system (e.g., remote server(s)). Processor(s) of the remote system can receive stream(s) of audio data capturing spoken utterance(s) from a client device of a user. A fulfillment pipeline can process the stream(s) of audio data to cause certain fulfillment(s) of the spoken utterance(s) to be performed. Meanwhile, a training pipeline can process the stream(s) of audio data to generate gradient(s) using unsupervised learning techniques. Subsequent to the processing by the fulfillment pipeline and/or the training pipeline, the stream(s) of audio data are discarded by the remote system. Accordingly, the ML model(s) can be trained at the remote system without storing or logging of the stream(s) of audio data by non-transient memory thereof, thereby providing more efficient training mechanisms for training the ML model(s) and also increasing security of user data.Type: ApplicationFiled: November 23, 2021Publication date: May 18, 2023Inventors: Françoise Beaufays, Khe Chai Sim, Trevor Strohman, Oren Litvin
-
Publication number: 20230107475Abstract: A computer-implemented method includes obtaining a multi-domain (MD) dataset and training a neural network model using the MD dataset with short-form data withheld (MD-SF). The neural network model includes a plurality of layer each having a plurality of parameters. The method also includes resetting each respective layer in the trained neural network one at a time. For each respective layer in the trained neural network model, and after resetting the respective layer, the method also includes determining a corresponding word error rate of the trained neural network model and identifying the respective layer as corresponding to an ambient layer when the corresponding word error rate satisfies a word error rate threshold. The method also includes transmitting an on-device neural network model to execute on one or more client devices for generating gradients based on the withheld domain (SF) of the MD dataset.Type: ApplicationFiled: October 4, 2022Publication date: April 6, 2023Applicant: Google LLCInventors: Dhruv Guliani, Lillian Zhou, Andreas Kebel, Giovanni Motta, Francoise Beaufays