Patents by Inventor Francoise Beaufays

Francoise Beaufays has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Phrase extraction for ASR models

Patent number: 11955134

Abstract: A method of phrase extraction for ASR models includes obtaining audio data characterizing an utterance and a corresponding ground-truth transcription of the utterance and modifying the audio data to obfuscate a particular phrase recited in the utterance. The method also includes processing, using a trained ASR model, the modified audio data to generate a predicted transcription of the utterance, and determining whether the predicted transcription includes the particular phrase by comparing the predicted transcription of the utterance to the ground-truth transcription of the utterance. When the predicted transcription includes the particular phrase, the method includes generating an output indicating that the trained ASR model leaked the particular phrase from a training data set used to train the ASR model.

Type: Grant

Filed: December 13, 2021

Date of Patent: April 9, 2024

Assignee: Google LLC

Inventors: Ehsan Amid, Om Thakkar, Rajiv Mathews, Francoise Beaufays
IDENTIFYING AND CORRECTING AUTOMATIC SPEECH RECOGNITION (ASR) MISRECOGNITIONS IN A DECENTRALIZED MANNER

Publication number: 20240112673

Abstract: Implementations described herein identify and correct automatic speech recognition (ASR) misrecognitions. For example, on-device processor(s) of a client device may generate a predicted textual segment that is predicted to correspond to spoken utterance of a user of the client device, and may receive further input that modifies the predicted textual segment to an alternate textual segment. Further, the on-device processor(s) may store these textual segments in on-device storage as a candidate correction pair, and transmit the candidate correction pair to a remote system. Moreover, remote processor(s) of the remote system may determine that the candidate correction pair is an actual correction pair, and may cause client devices to generate updates for a global ASR model for the candidate correction pair. Additionally, the remote processor(s) may distribute the global ASR model to the client devices and/or additional client devices.

Type: Application

Filed: October 3, 2022

Publication date: April 4, 2024

Inventors: Rajiv Mathews, Rohit Prabhavalkar, Giovanni Motta, Mingqing Chen, Lillian Zhou, Dhruv Guliani, Harry Zhang, Trevor Strohman, Françoise Beaufays
Modality Learning on Mobile Devices

Publication number: 20240086063

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for cross input modality learning in a mobile device are disclosed. In one aspect, a method includes activating a first modality user input mode in which user inputs by way of a first modality are recognized using a first modality recognizer; and receiving a user input by way of the first modality. The method includes, obtaining, as a result of the first modality recognizer recognizing the user input, a transcription that includes a particular term; and generating an input context data structure that references at least the particular term. The method further includes, transmitting, by the first modality recognizer, the input context data structure to a second modality recognizer for use in updating a second modality recognition model associated with the second modality recognizer.

Type: Application

Filed: November 22, 2023

Publication date: March 14, 2024

Inventors: Yu Ouyang, Diego Melendo Casado, Mohammadinamul Hasan Sheik, Francoise Beaufays, Dragan Zivkovic, Meltem Oktem
Compression of Data that Exhibits Mixed Compressibility

Publication number: 20240080038

Abstract: Systems and methods for compression of data that exhibits mixed compressibility, such as floating-point data, are provided. As one example, aspects of the present disclosure can be used to compress floating-point data that represents the values of parameters of a machine-learned model. Therefore, aspects of the present disclosure can be used to compress machine-learned models (e.g., for reducing storage requirements associated with the model, reducing the bandwidth expended to transmit the model, etc.).

Type: Application

Filed: October 27, 2023

Publication date: March 7, 2024

Inventors: Giovanni Motta, Francoise Beaufays, Petr Zadrazil
USING CORRECTIONS, OF PREDICTED TEXTUAL SEGMENTS OF SPOKEN UTTERANCES, FOR TRAINING OF ON-DEVICE SPEECH RECOGNITION MODEL

Publication number: 20240029711

Abstract: Processor(s) of a client device can: receive audio data that captures a spoken utterance of a user of the client device; process, using an on-device speech recognition model, the audio data to generate a predicted textual segment that is a prediction of the spoken utterance; cause at least part of the predicted textual segment to be rendered (e.g., visually and/or audibly); receive further user interface input that is a correction of the predicted textual segment to an alternate textual segment; and generate a gradient based on comparing at least part of the predicted output to ground truth output that corresponds to the alternate textual segment. The gradient is used, by processor(s) of the client device, to update weights of the on-device speech recognition model and/or is transmitted to a remote system for use in remote updating of global weights of a global speech recognition model.

Type: Application

Filed: October 5, 2023

Publication date: January 25, 2024

Inventors: Françoise Beaufays, Johan Schalkwyk, Giovanni Motta
Compression of data that exhibits mixed compressibility

Patent number: 11843397

Abstract: Systems and methods for compression of data that exhibits mixed compressibility, such as floating-point data, are provided. As one example, aspects of the present disclosure can be used to compress floating-point data that represents the values of parameters of a machine-learned model. Therefore, aspects of the present disclosure can be used to compress machine-learned models (e.g., for reducing storage requirements associated with the model, reducing the bandwidth expended to transmit the model, etc.).

Type: Grant

Filed: September 9, 2019

Date of Patent: December 12, 2023

Assignee: GOOGLE LLC

Inventors: Giovanni Motta, Francoise Beaufays, Petr Zadrazil
Modality learning on mobile devices

Patent number: 11842045

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for cross input modality learning in a mobile device are disclosed. In one aspect, a method includes activating a first modality user input mode in which user inputs by way of a first modality are recognized using a first modality recognizer; and receiving a user input by way of the first modality. The method includes, obtaining, as a result of the first modality recognizer recognizing the user input, a transcription that includes a particular term; and generating an input context data structure that references at least the particular term. The method further includes, transmitting, by the first modality recognizer, the input context data structure to a second modality recognizer for use in updating a second modality recognition model associated with the second modality recognizer.

Type: Grant

Filed: August 31, 2022

Date of Patent: December 12, 2023

Assignee: Google LLC

Inventors: Yu Ouyang, Diego Melendo Casado, Mohammadinamul Hasan Sheik, Francoise Beaufays, Dragan Zivkovic, Meltem Oktem
Using corrections, of predicted textual segments of spoken utterances, for training of on-device speech recognition model

Patent number: 11817080

Abstract: Processor(s) of a client device can: receive audio data that captures a spoken utterance of a user of the client device; process, using an on-device speech recognition model, the audio data to generate a predicted textual segment that is a prediction of the spoken utterance; cause at least part of the predicted textual segment to be rendered (e.g., visually and/or audibly); receive further user interface input that is a correction of the predicted textual segment to an alternate textual segment; and generate a gradient based on comparing at least part of the predicted output to ground truth output that corresponds to the alternate textual segment. The gradient is used, by processor(s) of the client device, to update weights of the on-device speech recognition model and/or is transmitted to a remote system for use in remote updating of global weights of a global speech recognition model.

Type: Grant

Filed: October 11, 2019

Date of Patent: November 14, 2023

Assignee: GOOGLE LLC

Inventors: Françoise Beaufays, Johan Schalkwyk, Giovanni Motta
USING CORRECTIONS, OF AUTOMATED ASSISTANT FUNCTIONS, FOR TRAINING OF ON-DEVICE MACHINE LEARNING MODELS

Publication number: 20230352019

Abstract: Processor(s) of a client device can: receive sensor data that captures environmental attributes of an environment of the client device; process the sensor data using a machine learning model to generate a predicted output that dictates whether one or more currently dormant automated assistant functions are activated; making a decision as to whether to trigger the one or more currently dormant automated assistant functions; subsequent to making the decision, determining that the decision was incorrect; and in response to determining that the determination was incorrect, generating a gradient based on comparing the predicted output to ground truth output. In some implementations, the generated gradient is used, by processor(s) of the client device, to update weights of the on-device speech recognition model. In some implementations, the generated gradient is additionally or alternatively transmitted to a remote system for use in remote updating of global weights of a global speech recognition model.

Type: Application

Filed: July 6, 2023

Publication date: November 2, 2023

Inventors: Françoise Beaufays, Rajiv Mathews, Dragan Zivkovic, Kurt Partridge, Andrew Hard
MIXED CLIENT-SERVER FEDERATED LEARNING OF MACHINE LEARNING MODEL(S)

Publication number: 20230352004

Abstract: Implementations disclosed herein are directed to federated learning of machine learning (“ML”) model(s) based on gradient(s) generated at corresponding client devices and a remote system. Processor(s) of the corresponding client devices can process client data generated locally at the corresponding client devices using corresponding on-device ML model(s) to generate corresponding predicted outputs, generate corresponding client gradients based on the corresponding predicted outputs, and transmit the corresponding client gradients to the remote system. Processor(s) of the remote system can process remote data obtained from remote database(s) using global ML model(s) to generate additional corresponding predicted outputs, generate corresponding remote gradients based on the additional corresponding predicted outputs. Further, the remote system can utilize the corresponding client gradients and the corresponding remote gradients to update the global ML model(s) or weights thereof.

Type: Application

Filed: July 5, 2023

Publication date: November 2, 2023

Inventors: Françoise Beaufays, Andrew Hard, Swaroop Indra Ramaswamy, Om Dipakbhai Thakkar, Rajiv Mathews
GENERATING AND/OR UTILIZING UNINTENTIONAL MEMORIZATION MEASURE(S) FOR AUTOMATIC SPEECH RECOGNITION MODEL(S)

Publication number: 20230317082

Abstract: An unintentional memorization measure can be used to determine whether an automatic speech recognition (ASR) model has unintentionally memorized one or more phrases during training of the ASR model. Various implementations include generating one or more candidate transcripts based on the vocabulary of the ASR model. For example, the system can generate a candidate transcript by appending a token of the vocabulary to a previous candidate transcript. Various implementations include processing the candidate transcript using a speech synthesis model to generate synthesized speech audio data that includes synthesized speech of the candidate transcript. Additionally or alternatively, the synthesized speech audio data can be processed using the ASR model to generate ASR output. Various implementations can include generating a loss based on comparing the ASR output and the candidate transcript.

Type: Application

Filed: March 31, 2022

Publication date: October 5, 2023

Inventors: Om Dipakbhai Thakkar, Hakim Sidahmed, W. Ronny Huang, Rajiv Mathews, Françoise Beaufays, Florian Tramèr
ON-DEVICE SPEECH SYNTHESIS OF TEXTUAL SEGMENTS FOR TRAINING OF ON-DEVICE SPEECH RECOGNITION MODEL

Publication number: 20230306955

Abstract: Processor(s) of a client device can: identify a textual segment stored locally at the client device; process the textual segment, using a speech synthesis model stored locally at the client device, to generate synthesized speech audio data that includes synthesized speech of the identified textual segment; process the synthesized speech, using an on-device speech recognition model that is stored locally at the client device, to generate predicted output; and generate a gradient based on comparing the predicted output to ground truth output that corresponds to the textual segment. In some implementations, the generated gradient is used, by processor(s) of the client device, to update weights of the on-device speech recognition model. In some implementations, the generated gradient is additionally or alternatively transmitted to a remote system for use in remote updating of global weights of a global speech recognition model.

Type: Application

Filed: May 31, 2023

Publication date: September 28, 2023

Inventors: Françoise Beaufays, Johan Schalkwyk, Khe Chai Sim
Structured Video Documents

Publication number: 20230281248

Abstract: A method includes receiving a content feed that includes audio data corresponding to speech utterances and processing the content feed to generate a semantically-rich, structured document. The structured document includes a transcription of the speech utterances and includes a plurality of words each aligned with a corresponding audio segment of the audio data that indicates a time when the word was recognized in the audio data. During playback of the content feed, the method also includes receiving a query from a user requesting information contained in the content feed and processing, by a large language model, the query and the structured document to generate a response to the query. The response conveys the requested information contained in the content feed. The method also includes providing, for output from a user device associated with the user, the response to the query.

Type: Application

Filed: March 2, 2023

Publication date: September 7, 2023

Applicant: Google LLC

Inventors: Johan SCHALKWYK, Francoise BEAUFAYS
Mixed client-server federated learning of machine learning model(s)

Patent number: 11749261

Abstract: Implementations disclosed herein are directed to federated learning of machine learning (“ML”) model(s) based on gradient(s) generated at corresponding client devices and a remote system. Processor(s) of the corresponding client devices can process client data generated locally at the corresponding client devices using corresponding on-device ML model(s) to generate corresponding predicted outputs, generate corresponding client gradients based on the corresponding predicted outputs, and transmit the corresponding client gradients to the remote system. Processor(s) of the remote system can process remote data obtained from remote database(s) using global ML model(s) to generate additional corresponding predicted outputs, generate corresponding remote gradients based on the additional corresponding predicted outputs. Further, the remote system can utilize the corresponding client gradients and the corresponding remote gradients to update the global ML model(s) or weights thereof.

Type: Grant

Filed: March 10, 2021

Date of Patent: September 5, 2023

Assignee: GOOGLE LLC

Inventors: Françoise Beaufays, Andrew Hard, Swaroop Indra Ramaswamy, Om Dipakbhai Thakkar, Rajiv Mathews
Using corrections, of automated assistant functions, for training of on-device machine learning models

Patent number: 11741953

Abstract: Processor(s) of a client device can: receive sensor data that captures environmental attributes of an environment of the client device; process the sensor data using a machine learning model to generate a predicted output that dictates whether one or more currently dormant automated assistant functions are activated; making a decision as to whether to trigger the one or more currently dormant automated assistant functions; subsequent to making the decision, determining that the decision was incorrect; and in response to determining that the determination was incorrect, generating a gradient based on comparing the predicted output to ground truth output. In some implementations, the generated gradient is used, by processor(s) of the client device, to update weights of the on-device speech recognition model. In some implementations, the generated gradient is additionally or alternatively transmitted to a remote system for use in remote updating of global weights of a global speech recognition model.

Type: Grant

Filed: November 8, 2019

Date of Patent: August 29, 2023

Assignee: GOOGLE LLC

Inventors: Françoise Beaufays, Rajiv Mathews, Dragan Zivkovic, Kurt Partridge, Andrew Hard
On-device speech synthesis of textual segments for training of on-device speech recognition model

Patent number: 11705106

Abstract: Processor(s) of a client device can: identify a textual segment stored locally at the client device; process the textual segment, using a speech synthesis model stored locally at the client device, to generate synthesized speech audio data that includes synthesized speech of the identified textual segment; process the synthesized speech, using an on-device speech recognition model that is stored locally at the client device, to generate predicted output; and generate a gradient based on comparing the predicted output to ground truth output that corresponds to the textual segment. In some implementations, the generated gradient is used, by processor(s) of the client device, to update weights of the on-device speech recognition model. In some implementations, the generated gradient is additionally or alternatively transmitted to a remote system for use in remote updating of global weights of a global speech recognition model.

Type: Grant

Filed: September 20, 2021

Date of Patent: July 18, 2023

Assignee: GOOGLE LLC

Inventors: Françoise Beaufays, Johan Schalkwyk, Khe Chai Sim
Phrase Extraction for ASR Models

Publication number: 20230178094

Abstract: A method of phrase extraction for ASR models includes obtaining audio data characterizing an utterance and a corresponding ground-truth transcription of the utterance and modifying the audio data to obfuscate a particular phrase recited in the utterance. The method also includes processing, using a trained ASR model, the modified audio data to generate a predicted transcription of the utterance, and determining whether the predicted transcription includes the particular phrase by comparing the predicted transcription of the utterance to the ground-truth transcription of the utterance. When the predicted transcription includes the particular phrase, the method includes generating an output indicating that the trained ASR model leaked the particular phrase from a training data set used to train the ASR model.

Type: Application

Filed: December 13, 2021

Publication date: June 8, 2023

Applicant: Google LLC

Inventors: Ehsan Amid, Om Thakkar, Rajiv Mathews, Francoise Beaufays
METHOD(S) AND SYSTEM(S) FOR IMPROVED EFFICIENCY IN FEDERATED LEARNING OF MACHINE LEARNING MODEL(S)

Publication number: 20230177382

Abstract: Implementations disclosed herein are directed to efficient federated learning of machine learning (ML) model(s) at a remote system (e.g., remote server(s)) based on update(s) generated at client device(s). Processor(s) of the client device(s) can receive client data, process, using on-device ML model(s), the client data to generate predicted output(s), generate, using unsupervised learning, gradient(s) based on the predicted output(s), generate, based on the gradient(s), the update(s) for disparate portions of the on-device ML model(s) and/or global ML model(s) that are remote-based counterparts of the on-device ML model(s). Further, processor(s) of the remote system can receive, from the client device(s), the update(s) for the disparate portions of the on-device ML model(s), and cause the global ML model(s) to be updated based on the update(s) for the disparate portions of the on-device ML model(s) received from disparate client device(s).

Type: Application

Filed: December 2, 2021

Publication date: June 8, 2023

Inventors: Françoise Beaufays, Giovanni Motta, Khe Chai Sim
EPHEMERAL LEARNING OF MACHINE LEARNING MODEL(S)

Publication number: 20230156248

Abstract: Implementations disclosed herein are directed to ephemeral learning of machine learning (“ML”) model(s) based on gradient(s) generated at a remote system (e.g., remote server(s)). Processor(s) of the remote system can receive stream(s) of audio data capturing spoken utterance(s) from a client device of a user. A fulfillment pipeline can process the stream(s) of audio data to cause certain fulfillment(s) of the spoken utterance(s) to be performed. Meanwhile, a training pipeline can process the stream(s) of audio data to generate gradient(s) using unsupervised learning techniques. Subsequent to the processing by the fulfillment pipeline and/or the training pipeline, the stream(s) of audio data are discarded by the remote system. Accordingly, the ML model(s) can be trained at the remote system without storing or logging of the stream(s) of audio data by non-transient memory thereof, thereby providing more efficient training mechanisms for training the ML model(s) and also increasing security of user data.

Type: Application

Filed: November 23, 2021

Publication date: May 18, 2023

Inventors: Françoise Beaufays, Khe Chai Sim, Trevor Strohman, Oren Litvin
Exploring Heterogeneous Characteristics of Layers In ASR Models For More Efficient Training

Publication number: 20230107475

Abstract: A computer-implemented method includes obtaining a multi-domain (MD) dataset and training a neural network model using the MD dataset with short-form data withheld (MD-SF). The neural network model includes a plurality of layer each having a plurality of parameters. The method also includes resetting each respective layer in the trained neural network one at a time. For each respective layer in the trained neural network model, and after resetting the respective layer, the method also includes determining a corresponding word error rate of the trained neural network model and identifying the respective layer as corresponding to an ambient layer when the corresponding word error rate satisfies a word error rate threshold. The method also includes transmitting an on-device neural network model to execute on one or more client devices for generating gradients based on the withheld domain (SF) of the MD dataset.

Type: Application

Filed: October 4, 2022

Publication date: April 6, 2023

Applicant: Google LLC

Inventors: Dhruv Guliani, Lillian Zhou, Andreas Kebel, Giovanni Motta, Francoise Beaufays

1 2 3 4 5 … next