Patents by Inventor Rajiv Mathews
Rajiv Mathews has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20240296843Abstract: Processor(s) of a client device can: receive sensor data that captures environmental attributes of an environment of the client device; process the sensor data using a machine learning model to generate a predicted output that dictates whether one or more currently dormant automated assistant functions are activated; making a decision as to whether to trigger the one or more currently dormant automated assistant functions; subsequent to making the decision, determining that the decision was incorrect; and in response to determining that the determination was incorrect, generating a gradient based on comparing the predicted output to ground truth output. In some implementations, the generated gradient is used, by processor(s) of the client device, to update weights of the on-device speech recognition model. In some implementations, the generated gradient is additionally or alternatively transmitted to a remote system for use in remote updating of global weights of a global speech recognition model.Type: ApplicationFiled: May 7, 2024Publication date: September 5, 2024Inventors: Françoise Beaufays, Rajiv Mathews, Dragan Zivkovic, Kurt Partridge, Andrew Hard
-
Publication number: 20240265269Abstract: Implementations disclosed herein are directed to techniques for enabling decentralized learning of global language models (LMs). Remote processor(s) of a remote system can obtain a global LM that includes a global embedding matrix, generate a global embedding mask for the global embedding matrix using a masking technique, apply the global embedding mask to global embedding matrix to generate a sparsified global LM that includes a masked global embedding matrix that is a masked version of the global embedding matrix, transmit the sparsified global LM to computing device(s) that are participating in a given round of decentralized learning for the global language model, receive corresponding updates from the computing device(s), and cause the global LM to be updated based on the corresponding updates. By generating the global embedding mask and applying it to the global embedding matrix, the transferable size of the global LM is reduced thereby enabling decentralized learning thereof.Type: ApplicationFiled: March 23, 2023Publication date: August 8, 2024Inventors: Mingqing Chen, Lara McConnaughey, Kaan Ege Özgün, Rajiv Mathews, Françoise Beaufays
-
Publication number: 20240249193Abstract: Generally, the present disclosure is directed to enhanced federated learning (FL) that employs a set of clients with varying amounts of computational resources (e.g., system memory, storage, and processing bandwidth). To overcome limitations of conventional FL methods that employ a set of clients with varying amounts of computational resources, the embodiments run multi-directional knowledge distillation between the server models produced by each federated averaging (FedAvg) pool, using unlabeled server data as the distillation dataset. By co-distilling the two (or more) models frequently over the course of FedAvg rounds, information is shared between the pools without sharing model parameters. This leads to increased performance and faster convergence (in fewer federated rounds).Type: ApplicationFiled: January 19, 2024Publication date: July 25, 2024Inventors: Jared Alexander Lichtarge, Rajiv Mathews, Rohan Anil, Ehsan Amid, Shankar Kumar
-
Publication number: 20240233707Abstract: A method includes receiving distillation data including a plurality of out-of-domain training utterances. For each particular out-of-domain training utterance of the distillation data, the method includes generating a corresponding augmented out-of-domain training utterance, and generating, using a teacher ASR model trained on training data corresponding to a target domain, a pseudo-label corresponding to the corresponding augmented out-of-domain training utterance. The method also includes distilling a student ASR model from the teacher ASR model by training the student ASR model using the corresponding augmented out-of-domain training utterances paired with the corresponding pseudo-labels generated by the teacher ASR model.Type: ApplicationFiled: October 17, 2023Publication date: July 11, 2024Applicant: Google LLCInventors: Tien-Ju Yang, You-Chi Cheng, Shankar Kumar, Jared Lichtarge, Ehsan Amid, Yuxin Ding, Rajiv Mathews, Mingqing Chen
-
Publication number: 20240221772Abstract: A method of phrase extraction for ASR models includes obtaining audio data characterizing an utterance and a corresponding ground-truth transcription of the utterance and modifying the audio data to obfuscate a particular phrase recited in the utterance. The method also includes processing, using a trained ASR model, the modified audio data to generate a predicted transcription of the utterance, and determining whether the predicted transcription includes the particular phrase by comparing the predicted transcription of the utterance to the ground-truth transcription of the utterance. When the predicted transcription includes the particular phrase, the method includes generating an output indicating that the trained ASR model leaked the particular phrase from a training data set used to train the ASR model.Type: ApplicationFiled: March 19, 2024Publication date: July 4, 2024Applicant: Google LLCInventors: Ehsan Amid, Om Dipakbhai Thakkar, Rajiv Mathews, Francoise Beaufays
-
Patent number: 12014739Abstract: Processor(s) of a client device can: receive sensor data that captures environmental attributes of an environment of the client device; process the sensor data using a machine learning model to generate a predicted output that dictates whether one or more currently dormant automated assistant functions are activated; making a decision as to whether to trigger the one or more currently dormant automated assistant functions; subsequent to making the decision, determining that the decision was incorrect; and in response to determining that the determination was incorrect, generating a gradient based on comparing the predicted output to ground truth output. In some implementations, the generated gradient is used, by processor(s) of the client device, to update weights of the on-device speech recognition model. In some implementations, the generated gradient is additionally or alternatively transmitted to a remote system for use in remote updating of global weights of a global speech recognition model.Type: GrantFiled: July 6, 2023Date of Patent: June 18, 2024Assignee: GOOGLE LLCInventors: Françoise Beaufays, Rajiv Mathews, Dragan Zivkovic, Kurt Partridge, Andrew Hard
-
Publication number: 20240194192Abstract: Information can be distilled from a global automatic speech recognition (ASR) model to a client ASR model. Many implementations include using an RNN-T model as the ASR model, where the global ASR model includes a global encoder, a joint network, a prediction network, and where the client ASR model includes a client encoder, the joint network, and the prediction network. Various implementations include using principal component analysis (PCA) while training the global ASR model to learn a mean vector and a set of principal components corresponding to the global ASR model. Additional or alternative implementations include training the client ASR model to generate one or more predicted coefficients of the global ASR model.Type: ApplicationFiled: December 9, 2022Publication date: June 13, 2024Inventors: Ehsan Amid, Rajiv Mathews, Shankar Kumar, Jared Lichtarge, Mingqing Chen, Tien-Ju Yang, Yuxin Ding
-
Publication number: 20240135918Abstract: A method includes receiving distillation data including a plurality of out-of-domain training utterances. For each particular out-of-domain training utterance of the distillation data, the method includes generating a corresponding augmented out-of-domain training utterance, and generating, using a teacher ASR model trained on training data corresponding to a target domain, a pseudo-label corresponding to the corresponding augmented out-of-domain training utterance. The method also includes distilling a student ASR model from the teacher ASR model by training the student ASR model using the corresponding augmented out-of-domain training utterances paired with the corresponding pseudo-labels generated by the teacher ASR model.Type: ApplicationFiled: October 16, 2023Publication date: April 25, 2024Applicant: Google LLCInventors: Tien-Ju Yang, You-Chi Cheng, Shankar Kumar, Jared Lichtarge, Ehsan Amid, Yuxin Ding, Rajiv Mathews, Mingqing Chen
-
Patent number: 11955134Abstract: A method of phrase extraction for ASR models includes obtaining audio data characterizing an utterance and a corresponding ground-truth transcription of the utterance and modifying the audio data to obfuscate a particular phrase recited in the utterance. The method also includes processing, using a trained ASR model, the modified audio data to generate a predicted transcription of the utterance, and determining whether the predicted transcription includes the particular phrase by comparing the predicted transcription of the utterance to the ground-truth transcription of the utterance. When the predicted transcription includes the particular phrase, the method includes generating an output indicating that the trained ASR model leaked the particular phrase from a training data set used to train the ASR model.Type: GrantFiled: December 13, 2021Date of Patent: April 9, 2024Assignee: Google LLCInventors: Ehsan Amid, Om Thakkar, Rajiv Mathews, Francoise Beaufays
-
Publication number: 20240112673Abstract: Implementations described herein identify and correct automatic speech recognition (ASR) misrecognitions. For example, on-device processor(s) of a client device may generate a predicted textual segment that is predicted to correspond to spoken utterance of a user of the client device, and may receive further input that modifies the predicted textual segment to an alternate textual segment. Further, the on-device processor(s) may store these textual segments in on-device storage as a candidate correction pair, and transmit the candidate correction pair to a remote system. Moreover, remote processor(s) of the remote system may determine that the candidate correction pair is an actual correction pair, and may cause client devices to generate updates for a global ASR model for the candidate correction pair. Additionally, the remote processor(s) may distribute the global ASR model to the client devices and/or additional client devices.Type: ApplicationFiled: October 3, 2022Publication date: April 4, 2024Inventors: Rajiv Mathews, Rohit Prabhavalkar, Giovanni Motta, Mingqing Chen, Lillian Zhou, Dhruv Guliani, Harry Zhang, Trevor Strohman, Françoise Beaufays
-
Publication number: 20240112672Abstract: On-device processor(s) of a client device may store, in on-device storage and in association with a time to live (TTL) in the on-device storage, a correction directed to ASR processing of audio data. The correction may include a portion of a given speech hypothesis that was modified to an alternate speech hypothesis. Further, the on-device processor(s) may cause an on-device ASR model to be personalized based on the correction. Moreover, and based on additional ASR processing of additional audio data, the on-device processor(s) may store, in the on-device storage and in association with an additional TTL in the on-device storage, a pseudo-correction directed to the additional ASR processing. Accordingly, the on-device processor(s) may cause the on-device ASR model to be personalized based on the pseudo-correction to prevent forgetting by the on-device ASR model.Type: ApplicationFiled: October 4, 2022Publication date: April 4, 2024Inventors: Rajiv Mathews, Dragan Zivkovic, Khe Chai Sim
-
Publication number: 20240095582Abstract: During a round of decentralized learning for updating of a global machine learning (ML) model, remote processor(s) of a remote system may transmit, to a population of computing devices, primary weights for a primary version of the global ML model, and cause each of the computing devices to generate a corresponding update for the primary version of the global ML model. Further, the remote processor(s) may cause the primary version of the global ML model to be updated based on the corresponding updates that are received during the round of decentralized learning. However, the remote processor(s) may receive other corresponding updates subsequent to the round of decentralized learning. Accordingly, various techniques described herein (e.g., FARe-DUST, FeAST on MSG, and/or other techniques) enable the other corresponding updates to be utilized in achieving a final version of the global ML model.Type: ApplicationFiled: December 6, 2022Publication date: March 21, 2024Inventors: Andrew Hard, Sean Augenstein, Rohan Anil, Rajiv Mathews, Lara McConnaughey, Ehsan Amid, Antonious Girgis
-
Publication number: 20240095594Abstract: A method includes training a first differentially private (DP) model using a private training set, the private training set including a plurality of training samples, the first DP model satisfying a differential privacy budget, the differential privacy budget defining an amount of information about individual training samples of the private training set that may be revealed by the first DP model. The method also includes, while training the first DP model, generating a plurality of intermediate checkpoints, each intermediate checkpoint of the plurality of intermediate checkpoints representing a different intermediate state of the first DP model, each of the intermediate checkpoints satisfying the same differential privacy budget. The method further includes determining an aggregate of the first DP model and the plurality of intermediate checkpoints, and determining, using the aggregate, a second DP model, the second DP model satisfying the same differential privacy budget.Type: ApplicationFiled: August 31, 2023Publication date: March 21, 2024Applicant: Google LLCInventors: Om Dipakbhai Thakkar, Arun Ganesh, Virat Vishnu Shejwalkar, Abhradeep Guha Thakurta, Rajiv Mathews
-
Publication number: 20240070530Abstract: Implementations disclosed herein are directed to a hybrid federated learning (FL) technique that utilizes both federated averaging (FA) and federated distillation (FD) during a given round of FL of a given global machine learning (ML) model. Implementations may identify a population of client devices to participate in the given round of FL, determine a corresponding quantity of instances of client data available at each of the client devices that may be utilized during the given round of FL, and select different subsets of the client devices based on the corresponding quantity of instances of client data. Further, implementations may cause a first subset of the client devices to generate a corresponding FA update and a second subset of client devices to generate a corresponding FD update. Moreover, implementations may subsequently update the given global ML model based on the corresponding FA updates and the corresponding FD updates.Type: ApplicationFiled: December 5, 2022Publication date: February 29, 2024Inventors: Ehsan Amid, Rajiv Mathews, Rohan Anil, Shankar Kumar, Jared Lichtarge
-
Publication number: 20230359907Abstract: Implementations disclosed herein are directed to various techniques for mitigating and/or preventing catastrophic forgetting in federated learning of global machine learning (ML) models. Implementations may identify a global ML model that is initially trained at a remote server based on a server data set, determine server-based data for global weight(s) of the global ML model, and transmit the global ML model and the server-based data to a plurality of client devices. The server-based data may include, for example, EWC loss term(s), client augmenting gradients, server augmenting gradients, and/or server-based data. Further, the plurality client devices may generate, based on processing corresponding predicted output and using the global ML model, and based on the server-based data, a corresponding client gradient, and transmit the corresponding client gradient to the remote server. Implementations may further generate an updated global ML model based on at least the corresponding client gradients.Type: ApplicationFiled: July 1, 2022Publication date: November 9, 2023Inventors: Sean Augenstein, Andrew Hard, Kurt Partridge, Rajiv Mathews, Lin Ning, Karan Singhal
-
Publication number: 20230352019Abstract: Processor(s) of a client device can: receive sensor data that captures environmental attributes of an environment of the client device; process the sensor data using a machine learning model to generate a predicted output that dictates whether one or more currently dormant automated assistant functions are activated; making a decision as to whether to trigger the one or more currently dormant automated assistant functions; subsequent to making the decision, determining that the decision was incorrect; and in response to determining that the determination was incorrect, generating a gradient based on comparing the predicted output to ground truth output. In some implementations, the generated gradient is used, by processor(s) of the client device, to update weights of the on-device speech recognition model. In some implementations, the generated gradient is additionally or alternatively transmitted to a remote system for use in remote updating of global weights of a global speech recognition model.Type: ApplicationFiled: July 6, 2023Publication date: November 2, 2023Inventors: Françoise Beaufays, Rajiv Mathews, Dragan Zivkovic, Kurt Partridge, Andrew Hard
-
Publication number: 20230351246Abstract: Implementations disclosed herein are directed to utilizing elastic weight consolidation (EWC) loss term(s) in federated learning of global machine learning (ML) models. Implementations may identify a global ML model that initially trained at a remote server based on a server data set, determine the EWC loss term(s) for global weight(s) of the global ML model, and transmit the global ML model and the EWC loss term(s) to a plurality of client devices. The EWC loss term(s) may be determined based on a Fisher information matrix for the server data set. Further, the plurality client devices may generate, based on processing corresponding predicted output and using the global ML model, and based on the EWC loss term(s), a corresponding client gradient, and transmit the corresponding client gradient to the remote server. Implementations may further generate an updated global ML model based on at least the corresponding client gradients.Type: ApplicationFiled: May 2, 2022Publication date: November 2, 2023Inventors: Andrew Hard, Kurt Partridge, Rajiv Mathews, Sean Augenstein
-
Publication number: 20230352004Abstract: Implementations disclosed herein are directed to federated learning of machine learning (“ML”) model(s) based on gradient(s) generated at corresponding client devices and a remote system. Processor(s) of the corresponding client devices can process client data generated locally at the corresponding client devices using corresponding on-device ML model(s) to generate corresponding predicted outputs, generate corresponding client gradients based on the corresponding predicted outputs, and transmit the corresponding client gradients to the remote system. Processor(s) of the remote system can process remote data obtained from remote database(s) using global ML model(s) to generate additional corresponding predicted outputs, generate corresponding remote gradients based on the additional corresponding predicted outputs. Further, the remote system can utilize the corresponding client gradients and the corresponding remote gradients to update the global ML model(s) or weights thereof.Type: ApplicationFiled: July 5, 2023Publication date: November 2, 2023Inventors: Françoise Beaufays, Andrew Hard, Swaroop Indra Ramaswamy, Om Dipakbhai Thakkar, Rajiv Mathews
-
Publication number: 20230335126Abstract: A method includes inserting a set of canary text samples into a corpus of training text samples and training an external language model on the corpus of training text samples and the set of canary text samples inserted into the corpus of training text samples. For each canary text sample, the method also includes generating a corresponding synthetic speech utterance and generating an initial transcription for the corresponding synthetic speech utterance. The method also includes rescoring the initial transcription generated for each corresponding synthetic speech utterance using the external language model. The method also includes determining a word error rate (WER) of the external language model based on the rescored initial transcriptions and the canary text samples and detecting memorization of the canary text samples by the external language model based on the WER of the external language model.Type: ApplicationFiled: April 19, 2023Publication date: October 19, 2023Applicant: Google LLCInventors: Ronny Huang, Steve Chien, Om Thakkar, Rajiv Mathews
-
Publication number: 20230317082Abstract: An unintentional memorization measure can be used to determine whether an automatic speech recognition (ASR) model has unintentionally memorized one or more phrases during training of the ASR model. Various implementations include generating one or more candidate transcripts based on the vocabulary of the ASR model. For example, the system can generate a candidate transcript by appending a token of the vocabulary to a previous candidate transcript. Various implementations include processing the candidate transcript using a speech synthesis model to generate synthesized speech audio data that includes synthesized speech of the candidate transcript. Additionally or alternatively, the synthesized speech audio data can be processed using the ASR model to generate ASR output. Various implementations can include generating a loss based on comparing the ASR output and the candidate transcript.Type: ApplicationFiled: March 31, 2022Publication date: October 5, 2023Inventors: Om Dipakbhai Thakkar, Hakim Sidahmed, W. Ronny Huang, Rajiv Mathews, Françoise Beaufays, Florian Tramèr