Patents by Inventor Rajiv Mathews
Rajiv Mathews has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20250149026Abstract: A method includes obtaining an automatic speech recognition (ASR) model pre-trained on an initial training dataset, creating a set of canary speech utterances, and speeding up each canary speech utterance in the set of canary speech utterances. The operations also include fine-tuning the ASR model on the set of sped-up canary speech utterances and measuring un-intended memorization of the fine-tuned ASR model based on speech recognition results performed by the fine-tuned ASR model on the sped-up canary speech utterances.Type: ApplicationFiled: October 14, 2024Publication date: May 8, 2025Applicant: Google LLCInventors: Lun Wang, Om Dipakbhai Thakkar, Rajiv Mathews
-
Publication number: 20250118293Abstract: A method includes receiving a conversational training dataset including a plurality of conversational training samples, each training sample associated with a corresponding conversation and including: corresponding audio data characterizing a corresponding current utterance spoken by a user during a current turn in the corresponding conversation; a corresponding context for the corresponding current utterance including a transcript of a previous turn in the corresponding conversation that precedes the current turn; a corresponding ground-truth transcription of the corresponding current utterance; and a CoT annotation representing a corresponding logical relationship between the corresponding current utterance and the previous turn.Type: ApplicationFiled: September 20, 2024Publication date: April 10, 2025Applicant: Google LLCInventors: Mingqing Chen, Rajiv Mathews, Andrew Hard, Swaroop Ramaswamy, Kilol Gupta
-
Patent number: 12272360Abstract: Processor(s) of a client device can: receive sensor data that captures environmental attributes of an environment of the client device; process the sensor data using a machine learning model to generate a predicted output that dictates whether one or more currently dormant automated assistant functions are activated; making a decision as to whether to trigger the one or more currently dormant automated assistant functions; subsequent to making the decision, determining that the decision was incorrect; and in response to determining that the determination was incorrect, generating a gradient based on comparing the predicted output to ground truth output. In some implementations, the generated gradient is used, by processor(s) of the client device, to update weights of the on-device speech recognition model. In some implementations, the generated gradient is additionally or alternatively transmitted to a remote system for use in remote updating of global weights of a global speech recognition model.Type: GrantFiled: May 7, 2024Date of Patent: April 8, 2025Assignee: GOOGLE LLCInventors: Françoise Beaufays, Rajiv Mathews, Dragan Zivkovic, Kurt Partridge, Andrew Hard
-
Patent number: 12260875Abstract: A method of phrase extraction for ASR models includes obtaining audio data characterizing an utterance and a corresponding ground-truth transcription of the utterance and modifying the audio data to obfuscate a particular phrase recited in the utterance. The method also includes processing, using a trained ASR model, the modified audio data to generate a predicted transcription of the utterance, and determining whether the predicted transcription includes the particular phrase by comparing the predicted transcription of the utterance to the ground-truth transcription of the utterance. When the predicted transcription includes the particular phrase, the method includes generating an output indicating that the trained ASR model leaked the particular phrase from a training data set used to train the ASR model.Type: GrantFiled: March 19, 2024Date of Patent: March 25, 2025Assignee: Google LLCInventors: Ehsan Amid, Om Dipakbhai Thakkar, Rajiv Mathews, Francoise Beaufays
-
Patent number: 12223952Abstract: On-device processor(s) of a client device may store, in on-device storage and in association with a time to live (TTL) in the on-device storage, a correction directed to ASR processing of audio data. The correction may include a portion of a given speech hypothesis that was modified to an alternate speech hypothesis. Further, the on-device processor(s) may cause an on-device ASR model to be personalized based on the correction. Moreover, and based on additional ASR processing of additional audio data, the on-device processor(s) may store, in the on-device storage and in association with an additional TTL in the on-device storage, a pseudo-correction directed to the additional ASR processing. Accordingly, the on-device processor(s) may cause the on-device ASR model to be personalized based on the pseudo-correction to prevent forgetting by the on-device ASR model.Type: GrantFiled: October 4, 2022Date of Patent: February 11, 2025Assignee: GOOGLE LLCInventors: Rajiv Mathews, Dragan Zivkovic, Khe Chai Sim
-
Publication number: 20250045627Abstract: Processor(s) of a client device can receive global weights of a global ML model from a remote system, obtain a client device data set, determine a Fisher information matrix for the client data set, and transmit the Fisher information matrix for the client data set to the remote system. Further, processor(s) of the remote system can determine a corresponding elastic weight consolidation (EWC) loss term for each of the global weights based on at least the Fisher information matrix, generate a server update for the global ML model based on (i) processing server data remotely at the remote system and using the global ML model and (ii) based on the corresponding EWC loss term for each of the global weights, and update the global weights of the global ML model based on the server update.Type: ApplicationFiled: August 4, 2023Publication date: February 6, 2025Inventors: Andrew Hard, Kurt Partridge, Sean Augenstein, Rajiv Mathews
-
Publication number: 20250037707Abstract: Implementations disclosed herein are directed to federated learning of machine learning (“ML”) model(s) based on gradient(s) generated at corresponding client devices and a remote system. Processor(s) of the corresponding client devices can process client data generated locally at the corresponding client devices using corresponding on-device ML model(s) to generate corresponding predicted outputs, generate corresponding client gradients based on the corresponding predicted outputs, and transmit the corresponding client gradients to the remote system. Processor(s) of the remote system can process remote data obtained from remote database(s) using global ML model(s) to generate additional corresponding predicted outputs, generate corresponding remote gradients based on the additional corresponding predicted outputs. Further, the remote system can utilize the corresponding client gradients and the corresponding remote gradients to update the global ML model(s) or weights thereof.Type: ApplicationFiled: October 16, 2024Publication date: January 30, 2025Inventors: Françoise Beaufays, Andrew Hard, Swaroop Indra Ramaswamy, Om Dipakbhai Thakkar, Rajiv Mathews
-
Patent number: 12205575Abstract: Implementations disclosed herein are directed to federated learning of machine learning (“ML”) model(s) based on gradient(s) generated at corresponding client devices and a remote system. Processor(s) of the corresponding client devices can process client data generated locally at the corresponding client devices using corresponding on-device ML model(s) to generate corresponding predicted outputs, generate corresponding client gradients based on the corresponding predicted outputs, and transmit the corresponding client gradients to the remote system. Processor(s) of the remote system can process remote data obtained from remote database(s) using global ML model(s) to generate additional corresponding predicted outputs, generate corresponding remote gradients based on the additional corresponding predicted outputs. Further, the remote system can utilize the corresponding client gradients and the corresponding remote gradients to update the global ML model(s) or weights thereof.Type: GrantFiled: July 5, 2023Date of Patent: January 21, 2025Assignee: GOOGLE LLCInventors: Françoise Beaufays, Andrew Hard, Swaroop Indra Ramaswamy, Om Dipakbhai Thakkar, Rajiv Mathews
-
Publication number: 20240386318Abstract: Implementations described herein are directed to techniques for mitigating and/or eliminating catastrophic forgetting of a global machine learning (ML) model during decentralized learning thereof. Remote processor(s) of a remote system can initially train a global ML model based on server data that is accessible by the remote system. In subsequent decentralized learning of the global ML model, the remote processor(s) can utilize various checkpoint averaging techniques. As described herein, these various checkpoint averaging techniques can include, but are not limited to, a static checkpoint averaging technique, a dynamic checkpoint averaging techniques, and/or a mixed centralized and decentralized training technique.Type: ApplicationFiled: November 2, 2023Publication date: November 21, 2024Inventors: Yuxin Ding, Lillian Zhou, Mingqing Chen, Rajiv Mathews, Andrew Hard, Sean Augenstein
-
Publication number: 20240371362Abstract: Implementations are directed to efficient federated learning of machine learning (ML) model(s) through on-the-fly decompression and compression of model parameters, of the ML model(s), when facilitating forward propagation and/or back propagation at client device(s). For example, implementations can transmit, from a remote system to a client device, a compressed on-device ML model that includes some compressed parameters. Further, the client device can, in performing forward propagation and/or back propagation using the on-device ML model, decompress those compressed parameters on-the-fly as the parameters are needed for the propagation. The propagation will utilize the decompressed parameters that were decompressed on the fly.Type: ApplicationFiled: May 1, 2024Publication date: November 7, 2024Inventors: Tien-Ju Yang, Yonghui Xiao, Giovanni Motta, Françoise Beaufays, Rajiv Mathews, Mingqing Chen
-
Publication number: 20240330767Abstract: A method includes training a client machine learning (ML) model on client training data at a client device. While training the client ML model, the method also includes obtaining, from a server, server model weights of a server ML model trained on server training data, the server training data different that the client training data. While training the client ML model, the method also includes: transmitting, to the server, client model weights of the client ML model; updating the client ML model using the server model weights; obtaining, from the server, updated server model weights of the server ML model, the updated server model weights updated based on the transmitted client model weights; and further updating the client ML model using the updated server model weights.Type: ApplicationFiled: March 20, 2024Publication date: October 3, 2024Applicant: Google LLCInventors: Andrew Hard, Rajiv Mathews
-
Publication number: 20240330766Abstract: A method includes receiving, from a client device, a client machine learning (ML) model and obtaining a set of training data including a plurality of training samples. The client ML model is trained locally on the client device. For each respective training sample in the plurality of training samples, the method also includes determining, using the respective training sample, a first loss of the client ML model; determining, using the respective training sample, a second loss of a server machine learning (ML) model; and determining a respective score based on the first loss and the second loss. The method also includes selecting, based on each respective score of each respective training sample in the plurality of training samples, a subset of training samples from the plurality of training samples and training the server ML model using the subset of training samples.Type: ApplicationFiled: March 19, 2024Publication date: October 3, 2024Applicant: Google LLCInventors: Andrew Hard, Rajiv Mathews
-
Publication number: 20240296843Abstract: Processor(s) of a client device can: receive sensor data that captures environmental attributes of an environment of the client device; process the sensor data using a machine learning model to generate a predicted output that dictates whether one or more currently dormant automated assistant functions are activated; making a decision as to whether to trigger the one or more currently dormant automated assistant functions; subsequent to making the decision, determining that the decision was incorrect; and in response to determining that the determination was incorrect, generating a gradient based on comparing the predicted output to ground truth output. In some implementations, the generated gradient is used, by processor(s) of the client device, to update weights of the on-device speech recognition model. In some implementations, the generated gradient is additionally or alternatively transmitted to a remote system for use in remote updating of global weights of a global speech recognition model.Type: ApplicationFiled: May 7, 2024Publication date: September 5, 2024Inventors: Françoise Beaufays, Rajiv Mathews, Dragan Zivkovic, Kurt Partridge, Andrew Hard
-
Publication number: 20240265269Abstract: Implementations disclosed herein are directed to techniques for enabling decentralized learning of global language models (LMs). Remote processor(s) of a remote system can obtain a global LM that includes a global embedding matrix, generate a global embedding mask for the global embedding matrix using a masking technique, apply the global embedding mask to global embedding matrix to generate a sparsified global LM that includes a masked global embedding matrix that is a masked version of the global embedding matrix, transmit the sparsified global LM to computing device(s) that are participating in a given round of decentralized learning for the global language model, receive corresponding updates from the computing device(s), and cause the global LM to be updated based on the corresponding updates. By generating the global embedding mask and applying it to the global embedding matrix, the transferable size of the global LM is reduced thereby enabling decentralized learning thereof.Type: ApplicationFiled: March 23, 2023Publication date: August 8, 2024Inventors: Mingqing Chen, Lara McConnaughey, Kaan Ege Özgün, Rajiv Mathews, Françoise Beaufays
-
Publication number: 20240249193Abstract: Generally, the present disclosure is directed to enhanced federated learning (FL) that employs a set of clients with varying amounts of computational resources (e.g., system memory, storage, and processing bandwidth). To overcome limitations of conventional FL methods that employ a set of clients with varying amounts of computational resources, the embodiments run multi-directional knowledge distillation between the server models produced by each federated averaging (FedAvg) pool, using unlabeled server data as the distillation dataset. By co-distilling the two (or more) models frequently over the course of FedAvg rounds, information is shared between the pools without sharing model parameters. This leads to increased performance and faster convergence (in fewer federated rounds).Type: ApplicationFiled: January 19, 2024Publication date: July 25, 2024Inventors: Jared Alexander Lichtarge, Rajiv Mathews, Rohan Anil, Ehsan Amid, Shankar Kumar
-
Publication number: 20240233707Abstract: A method includes receiving distillation data including a plurality of out-of-domain training utterances. For each particular out-of-domain training utterance of the distillation data, the method includes generating a corresponding augmented out-of-domain training utterance, and generating, using a teacher ASR model trained on training data corresponding to a target domain, a pseudo-label corresponding to the corresponding augmented out-of-domain training utterance. The method also includes distilling a student ASR model from the teacher ASR model by training the student ASR model using the corresponding augmented out-of-domain training utterances paired with the corresponding pseudo-labels generated by the teacher ASR model.Type: ApplicationFiled: October 17, 2023Publication date: July 11, 2024Applicant: Google LLCInventors: Tien-Ju Yang, You-Chi Cheng, Shankar Kumar, Jared Lichtarge, Ehsan Amid, Yuxin Ding, Rajiv Mathews, Mingqing Chen
-
Publication number: 20240221772Abstract: A method of phrase extraction for ASR models includes obtaining audio data characterizing an utterance and a corresponding ground-truth transcription of the utterance and modifying the audio data to obfuscate a particular phrase recited in the utterance. The method also includes processing, using a trained ASR model, the modified audio data to generate a predicted transcription of the utterance, and determining whether the predicted transcription includes the particular phrase by comparing the predicted transcription of the utterance to the ground-truth transcription of the utterance. When the predicted transcription includes the particular phrase, the method includes generating an output indicating that the trained ASR model leaked the particular phrase from a training data set used to train the ASR model.Type: ApplicationFiled: March 19, 2024Publication date: July 4, 2024Applicant: Google LLCInventors: Ehsan Amid, Om Dipakbhai Thakkar, Rajiv Mathews, Francoise Beaufays
-
Patent number: 12014739Abstract: Processor(s) of a client device can: receive sensor data that captures environmental attributes of an environment of the client device; process the sensor data using a machine learning model to generate a predicted output that dictates whether one or more currently dormant automated assistant functions are activated; making a decision as to whether to trigger the one or more currently dormant automated assistant functions; subsequent to making the decision, determining that the decision was incorrect; and in response to determining that the determination was incorrect, generating a gradient based on comparing the predicted output to ground truth output. In some implementations, the generated gradient is used, by processor(s) of the client device, to update weights of the on-device speech recognition model. In some implementations, the generated gradient is additionally or alternatively transmitted to a remote system for use in remote updating of global weights of a global speech recognition model.Type: GrantFiled: July 6, 2023Date of Patent: June 18, 2024Assignee: GOOGLE LLCInventors: Françoise Beaufays, Rajiv Mathews, Dragan Zivkovic, Kurt Partridge, Andrew Hard
-
Publication number: 20240194192Abstract: Information can be distilled from a global automatic speech recognition (ASR) model to a client ASR model. Many implementations include using an RNN-T model as the ASR model, where the global ASR model includes a global encoder, a joint network, a prediction network, and where the client ASR model includes a client encoder, the joint network, and the prediction network. Various implementations include using principal component analysis (PCA) while training the global ASR model to learn a mean vector and a set of principal components corresponding to the global ASR model. Additional or alternative implementations include training the client ASR model to generate one or more predicted coefficients of the global ASR model.Type: ApplicationFiled: December 9, 2022Publication date: June 13, 2024Inventors: Ehsan Amid, Rajiv Mathews, Shankar Kumar, Jared Lichtarge, Mingqing Chen, Tien-Ju Yang, Yuxin Ding
-
Publication number: 20240135918Abstract: A method includes receiving distillation data including a plurality of out-of-domain training utterances. For each particular out-of-domain training utterance of the distillation data, the method includes generating a corresponding augmented out-of-domain training utterance, and generating, using a teacher ASR model trained on training data corresponding to a target domain, a pseudo-label corresponding to the corresponding augmented out-of-domain training utterance. The method also includes distilling a student ASR model from the teacher ASR model by training the student ASR model using the corresponding augmented out-of-domain training utterances paired with the corresponding pseudo-labels generated by the teacher ASR model.Type: ApplicationFiled: October 16, 2023Publication date: April 25, 2024Applicant: Google LLCInventors: Tien-Ju Yang, You-Chi Cheng, Shankar Kumar, Jared Lichtarge, Ehsan Amid, Yuxin Ding, Rajiv Mathews, Mingqing Chen