Patents by Inventor Yuxin Ding

Yuxin Ding has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

DECENTRALIZED LEARNING OF LARGE MACHINE LEARNING (ML) MODEL(S)

Publication number: 20250078812

Abstract: Implementations described herein are directed to a framework for decentralized learning of large global machine learning (ML) model(s). In various implementations, remote processor(s) of a remote system can identify a global ML model, select client devices to participate in a given round of decentralized learning of the global ML model, and transmit, to each of the client devices, a processed version of the global ML model that is of a reduced transferrable size. Further, client device processor(s) of a client device can receive the processed version of the global ML model, obtain corresponding client data, perform partial model training, based on processing the corresponding client data, for the processed version of the global ML model to generate a corresponding update, and transmit the corresponding update back to the remote system. Moreover, the remote processor(s) can update, based on at least the corresponding update, the global ML model.

Type: Application

Filed: August 5, 2024

Publication date: March 6, 2025

Inventors: Yonghui Xiao, Françoise Beaufays, Yuxin Ding
CHECKPOINT AVERAGING TO MITIGATE AND/OR ELIMINATE CATASTROPHIC FORGETTING OF MACHINE LEARNING MODEL(S) IN DECENTRALIZED LEARNING THEREOF

Publication number: 20240386318

Abstract: Implementations described herein are directed to techniques for mitigating and/or eliminating catastrophic forgetting of a global machine learning (ML) model during decentralized learning thereof. Remote processor(s) of a remote system can initially train a global ML model based on server data that is accessible by the remote system. In subsequent decentralized learning of the global ML model, the remote processor(s) can utilize various checkpoint averaging techniques. As described herein, these various checkpoint averaging techniques can include, but are not limited to, a static checkpoint averaging technique, a dynamic checkpoint averaging techniques, and/or a mixed centralized and decentralized training technique.

Type: Application

Filed: November 2, 2023

Publication date: November 21, 2024

Inventors: Yuxin Ding, Lillian Zhou, Mingqing Chen, Rajiv Mathews, Andrew Hard, Sean Augenstein
Knowledge Distillation with Domain Mismatch For Speech Recognition

Publication number: 20240233707

Abstract: A method includes receiving distillation data including a plurality of out-of-domain training utterances. For each particular out-of-domain training utterance of the distillation data, the method includes generating a corresponding augmented out-of-domain training utterance, and generating, using a teacher ASR model trained on training data corresponding to a target domain, a pseudo-label corresponding to the corresponding augmented out-of-domain training utterance. The method also includes distilling a student ASR model from the teacher ASR model by training the student ASR model using the corresponding augmented out-of-domain training utterances paired with the corresponding pseudo-labels generated by the teacher ASR model.

Type: Application

Filed: October 17, 2023

Publication date: July 11, 2024

Applicant: Google LLC

Inventors: Tien-Ju Yang, You-Chi Cheng, Shankar Kumar, Jared Lichtarge, Ehsan Amid, Yuxin Ding, Rajiv Mathews, Mingqing Chen
FEDERATED KNOWLEDGE DISTILLATION ON AN ENCODER OF A GLOBAL ASR MODEL AND/OR AN ENCODER OF A CLIENT ASR MODEL

Publication number: 20240194192

Abstract: Information can be distilled from a global automatic speech recognition (ASR) model to a client ASR model. Many implementations include using an RNN-T model as the ASR model, where the global ASR model includes a global encoder, a joint network, a prediction network, and where the client ASR model includes a client encoder, the joint network, and the prediction network. Various implementations include using principal component analysis (PCA) while training the global ASR model to learn a mean vector and a set of principal components corresponding to the global ASR model. Additional or alternative implementations include training the client ASR model to generate one or more predicted coefficients of the global ASR model.

Type: Application

Filed: December 9, 2022

Publication date: June 13, 2024

Inventors: Ehsan Amid, Rajiv Mathews, Shankar Kumar, Jared Lichtarge, Mingqing Chen, Tien-Ju Yang, Yuxin Ding
Knowledge Distillation with Domain Mismatch For Speech Recognition

Publication number: 20240135918

Abstract: A method includes receiving distillation data including a plurality of out-of-domain training utterances. For each particular out-of-domain training utterance of the distillation data, the method includes generating a corresponding augmented out-of-domain training utterance, and generating, using a teacher ASR model trained on training data corresponding to a target domain, a pseudo-label corresponding to the corresponding augmented out-of-domain training utterance. The method also includes distilling a student ASR model from the teacher ASR model by training the student ASR model using the corresponding augmented out-of-domain training utterances paired with the corresponding pseudo-labels generated by the teacher ASR model.

Type: Application

Filed: October 16, 2023

Publication date: April 25, 2024

Applicant: Google LLC

Inventors: Tien-Ju Yang, You-Chi Cheng, Shankar Kumar, Jared Lichtarge, Ehsan Amid, Yuxin Ding, Rajiv Mathews, Mingqing Chen

DECENTRALIZED LEARNING OF LARGE MACHINE LEARNING (ML) MODEL(S)

CHECKPOINT AVERAGING TO MITIGATE AND/OR ELIMINATE CATASTROPHIC FORGETTING OF MACHINE LEARNING MODEL(S) IN DECENTRALIZED LEARNING THEREOF

Knowledge Distillation with Domain Mismatch For Speech Recognition

FEDERATED KNOWLEDGE DISTILLATION ON AN ENCODER OF A GLOBAL ASR MODEL AND/OR AN ENCODER OF A CLIENT ASR MODEL

Knowledge Distillation with Domain Mismatch For Speech Recognition