Patents by Inventor Shaojin Ding
Shaojin Ding has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20250078815Abstract: A method includes obtaining a plurality of training samples that each include a respective speech utterance and a respective textual utterance representing a transcription of the respective speech utterance. The method also includes fine-tuning, using quantization and sparsity aware training with native integer operations, a pre-trained automatic speech recognition (ASR) model on the plurality of training samples. Here, the pre-trained ASR model includes a plurality of weights and the fine-tuning includes pruning one or more weights of the plurality of weights using a sparsity mask and quantizing each weight of the plurality of weights based on an integer with a fixed-bit width. The method also includes providing the fine-tuned ASR model to a user device.Type: ApplicationFiled: September 5, 2024Publication date: March 6, 2025Applicant: Google LLCInventors: Shaojin Ding, David Qiu, David Rim, Amir Yazdanbakhsh, Yanzhang He, Zhonglin Han, Rohit Prakash Prabhavalkar, Weiran Wang, Bo Li, Jian Li, Tara N. Sainath, Shivani Agrawal, Oleg Rybakov
-
Publication number: 20240347043Abstract: A method includes obtaining a plurality of training samples, determining a minimum integer fixed-bit width representing a maximum quantization of an automatic speech recognition (ASR) model, and training the ASR model on the plurality of training samples using a quantity of random noise. The ASR model includes a plurality of weights that each include a respective float value. The quantity of random noise is based on the minimum integer fixed-bit value. After training the ASR model, the method also includes selecting a target integer fixed-bit width greater than or equal to the minimum integer fixed-bit width, and for each respective weight of the plurality of weights, quantizing the respective weight from the respective float value to a respective integer associated with a value of the selected target integer fixed-bit width. The operations also include providing the quantized trained ASR model to a user device.Type: ApplicationFiled: April 10, 2024Publication date: October 17, 2024Applicant: Google LLCInventors: David Qiu, David Rim, Shaojin Ding, Yanzhang He
-
Publication number: 20240153495Abstract: A method includes receiving a training dataset that includes one or more spoken training utterances for training an automatic speech recognition (ASR) model. Each spoken training utterance in the training dataset paired with a corresponding transcription and a corresponding target sequence of auxiliary tokens. For each spoken training utterance, the method includes generating a speech recognition hypothesis for a corresponding spoken training utterance, determining a speech recognition loss based on the speech recognition hypothesis and the corresponding transcription, generating a predicted auxiliary token for the corresponding spoken training utterance, and determining an auxiliary task loss based on the predicted auxiliary token and the corresponding target sequence of auxiliary tokens. The method also includes the ASR model jointly on the speech recognition loss and the auxiliary task loss determined for each spoken training utterance.Type: ApplicationFiled: October 26, 2023Publication date: May 9, 2024Applicant: Google LLCInventors: Weiran Wang, Ding Zhao, Shaojin Ding, Hao Zhang, Shuo-yiin Chang, David Johannes Rybach, Tara N. Sainath, Yanzhang He, Ian McGraw, Shankar Kumar
-
Publication number: 20230335107Abstract: Provided herein is a reference-free foreign accent conversion (FAC) computer system and methods for training models, utilizing a library of algorithms, to directly transform utterances from a foreign, non-native speaker (L2) or second language (L2) speaker to have the accent of a native (L1) speaker. The models in the reference-free FAC computer system are a speech-independent acoustic model to extract speaker independent speech embeddings from an L1 speaker utterance and/or the L2 speaker, a speech synthesizer to generate L1 speaker reference-based golden-speaker utterances and a pronunciation correction model to generate a L2 speaker reference-free golden speaker utterances.Type: ApplicationFiled: August 24, 2021Publication date: October 19, 2023Applicant: The Texas A&M UniversityInventors: Guanlong Zhao, Shaojin Ding, Ricardo Gutierrez-Osuna
-
Publication number: 20230326461Abstract: An automated speech recognition (ASR) model includes a first encoder, a first encoder, a second encoder, and a second decoder. The first encoder receives, as input, a sequence of acoustic frames, and generates, at each of a plurality of output steps, a first higher order feature representation for a corresponding acoustic frame in the sequence of acoustic frames. The first decoder receives, as input, the first higher order feature representation generated by the first encoder, and generates a first probability distribution over possible speech recognition hypotheses. The second encoder receives, as input, the first higher order feature representation generated by the first encoder, and generates a second higher order feature representation for a corresponding first higher order feature frame. The second decoder receives, as input, the second higher order feature representation generated by the second encoder, and generates a second probability distribution over possible speech recognition hypotheses.Type: ApplicationFiled: March 13, 2023Publication date: October 12, 2023Applicant: Google LLCInventors: Shaojin Ding, Yangzhang He, Xin Wang, Weiran Wang, Trevor Strohman, Tara N. Sainath, Rohit Parkash Prabhavalkar, Robert David, Rina Panigrahy, Rami Botros, Qiao Liang, Ian Mcgraw, Ding Zhao, Dongseong Hwang
-
Publication number: 20230298569Abstract: A method for training a model includes obtaining a plurality of training samples. Each respective training sample of the plurality of training samples includes a respective speech utterance and a respective textual utterance representing a transcription of the respective speech utterance. The method includes training, using quantization aware training with native integer operations, an automatic speech recognition (ASR) model on the plurality of training samples. The method also includes quantizing the trained ASR model to an integer target fixed-bit width. The quantized trained ASR model includes a plurality of weights. Each weight of the plurality of weights includes an integer with the target fixed-bit width. The method includes providing the quantized trained ASR model to a user device.Type: ApplicationFiled: March 20, 2023Publication date: September 21, 2023Applicant: Google LLCInventors: Shaojin Ding, Oleg Rybakov, Phoenix Meadowlark, Shivani Agrawal, Yanzhang He, Lukasz Lew
-
Publication number: 20230298591Abstract: A computer-implemented method includes receiving a sequence of acoustic frames corresponding to an utterance and generating a reference speaker embedding for the utterance. The method also includes receiving a target speaker embedding for a target speaker and generating feature-wise linear modulation (FiLM) parameters including a scaling vector and a shifting vector based on the target speaker embedding. The method also includes generating an affine transformation output that scales and shifts the reference speaker embedding based on the FiLM parameters. The method also includes generating a classification output indicating whether the utterance was spoken by the target speaker based on the affine transformation output.Type: ApplicationFiled: March 17, 2023Publication date: September 21, 2023Applicant: Google LLCInventors: Shaojin Ding, Rajeev Rikhye, Qiao Liang, Yanzhang He, Quan Wang, Arun Narayanan, Tom O'Malley, Ian McGraw