Patents by Inventor David Qiu

David Qiu has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

SPARSITY MASK LEARNING USING A TOP-K ESTIMATOR

Publication number: 20260134271

Abstract: A method includes obtaining a plurality of training samples each including a corresponding input and a corresponding ground-truth output. The method also includes obtaining a plurality of model weights and a plurality of mask weights of a machine learning (ML) model, determining a sparsity mask based on the plurality of mask weights and generating a plurality of masked model weights by applying the sparsity mask to the model weights. For each training sample, the method also includes processing, using the ML model based on the plurality of masked model weights, the corresponding input to generate a predicted output, and determining a corresponding loss based on the corresponding ground-truth output and the predicted output. The method also include updating, based on the corresponding losses, the ML model by updating the plurality of model weights and the plurality of mask weights.

Type: Application

Filed: October 6, 2025

Publication date: May 14, 2026

Applicant: Google LLC

Inventors: Ganesh Jawahar, David Qiu, Shaojin Ding, Xingyu Cai, Antoine Jean Bruguier, Steven M. Hernandez, Shivani Agrawal, Yanzhang He
Robustness aware norm decay for quantization aware training and generalization

Patent number: 12620388

Abstract: A method includes obtaining a plurality of training samples, determining a minimum integer fixed-bit width representing a maximum quantization of an automatic speech recognition (ASR) model, and training the ASR model on the plurality of training samples using a quantity of random noise. The ASR model includes a plurality of weights that each include a respective float value. The quantity of random noise is based on the minimum integer fixed-bit value. After training the ASR model, the method also includes selecting a target integer fixed-bit width greater than or equal to the minimum integer fixed-bit width, and for each respective weight of the plurality of weights, quantizing the respective weight from the respective float value to a respective integer associated with a value of the selected target integer fixed-bit width. The operations also include providing the quantized trained ASR model to a user device.

Type: Grant

Filed: April 10, 2024

Date of Patent: May 5, 2026

Assignee: Google LLC

Inventors: David Qiu, David Rim, Shaojin Ding, Yanzhang He
Flickering reduction with partial hypothesis re-ranking for streaming ASR

Patent number: 12620390

Abstract: A method includes processing, using a speech recognizer, a first portion of audio data to generate a first lattice, and generating a first partial transcription for an utterance based on the first lattice. The method includes processing, using the recognizer, a second portion of the data to generate, based on the first lattice, a second lattice representing a plurality of partial speech recognition hypotheses for the utterance and a plurality of corresponding speech recognition scores. For each particular partial speech recognition hypothesis, the method includes generating a corresponding re-ranked score based on the corresponding speech recognition score and whether the particular partial speech recognition hypothesis shares a prefix with the first partial transcription.

Type: Grant

Filed: July 13, 2023

Date of Patent: May 5, 2026

Assignee: Google LLC

Inventors: Antoine Jean Bruguier, David Qiu, Yanzhang He, Trevor Strohman
Multi-task learning for end-to-end automated speech recognition confidence and deletion estimation

Patent number: 12456460

Abstract: A method including receiving a speech recognition result corresponding to a transcription of an utterance spoken by a user. For each sub-word unit in a sequence of hypothesized sub-word units of the speech recognition result, using a confidence estimation module to: obtain a respective confidence embedding associated with the corresponding output step when the corresponding sub-word unit was output from the first speech recognizer; generate a confidence feature vector; generate an acoustic context vector; and generate a respective confidence output score for the corresponding sub-word unit based on the confidence feature vector and the acoustic feature vector received as input by the output layer of the confidence estimation module. The method also includes determining, based on the respective confidence output score generated for each sub-word unit in the sequence of hypothesized sub-word units, an utterance-level confidence score for the transcription of the utterance.

Type: Grant

Filed: December 11, 2021

Date of Patent: October 28, 2025

Assignee: Google LLC

Inventors: David Qiu, Yanzhang He, Yu Zhang, Qiujia Li, Liangliang Cao, Ian McGraw
Context-aware neural confidence estimation for rare word speech recognition

Patent number: 12424206

Abstract: An automatic speech recognition (ASR) system that includes an ASR model, a neural associative memory (NAM) biasing model, and a confidence estimation model (CEM). The ASR model includes an audio encoder configured to encode a sequence of audio frames characterizing a spoken utterance into a sequence of higher-order feature representations, and a decoder configured to receive the sequence of higher-order feature representations and output a final speech recognition result. The NAM biasing model is configured to receive biasing contextual information and modify the sequence of higher-order feature representations based on the biasing contextual information to generate, as output, biasing context vectors. The CEM is configured to compute a confidence of the final speech recognition result output by the decoder. The CEM is connected to the biasing context vectors generated by the NAM biasing model.

Type: Grant

Filed: June 23, 2023

Date of Patent: September 23, 2025

Assignee: Google LLC

Inventors: David Qiu, Tsendsuren Munkhdalai, Yanzhang He, Khe Chai Sim
QUANTIZATION AND SPARSITY AWARE FINE-TUNING FOR SPEECH RECOGNITION WITH UNIVERSAL SPEECH MODELS

Publication number: 20250078815

Abstract: A method includes obtaining a plurality of training samples that each include a respective speech utterance and a respective textual utterance representing a transcription of the respective speech utterance. The method also includes fine-tuning, using quantization and sparsity aware training with native integer operations, a pre-trained automatic speech recognition (ASR) model on the plurality of training samples. Here, the pre-trained ASR model includes a plurality of weights and the fine-tuning includes pruning one or more weights of the plurality of weights using a sparsity mask and quantizing each weight of the plurality of weights based on an integer with a fixed-bit width. The method also includes providing the fine-tuned ASR model to a user device.

Type: Application

Filed: September 5, 2024

Publication date: March 6, 2025

Applicant: Google LLC

Inventors: Shaojin Ding, David Qiu, David Rim, Amir Yazdanbakhsh, Yanzhang He, Zhonglin Han, Rohit Prakash Prabhavalkar, Weiran Wang, Bo Li, Jian Li, Tara N. Sainath, Shivani Agrawal, Oleg Rybakov
Robustness Aware Norm Decay for Quantization Aware Training and Generalization

Publication number: 20240347043

Abstract: A method includes obtaining a plurality of training samples, determining a minimum integer fixed-bit width representing a maximum quantization of an automatic speech recognition (ASR) model, and training the ASR model on the plurality of training samples using a quantity of random noise. The ASR model includes a plurality of weights that each include a respective float value. The quantity of random noise is based on the minimum integer fixed-bit value. After training the ASR model, the method also includes selecting a target integer fixed-bit width greater than or equal to the minimum integer fixed-bit width, and for each respective weight of the plurality of weights, quantizing the respective weight from the respective float value to a respective integer associated with a value of the selected target integer fixed-bit width. The operations also include providing the quantized trained ASR model to a user device.

Type: Application

Filed: April 10, 2024

Publication date: October 17, 2024

Applicant: Google LLC

Inventors: David Qiu, David Rim, Shaojin Ding, Yanzhang He
Context-aware Neural Confidence Estimation for Rare Word Speech Recognition

Publication number: 20240029720

Abstract: An automatic speech recognition (ASR) system that includes an ASR model, a neural associative memory (NAM) biasing model, and a confidence estimation model (CEM). The ASR model includes an audio encoder configured to encode a sequence of audio frames characterizing a spoken utterance into a sequence of higher-order feature representations, and a decoder configured to receive the sequence of higher-order feature representations and output a final speech recognition result. The NAM biasing model is configured to receive biasing contextual information and modify the sequence of higher-order feature representations based on the biasing contextual information to generate, as output, biasing context vectors. The CEM is configured to compute a confidence of the final speech recognition result output by the decoder. The CEM is connected to the biasing context vectors generated by the NAM biasing model.

Type: Application

Filed: June 23, 2023

Publication date: January 25, 2024

Inventors: David Qiu, Tsendsuren Munkhdalai, Yangzhang He, Khe Chai Sim
Flickering Reduction with Partial Hypothesis Re-ranking for Streaming ASR

Publication number: 20240029718

Abstract: A method includes processing, using a speech recognizer, a first portion of audio data to generate a first lattice, and generating a first partial transcription for an utterance based on the first lattice. The method includes processing, using the recognizer, a second portion of the data to generate, based on the first lattice, a second lattice representing a plurality of partial speech recognition hypotheses for the utterance and a plurality of corresponding speech recognition scores. For each particular partial speech recognition hypothesis, the method includes generating a corresponding re-ranked score based on the corresponding speech recognition score and whether the particular partial speech recognition hypothesis shares a prefix with the first partial transcription.

Type: Application

Filed: July 13, 2023

Publication date: January 25, 2024

Applicant: Google LLC

Inventors: Antoine Jean Bruguier, David Qiu, Yangzhang He, Trevor Strohman
Learning word-level confidence for subword end-to-end automatic speech recognition

Patent number: 11610586

Abstract: A method includes receiving a speech recognition result, and using a confidence estimation module (CEM), for each sub-word unit in a sequence of hypothesized sub-word units for the speech recognition result: obtaining a respective confidence embedding that represents a set of confidence features; generating, using a first attention mechanism, a confidence feature vector; generating, using a second attention mechanism, an acoustic context vector; and generating, as output from an output layer of the CEM, a respective confidence output score for each corresponding sub-word unit based on the confidence feature vector and the acoustic feature vector received as input by the output layer of the CEM. For each of the one or more words formed by the sequence of hypothesized sub-word units, the method also includes determining a respective word-level confidence score for the word. The method also includes determining an utterance-level confidence score by aggregating the word-level confidence scores.

Type: Grant

Filed: February 23, 2021

Date of Patent: March 21, 2023

Assignee: Google LLC

Inventors: David Qiu, Qiujia Li, Yanzhang He, Yu Zhang, Bo Li, Liangliang Cao, Rohit Prabhavalkar, Deepti Bhatia, Wei Li, Ke Hu, Tara Sainath, Ian Mcgraw
Multi-Task Learning for End-To-End Automated Speech Recognition Confidence and Deletion Estimation

Publication number: 20220310080

Abstract: A method including receiving a speech recognition result corresponding to a transcription of an utterance spoken by a user. For each sub-word unit in a sequence of hypothesized sub-word units of the speech recognition result, using a confidence estimation module to: obtain a respective confidence embedding associated with the corresponding output step when the corresponding sub-word unit was output from the first speech recognizer; generate a confidence feature vector; generate an acoustic context vector; and generate a respective confidence output score for the corresponding sub-word unit based on the confidence feature vector and the acoustic feature vector received as input by the output layer of the confidence estimation module. The method also includes determining, based on the respective confidence output score generated for each sub-word unit in the sequence of hypothesized sub-word units, an utterance-level confidence score for the transcription of the utterance.

Type: Application

Filed: December 11, 2021

Publication date: September 29, 2022

Applicant: Google LLC

Inventors: David Qiu, Yanzhang He, Yu Zhang, Qiujia Li, Liangliang Cao, Ian McGraw
Learning Word-Level Confidence for Subword End-To-End Automatic Speech Recognition

Publication number: 20220270597

Abstract: A method includes receiving a speech recognition result, and using a confidence estimation module (CEM), for each sub-word unit in a sequence of hypothesized sub-word units for the speech recognition result: obtaining a respective confidence embedding that represents a set of confidence features; generating, using a first attention mechanism, a confidence feature vector; generating, using a second attention mechanism, an acoustic context vector; and generating, as output from an output layer of the CEM, a respective confidence output score for each corresponding sub-word unit based on the confidence feature vector and the acoustic feature vector received as input by the output layer of the CEM. For each of the one or more words formed by the sequence of hypothesized sub-word units, the method also includes determining a respective word-level confidence score for the word. The method also includes determining an utterance-level confidence score by aggregating the word-level confidence scores.

Type: Application

Filed: February 23, 2021

Publication date: August 25, 2022

Applicant: Google LLC

Inventors: David Qiu, Qiujia Li, Yanzhang He, Yu Zhang, Bo Li, Liangliang Cao, Rohit Prabhavalkar, Deepti Bhatia, Wei Li, Ke Hu, Tara Sainath, Ian Mcgraw