Patents by Inventor Shaan Jagdeep Patrick Bijwadia

Shaan Jagdeep Patrick Bijwadia has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

CONNECTING DIFFERENT ASR APPLICATION DOMAINS WITH SPEAKER-TAGS

Publication number: 20240304181

Abstract: A method includes receiving a plurality of training samples spanning multiple different domains. Each corresponding training sample includes audio data characterizing an utterance paired with a corresponding transcription of the utterance. The method also includes re-labeling each corresponding training sample of the plurality of training samples by annotating the corresponding transcription of the utterance with one or more speaker tags. Each speaker tag indicates a respective segment of the transcription for speech that was spoken by a particular type of speaker. The method also includes training a multi-domain speech recognition model on the re-labeled training samples to teach the multi-domain speech recognition model to learn to share parameters for recognizing speech across each of the different multiple different domains.

Type: Application

Filed: March 7, 2024

Publication date: September 12, 2024

Applicant: Google LLC

Inventors: Guru Prakash Arumugam, Shuo-yiin Chang, Shaan Jagdeep Patrick Bijwadia, Weiran Wang, Quan Wang, Rohit Prakash Prabhavalkar, Tara N. Sainath
Text Injection For Training Auxiliary Tasks In Speech Recognition Models

Publication number: 20240296840

Abstract: A joint auxiliary task and ASR model includes an encoder to receive a sequence of acoustic frames and generate, at each of a plurality of output steps, a higher-order feature representation for a corresponding acoustic frame. The model also includes a multi-output HAT decoder to generate at each of the plurality of output steps a probability distribution over possible speech recognition hypotheses, and an indication of whether the output step corresponds to an auxiliary token associated with a particular auxiliary task. The model is trained by a JEIT training process based on: a paired training data set including paired audio data and transcriptions, the transcriptions annotated with ground-truth auxiliary tokens associated with the particular auxiliary task; and an unpaired training data set including textual utterances not paired with any corresponding audio data, the textual utterances annotated with the ground-truth auxiliary tokens associated with the particular auxiliary task.

Type: Application

Filed: March 1, 2024

Publication date: September 5, 2024

Applicant: Google LLC

Inventors: Shaan Jagdeep Patrick Bijwadia, Shuo-yiin Chang, Tara N. Sainath, Weiran Wang, Zhong Meng
Unified End-To-End Speech Recognition And Endpointing Using A Switch Connection

Publication number: 20240029719

Abstract: A single E2E multitask model includes a speech recognition model and an endpointer model. The speech recognition model includes an audio encoder configured to encode a sequence of audio frames into corresponding higher-order feature representations, and a decoder configured to generate probability distributions over possible speech recognition hypotheses for the sequence of audio frames based on the higher-order feature representations. The endpointer model is configured to operate between a VAD mode and an EOQ detection mode. During the VAD mode, the endpointer model receives input audio frames, and determines, for each input audio frame, whether the input audio frame includes speech. During the EOQ detection mode, the endpointer model receives latent representations for the sequence of audio frames output from the audio encoder, and determines, for each of the latent representation, whether the latent representation includes final silence.

Type: Application

Filed: June 23, 2023

Publication date: January 25, 2024

Applicant: Google LLC

Inventors: Shaan Jagdeep Patrick Bijwadia, Shuo-yiin Chang, Bo Li, Yanzhang He, Tara N. Sainath, Chao Zhang
Predicting Word Boundaries for On-Device Batching of End-To-End Speech Recognition Models

Publication number: 20230107493

Abstract: A method includes receiving a sequence of input audio frames corresponding to an utterance captured by a user device, the utterance including a plurality of words. For each input audio frame, the method includes predicting, using a word boundary detection model configured receive the sequence of input audio frames as input, whether the input audio frame is a word boundary. The method includes batching the input audio frames into a plurality of batches based on the input audio frames predicted as word boundaries, wherein each batch includes a corresponding plurality of batched input audio frames. For each of the plurality of batches, the method includes processing, using a speech recognition model, the corresponding plurality of batched input audio frames in parallel to generate a speech recognition result.

Type: Application

Filed: September 21, 2022

Publication date: April 6, 2023

Applicant: Google LLC

Inventors: Shaan Jagdeep Patrick Bijwadia, Tara N. Sainath, Jiahui Yu, Shuo-yiin Chang, Yangzhang He

CONNECTING DIFFERENT ASR APPLICATION DOMAINS WITH SPEAKER-TAGS

Text Injection For Training Auxiliary Tasks In Speech Recognition Models

Unified End-To-End Speech Recognition And Endpointing Using A Switch Connection

Predicting Word Boundaries for On-Device Batching of End-To-End Speech Recognition Models