Patents by Inventor Yashesh GAUR

Yashesh GAUR has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Systems and methods for robust speech recognition using generative adversarial networks

Patent number: 10971142

Abstract: Described herein are systems and methods for a general, scalable, end-to-end framework that uses a generative adversarial network (GAN) objective to enable robust speech recognition. Encoders trained with the proposed approach enjoy improved invariance by learning to map noisy audio to the same embedding space as that of clean audio. Embodiments of a Wasserstein GAN framework increase the robustness of seq-to-seq models in a scalable, end-to-end fashion. In one or more embodiments, an encoder component is treated as the generator of GAN and is trained to produce indistinguishable embeddings between labeled and unlabeled audio samples. This new robust training approach can learn to induce robustness without alignment or complicated inference pipeline and even where augmentation of audio data is not possible.

Type: Grant

Filed: October 8, 2018

Date of Patent: April 6, 2021

Assignee: Baidu USA LLC

Inventors: Anuroop Sriram, Hee Woo Jun, Yashesh Gaur, Sanjeev Satheesh
SPEAKER ADAPTATION FOR ATTENTION-BASED ENCODER-DECODER

Publication number: 20210065683

Abstract: Embodiments are associated with a speaker-independent attention-based encoder-decoder model to classify output tokens based on input speech frames, the speaker-independent attention-based encoder-decoder model associated with a first output distribution, a speaker-dependent attention-based encoder-decoder model to classify output tokens based on input speech frames, the speaker-dependent attention-based encoder-decoder model associated with a second output distribution, training of the second attention-based encoder-decoder model to classify output tokens based on input speech frames of a target speaker and simultaneously training the speaker-dependent attention-based encoder-decoder model to maintain a similarity between the first output distribution and the second output distribution, and performing automatic speech recognition on speech frames of the target speaker using the trained speaker-dependent attention-based encoder-decoder model.

Type: Application

Filed: November 6, 2019

Publication date: March 4, 2021

Inventors: Zhong MENG, Yashesh GAUR, Jinyu LI, Yifan GONG
Systems and methods for principled bias reduction in production speech models

Patent number: 10657955

Abstract: Described herein are systems and methods to identify and address sources of bias in an end-to-end speech model. In one or more embodiments, the end-to-end model may be a recurrent neural network with two 2D-convolutional input layers, followed by multiple bidirectional recurrent layers and one fully connected layer before a softmax layer. In one or more embodiments, the network is trained end-to-end using the CTC loss function to directly predict sequences of characters from log spectrograms of audio. With optimized recurrent layers and training together with alignment information, some unwanted bias induced by using purely forward only recurrences may be removed in a deployed model.

Type: Grant

Filed: January 30, 2018

Date of Patent: May 19, 2020

Assignee: Baidu USA LLC

Inventors: Eric Battenberg, Rewon Child, Adam Coates, Christopher Fougner, Yashesh Gaur, Jiaji Huang, Heewoo Jun, Ajay Kannan, Markus Kliegl, Atul Kumar, Hairong Liu, Vinay Rao, Sanjeev Satheesh, David Seetapun, Anuroop Sriram, Zhenyao Zhu
SYSTEMS AND METHODS FOR ROBUST SPEECH RECOGNITION USING GENERATIVE ADVERSARIAL NETWORKS

Publication number: 20190130903

Abstract: Described herein are systems and methods for a general, scalable, end-to-end framework that uses a generative adversarial network (GAN) objective to enable robust speech recognition. Encoders trained with the proposed approach enjoy improved invariance by learning to map noisy audio to the same embedding space as that of clean audio. Embodiments of a Wasserstein GAN framework increase the robustness of seq-to-seq models in a scalable, end-to-end fashion. In one or more embodiments, an encoder component is treated as the generator of GAN and is trained to produce indistinguishable embeddings between labeled and unlabeled audio samples. This new robust training approach can learn to induce robustness without alignment or complicated inference pipeline and even where augmentation of audio data is not possible.

Type: Application

Filed: October 8, 2018

Publication date: May 2, 2019

Applicant: Baidu USA LLC

Inventors: Anuroop SRIRAM, Hee Woo JUN, Yashesh GAUR, Sanjeev SATHEESH
SYSTEMS AND METHODS FOR PRINCIPLED BIAS REDUCTION IN PRODUCTION SPEECH MODELS

Publication number: 20180247643

Abstract: Described herein are systems and methods to identify and address sources of bias in an end-to-end speech model. In one or more embodiments, the end-to-end model may be a recurrent neural network with two 2D-convolutional input layers, followed by multiple bidirectional recurrent layers and one fully connected layer before a softmax layer. In one or more embodiments, the network is trained end-to-end using the CTC loss function to directly predict sequences of characters from log spectrograms of audio. With optimized recurrent layers and training together with alignment information, some unwanted bias induced by using purely forward only recurrences may be removed in a deployed model.

Type: Application

Filed: January 30, 2018

Publication date: August 30, 2018

Applicant: Baidu USA LLC

Inventors: Eric BATTENBERG, Rewon CHILD, Adam COATES, Christopher FOUGNER, Yashesh GAUR, Jiaji HUANG, Heewoo JUN, Ajay KANNAN, Markus KLIEGL, Atul KUMAR, Hairong LIU, Vinay RAO, Sanjeev SATHEESH, David SEETAPUN, Anuroop SRIRAM, Zhenyao ZHU

prev 1 2

Systems and methods for robust speech recognition using generative adversarial networks

SPEAKER ADAPTATION FOR ATTENTION-BASED ENCODER-DECODER

Systems and methods for principled bias reduction in production speech models

SYSTEMS AND METHODS FOR ROBUST SPEECH RECOGNITION USING GENERATIVE ADVERSARIAL NETWORKS

SYSTEMS AND METHODS FOR PRINCIPLED BIAS REDUCTION IN PRODUCTION SPEECH MODELS