Patents by Inventor Sanjeev Satheesh

Sanjeev Satheesh has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

User-generated visual guide for the classification of images

Patent number: 11790270

Abstract: A process and a system for creating a visual guide for developing training data for a classification of image, where the training data includes images tagged with labels for the classification of the images. A processor may prompt a user to define a framework for the classification. For an initial set of images within the training data, qualified human classifiers are prompted to locate the images within the framework and to tag the images with labels. The processor determines whether the tagged images have consistent labels, and, if so, the processor adds images to the training data. The processor may add the images by providing a visual guide, the visual guide including tagged images arranged according to their locations within the framework their labels, and prompting human classifiers to tag the additional images with labels for the classification, according to the visual guide.

Type: Grant

Filed: October 13, 2021

Date of Patent: October 17, 2023

Assignee: Landing AI

Inventors: Dongyan Wang, Gopi Prashanth Gopal, Andrew Yan-Tak Ng, Karthikeyan Thiruppathisamy Nathillvar, Rustam Hashimov, Pingyang He, Dillon Anthony Laird, Yiwen Rong, Alejandro Betancourt, Sanjeev Satheesh, Yu Qing Zhou
Cold fusing sequence-to-sequence models with language models

Patent number: 11620986

Abstract: Described herein are systems and methods for generating natural language sentences with Sequence-to-sequence (Seq2Seq) models with attention. The Seq2Seq models may be implemented in applications, such as machine translation, image captioning, and speech recognition. Performance has further been improved by leveraging unlabeled data, often in the form of a language models. Disclosed herein are “Cold Fusion” architecture embodiments that leverage a pre-trained language model during training. The Seq2Seq models with Cold Fusion embodiments are able to better utilize language information enjoying faster convergence, better generalization, and almost complete transfer to a new domain while using less labeled training data.

Type: Grant

Filed: October 1, 2020

Date of Patent: April 4, 2023

Assignee: Baidu USA LLC

Inventors: Anuroop Sriram, Heewoo Jun, Sanjeev Satheesh, Adam Coates
Deep learning models for speech recognition

Patent number: 11562733

Abstract: Presented herein are embodiments of state-of-the-art speech recognition systems developed using end-to-end deep learning. In embodiments, the model architecture is significantly simpler than traditional speech systems, which rely on laboriously engineered processing pipelines; these traditional systems also tend to perform poorly when used in noisy environments. In contrast, embodiments of the system do not need hand-designed components to model background noise, reverberation, or speaker variation, but instead directly learn a function that is robust to such effects. Neither a phoneme dictionary, nor even the concept of a “phoneme,” is needed. Embodiments include a well-optimized recurrent neural network (RNN) training system that can use multiple GPUs, as well as a set of novel data synthesis techniques that allows for a large amount of varied data for training to be efficiently obtained.

Type: Grant

Filed: August 15, 2019

Date of Patent: January 24, 2023

Assignee: BAIDU USA LLC

Inventors: Awni Hannun, Carl Case, Jared Casper, Bryan Catanzaro, Gregory Diamos, Erich Eisen, Ryan Prenger, Sanjeev Satheesh, Shubhabrata Sengupta, Adam Coates, Andrew Ng
USER-GENERATED VISUAL GUIDE FOR THE CLASSIFICATION OF IMAGES

Publication number: 20220277171

Abstract: Systems and methods are disclosed herein for creating a visual guide for developing training data for a classification of image, where the training data includes images tagged with labels for the classification of the images. A processor may prompt a user to define a framework for the classification. For an initial set of images within the training data, qualified human classifiers are prompted to locate the images within the framework and to tag the images with labels. The processor determines whether the tagged images have consistent labels, and, if so, the processor adds images to the training data. The processor may add the images by providing a visual guide, the visual guide including tagged images arranged according to their locations within the framework their labels, and prompting human classifiers to tag the additional images with labels for the classification, according to the visual guide.

Type: Application

Filed: October 13, 2021

Publication date: September 1, 2022

Inventors: Dongyan Wang, Gopi Prashanth Gopal, Andrew Yan-Tak Ng, Karthikeyan Thiruppathisamy Nathillvar, Rustam Hashimov, Pingyang He, Dillon Anthony Laird, Yiwen Rong, Alejandro Betancourt, Sanjeev Satheesh, Yu Qing Zhou
Automated visual inspection of syringes

Patent number: 11348236

Abstract: A processor receives an image of a syringe. After identifying a background and foreground of the image, where the foreground indicates pixels that may be associated with a defect, the processor subtracts the background to generate an updated image with an accentuated foreground. The processor applies a bounding box to a group of pixels in the foreground and inputs the bounding box into a classifier. The classifier outputs a label indicating whether the syringe is defective.

Type: Grant

Filed: April 10, 2020

Date of Patent: May 31, 2022

Assignee: Landing AI

Inventors: Wei Fu, Rahul Devraj Solanki, Mark William Sabini, Yuanzhe Dong, Hao Sheng, Gopi Prashanth Gopal, Ankur Rawat, Sanjeev Satheesh
User-generated visual guide for the classification of images

Patent number: 11182646

Abstract: A process and a system for creating a visual guide for developing training data for a classification of image, where the training data includes images tagged with labels for the classification of the images. A processor may prompt a user to define a framework for the classification. For an initial set of images within the training data, qualified human classifiers are prompted to locate the images within the framework and to tag the images with labels. The processor determines whether the tagged images have consistent labels, and, if so, the processor adds images to the training data. The processor may add the images by providing a visual guide, the visual guide including tagged images arranged according to their locations within the framework their labels, and prompting human classifiers to tag the additional images with labels for the classification, according to the visual guide.

Type: Grant

Filed: October 30, 2019

Date of Patent: November 23, 2021

Assignee: LANDING AI

Inventors: Dongyan Wang, Gopi Prashanth Gopal, Andrew Yan-Tak Ng, Karthikeyan Thiruppathisamy Nathillvar, Rustam Hashimov, Pingyang He, Dillon Anthony Laird, Yiwen Rong, Alejandro Betancourt, Sanjeev Satheesh, Yu Qing Zhou
Automated Visual Inspection of Syringes

Publication number: 20210192723

Abstract: A processor receives an image of a syringe. After identifying a background and foreground of the image, where the foreground indicates pixels that may be associated with a defect, the processor subtracts the background to generate an updated image with an accentuated foreground. The processor applies a bounding box to a group of pixels in the foreground and inputs the bounding box into a classifier. The classifier outputs a label indicating whether the syringe is defective.

Type: Application

Filed: April 10, 2020

Publication date: June 24, 2021

Inventors: Wei Fu, Rahul Devraj Solanki, Mark William Sabini, Yuanzhe Dong, Hao Sheng, Gopi Prashanth Gopal, Ankur Rawat, Sanjeev Satheesh
Systems and methods for robust speech recognition using generative adversarial networks

Patent number: 10971142

Abstract: Described herein are systems and methods for a general, scalable, end-to-end framework that uses a generative adversarial network (GAN) objective to enable robust speech recognition. Encoders trained with the proposed approach enjoy improved invariance by learning to map noisy audio to the same embedding space as that of clean audio. Embodiments of a Wasserstein GAN framework increase the robustness of seq-to-seq models in a scalable, end-to-end fashion. In one or more embodiments, an encoder component is treated as the generator of GAN and is trained to produce indistinguishable embeddings between labeled and unlabeled audio samples. This new robust training approach can learn to induce robustness without alignment or complicated inference pipeline and even where augmentation of audio data is not possible.

Type: Grant

Filed: October 8, 2018

Date of Patent: April 6, 2021

Assignee: Baidu USA LLC

Inventors: Anuroop Sriram, Hee Woo Jun, Yashesh Gaur, Sanjeev Satheesh
USER-GENERATED VISUAL GUIDE FOR THE CLASSIFICATION OF IMAGES

Publication number: 20210097337

Abstract: Systems and methods are disclosed herein for creating a visual guide for developing training data for a classification of image, where the training data includes images tagged with labels for the classification of the images. A processor may prompt a user to define a framework for the classification. For an initial set of images within the training data, qualified human classifiers are prompted to locate the images within the framework and to tag the images with labels. The processor determines whether the tagged images have consistent labels, and, if so, the processor adds images to the training data. The processor may add the images by providing a visual guide, the visual guide including tagged images arranged according to their locations within the framework their labels, and prompting human classifiers to tag the additional images with labels for the classification, according to the visual guide.

Type: Application

Filed: October 30, 2019

Publication date: April 1, 2021

Inventors: Dongyan Wang, Gopi Prashanth Gopal, Andrew Yan-Tak Ng, Karthikeyan Thiruppathisamy Nathillvar, Rustam Hashimov, Pingyang He, Dillon Anthony Laird, Yiwen Rong, Alejandro Betancourt, Sanjeev Satheesh, Yu Qing Zhou
COLD FUSING SEQUENCE-TO-SEQUENCE MODELS WITH LANGUAGE MODELS

Publication number: 20210027767

Abstract: Described herein are systems and methods for generating natural language sentences with Sequence-to-sequence (Seq2Seq) models with attention. The Seq2Seq models may be implemented in applications, such as machine translation, image captioning, and speech recognition. Performance has further been improved by leveraging unlabeled data, often in the form of a language models. Disclosed herein are “Cold Fusion” architecture embodiments that leverage a pre-trained language model during training. The Seq2Seq models with Cold Fusion embodiments are able to better utilize language information enjoying faster convergence, better generalization, and almost complete transfer to a new domain while using less labeled training data.

Type: Application

Filed: October 1, 2020

Publication date: January 28, 2021

Applicant: Baidu USA LLC

Inventors: Anuroop SRIRAM, Heewoo JUN, Sanjeev SATHEESH, Adam COATES
Cold fusing sequence-to-sequence models with language models

Patent number: 10867595

Abstract: Described herein are systems and methods for generating natural language sentences with Sequence-to-sequence (Seq2Seq) models with attention. The Seq2Seq models may be implemented in applications, such as machine translation, image captioning, and speech recognition. Performance has further been improved by leveraging unlabeled data, often in the form of a language models. Disclosed herein are “Cold Fusion” architecture embodiments that leverage a pre-trained language model during training. The Seq2Seq models with Cold Fusion embodiments are able to better utilize language information enjoying faster convergence, better generalization, and almost complete transfer to a new domain while using less labeled training data.

Type: Grant

Filed: March 6, 2018

Date of Patent: December 15, 2020

Assignee: Baidu USA LLC

Inventors: Anuroop Sriram, Heewoo Jun, Sanjeev Satheesh, Adam Coates
Systems and methods for principled bias reduction in production speech models

Patent number: 10657955

Abstract: Described herein are systems and methods to identify and address sources of bias in an end-to-end speech model. In one or more embodiments, the end-to-end model may be a recurrent neural network with two 2D-convolutional input layers, followed by multiple bidirectional recurrent layers and one fully connected layer before a softmax layer. In one or more embodiments, the network is trained end-to-end using the CTC loss function to directly predict sequences of characters from log spectrograms of audio. With optimized recurrent layers and training together with alignment information, some unwanted bias induced by using purely forward only recurrences may be removed in a deployed model.

Type: Grant

Filed: January 30, 2018

Date of Patent: May 19, 2020

Assignee: Baidu USA LLC

Inventors: Eric Battenberg, Rewon Child, Adam Coates, Christopher Fougner, Yashesh Gaur, Jiaji Huang, Heewoo Jun, Ajay Kannan, Markus Kliegl, Atul Kumar, Hairong Liu, Vinay Rao, Sanjeev Satheesh, David Seetapun, Anuroop Sriram, Zhenyao Zhu
Systems and methods for speech transcription

Patent number: 10540957

Abstract: Presented herein are embodiments of state-of-the-art speech recognition systems developed using end-to-end deep learning. In embodiments, the model architecture is significantly simpler than traditional speech systems, which rely on laboriously engineered processing pipelines; these traditional systems also tend to perform poorly when used in noisy environments. In contrast, embodiments of the system do not need hand-designed components to model background noise, reverberation, or speaker variation, but instead directly learn a function that is robust to such effects. A phoneme dictionary, nor even the concept of a “phoneme,” is needed. Embodiments include a well-optimized recurrent neural network (RNN) training system that can use multiple GPUs, as well as a set of novel data synthesis techniques that allows for a large amount of varied data for training to be efficiently obtained.

Type: Grant

Filed: June 9, 2015

Date of Patent: January 21, 2020

Assignee: BAIDU USA LLC

Inventors: Awni Hannun, Carl Case, Jared Casper, Bryan Catanzaro, Gregory Diamos, Erich Elsen, Ryan Prenger, Sanjeev Satheesh, Shubhabrata Sengupta, Adam Coates, Andrew Y. Ng
DEEP LEARNING MODELS FOR SPEECH RECOGNITION

Publication number: 20190371298

Abstract: Presented herein are embodiments of state-of-the-art speech recognition systems developed using end-to-end deep learning. In embodiments, the model architecture is significantly simpler than traditional speech systems, which rely on laboriously engineered processing pipelines; these traditional systems also tend to perform poorly when used in noisy environments. In contrast, embodiments of the system do not need hand-designed components to model background noise, reverberation, or speaker variation, but instead directly learn a function that is robust to such effects. A phoneme dictionary, nor even the concept of a “phoneme,” is needed. Embodiments include a well-optimized recurrent neural network (RNN) training system that can use multiple GPUs, as well as a set of novel data synthesis techniques that allows for a large amount of varied data for training to be efficiently obtained.

Type: Application

Filed: August 15, 2019

Publication date: December 5, 2019

Applicant: BAIDU USA LLC

Inventors: Awni HANNUN, Carl CASE, Jared Casper, Bryan Catanzaro, Gregory Diamos, Erich Elsen, Ryan Prenger, Sanjeev Satheesh, Shubhabrata Sengupta, Adam Coates, Andrew Ng
Systems and methods for automatic unit selection and target decomposition for sequence labelling

Patent number: 10373610

Abstract: Described herein are systems and methods for automatic unit selection and target decomposition for sequence labelling. Embodiments include a new loss function called Gram-Connectionist Temporal Classification (CTC) loss that extend the popular CTC loss function criterion to alleviate prior limitations. While preserving the advantages of CTC, Gram-CTC automatically learns the best set of basic units (grams), as well as the most suitable decomposition of target sequences. Unlike CTC, embodiments of Gram-CTC allow a model to output variable number of characters at each time step, which enables the model to capture longer term dependency and improves the computational efficiency. It is also demonstrated that embodiments of Gram-CTC improve CTC in terms of both performance and efficiency on the large vocabulary speech recognition task at multiple scales of data, and that systems that employ an embodiment of Gram-CTC can outperform the state-of-the-art on a standard speech benchmark.

Type: Grant

Filed: September 7, 2017

Date of Patent: August 6, 2019

Assignee: Baidu USA LLC

Inventors: Hairong Liu, Zhenyao Zhu, Sanjeev Satheesh
End-to-end speech recognition

Patent number: 10332509

Abstract: Embodiments of end-to-end deep learning systems and methods are disclosed to recognize speech of vastly different languages, such as English or Mandarin Chinese. In embodiments, the entire pipelines of hand-engineered components are replaced with neural networks, and the end-to-end learning allows handling a diverse variety of speech including noisy environments, accents, and different languages. Using a trained embodiment and an embodiment of a batch dispatch technique with GPUs in a data center, an end-to-end deep learning system can be inexpensively deployed in an online setting, delivering low latency when serving users at scale.

Type: Grant

Filed: November 21, 2016

Date of Patent: June 25, 2019

Assignee: Baidu USA, LLC

Inventors: Bryan Catanzaro, Jingdong Chen, Mike Chrzanowski, Erich Elsen, Jesse Engel, Christopher Fougner, Xu Han, Awni Hannun, Ryan Prenger, Sanjeev Satheesh, Shubhabrata Sengupta, Dani Yogatama, Chong Wang, Jun Zhan, Zhenyao Zhu, Dario Amodei
Deployed end-to-end speech recognition

Patent number: 10319374

Abstract: Embodiments of end-to-end deep learning systems and methods are disclosed to recognize speech of vastly different languages, such as English or Mandarin Chinese. In embodiments, the entire pipelines of hand-engineered components are replaced with neural networks, and the end-to-end learning allows handling a diverse variety of speech including noisy environments, accents, and different languages. Using a trained embodiment and an embodiment of a batch dispatch technique with GPUs in a data center, an end-to-end deep learning system can be inexpensively deployed in an online setting, delivering low latency when serving users at scale.

Type: Grant

Filed: November 21, 2016

Date of Patent: June 11, 2019

Assignee: Baidu USA, LLC

Inventors: Bryan Catanzaro, Jingdong Chen, Mike Chrzanowski, Erich Elsen, Jesse Engel, Christopher Fougner, Xu Han, Awni Hannun, Ryan Prenger, Sanjeev Satheesh, Shubhabrata Sengupta, Dani Yogatama, Chong Wang, Jun Zhan, Zhenyao Zhu, Dario Amodei
SYSTEMS AND METHODS FOR ROBUST SPEECH RECOGNITION USING GENERATIVE ADVERSARIAL NETWORKS

Publication number: 20190130903

Abstract: Described herein are systems and methods for a general, scalable, end-to-end framework that uses a generative adversarial network (GAN) objective to enable robust speech recognition. Encoders trained with the proposed approach enjoy improved invariance by learning to map noisy audio to the same embedding space as that of clean audio. Embodiments of a Wasserstein GAN framework increase the robustness of seq-to-seq models in a scalable, end-to-end fashion. In one or more embodiments, an encoder component is treated as the generator of GAN and is trained to produce indistinguishable embeddings between labeled and unlabeled audio samples. This new robust training approach can learn to induce robustness without alignment or complicated inference pipeline and even where augmentation of audio data is not possible.

Type: Application

Filed: October 8, 2018

Publication date: May 2, 2019

Applicant: Baidu USA LLC

Inventors: Anuroop SRIRAM, Hee Woo JUN, Yashesh GAUR, Sanjeev SATHEESH
COLD FUSING SEQUENCE-TO-SEQUENCE MODELS WITH LANGUAGE MODELS

Publication number: 20180336884

Abstract: Described herein are systems and methods for generating natural language sentences with Sequence-to-sequence (Seq2Seq) models with attention. The Seq2Seq models may be implemented in applications, such as machine translation, image captioning, and speech recognition. Performance has further been improved by leveraging unlabeled data, often in the form of a language models. Disclosed herein are “Cold Fusion” architecture embodiments that leverage a pre-trained language model during training. The Seq2Seq models with Cold Fusion embodiments are able to better utilize language information enjoying faster convergence, better generalization, and almost complete transfer to a new domain while using less labeled training data.

Type: Application

Filed: March 6, 2018

Publication date: November 22, 2018

Applicant: Baidu USA LLC

Inventors: Anuroop SRIRAM, Heewoo JUN, Sanjeev SATHEESH, Adam COATES
SYSTEMS AND METHODS FOR PRINCIPLED BIAS REDUCTION IN PRODUCTION SPEECH MODELS

Publication number: 20180247643

Abstract: Described herein are systems and methods to identify and address sources of bias in an end-to-end speech model. In one or more embodiments, the end-to-end model may be a recurrent neural network with two 2D-convolutional input layers, followed by multiple bidirectional recurrent layers and one fully connected layer before a softmax layer. In one or more embodiments, the network is trained end-to-end using the CTC loss function to directly predict sequences of characters from log spectrograms of audio. With optimized recurrent layers and training together with alignment information, some unwanted bias induced by using purely forward only recurrences may be removed in a deployed model.

Type: Application

Filed: January 30, 2018

Publication date: August 30, 2018

Applicant: Baidu USA LLC

Inventors: Eric BATTENBERG, Rewon CHILD, Adam COATES, Christopher FOUGNER, Yashesh GAUR, Jiaji HUANG, Heewoo JUN, Ajay KANNAN, Markus KLIEGL, Atul KUMAR, Hairong LIU, Vinay RAO, Sanjeev SATHEESH, David SEETAPUN, Anuroop SRIRAM, Zhenyao ZHU

1 2 next