Patents by Inventor Sanjeev Satheesh
Sanjeev Satheesh has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 11790270Abstract: A process and a system for creating a visual guide for developing training data for a classification of image, where the training data includes images tagged with labels for the classification of the images. A processor may prompt a user to define a framework for the classification. For an initial set of images within the training data, qualified human classifiers are prompted to locate the images within the framework and to tag the images with labels. The processor determines whether the tagged images have consistent labels, and, if so, the processor adds images to the training data. The processor may add the images by providing a visual guide, the visual guide including tagged images arranged according to their locations within the framework their labels, and prompting human classifiers to tag the additional images with labels for the classification, according to the visual guide.Type: GrantFiled: October 13, 2021Date of Patent: October 17, 2023Assignee: Landing AIInventors: Dongyan Wang, Gopi Prashanth Gopal, Andrew Yan-Tak Ng, Karthikeyan Thiruppathisamy Nathillvar, Rustam Hashimov, Pingyang He, Dillon Anthony Laird, Yiwen Rong, Alejandro Betancourt, Sanjeev Satheesh, Yu Qing Zhou
-
Patent number: 11620986Abstract: Described herein are systems and methods for generating natural language sentences with Sequence-to-sequence (Seq2Seq) models with attention. The Seq2Seq models may be implemented in applications, such as machine translation, image captioning, and speech recognition. Performance has further been improved by leveraging unlabeled data, often in the form of a language models. Disclosed herein are “Cold Fusion” architecture embodiments that leverage a pre-trained language model during training. The Seq2Seq models with Cold Fusion embodiments are able to better utilize language information enjoying faster convergence, better generalization, and almost complete transfer to a new domain while using less labeled training data.Type: GrantFiled: October 1, 2020Date of Patent: April 4, 2023Assignee: Baidu USA LLCInventors: Anuroop Sriram, Heewoo Jun, Sanjeev Satheesh, Adam Coates
-
Patent number: 11562733Abstract: Presented herein are embodiments of state-of-the-art speech recognition systems developed using end-to-end deep learning. In embodiments, the model architecture is significantly simpler than traditional speech systems, which rely on laboriously engineered processing pipelines; these traditional systems also tend to perform poorly when used in noisy environments. In contrast, embodiments of the system do not need hand-designed components to model background noise, reverberation, or speaker variation, but instead directly learn a function that is robust to such effects. Neither a phoneme dictionary, nor even the concept of a “phoneme,” is needed. Embodiments include a well-optimized recurrent neural network (RNN) training system that can use multiple GPUs, as well as a set of novel data synthesis techniques that allows for a large amount of varied data for training to be efficiently obtained.Type: GrantFiled: August 15, 2019Date of Patent: January 24, 2023Assignee: BAIDU USA LLCInventors: Awni Hannun, Carl Case, Jared Casper, Bryan Catanzaro, Gregory Diamos, Erich Eisen, Ryan Prenger, Sanjeev Satheesh, Shubhabrata Sengupta, Adam Coates, Andrew Ng
-
Publication number: 20220277171Abstract: Systems and methods are disclosed herein for creating a visual guide for developing training data for a classification of image, where the training data includes images tagged with labels for the classification of the images. A processor may prompt a user to define a framework for the classification. For an initial set of images within the training data, qualified human classifiers are prompted to locate the images within the framework and to tag the images with labels. The processor determines whether the tagged images have consistent labels, and, if so, the processor adds images to the training data. The processor may add the images by providing a visual guide, the visual guide including tagged images arranged according to their locations within the framework their labels, and prompting human classifiers to tag the additional images with labels for the classification, according to the visual guide.Type: ApplicationFiled: October 13, 2021Publication date: September 1, 2022Inventors: Dongyan Wang, Gopi Prashanth Gopal, Andrew Yan-Tak Ng, Karthikeyan Thiruppathisamy Nathillvar, Rustam Hashimov, Pingyang He, Dillon Anthony Laird, Yiwen Rong, Alejandro Betancourt, Sanjeev Satheesh, Yu Qing Zhou
-
Patent number: 11348236Abstract: A processor receives an image of a syringe. After identifying a background and foreground of the image, where the foreground indicates pixels that may be associated with a defect, the processor subtracts the background to generate an updated image with an accentuated foreground. The processor applies a bounding box to a group of pixels in the foreground and inputs the bounding box into a classifier. The classifier outputs a label indicating whether the syringe is defective.Type: GrantFiled: April 10, 2020Date of Patent: May 31, 2022Assignee: Landing AIInventors: Wei Fu, Rahul Devraj Solanki, Mark William Sabini, Yuanzhe Dong, Hao Sheng, Gopi Prashanth Gopal, Ankur Rawat, Sanjeev Satheesh
-
Patent number: 11182646Abstract: A process and a system for creating a visual guide for developing training data for a classification of image, where the training data includes images tagged with labels for the classification of the images. A processor may prompt a user to define a framework for the classification. For an initial set of images within the training data, qualified human classifiers are prompted to locate the images within the framework and to tag the images with labels. The processor determines whether the tagged images have consistent labels, and, if so, the processor adds images to the training data. The processor may add the images by providing a visual guide, the visual guide including tagged images arranged according to their locations within the framework their labels, and prompting human classifiers to tag the additional images with labels for the classification, according to the visual guide.Type: GrantFiled: October 30, 2019Date of Patent: November 23, 2021Assignee: LANDING AIInventors: Dongyan Wang, Gopi Prashanth Gopal, Andrew Yan-Tak Ng, Karthikeyan Thiruppathisamy Nathillvar, Rustam Hashimov, Pingyang He, Dillon Anthony Laird, Yiwen Rong, Alejandro Betancourt, Sanjeev Satheesh, Yu Qing Zhou
-
Publication number: 20210192723Abstract: A processor receives an image of a syringe. After identifying a background and foreground of the image, where the foreground indicates pixels that may be associated with a defect, the processor subtracts the background to generate an updated image with an accentuated foreground. The processor applies a bounding box to a group of pixels in the foreground and inputs the bounding box into a classifier. The classifier outputs a label indicating whether the syringe is defective.Type: ApplicationFiled: April 10, 2020Publication date: June 24, 2021Inventors: Wei Fu, Rahul Devraj Solanki, Mark William Sabini, Yuanzhe Dong, Hao Sheng, Gopi Prashanth Gopal, Ankur Rawat, Sanjeev Satheesh
-
Patent number: 10971142Abstract: Described herein are systems and methods for a general, scalable, end-to-end framework that uses a generative adversarial network (GAN) objective to enable robust speech recognition. Encoders trained with the proposed approach enjoy improved invariance by learning to map noisy audio to the same embedding space as that of clean audio. Embodiments of a Wasserstein GAN framework increase the robustness of seq-to-seq models in a scalable, end-to-end fashion. In one or more embodiments, an encoder component is treated as the generator of GAN and is trained to produce indistinguishable embeddings between labeled and unlabeled audio samples. This new robust training approach can learn to induce robustness without alignment or complicated inference pipeline and even where augmentation of audio data is not possible.Type: GrantFiled: October 8, 2018Date of Patent: April 6, 2021Assignee: Baidu USA LLCInventors: Anuroop Sriram, Hee Woo Jun, Yashesh Gaur, Sanjeev Satheesh
-
Publication number: 20210097337Abstract: Systems and methods are disclosed herein for creating a visual guide for developing training data for a classification of image, where the training data includes images tagged with labels for the classification of the images. A processor may prompt a user to define a framework for the classification. For an initial set of images within the training data, qualified human classifiers are prompted to locate the images within the framework and to tag the images with labels. The processor determines whether the tagged images have consistent labels, and, if so, the processor adds images to the training data. The processor may add the images by providing a visual guide, the visual guide including tagged images arranged according to their locations within the framework their labels, and prompting human classifiers to tag the additional images with labels for the classification, according to the visual guide.Type: ApplicationFiled: October 30, 2019Publication date: April 1, 2021Inventors: Dongyan Wang, Gopi Prashanth Gopal, Andrew Yan-Tak Ng, Karthikeyan Thiruppathisamy Nathillvar, Rustam Hashimov, Pingyang He, Dillon Anthony Laird, Yiwen Rong, Alejandro Betancourt, Sanjeev Satheesh, Yu Qing Zhou
-
Publication number: 20210027767Abstract: Described herein are systems and methods for generating natural language sentences with Sequence-to-sequence (Seq2Seq) models with attention. The Seq2Seq models may be implemented in applications, such as machine translation, image captioning, and speech recognition. Performance has further been improved by leveraging unlabeled data, often in the form of a language models. Disclosed herein are “Cold Fusion” architecture embodiments that leverage a pre-trained language model during training. The Seq2Seq models with Cold Fusion embodiments are able to better utilize language information enjoying faster convergence, better generalization, and almost complete transfer to a new domain while using less labeled training data.Type: ApplicationFiled: October 1, 2020Publication date: January 28, 2021Applicant: Baidu USA LLCInventors: Anuroop SRIRAM, Heewoo JUN, Sanjeev SATHEESH, Adam COATES
-
Patent number: 10867595Abstract: Described herein are systems and methods for generating natural language sentences with Sequence-to-sequence (Seq2Seq) models with attention. The Seq2Seq models may be implemented in applications, such as machine translation, image captioning, and speech recognition. Performance has further been improved by leveraging unlabeled data, often in the form of a language models. Disclosed herein are “Cold Fusion” architecture embodiments that leverage a pre-trained language model during training. The Seq2Seq models with Cold Fusion embodiments are able to better utilize language information enjoying faster convergence, better generalization, and almost complete transfer to a new domain while using less labeled training data.Type: GrantFiled: March 6, 2018Date of Patent: December 15, 2020Assignee: Baidu USA LLCInventors: Anuroop Sriram, Heewoo Jun, Sanjeev Satheesh, Adam Coates
-
Patent number: 10657955Abstract: Described herein are systems and methods to identify and address sources of bias in an end-to-end speech model. In one or more embodiments, the end-to-end model may be a recurrent neural network with two 2D-convolutional input layers, followed by multiple bidirectional recurrent layers and one fully connected layer before a softmax layer. In one or more embodiments, the network is trained end-to-end using the CTC loss function to directly predict sequences of characters from log spectrograms of audio. With optimized recurrent layers and training together with alignment information, some unwanted bias induced by using purely forward only recurrences may be removed in a deployed model.Type: GrantFiled: January 30, 2018Date of Patent: May 19, 2020Assignee: Baidu USA LLCInventors: Eric Battenberg, Rewon Child, Adam Coates, Christopher Fougner, Yashesh Gaur, Jiaji Huang, Heewoo Jun, Ajay Kannan, Markus Kliegl, Atul Kumar, Hairong Liu, Vinay Rao, Sanjeev Satheesh, David Seetapun, Anuroop Sriram, Zhenyao Zhu
-
Patent number: 10540957Abstract: Presented herein are embodiments of state-of-the-art speech recognition systems developed using end-to-end deep learning. In embodiments, the model architecture is significantly simpler than traditional speech systems, which rely on laboriously engineered processing pipelines; these traditional systems also tend to perform poorly when used in noisy environments. In contrast, embodiments of the system do not need hand-designed components to model background noise, reverberation, or speaker variation, but instead directly learn a function that is robust to such effects. A phoneme dictionary, nor even the concept of a “phoneme,” is needed. Embodiments include a well-optimized recurrent neural network (RNN) training system that can use multiple GPUs, as well as a set of novel data synthesis techniques that allows for a large amount of varied data for training to be efficiently obtained.Type: GrantFiled: June 9, 2015Date of Patent: January 21, 2020Assignee: BAIDU USA LLCInventors: Awni Hannun, Carl Case, Jared Casper, Bryan Catanzaro, Gregory Diamos, Erich Elsen, Ryan Prenger, Sanjeev Satheesh, Shubhabrata Sengupta, Adam Coates, Andrew Y. Ng
-
Publication number: 20190371298Abstract: Presented herein are embodiments of state-of-the-art speech recognition systems developed using end-to-end deep learning. In embodiments, the model architecture is significantly simpler than traditional speech systems, which rely on laboriously engineered processing pipelines; these traditional systems also tend to perform poorly when used in noisy environments. In contrast, embodiments of the system do not need hand-designed components to model background noise, reverberation, or speaker variation, but instead directly learn a function that is robust to such effects. A phoneme dictionary, nor even the concept of a “phoneme,” is needed. Embodiments include a well-optimized recurrent neural network (RNN) training system that can use multiple GPUs, as well as a set of novel data synthesis techniques that allows for a large amount of varied data for training to be efficiently obtained.Type: ApplicationFiled: August 15, 2019Publication date: December 5, 2019Applicant: BAIDU USA LLCInventors: Awni HANNUN, Carl CASE, Jared Casper, Bryan Catanzaro, Gregory Diamos, Erich Elsen, Ryan Prenger, Sanjeev Satheesh, Shubhabrata Sengupta, Adam Coates, Andrew Ng
-
Patent number: 10373610Abstract: Described herein are systems and methods for automatic unit selection and target decomposition for sequence labelling. Embodiments include a new loss function called Gram-Connectionist Temporal Classification (CTC) loss that extend the popular CTC loss function criterion to alleviate prior limitations. While preserving the advantages of CTC, Gram-CTC automatically learns the best set of basic units (grams), as well as the most suitable decomposition of target sequences. Unlike CTC, embodiments of Gram-CTC allow a model to output variable number of characters at each time step, which enables the model to capture longer term dependency and improves the computational efficiency. It is also demonstrated that embodiments of Gram-CTC improve CTC in terms of both performance and efficiency on the large vocabulary speech recognition task at multiple scales of data, and that systems that employ an embodiment of Gram-CTC can outperform the state-of-the-art on a standard speech benchmark.Type: GrantFiled: September 7, 2017Date of Patent: August 6, 2019Assignee: Baidu USA LLCInventors: Hairong Liu, Zhenyao Zhu, Sanjeev Satheesh
-
Patent number: 10332509Abstract: Embodiments of end-to-end deep learning systems and methods are disclosed to recognize speech of vastly different languages, such as English or Mandarin Chinese. In embodiments, the entire pipelines of hand-engineered components are replaced with neural networks, and the end-to-end learning allows handling a diverse variety of speech including noisy environments, accents, and different languages. Using a trained embodiment and an embodiment of a batch dispatch technique with GPUs in a data center, an end-to-end deep learning system can be inexpensively deployed in an online setting, delivering low latency when serving users at scale.Type: GrantFiled: November 21, 2016Date of Patent: June 25, 2019Assignee: Baidu USA, LLCInventors: Bryan Catanzaro, Jingdong Chen, Mike Chrzanowski, Erich Elsen, Jesse Engel, Christopher Fougner, Xu Han, Awni Hannun, Ryan Prenger, Sanjeev Satheesh, Shubhabrata Sengupta, Dani Yogatama, Chong Wang, Jun Zhan, Zhenyao Zhu, Dario Amodei
-
Patent number: 10319374Abstract: Embodiments of end-to-end deep learning systems and methods are disclosed to recognize speech of vastly different languages, such as English or Mandarin Chinese. In embodiments, the entire pipelines of hand-engineered components are replaced with neural networks, and the end-to-end learning allows handling a diverse variety of speech including noisy environments, accents, and different languages. Using a trained embodiment and an embodiment of a batch dispatch technique with GPUs in a data center, an end-to-end deep learning system can be inexpensively deployed in an online setting, delivering low latency when serving users at scale.Type: GrantFiled: November 21, 2016Date of Patent: June 11, 2019Assignee: Baidu USA, LLCInventors: Bryan Catanzaro, Jingdong Chen, Mike Chrzanowski, Erich Elsen, Jesse Engel, Christopher Fougner, Xu Han, Awni Hannun, Ryan Prenger, Sanjeev Satheesh, Shubhabrata Sengupta, Dani Yogatama, Chong Wang, Jun Zhan, Zhenyao Zhu, Dario Amodei
-
Publication number: 20190130903Abstract: Described herein are systems and methods for a general, scalable, end-to-end framework that uses a generative adversarial network (GAN) objective to enable robust speech recognition. Encoders trained with the proposed approach enjoy improved invariance by learning to map noisy audio to the same embedding space as that of clean audio. Embodiments of a Wasserstein GAN framework increase the robustness of seq-to-seq models in a scalable, end-to-end fashion. In one or more embodiments, an encoder component is treated as the generator of GAN and is trained to produce indistinguishable embeddings between labeled and unlabeled audio samples. This new robust training approach can learn to induce robustness without alignment or complicated inference pipeline and even where augmentation of audio data is not possible.Type: ApplicationFiled: October 8, 2018Publication date: May 2, 2019Applicant: Baidu USA LLCInventors: Anuroop SRIRAM, Hee Woo JUN, Yashesh GAUR, Sanjeev SATHEESH
-
Publication number: 20180336884Abstract: Described herein are systems and methods for generating natural language sentences with Sequence-to-sequence (Seq2Seq) models with attention. The Seq2Seq models may be implemented in applications, such as machine translation, image captioning, and speech recognition. Performance has further been improved by leveraging unlabeled data, often in the form of a language models. Disclosed herein are “Cold Fusion” architecture embodiments that leverage a pre-trained language model during training. The Seq2Seq models with Cold Fusion embodiments are able to better utilize language information enjoying faster convergence, better generalization, and almost complete transfer to a new domain while using less labeled training data.Type: ApplicationFiled: March 6, 2018Publication date: November 22, 2018Applicant: Baidu USA LLCInventors: Anuroop SRIRAM, Heewoo JUN, Sanjeev SATHEESH, Adam COATES
-
Publication number: 20180247639Abstract: Described herein are systems and methods for automatic unit selection and target decomposition for sequence labelling. Embodiments include a new loss function called Gram-Connectionist Temporal Classification (CTC) loss that extend the popular CTC loss function criterion to alleviate prior limitations. While preserving the advantages of CTC, Gram-CTC automatically learns the best set of basic units (grams), as well as the most suitable decomposition of target sequences. Unlike CTC, embodiments of Gram-CTC allow a model to output variable number of characters at each time step, which enables the model to capture longer term dependency and improves the computational efficiency. It is also demonstrated that embodiments of Gram-CTC improve CTC in terms of both performance and efficiency on the large vocabulary speech recognition task at multiple scales of data, and that systems that employ an embodiment of Gram-CTC can outperform the state-of-the-art on a standard speech benchmark.Type: ApplicationFiled: September 7, 2017Publication date: August 30, 2018Applicant: Baidu USA LLCInventors: Hairong Liu, Zhenyao Zhu, Sanjeev Satheesh