Patents by Inventor Ryan Prenger

Ryan Prenger has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11562733
    Abstract: Presented herein are embodiments of state-of-the-art speech recognition systems developed using end-to-end deep learning. In embodiments, the model architecture is significantly simpler than traditional speech systems, which rely on laboriously engineered processing pipelines; these traditional systems also tend to perform poorly when used in noisy environments. In contrast, embodiments of the system do not need hand-designed components to model background noise, reverberation, or speaker variation, but instead directly learn a function that is robust to such effects. Neither a phoneme dictionary, nor even the concept of a “phoneme,” is needed. Embodiments include a well-optimized recurrent neural network (RNN) training system that can use multiple GPUs, as well as a set of novel data synthesis techniques that allows for a large amount of varied data for training to be efficiently obtained.
    Type: Grant
    Filed: August 15, 2019
    Date of Patent: January 24, 2023
    Assignee: BAIDU USA LLC
    Inventors: Awni Hannun, Carl Case, Jared Casper, Bryan Catanzaro, Gregory Diamos, Erich Eisen, Ryan Prenger, Sanjeev Satheesh, Shubhabrata Sengupta, Adam Coates, Andrew Ng
  • Publication number: 20200394994
    Abstract: Systems and methods to help synthesize a second audio signal based, at least in part, on one or more neural networks trained using one or more characteristics of a first audio signal. Systems and methods to train one or more neural networks to synthesize a second audio signal based, at least in part, on one or more characteristics of a first audio signal.
    Type: Application
    Filed: June 12, 2019
    Publication date: December 17, 2020
    Inventors: Ryan Prenger, Rafael Valle, Bryan Catanzaro
  • Patent number: 10540957
    Abstract: Presented herein are embodiments of state-of-the-art speech recognition systems developed using end-to-end deep learning. In embodiments, the model architecture is significantly simpler than traditional speech systems, which rely on laboriously engineered processing pipelines; these traditional systems also tend to perform poorly when used in noisy environments. In contrast, embodiments of the system do not need hand-designed components to model background noise, reverberation, or speaker variation, but instead directly learn a function that is robust to such effects. A phoneme dictionary, nor even the concept of a “phoneme,” is needed. Embodiments include a well-optimized recurrent neural network (RNN) training system that can use multiple GPUs, as well as a set of novel data synthesis techniques that allows for a large amount of varied data for training to be efficiently obtained.
    Type: Grant
    Filed: June 9, 2015
    Date of Patent: January 21, 2020
    Assignee: BAIDU USA LLC
    Inventors: Awni Hannun, Carl Case, Jared Casper, Bryan Catanzaro, Gregory Diamos, Erich Elsen, Ryan Prenger, Sanjeev Satheesh, Shubhabrata Sengupta, Adam Coates, Andrew Y. Ng
  • Patent number: 10540961
    Abstract: Described herein are systems and methods for creating and using Convolutional Recurrent Neural Networks (CRNNs) for small-footprint keyword spotting (KWS) systems. Inspired by the large-scale state-of-the-art speech recognition systems, in embodiments, the strengths of convolutional layers to utilize the structure in the data in time and frequency domains are combined with recurrent layers to utilize context for the entire processed frame. The effect of architecture parameters were examined to determine preferred model embodiments given the performance versus model size tradeoff. Various training strategies are provided to improve performance. In embodiments, using only ˜230 k parameters and yielding acceptably low latency, a CRNN model embodiment demonstrated high accuracy and robust performance in a wide range of environments.
    Type: Grant
    Filed: August 28, 2017
    Date of Patent: January 21, 2020
    Assignee: Baidu USA LLC
    Inventors: Sercan Arik, Markus Kliegl, Rewon Child, Joel Hestness, Andrew Gibiansky, Christopher Fougner, Ryan Prenger, Adam Coates
  • Publication number: 20190371298
    Abstract: Presented herein are embodiments of state-of-the-art speech recognition systems developed using end-to-end deep learning. In embodiments, the model architecture is significantly simpler than traditional speech systems, which rely on laboriously engineered processing pipelines; these traditional systems also tend to perform poorly when used in noisy environments. In contrast, embodiments of the system do not need hand-designed components to model background noise, reverberation, or speaker variation, but instead directly learn a function that is robust to such effects. A phoneme dictionary, nor even the concept of a “phoneme,” is needed. Embodiments include a well-optimized recurrent neural network (RNN) training system that can use multiple GPUs, as well as a set of novel data synthesis techniques that allows for a large amount of varied data for training to be efficiently obtained.
    Type: Application
    Filed: August 15, 2019
    Publication date: December 5, 2019
    Applicant: BAIDU USA LLC
    Inventors: Awni HANNUN, Carl CASE, Jared Casper, Bryan Catanzaro, Gregory Diamos, Erich Elsen, Ryan Prenger, Sanjeev Satheesh, Shubhabrata Sengupta, Adam Coates, Andrew Ng
  • Patent number: 10332509
    Abstract: Embodiments of end-to-end deep learning systems and methods are disclosed to recognize speech of vastly different languages, such as English or Mandarin Chinese. In embodiments, the entire pipelines of hand-engineered components are replaced with neural networks, and the end-to-end learning allows handling a diverse variety of speech including noisy environments, accents, and different languages. Using a trained embodiment and an embodiment of a batch dispatch technique with GPUs in a data center, an end-to-end deep learning system can be inexpensively deployed in an online setting, delivering low latency when serving users at scale.
    Type: Grant
    Filed: November 21, 2016
    Date of Patent: June 25, 2019
    Assignee: Baidu USA, LLC
    Inventors: Bryan Catanzaro, Jingdong Chen, Mike Chrzanowski, Erich Elsen, Jesse Engel, Christopher Fougner, Xu Han, Awni Hannun, Ryan Prenger, Sanjeev Satheesh, Shubhabrata Sengupta, Dani Yogatama, Chong Wang, Jun Zhan, Zhenyao Zhu, Dario Amodei
  • Patent number: 10319374
    Abstract: Embodiments of end-to-end deep learning systems and methods are disclosed to recognize speech of vastly different languages, such as English or Mandarin Chinese. In embodiments, the entire pipelines of hand-engineered components are replaced with neural networks, and the end-to-end learning allows handling a diverse variety of speech including noisy environments, accents, and different languages. Using a trained embodiment and an embodiment of a batch dispatch technique with GPUs in a data center, an end-to-end deep learning system can be inexpensively deployed in an online setting, delivering low latency when serving users at scale.
    Type: Grant
    Filed: November 21, 2016
    Date of Patent: June 11, 2019
    Assignee: Baidu USA, LLC
    Inventors: Bryan Catanzaro, Jingdong Chen, Mike Chrzanowski, Erich Elsen, Jesse Engel, Christopher Fougner, Xu Han, Awni Hannun, Ryan Prenger, Sanjeev Satheesh, Shubhabrata Sengupta, Dani Yogatama, Chong Wang, Jun Zhan, Zhenyao Zhu, Dario Amodei
  • Publication number: 20180261213
    Abstract: Described herein are systems and methods for creating and using Convolutional Recurrent Neural Networks (CRNNs) for small-footprint keyword spotting (KWS) systems. Inspired by the large-scale state-of-the-art speech recognition systems, in embodiments, the strengths of convolutional layers to utilize the structure in the data in time and frequency domains are combined with recurrent layers to utilize context for the entire processed frame. The effect of architecture parameters were examined to determine preferred model embodiments given the performance versus model size tradeoff. Various training strategies are provided to improve performance. In embodiments, using only ˜230 k parameters and yielding acceptably low latency, a CRNN model embodiment demonstrated high accuracy and robust performance in a wide range of environments.
    Type: Application
    Filed: August 28, 2017
    Publication date: September 13, 2018
    Applicant: Baidu USA LLC
    Inventors: Sercan Arik, Markus Kliegl, Rewon Child, Joel Hestness, Andrew Gibiansky, Christopher Fougner, Ryan Prenger, Adam Coates
  • Publication number: 20170148431
    Abstract: Embodiments of end-to-end deep learning systems and methods are disclosed to recognize speech of vastly different languages, such as English or Mandarin Chinese. In embodiments, the entire pipelines of hand-engineered components are replaced with neural networks, and the end-to-end learning allows handling a diverse variety of speech including noisy environments, accents, and different languages. Using a trained embodiment and an embodiment of a batch dispatch technique with GPUs in a data center, an end-to-end deep learning system can be inexpensively deployed in an online setting, delivering low latency when serving users at scale.
    Type: Application
    Filed: November 21, 2016
    Publication date: May 25, 2017
    Applicant: Baidu USA LLC
    Inventors: Bryan Catanzaro, Jingdong Chen, Mike Chrzanowski, Erich Elsen, Jesse Engel, Christopher Fougner, Xu Han, Awni Hannun, Ryan Prenger, Sanjeev Satheesh, Shubhabrata Sengupta, Dani Yogatama, Chong Wang, Jun Zhan, Zhenyao Zhu, Dario Amodei
  • Publication number: 20170148433
    Abstract: Embodiments of end-to-end deep learning systems and methods are disclosed to recognize speech of vastly different languages, such as English or Mandarin Chinese. In embodiments, the entire pipelines of hand-engineered components are replaced with neural networks, and the end-to-end learning allows handling a diverse variety of speech including noisy environments, accents, and different languages. Using a trained embodiment and an embodiment of a batch dispatch technique with GPUs in a data center, an end-to-end deep learning system can be inexpensively deployed in an online setting, delivering low latency when serving users at scale.
    Type: Application
    Filed: November 21, 2016
    Publication date: May 25, 2017
    Applicant: Baidu USA LLC
    Inventors: Bryan Catanzaro, Jingdong Chen, Mike Chrzanowski, Erich Elsen, Jesse Engel, Christopher Fougner, Xu Han, Awni Hannun, Ryan Prenger, Sanjeev Satheesh, Shubhabrata Sengupta, Dani Yogatama, Chong Wang, Jun Zhan, Zhenyao Zhu, Dario Amodei
  • Patent number: 9451883
    Abstract: Decoding and reconstructing a subjective perceptual or cognitive experience is described. A first set of brain activity data produced in response to a first brain activity stimulus is acquired from a subject using a brain imaging device. An encoding model is used to convert the brain activity data into a corresponding set of predicted response values. A second set of brain activity data produced in response to a second brain activity stimulus is acquired from a subject and decoded using a decoding distribution derived from the encoding model, and the probability the second set of brain activity data corresponds to said predicted response values is determined. The second set of brain activity stimuli is then reconstructed based on the probability of correspondence between the second set of brain activity data and the predicted response values.
    Type: Grant
    Filed: December 21, 2012
    Date of Patent: September 27, 2016
    Assignee: THE REGENTS OF THE UNIVERSITY OF CALIFORNIA
    Inventors: Jack L. Gallant, Thomas Naselaris, Kendrick Kay, Ryan Prenger
  • Publication number: 20160171974
    Abstract: Presented herein are embodiments of state-of-the-art speech recognition systems developed using end-to-end deep learning. In embodiments, the model architecture is significantly simpler than traditional speech systems, which rely on laboriously engineered processing pipelines; these traditional systems also tend to perform poorly when used in noisy environments. In contrast, embodiments of the system do not need hand-designed components to model background noise, reverberation, or speaker variation, but instead directly learn a function that is robust to such effects. A phoneme dictionary, nor even the concept of a “phoneme,” is needed. Embodiments include a well-optimized recurrent neural network (RNN) training system that can use multiple GPUs, as well as a set of novel data synthesis techniques that allows for a large amount of varied data for training to be efficiently obtained.
    Type: Application
    Filed: June 9, 2015
    Publication date: June 16, 2016
    Applicant: BAIDU USA LLC
    Inventors: Awni Hannun, Carl Case, Jared Casper, Bryan Catanzaro, Gregory Diamos, Erich Elsen, Ryan Prenger, Sanjeev Satheesh, Shubhabrata Sengupta, Adam Coates, Andrew Y. Ng