Patents by Inventor Gregory DIAMOS

Gregory DIAMOS has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20180247636
    Abstract: Embodiments of a production-quality text-to-speech (TTS) system constructed from deep neural networks are described. System embodiments comprise five major building blocks: a segmentation model for locating phoneme boundaries, a grapheme-to-phoneme conversion model, a phoneme duration prediction model, a fundamental frequency prediction model, and an audio synthesis model. For embodiments of the segmentation model, phoneme boundary detection was performed with deep neural networks using Connectionist Temporal Classification (CTC) loss. For embodiments of the audio synthesis model, a variant of WaveNet was created that requires fewer parameters and trains faster than the original. By using a neural network for each component, system embodiments are simpler and more flexible than traditional TTS systems, where each component requires laborious feature engineering and extensive domain expertise. Inference with system embodiments may be performed faster than real time.
    Type: Application
    Filed: January 29, 2018
    Publication date: August 30, 2018
    Applicant: Baidu USA LLC
    Inventors: Sercan O. ARIK, Mike CHRZANOWSKI, Adam COATES, Gregory DIAMOS, Andrew GIBIANSKY, John MILLER, Andrew NG, Jonathan RAIMAN, Shubhahrata SENGUPTA, Mohammad SHOEYBI
  • Publication number: 20170169326
    Abstract: Systems and methods for a multi-core optimized Recurrent Neural Network (RNN) architecture are disclosed. The various architectures affect communication and synchronization operations according to the Multi-Bulk-Synchronous-Parallel (MBSP) model for a given processor. The resulting family of network architectures, referred to as MBSP-RNNs, perform similarly to a conventional RNNs having the same number of parameters, but are substantially more efficient when mapped onto a modern general purpose processor. Due to the large gain in computational efficiency, for a fixed computational budget, MBSP-RNNs outperform RNNs at applications such as end-to-end speech recognition.
    Type: Application
    Filed: April 5, 2016
    Publication date: June 15, 2017
    Applicant: Baidu USA LLC
    Inventors: Gregory Diamos, Awni Hannun, Bryan Catanzaro, Dario Amodei, Erich Elsen, Jesse Engel, Shubhabrata Sengupta
  • Patent number: 9645802
    Abstract: A device compiler and linker is configured to group instructions into different strands for execution by different threads based on the dependence of those instructions on other, long-latency instructions. A thread may execute a strand that includes long-latency instructions, and then hardware resources previously allocated for the execution of that thread may be de-allocated from the thread and re-allocated to another thread. The other thread may then execute another strand while the long-latency instructions are in flight. With this approach, the other thread is not required to wait for the long-latency instructions to complete before acquiring hardware resources and initiating execution of the other strand, thereby eliminating at least a portion of the time that the other thread would otherwise spend waiting.
    Type: Grant
    Filed: August 7, 2013
    Date of Patent: May 9, 2017
    Assignee: NVIDIA Corporation
    Inventors: Mojtaba Mehrara, Michael Garland, Gregory Diamos
  • Patent number: 9424038
    Abstract: A compiler-controlled technique for scheduling threads to execute different regions of a program. A compiler analyzes program code to determine a control flow graph for the program code. The control flow graph contains regions and directed edges between regions. The regions have associated execution priorities. The directed edges indicate the direction of program control flow. Each region has a thread frontier which contains one or more regions. The compiler inserts one or more update predicate mask variable instructions at the end of a region. The compiler also inserts one or more conditional branch instructions at the end of the region. The conditional branch instructions are arranged in order of execution priority of the regions in the thread frontier of the region, to enforce execution priority of the regions at runtime.
    Type: Grant
    Filed: December 10, 2012
    Date of Patent: August 23, 2016
    Assignee: NVIDIA Corporation
    Inventors: Gregory Diamos, Mojtaba Mehrara
  • Publication number: 20160171974
    Abstract: Presented herein are embodiments of state-of-the-art speech recognition systems developed using end-to-end deep learning. In embodiments, the model architecture is significantly simpler than traditional speech systems, which rely on laboriously engineered processing pipelines; these traditional systems also tend to perform poorly when used in noisy environments. In contrast, embodiments of the system do not need hand-designed components to model background noise, reverberation, or speaker variation, but instead directly learn a function that is robust to such effects. A phoneme dictionary, nor even the concept of a “phoneme,” is needed. Embodiments include a well-optimized recurrent neural network (RNN) training system that can use multiple GPUs, as well as a set of novel data synthesis techniques that allows for a large amount of varied data for training to be efficiently obtained.
    Type: Application
    Filed: June 9, 2015
    Publication date: June 16, 2016
    Applicant: BAIDU USA LLC
    Inventors: Awni Hannun, Carl Case, Jared Casper, Bryan Catanzaro, Gregory Diamos, Erich Elsen, Ryan Prenger, Sanjeev Satheesh, Shubhabrata Sengupta, Adam Coates, Andrew Y. Ng
  • Patent number: 9274792
    Abstract: A compiler-controlled technique for scheduling threads to execute different regions of a program. A compiler analyzes program code to determine a control flow graph for the program code. The control flow graph contains regions and directed edges between regions. The regions have associated execution priorities. The directed edges indicate the direction of program control flow. Each region has a thread frontier which contains one or more regions. The compiler inserts one or more update predicate mask variable instructions at the end of a region. The compiler also inserts one or more conditional branch instructions at the end of the region. The conditional branch instructions are arranged in order of execution priority of the regions in the thread frontier of the region, to enforce execution priority of the regions at runtime.
    Type: Grant
    Filed: December 10, 2012
    Date of Patent: March 1, 2016
    Assignee: NVIDIA Corporation
    Inventors: Gregory Diamos, Mojtaba Mehrara
  • Patent number: 9229717
    Abstract: A method for allocating registers within a processing unit. A compiler assigns a plurality of instructions to a plurality of processing clusters. Each instruction is configured to access a first virtual register within a live range. The compiler determines which processing cluster in the plurality of processing clusters is an owner cluster for the first virtual register within the live range. The compiler configures a first instruction included in the plurality of instructions to access a first global virtual register.
    Type: Grant
    Filed: December 11, 2012
    Date of Patent: January 5, 2016
    Assignee: NVIDIA Corporation
    Inventors: Mojtaba Mehrara, Gregory Diamos
  • Publication number: 20150046684
    Abstract: A device compiler and linker is configured to group instructions into different strands for execution by different threads based on the dependence of those instructions on other, long-latency instructions. A thread may execute a strand that includes long-latency instructions, and then hardware resources previously allocated for the execution of that thread may be de-allocated from the thread and re-allocated to another thread. The other thread may then execute another strand while the long-latency instructions are in flight. With this approach, the other thread is not required to wait for the long-latency instructions to complete before acquiring hardware resources and initiating execution of the other strand, thereby eliminating at least a portion of the time that the other thread would otherwise spend waiting.
    Type: Application
    Filed: August 7, 2013
    Publication date: February 12, 2015
    Applicant: NVIDIA CORPORATION
    Inventors: Mojtaba Mehrara, Michael Garland, Gregory Diamos
  • Publication number: 20140165049
    Abstract: A compiler-controlled technique for scheduling threads to execute different regions of a program. A compiler analyzes program code to determine a control flow graph for the program code. The control flow graph contains regions and directed edges between regions. The regions have associated execution priorities. The directed edges indicate the direction of program control flow. Each region has a thread frontier which contains one or more regions. The compiler inserts one or more update predicate mask variable instructions at the end of a region. The compiler also inserts one or more conditional branch instructions at the end of the region. The conditional branch instructions are arranged in order of execution priority of the regions in the thread frontier of the region, to enforce execution priority of the regions at runtime.
    Type: Application
    Filed: December 10, 2012
    Publication date: June 12, 2014
    Applicant: NVIDIA CORPORATION
    Inventors: Gregory DIAMOS, Mojtaba MEHRARA
  • Publication number: 20140164745
    Abstract: A method for allocating registers within a processing unit. A compiler assigns a plurality of instructions to a plurality of processing clusters. Each instruction is configured to access a first virtual register within a live range. The compiler determines which processing cluster in the plurality of processing clusters is an owner cluster for the first virtual register within the live range. The compiler configures a first instruction included in the plurality of instructions to access a first global virtual register.
    Type: Application
    Filed: December 11, 2012
    Publication date: June 12, 2014
    Applicant: NVIDIA CORPORATION
    Inventors: Mojtaba MEHRARA, Gregory DIAMOS