Patents by Inventor Gregory DIAMOS

Gregory DIAMOS has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

SYSTEMS AND METHODS FOR REAL-TIME NEURAL TEXT-TO-SPEECH

Publication number: 20180247636

Abstract: Embodiments of a production-quality text-to-speech (TTS) system constructed from deep neural networks are described. System embodiments comprise five major building blocks: a segmentation model for locating phoneme boundaries, a grapheme-to-phoneme conversion model, a phoneme duration prediction model, a fundamental frequency prediction model, and an audio synthesis model. For embodiments of the segmentation model, phoneme boundary detection was performed with deep neural networks using Connectionist Temporal Classification (CTC) loss. For embodiments of the audio synthesis model, a variant of WaveNet was created that requires fewer parameters and trains faster than the original. By using a neural network for each component, system embodiments are simpler and more flexible than traditional TTS systems, where each component requires laborious feature engineering and extensive domain expertise. Inference with system embodiments may be performed faster than real time.

Type: Application

Filed: January 29, 2018

Publication date: August 30, 2018

Applicant: Baidu USA LLC

Inventors: Sercan O. ARIK, Mike CHRZANOWSKI, Adam COATES, Gregory DIAMOS, Andrew GIBIANSKY, John MILLER, Andrew NG, Jonathan RAIMAN, Shubhahrata SENGUPTA, Mohammad SHOEYBI
SYSTEMS AND METHODS FOR A MULTI-CORE OPTIMIZED RECURRENT NEURAL NETWORK

Publication number: 20170169326

Abstract: Systems and methods for a multi-core optimized Recurrent Neural Network (RNN) architecture are disclosed. The various architectures affect communication and synchronization operations according to the Multi-Bulk-Synchronous-Parallel (MBSP) model for a given processor. The resulting family of network architectures, referred to as MBSP-RNNs, perform similarly to a conventional RNNs having the same number of parameters, but are substantially more efficient when mapped onto a modern general purpose processor. Due to the large gain in computational efficiency, for a fixed computational budget, MBSP-RNNs outperform RNNs at applications such as end-to-end speech recognition.

Type: Application

Filed: April 5, 2016

Publication date: June 15, 2017

Applicant: Baidu USA LLC

Inventors: Gregory Diamos, Awni Hannun, Bryan Catanzaro, Dario Amodei, Erich Elsen, Jesse Engel, Shubhabrata Sengupta
Technique for grouping instructions into independent strands

Patent number: 9645802

Abstract: A device compiler and linker is configured to group instructions into different strands for execution by different threads based on the dependence of those instructions on other, long-latency instructions. A thread may execute a strand that includes long-latency instructions, and then hardware resources previously allocated for the execution of that thread may be de-allocated from the thread and re-allocated to another thread. The other thread may then execute another strand while the long-latency instructions are in flight. With this approach, the other thread is not required to wait for the long-latency instructions to complete before acquiring hardware resources and initiating execution of the other strand, thereby eliminating at least a portion of the time that the other thread would otherwise spend waiting.

Type: Grant

Filed: August 7, 2013

Date of Patent: May 9, 2017

Assignee: NVIDIA Corporation

Inventors: Mojtaba Mehrara, Michael Garland, Gregory Diamos
Compiler-controlled region scheduling for SIMD execution of threads

Patent number: 9424038

Abstract: A compiler-controlled technique for scheduling threads to execute different regions of a program. A compiler analyzes program code to determine a control flow graph for the program code. The control flow graph contains regions and directed edges between regions. The regions have associated execution priorities. The directed edges indicate the direction of program control flow. Each region has a thread frontier which contains one or more regions. The compiler inserts one or more update predicate mask variable instructions at the end of a region. The compiler also inserts one or more conditional branch instructions at the end of the region. The conditional branch instructions are arranged in order of execution priority of the regions in the thread frontier of the region, to enforce execution priority of the regions at runtime.

Type: Grant

Filed: December 10, 2012

Date of Patent: August 23, 2016

Assignee: NVIDIA Corporation

Inventors: Gregory Diamos, Mojtaba Mehrara
SYSTEMS AND METHODS FOR SPEECH TRANSCRIPTION

Publication number: 20160171974

Abstract: Presented herein are embodiments of state-of-the-art speech recognition systems developed using end-to-end deep learning. In embodiments, the model architecture is significantly simpler than traditional speech systems, which rely on laboriously engineered processing pipelines; these traditional systems also tend to perform poorly when used in noisy environments. In contrast, embodiments of the system do not need hand-designed components to model background noise, reverberation, or speaker variation, but instead directly learn a function that is robust to such effects. A phoneme dictionary, nor even the concept of a “phoneme,” is needed. Embodiments include a well-optimized recurrent neural network (RNN) training system that can use multiple GPUs, as well as a set of novel data synthesis techniques that allows for a large amount of varied data for training to be efficiently obtained.

Type: Application

Filed: June 9, 2015

Publication date: June 16, 2016

Applicant: BAIDU USA LLC

Inventors: Awni Hannun, Carl Case, Jared Casper, Bryan Catanzaro, Gregory Diamos, Erich Elsen, Ryan Prenger, Sanjeev Satheesh, Shubhabrata Sengupta, Adam Coates, Andrew Y. Ng
Compiler-controlled region scheduling for SIMD execution of threads

Patent number: 9274792

Abstract: A compiler-controlled technique for scheduling threads to execute different regions of a program. A compiler analyzes program code to determine a control flow graph for the program code. The control flow graph contains regions and directed edges between regions. The regions have associated execution priorities. The directed edges indicate the direction of program control flow. Each region has a thread frontier which contains one or more regions. The compiler inserts one or more update predicate mask variable instructions at the end of a region. The compiler also inserts one or more conditional branch instructions at the end of the region. The conditional branch instructions are arranged in order of execution priority of the regions in the thread frontier of the region, to enforce execution priority of the regions at runtime.

Type: Grant

Filed: December 10, 2012

Date of Patent: March 1, 2016

Assignee: NVIDIA Corporation

Inventors: Gregory Diamos, Mojtaba Mehrara
Register allocation for clustered multi-level register files

Patent number: 9229717

Abstract: A method for allocating registers within a processing unit. A compiler assigns a plurality of instructions to a plurality of processing clusters. Each instruction is configured to access a first virtual register within a live range. The compiler determines which processing cluster in the plurality of processing clusters is an owner cluster for the first virtual register within the live range. The compiler configures a first instruction included in the plurality of instructions to access a first global virtual register.

Type: Grant

Filed: December 11, 2012

Date of Patent: January 5, 2016

Assignee: NVIDIA Corporation

Inventors: Mojtaba Mehrara, Gregory Diamos
TECHNIQUE FOR GROUPING INSTRUCTIONS INTO INDEPENDENT STRANDS

Publication number: 20150046684

Abstract: A device compiler and linker is configured to group instructions into different strands for execution by different threads based on the dependence of those instructions on other, long-latency instructions. A thread may execute a strand that includes long-latency instructions, and then hardware resources previously allocated for the execution of that thread may be de-allocated from the thread and re-allocated to another thread. The other thread may then execute another strand while the long-latency instructions are in flight. With this approach, the other thread is not required to wait for the long-latency instructions to complete before acquiring hardware resources and initiating execution of the other strand, thereby eliminating at least a portion of the time that the other thread would otherwise spend waiting.

Type: Application

Filed: August 7, 2013

Publication date: February 12, 2015

Applicant: NVIDIA CORPORATION

Inventors: Mojtaba Mehrara, Michael Garland, Gregory Diamos
COMPILER-CONTROLLED REGION SCHEDULING FOR SIMD EXECUTION OF THREADS

Publication number: 20140165049

Abstract: A compiler-controlled technique for scheduling threads to execute different regions of a program. A compiler analyzes program code to determine a control flow graph for the program code. The control flow graph contains regions and directed edges between regions. The regions have associated execution priorities. The directed edges indicate the direction of program control flow. Each region has a thread frontier which contains one or more regions. The compiler inserts one or more update predicate mask variable instructions at the end of a region. The compiler also inserts one or more conditional branch instructions at the end of the region. The conditional branch instructions are arranged in order of execution priority of the regions in the thread frontier of the region, to enforce execution priority of the regions at runtime.

Type: Application

Filed: December 10, 2012

Publication date: June 12, 2014

Applicant: NVIDIA CORPORATION

Inventors: Gregory DIAMOS, Mojtaba MEHRARA
REGISTER ALLOCATION FOR CLUSTERED MULTI-LEVEL REGISTER FILES

Publication number: 20140164745

Abstract: A method for allocating registers within a processing unit. A compiler assigns a plurality of instructions to a plurality of processing clusters. Each instruction is configured to access a first virtual register within a live range. The compiler determines which processing cluster in the plurality of processing clusters is an owner cluster for the first virtual register within the live range. The compiler configures a first instruction included in the plurality of instructions to access a first global virtual register.

Type: Application

Filed: December 11, 2012

Publication date: June 12, 2014

Applicant: NVIDIA CORPORATION

Inventors: Mojtaba MEHRARA, Gregory DIAMOS

prev 1 2