Patents by Inventor Mohammad SHOEYBI

Mohammad SHOEYBI has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

DIALOGUE SYSTEMS USING KNOWLEDGE BASES AND LANGUAGE MODELS FOR AUTOMOTIVE SYSTEMS AND APPLICATIONS

Publication number: 20240095460

Abstract: In various examples, systems and methods that use dialogue systems associated with various machine systems and applications are described. For instance, the systems and methods may receive text data representing speech, such as a question associated with a vehicle or other machine type. The systems and methods then use a retrieval system(s) to retrieve a question/answer pair(s) associated with the text data and/or contextual information associated with the text data. In some examples, the contextual information is associated with a knowledge base associated with or corresponding to the vehicle. The systems and methods then generate a prompt using the text data, the question/answer pair(s), and/or the contextual information. Additionally, the systems and methods determine, using a language model(s) and based at least on the prompt, an output associated with the text data. For instance, the output may include information that answers the question associated with the vehicle.

Type: Application

Filed: September 19, 2022

Publication date: March 21, 2024

Inventors: Peng Xu, Mostofa Patwary, Rajath Shetty, Niral Lalit Pathak, Ratin Kumar, Bryan Catanzaro, Mohammad Shoeybi
NEURAL NETWORK-BASED LANGUAGE RESTRICTION

Publication number: 20240095447

Abstract: Apparatuses, systems, and techniques are presented to identify and prevent generation of restricted content. In at least one embodiment, one or more neural networks are used to identify restricted content based only on the restricted content.

Type: Application

Filed: June 22, 2022

Publication date: March 21, 2024

Inventors: Wei Ping, Boxin Wang, Chaowei Xiao, Mohammad Shoeybi, Mostofa Patwary, Anima Anandkumar, Bryan Catanzaro
Real-time neural text-to-speech

Patent number: 11705107

Abstract: Embodiments of a production-quality text-to-speech (TTS) system constructed from deep neural networks are described. System embodiments comprise five major building blocks: a segmentation model for locating phoneme boundaries, a grapheme-to-phoneme conversion model, a phoneme duration prediction model, a fundamental frequency prediction model, and an audio synthesis model. For embodiments of the segmentation model, phoneme boundary detection was performed with deep neural networks using Connectionist Temporal Classification (CTC) loss. For embodiments of the audio synthesis model, a variant of WaveNet was created that requires fewer parameters and trains faster than the original. By using a neural network for each component, system embodiments are simpler and more flexible than traditional TTS systems, where each component requires laborious feature engineering and extensive domain expertise. Inference with system embodiments may be performed faster than real time.

Type: Grant

Filed: October 1, 2020

Date of Patent: July 18, 2023

Assignee: Baidu USA LLC

Inventors: Sercan O. Arik, Mike Chrzanowski, Adam Coates, Gregory Diamos, Andrew Gibiansky, John Miller, Andrew Ng, Jonathan Raiman, Shubhahrata Sengupta, Mohammad Shoeybi
Systems and methods for trace norm regularization and faster inference for embedded models

Patent number: 11556775

Abstract: Described herein are systems and methods for compressing and speeding up dense matrix multiplications as found, for examples, in the fully connected and recurrent layers of neural networks for embedded large vocabulary continuous speech recognition (LVCSR). For compression, trace norm regularization technique embodiments were introduced and studied for training low rank factored versions of matrix multiplications. Compared to standard low rank training, the methods more consistently lead to good accuracy versus number of parameter trade-offs and can be used to speed-up training of large models. Faster inference may be further enabled on ARM processors through kernels optimized for small batch sizes, resulting in speed ups over the currently used library. Beyond LVCSR, the techniques are also generally applicable to embedded neural networks with large fully connected or recurrent layers.

Type: Grant

Filed: October 3, 2018

Date of Patent: January 17, 2023

Assignee: Baidu USA LLC

Inventors: Markus Kliegl, Siddharth Goyal, Kexin Zhao, Kavya Srinet, Mohammad Shoeybi
VIDEO INTERPOLATION USING ONE OR MORE NEURAL NETWORKS

Publication number: 20210067735

Abstract: Apparatuses, systems, and techniques to enhance video. In at least one embodiment, one or more neural networks are used to create, from a first video, a second video having a higher frame rate, higher resolution, or reduced number of missing or corrupt video frames.

Type: Application

Filed: September 3, 2019

Publication date: March 4, 2021

Inventors: Fitsum Reda, Deqing Sun, Aysegul Dundar, Mohammad Shoeybi, Guilin Liu, Kevin Shih, Andrew Tao, Jan Kautz, Bryan Catanzaro
REAL-TIME NEURAL TEXT-TO-SPEECH

Publication number: 20210027762

Abstract: Embodiments of a production-quality text-to-speech (TTS) system constructed from deep neural networks are described. System embodiments comprise five major building blocks: a segmentation model for locating phoneme boundaries, a grapheme-to-phoneme conversion model, a phoneme duration prediction model, a fundamental frequency prediction model, and an audio synthesis model. For embodiments of the segmentation model, phoneme boundary detection was performed with deep neural networks using Connectionist Temporal Classification (CTC) loss. For embodiments of the audio synthesis model, a variant of WaveNet was created that requires fewer parameters and trains faster than the original. By using a neural network for each component, system embodiments are simpler and more flexible than traditional TTS systems, where each component requires laborious feature engineering and extensive domain expertise. Inference with system embodiments may be performed faster than real time.

Type: Application

Filed: October 1, 2020

Publication date: January 28, 2021

Applicant: Baidu USA LLC

Inventors: Sercan O. ARIK, Mike CHRZANOWSKI, Adam COATES, Gregory DIAMOS, Andrew GIBIANSKY, John MILLER, Andrew NG, Jonathan RAIMAN, Shubhahrata SENGUPTA, Mohammad SHOEYBI
Systems and methods for real-time neural text-to-speech

Patent number: 10872598

Abstract: Embodiments of a production-quality text-to-speech (TTS) system constructed from deep neural networks are described. System embodiments comprise five major building blocks: a segmentation model for locating phoneme boundaries, a grapheme-to-phoneme conversion model, a phoneme duration prediction model, a fundamental frequency prediction model, and an audio synthesis model. For embodiments of the segmentation model, phoneme boundary detection was performed with deep neural networks using Connectionist Temporal Classification (CTC) loss. For embodiments of the audio synthesis model, a variant of WaveNet was created that requires fewer parameters and trains faster than the original. By using a neural network for each component, system embodiments are simpler and more flexible than traditional TTS systems, where each component requires laborious feature engineering and extensive domain expertise. Inference with system embodiments may be performed faster than real time.

Type: Grant

Filed: January 29, 2018

Date of Patent: December 22, 2020

Assignee: Baidu USA LLC

Inventors: Sercan O. Arik, Mike Chrzanowski, Adam Coates, Gregory Diamos, Andrew Gibiansky, John Miller, Andrew Ng, Jonathan Raiman, Shubhahrata Sengupta, Mohammad Shoeybi
SYSTEMS AND METHODS FOR TRACE NORM REGULARIZATION AND FASTER INFERENCE FOR EMBEDDED MODELS

Publication number: 20190122108

Abstract: Described herein are systems and methods for compressing and speeding up dense matrix multiplications as found, for examples, in the fully connected and recurrent layers of neural networks for embedded large vocabulary continuous speech recognition (LVCSR). For compression, trace norm regularization technique embodiments were introduced and studied for training low rank factored versions of matrix multiplications. Compared to standard low rank training, the methods more consistently lead to good accuracy versus number of parameter trade-offs and can be used to speed-up training of large models. Faster inference may be further enabled on ARM processors through kernels optimized for small batch sizes, resulting in speed ups over the currently used library. Beyond LVCSR, the techniques are also generally applicable to embedded neural networks with large fully connected or recurrent layers.

Type: Application

Filed: October 3, 2018

Publication date: April 25, 2019

Applicant: Baidu USA LLC

Inventors: Markus KLIEGL, Siddharth GOYAL, Kexin ZHAO, Kavya SRINET, Mohammad SHOEYBI
SYSTEMS AND METHODS FOR REAL-TIME NEURAL TEXT-TO-SPEECH

Publication number: 20180247636

Abstract: Embodiments of a production-quality text-to-speech (TTS) system constructed from deep neural networks are described. System embodiments comprise five major building blocks: a segmentation model for locating phoneme boundaries, a grapheme-to-phoneme conversion model, a phoneme duration prediction model, a fundamental frequency prediction model, and an audio synthesis model. For embodiments of the segmentation model, phoneme boundary detection was performed with deep neural networks using Connectionist Temporal Classification (CTC) loss. For embodiments of the audio synthesis model, a variant of WaveNet was created that requires fewer parameters and trains faster than the original. By using a neural network for each component, system embodiments are simpler and more flexible than traditional TTS systems, where each component requires laborious feature engineering and extensive domain expertise. Inference with system embodiments may be performed faster than real time.

Type: Application

Filed: January 29, 2018

Publication date: August 30, 2018

Applicant: Baidu USA LLC

Inventors: Sercan O. ARIK, Mike CHRZANOWSKI, Adam COATES, Gregory DIAMOS, Andrew GIBIANSKY, John MILLER, Andrew NG, Jonathan RAIMAN, Shubhahrata SENGUPTA, Mohammad SHOEYBI