Patents by Inventor Rohan Badlani

Rohan Badlani has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

SYNTHESIZING SPEECH IN MULTIPLE LANGUAGES IN CONVERSATIONAL AI SYSTEMS AND APPLICATIONS

Publication number: 20250118286

Abstract: In various examples, synthesizing speech in multiple languages in conversational AI systems and applications is described herein. Systems and methods are disclosed that use one or more models to synthesize speech from a first language spoken by a speaker to a second, target language selected by the speaker. In some examples, to perform the translation, the model(s) may disentangle one or more attributes associated with speech from speakers, such as speakers' identities, speakers' accents, and text associated with the speech. Additionally, the model(s) may allow for fine-grained control of additional attributes associated with output speech, such as one or more frequencies, one or more energies, and one or more phoneme durations. Furthermore, the model(s) may be configured to use the accent associated with the target language when generating text, such as when aligning text encodings with one or more phonemes.

Type: Application

Filed: October 9, 2023

Publication date: April 10, 2025

Inventors: Rohan Badlani, José Rafael Valle Gomves da Costa, Kevin Jonathan Shih, Bryan Catanzaro
NORMALIZING FLOWS WITH NEURAL SPLINES FOR HIGH-QUALITY SPEECH SYNTHESIS

Publication number: 20240038212

Abstract: Disclosed are apparatuses, systems, and techniques that may use machine learning for implementing generative text-to-speech models. The techniques include identifying a mapping of speech characteristics (SC) on a target distribution of a latent variable using a non-linear transformation for at least a subset of the SC. Parameters of the non-linear transformation are determined using a neural network that approximates a statistics of the SC with a statistics predicted for the SC based on the identified mapping and the target distribution of the latent variable.

Type: Application

Filed: January 20, 2023

Publication date: February 1, 2024

Inventors: Kevin Shih, José Rafael Valle Gomes da Costa, Rohan Badlani, João Felipe Santos, Bryan Catanzaro
Unsupervised alignment for text to speech synthesis using neural networks

Patent number: 11869483

Abstract: Generation of synthetic speech from an input text sequence may be difficult when durations of individual phonemes forming the input text sequence are unknown. A predominantly parallel process may model speech rhythm as a separate generative distribution such that phoneme duration may be sampled at inference. Additional information such as pitch or energy may also be sampled to provide improved diversity for synthetic speech generation.

Type: Grant

Filed: October 7, 2021

Date of Patent: January 9, 2024

Assignee: Nvidia Corporation

Inventors: Kevin Shih, Jose Rafael Valle Gomes da Costa, Rohan Badlani, Adrian Lancucki, Wei Ping, Bryan Catanzaro
UNSUPERVISED ALIGNMENT FOR TEXT TO SPEECH SYNTHESIS USING NEURAL NETWORKS

Publication number: 20230419947

Abstract: Generation of synthetic speech from an input text sequence may be difficult when durations of individual phonemes forming the input text sequence are unknown. A predominantly parallel process may model speech rhythm as a separate generative distribution such that phoneme duration may be sampled at inference. Additional information such as pitch or energy may also be sampled to provide improved diversity for synthetic speech generation.

Type: Application

Filed: August 15, 2023

Publication date: December 28, 2023

Inventors: Kevin Shih, Jose Rafael Valle Gomes da Costa, Rohan Badlani, Adrian Lancucki, Wei Ping, Bryan Catanzaro
UNSUPERVISED ALIGNMENT FOR TEXT TO SPEECH SYNTHESIS USING NEURAL NETWORKS

Publication number: 20230402028

Abstract: Generation of synthetic speech from an input text sequence may be difficult when durations of individual phonemes forming the input text sequence are unknown. A predominantly parallel process may model speech rhythm as a separate generative distribution such that phoneme duration may be sampled at inference. Additional information such as pitch or energy may also be sampled to provide improved diversity for synthetic speech generation.

Type: Application

Filed: August 28, 2023

Publication date: December 14, 2023

Inventors: Kevin Shih, Jose Rafael Valle Gomes da Costa, Rohan Badlani, Adrian Lancucki, Wei Ping, Bryan Catanzaro
Unsupervised alignment for text to speech synthesis using neural networks

Patent number: 11769481

Abstract: Generation of synthetic speech from an input text sequence may be difficult when durations of individual phonemes forming the input text sequence are unknown. A predominantly parallel process may model speech rhythm as a separate generative distribution such that phoneme duration may be sampled at inference. Additional information such as pitch or energy may also be sampled to provide improved diversity for synthetic speech generation.

Type: Grant

Filed: October 7, 2021

Date of Patent: September 26, 2023

Assignee: Nvidia Corporation

Inventors: Kevin Shih, Jose Rafael Valle Gomes da Costa, Rohan Badlani, Adrian Lancucki, Wei Ping, Bryan Catanzaro
UNSUPERVISED ALIGNMENT FOR TEXT TO SPEECH SYNTHESIS USING NEURAL NETWORKS

Publication number: 20230110905

Abstract: Generation of synthetic speech from an input text sequence may be difficult when durations of individual phonemes forming the input text sequence are unknown. A predominantly parallel process may model speech rhythm as a separate generative distribution such that phoneme duration may be sampled at inference. Additional information such as pitch or energy may also be sampled to provide improved diversity for synthetic speech generation.

Type: Application

Filed: October 7, 2021

Publication date: April 13, 2023

Inventors: Kevin Shih, Jose Rafael Valle Gomes da Costa, Rohan Badlani, Adrian Lancucki, Wei Ping, Bryan Catanzaro
UNSUPERVISED ALIGNMENT FOR TEXT TO SPEECH SYNTHESIS USING NEURAL NETWORKS

Publication number: 20230113950

Abstract: Generation of synthetic speech from an input text sequence may be difficult when durations of individual phonemes forming the input text sequence are unknown. A predominantly parallel process may model speech rhythm as a separate generative distribution such that phoneme duration may be sampled at inference. Additional information such as pitch or energy may also be sampled to provide improved diversity for synthetic speech generation.

Type: Application

Filed: October 7, 2021

Publication date: April 13, 2023

Inventors: Kevin Shih, Jose Rafael Valle Gomes da Costa, Rohan Badlani, Adrian Lancucki, Wei Ping, Bryan Catanzaro
SYNTHESIZING VIDEO FROM AUDIO USING ONE OR MORE NEURAL NETWORKS

Publication number: 20230035306

Abstract: Apparatuses, systems, and techniques are presented to generate media content.

Type: Application

Filed: July 21, 2021

Publication date: February 2, 2023

Inventors: Ming-Yu Liu, Koki Nagano, Yeongho Seol, Jose Rafael Valle Gomes da Costa, Jaewoo Seo, Ting-Chun Wang, Arun Mallya, Sameh Khamis, Wei Ping, Rohan Badlani, Kevin Jonathan Shih, Bryan Catanzaro, Simon Yuen, Jan Kautz
Generating and using joint representations of source code

Patent number: 11169786

Abstract: Implementations are described herein for generating embeddings of source code using both the language and graph domains, and leveraging combinations of these semantically-rich and structurally-informative embeddings for various purposes. In various implementations, tokens of a source code snippet may be applied as input across a sequence-processing machine learning model to generate a plurality of token embeddings. A graph may also be generated based on the source code snippet. A joint representation may be generated based on the graph and the incorporated token embeddings. The joint representation generated from the source code snippet may be compared to one or more other joint representations generated from one or more other source code snippets to make a determination about the source code snippet.

Type: Grant

Filed: February 4, 2020

Date of Patent: November 9, 2021

Assignee: X DEVELOPMENT LLC

Inventors: Rohan Badlani, Owen Lewis, Georgios Evangelopoulos, Olivia Hatalsky, Bin Ni
GENERATING AND USING JOINT REPRESENTATIONS OF SOURCE CODE

Publication number: 20210240453

Abstract: Implementations are described herein for generating embeddings of source code using both the language and graph domains, and leveraging combinations of these semantically-rich and structurally-informative embeddings for various purposes. In various implementations, tokens of a source code snippet may be applied as input across a sequence-processing machine learning model to generate a plurality of token embeddings. A graph may also be generated based on the source code snippet. A joint representation may be generated based on the graph and the incorporated token embeddings. The joint representation generated from the source code snippet may be compared to one or more other joint representations generated from one or more other source code snippets to make a determination about the source code snippet.

Type: Application

Filed: February 4, 2020

Publication date: August 5, 2021

Inventors: Rohan Badlani, Owen Lewis, Georgios Evangelopoulos, Olivia Hatalsky, Bin Ni