Patents by Inventor Rohan Badlani

Rohan Badlani has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20250118286
    Abstract: In various examples, synthesizing speech in multiple languages in conversational AI systems and applications is described herein. Systems and methods are disclosed that use one or more models to synthesize speech from a first language spoken by a speaker to a second, target language selected by the speaker. In some examples, to perform the translation, the model(s) may disentangle one or more attributes associated with speech from speakers, such as speakers' identities, speakers' accents, and text associated with the speech. Additionally, the model(s) may allow for fine-grained control of additional attributes associated with output speech, such as one or more frequencies, one or more energies, and one or more phoneme durations. Furthermore, the model(s) may be configured to use the accent associated with the target language when generating text, such as when aligning text encodings with one or more phonemes.
    Type: Application
    Filed: October 9, 2023
    Publication date: April 10, 2025
    Inventors: Rohan Badlani, José Rafael Valle Gomves da Costa, Kevin Jonathan Shih, Bryan Catanzaro
  • Publication number: 20240038212
    Abstract: Disclosed are apparatuses, systems, and techniques that may use machine learning for implementing generative text-to-speech models. The techniques include identifying a mapping of speech characteristics (SC) on a target distribution of a latent variable using a non-linear transformation for at least a subset of the SC. Parameters of the non-linear transformation are determined using a neural network that approximates a statistics of the SC with a statistics predicted for the SC based on the identified mapping and the target distribution of the latent variable.
    Type: Application
    Filed: January 20, 2023
    Publication date: February 1, 2024
    Inventors: Kevin Shih, José Rafael Valle Gomes da Costa, Rohan Badlani, João Felipe Santos, Bryan Catanzaro
  • Patent number: 11869483
    Abstract: Generation of synthetic speech from an input text sequence may be difficult when durations of individual phonemes forming the input text sequence are unknown. A predominantly parallel process may model speech rhythm as a separate generative distribution such that phoneme duration may be sampled at inference. Additional information such as pitch or energy may also be sampled to provide improved diversity for synthetic speech generation.
    Type: Grant
    Filed: October 7, 2021
    Date of Patent: January 9, 2024
    Assignee: Nvidia Corporation
    Inventors: Kevin Shih, Jose Rafael Valle Gomes da Costa, Rohan Badlani, Adrian Lancucki, Wei Ping, Bryan Catanzaro
  • Publication number: 20230419947
    Abstract: Generation of synthetic speech from an input text sequence may be difficult when durations of individual phonemes forming the input text sequence are unknown. A predominantly parallel process may model speech rhythm as a separate generative distribution such that phoneme duration may be sampled at inference. Additional information such as pitch or energy may also be sampled to provide improved diversity for synthetic speech generation.
    Type: Application
    Filed: August 15, 2023
    Publication date: December 28, 2023
    Inventors: Kevin Shih, Jose Rafael Valle Gomes da Costa, Rohan Badlani, Adrian Lancucki, Wei Ping, Bryan Catanzaro
  • Publication number: 20230402028
    Abstract: Generation of synthetic speech from an input text sequence may be difficult when durations of individual phonemes forming the input text sequence are unknown. A predominantly parallel process may model speech rhythm as a separate generative distribution such that phoneme duration may be sampled at inference. Additional information such as pitch or energy may also be sampled to provide improved diversity for synthetic speech generation.
    Type: Application
    Filed: August 28, 2023
    Publication date: December 14, 2023
    Inventors: Kevin Shih, Jose Rafael Valle Gomes da Costa, Rohan Badlani, Adrian Lancucki, Wei Ping, Bryan Catanzaro
  • Patent number: 11769481
    Abstract: Generation of synthetic speech from an input text sequence may be difficult when durations of individual phonemes forming the input text sequence are unknown. A predominantly parallel process may model speech rhythm as a separate generative distribution such that phoneme duration may be sampled at inference. Additional information such as pitch or energy may also be sampled to provide improved diversity for synthetic speech generation.
    Type: Grant
    Filed: October 7, 2021
    Date of Patent: September 26, 2023
    Assignee: Nvidia Corporation
    Inventors: Kevin Shih, Jose Rafael Valle Gomes da Costa, Rohan Badlani, Adrian Lancucki, Wei Ping, Bryan Catanzaro
  • Publication number: 20230110905
    Abstract: Generation of synthetic speech from an input text sequence may be difficult when durations of individual phonemes forming the input text sequence are unknown. A predominantly parallel process may model speech rhythm as a separate generative distribution such that phoneme duration may be sampled at inference. Additional information such as pitch or energy may also be sampled to provide improved diversity for synthetic speech generation.
    Type: Application
    Filed: October 7, 2021
    Publication date: April 13, 2023
    Inventors: Kevin Shih, Jose Rafael Valle Gomes da Costa, Rohan Badlani, Adrian Lancucki, Wei Ping, Bryan Catanzaro
  • Publication number: 20230113950
    Abstract: Generation of synthetic speech from an input text sequence may be difficult when durations of individual phonemes forming the input text sequence are unknown. A predominantly parallel process may model speech rhythm as a separate generative distribution such that phoneme duration may be sampled at inference. Additional information such as pitch or energy may also be sampled to provide improved diversity for synthetic speech generation.
    Type: Application
    Filed: October 7, 2021
    Publication date: April 13, 2023
    Inventors: Kevin Shih, Jose Rafael Valle Gomes da Costa, Rohan Badlani, Adrian Lancucki, Wei Ping, Bryan Catanzaro
  • Publication number: 20230035306
    Abstract: Apparatuses, systems, and techniques are presented to generate media content.
    Type: Application
    Filed: July 21, 2021
    Publication date: February 2, 2023
    Inventors: Ming-Yu Liu, Koki Nagano, Yeongho Seol, Jose Rafael Valle Gomes da Costa, Jaewoo Seo, Ting-Chun Wang, Arun Mallya, Sameh Khamis, Wei Ping, Rohan Badlani, Kevin Jonathan Shih, Bryan Catanzaro, Simon Yuen, Jan Kautz
  • Patent number: 11169786
    Abstract: Implementations are described herein for generating embeddings of source code using both the language and graph domains, and leveraging combinations of these semantically-rich and structurally-informative embeddings for various purposes. In various implementations, tokens of a source code snippet may be applied as input across a sequence-processing machine learning model to generate a plurality of token embeddings. A graph may also be generated based on the source code snippet. A joint representation may be generated based on the graph and the incorporated token embeddings. The joint representation generated from the source code snippet may be compared to one or more other joint representations generated from one or more other source code snippets to make a determination about the source code snippet.
    Type: Grant
    Filed: February 4, 2020
    Date of Patent: November 9, 2021
    Assignee: X DEVELOPMENT LLC
    Inventors: Rohan Badlani, Owen Lewis, Georgios Evangelopoulos, Olivia Hatalsky, Bin Ni
  • Publication number: 20210240453
    Abstract: Implementations are described herein for generating embeddings of source code using both the language and graph domains, and leveraging combinations of these semantically-rich and structurally-informative embeddings for various purposes. In various implementations, tokens of a source code snippet may be applied as input across a sequence-processing machine learning model to generate a plurality of token embeddings. A graph may also be generated based on the source code snippet. A joint representation may be generated based on the graph and the incorporated token embeddings. The joint representation generated from the source code snippet may be compared to one or more other joint representations generated from one or more other source code snippets to make a determination about the source code snippet.
    Type: Application
    Filed: February 4, 2020
    Publication date: August 5, 2021
    Inventors: Rohan Badlani, Owen Lewis, Georgios Evangelopoulos, Olivia Hatalsky, Bin Ni