Patents by Inventor Siddharth Gururani

Siddharth Gururani has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 12367628
    Abstract: Apparatuses, systems, and techniques are presented to generate digital content. In at least one embodiment, one or more neural networks are used to generate video information based at least in part upon voice information and a combination of image features and facial landmarks corresponding to one or more images of a person.
    Type: Grant
    Filed: September 6, 2022
    Date of Patent: July 22, 2025
    Assignee: NVIDIA Corporation
    Inventors: Siddharth Gururani, Arun Mallya, Ting-Chun Wang, Jose Rafael Valle da Costa, Ming-Yu Liu
  • Patent number: 12340788
    Abstract: A system for use in video game development to generate expressive speech audio comprises a user interface configured to receive user-input text data and a user selection of a speech style. The system includes a machine-learned synthesizer comprising a text encoder, a speech style encoder and a decoder. The machine-learned synthesizer is configured to generate one or more text encodings derived from the user-input text data, using the text encoder of the machine-learned synthesizer; generate a speech style encoding by processing a set of speech style features associated with the selected speech style using the speech style encoder of the machine-learned synthesizer; combine the one or more text encodings and the speech style encoding to generate one or more combined encodings; and decode the one or more combined encodings with the decoder of the machine-learned synthesizer to generate predicted acoustic features.
    Type: Grant
    Filed: May 7, 2024
    Date of Patent: June 24, 2025
    Assignee: ELECTRONIC ARTS INC.
    Inventors: Siddharth Gururani, Kilol Gupta, Dhaval Shah, Zahra Shakeri, Jervis Pinto, Mohsen Sardari, Navid Aghdaie, Kazi Zaman
  • Patent number: 12315491
    Abstract: This specification describes a computer-implemented method of training a machine-learned speech audio generation system to generate predicted acoustic features for generated speech audio for use in a video game. The training comprises receiving one or more training examples. Each training example comprises: (i) ground-truth acoustic features for speech audio, (ii) speech content data representing speech content of the speech audio, and (iii) speech expression data representing speech expression of the speech audio. Parameters of the machine-learned speech audio generation system are updated by: (i) minimizing a measure of difference between the predicted acoustic features for a training example and the corresponding ground-truth acoustic features of the training example, and (ii) minimizing a measure of difference between the predicted prosodic features for the training example and the corresponding ground-truth prosodic features for the training example.
    Type: Grant
    Filed: February 13, 2024
    Date of Patent: May 27, 2025
    Assignee: ELECTRONIC ARTS INC.
    Inventors: Shahab Raji, Siddharth Gururani, Zahra Shakeri, Kilol Gupta, Ping Zhong
  • Patent number: 12233338
    Abstract: This specification describes a computer-implemented method of training a machine-learned speech audio generation system for use in video games. The training comprises: receiving one or more training examples. Each training example comprises: (i) ground-truth acoustic features for speech audio, (ii) speech content data representing speech content of the speech audio, and (iii) a ground-truth speaker identifier for a speaker of the speech audio.
    Type: Grant
    Filed: November 16, 2021
    Date of Patent: February 25, 2025
    Assignee: Electronic Arts Inc.
    Inventors: Ping Zhong, Zahra Shakeri, Siddharth Gururani, Kilol Gupta, Shahab Raji
  • Publication number: 20240290316
    Abstract: A system for use in video game development to generate expressive speech audio comprises a user interface configured to receive user-input text data and a user selection of a speech style. The system includes a machine-learned synthesizer comprising a text encoder, a speech style encoder and a decoder. The machine-learned synthesizer is configured to generate one or more text encodings derived from the user-input text data, using the text encoder of the machine-learned synthesizer; generate a speech style encoding by processing a set of speech style features associated with the selected speech style using the speech style encoder of the machine-learned synthesizer; combine the one or more text encodings and the speech style encoding to generate one or more combined encodings; and decode the one or more combined encodings with the decoder of the machine-learned synthesizer to generate predicted acoustic features.
    Type: Application
    Filed: May 7, 2024
    Publication date: August 29, 2024
    Inventors: Siddharth Gururani, Kilol Gupta, Dhaval Shah, Zahra Shakeri, Jervis Pinto, Mohsen Sardari, Navid Aghdaie, Kazi Zaman
  • Patent number: 12033611
    Abstract: A system for use in video game development to generate expressive speech audio comprises a user interface configured to receive user-input text data and a user selection of a speech style. The system includes a machine-learned synthesizer comprising a text encoder, a speech style encoder and a decoder. The machine-learned synthesizer is configured to generate one or more text encodings derived from the user-input text data, using the text encoder of the machine-learned synthesizer; generate a speech style encoding by processing a set of speech style features associated with the selected speech style using the speech style encoder of the machine-learned synthesizer; combine the one or more text encodings and the speech style encoding to generate one or more combined encodings; and decode the one or more combined encodings with the decoder of the machine-learned synthesizer to generate predicted acoustic features.
    Type: Grant
    Filed: February 28, 2022
    Date of Patent: July 9, 2024
    Assignee: ELECTRONIC ARTS INC.
    Inventors: Siddharth Gururani, Kilol Gupta, Dhaval Shah, Zahra Shakeri, Jervis Pinto, Mohsen Sardari, Navid Aghdaie, Kazi Zaman
  • Publication number: 20240144568
    Abstract: Apparatuses, systems, and techniques are presented to generate digital content. In at least one embodiment, one or more neural networks are used to generate video information based at least in part upon voice information and a combination of image features and facial landmarks corresponding to one or more images of a person.
    Type: Application
    Filed: September 6, 2022
    Publication date: May 2, 2024
    Inventors: Siddharth Gururani, Arun Mallya, Ting-Chun Wang, Jose Rafael Valle da Costa, Ming-Yu Liu
  • Patent number: 11735158
    Abstract: This specification describes systems and methods for aging voice audio, in particular voice audio in computer games. According to one aspect of this specification, there is described a method for aging speech audio data. The method comprises: inputting an initial audio signal and an age embedding into a machine-learned age convertor model, wherein: the initial audio signal comprises speech audio; and the age embedding is based on an age classification of a plurality of speech audio samples of subjects in a target age category; processing, by the machine-learned age convertor model, the initial audio signal and the age embedding to generate an age-altered audio signal, wherein the age-altered audio signal corresponds to a version of the initial audio signal in the target age category; and outputting, from the machine-learned age convertor model, the age-altered audio signal.
    Type: Grant
    Filed: August 11, 2021
    Date of Patent: August 22, 2023
    Assignee: ELECTRONIC ARTS INC.
    Inventors: Kilol Gupta, Zahra Shakeri, Ping Zhong, Siddharth Gururani, Mohsen Sardari
  • Publication number: 20220208170
    Abstract: A system for use in video game development to generate expressive speech audio comprises a user interface configured to receive user-input text data and a user selection of a speech style. The system includes a machine-learned synthesizer comprising a text encoder, a speech style encoder and a decoder. The machine-learned synthesizer is configured to generate one or more text encodings derived from the user-input text data, using the text encoder of the machine-learned synthesizer; generate a speech style encoding by processing a set of speech style features associated with the selected speech style using the speech style encoder of the machine-learned synthesizer; combine the one or more text encodings and the speech style encoding to generate one or more combined encodings; and decode the one or more combined encodings with the decoder of the machine-learned synthesizer to generate predicted acoustic features.
    Type: Application
    Filed: February 28, 2022
    Publication date: June 30, 2022
    Inventors: Siddharth Gururani, Kilol Gupta, Dhaval Shah, Zahra Shakeri, Jervis Pinto, Mohsen Sardari, Navid Aghdaie, Kazi Zaman
  • Patent number: 11295721
    Abstract: A system for use in video game development to generate expressive speech audio comprises a user interface configured to receive user-input text data and a user selection of a speech style. The system includes a machine-learned synthesizer comprising a text encoder, a speech style encoder and a decoder. The machine-learned synthesizer is configured to generate one or more text encodings derived from the user-input text data, using the text encoder of the machine-learned synthesizer; generate a speech style encoding by processing a set of speech style features associated with the selected speech style using the speech style encoder of the machine-learned synthesizer; combine the one or more text encodings and the speech style encoding to generate one or more combined encodings; and decode the one or more combined encodings with the decoder of the machine-learned synthesizer to generate predicted acoustic features.
    Type: Grant
    Filed: April 3, 2020
    Date of Patent: April 5, 2022
    Assignee: ELECTRONIC ARTS INC.
    Inventors: Siddharth Gururani, Kilol Gupta, Dhaval Shah, Zahra Shakeri, Jervis Pinto, Mohsen Sardari, Navid Aghdaie, Kazi Zaman
  • Publication number: 20210151029
    Abstract: A system for use in video game development to generate expressive speech audio comprises a user interface configured to receive user-input text data and a user selection of a speech style. The system includes a machine-learned synthesizer comprising a text encoder, a speech style encoder and a decoder. The machine-learned synthesizer is configured to generate one or more text encodings derived from the user-input text data, using the text encoder of the machine-learned synthesizer; generate a speech style encoding by processing a set of speech style features associated with the selected speech style using the speech style encoder of the machine-learned synthesizer; combine the one or more text encodings and the speech style encoding to generate one or more combined encodings; and decode the one or more combined encodings with the decoder of the machine-learned synthesizer to generate predicted acoustic features.
    Type: Application
    Filed: April 3, 2020
    Publication date: May 20, 2021
    Inventors: Siddharth Gururani, Kilol Gupta, Dhaval Shah, Zahra Shakeri, Jervis Pinto, Mohsen Sardari, Navid Aghdaie, Kazi Zaman