Patents by Inventor Siddharth Gururani
Siddharth Gururani has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 12367628Abstract: Apparatuses, systems, and techniques are presented to generate digital content. In at least one embodiment, one or more neural networks are used to generate video information based at least in part upon voice information and a combination of image features and facial landmarks corresponding to one or more images of a person.Type: GrantFiled: September 6, 2022Date of Patent: July 22, 2025Assignee: NVIDIA CorporationInventors: Siddharth Gururani, Arun Mallya, Ting-Chun Wang, Jose Rafael Valle da Costa, Ming-Yu Liu
-
Patent number: 12340788Abstract: A system for use in video game development to generate expressive speech audio comprises a user interface configured to receive user-input text data and a user selection of a speech style. The system includes a machine-learned synthesizer comprising a text encoder, a speech style encoder and a decoder. The machine-learned synthesizer is configured to generate one or more text encodings derived from the user-input text data, using the text encoder of the machine-learned synthesizer; generate a speech style encoding by processing a set of speech style features associated with the selected speech style using the speech style encoder of the machine-learned synthesizer; combine the one or more text encodings and the speech style encoding to generate one or more combined encodings; and decode the one or more combined encodings with the decoder of the machine-learned synthesizer to generate predicted acoustic features.Type: GrantFiled: May 7, 2024Date of Patent: June 24, 2025Assignee: ELECTRONIC ARTS INC.Inventors: Siddharth Gururani, Kilol Gupta, Dhaval Shah, Zahra Shakeri, Jervis Pinto, Mohsen Sardari, Navid Aghdaie, Kazi Zaman
-
Patent number: 12315491Abstract: This specification describes a computer-implemented method of training a machine-learned speech audio generation system to generate predicted acoustic features for generated speech audio for use in a video game. The training comprises receiving one or more training examples. Each training example comprises: (i) ground-truth acoustic features for speech audio, (ii) speech content data representing speech content of the speech audio, and (iii) speech expression data representing speech expression of the speech audio. Parameters of the machine-learned speech audio generation system are updated by: (i) minimizing a measure of difference between the predicted acoustic features for a training example and the corresponding ground-truth acoustic features of the training example, and (ii) minimizing a measure of difference between the predicted prosodic features for the training example and the corresponding ground-truth prosodic features for the training example.Type: GrantFiled: February 13, 2024Date of Patent: May 27, 2025Assignee: ELECTRONIC ARTS INC.Inventors: Shahab Raji, Siddharth Gururani, Zahra Shakeri, Kilol Gupta, Ping Zhong
-
Patent number: 12233338Abstract: This specification describes a computer-implemented method of training a machine-learned speech audio generation system for use in video games. The training comprises: receiving one or more training examples. Each training example comprises: (i) ground-truth acoustic features for speech audio, (ii) speech content data representing speech content of the speech audio, and (iii) a ground-truth speaker identifier for a speaker of the speech audio.Type: GrantFiled: November 16, 2021Date of Patent: February 25, 2025Assignee: Electronic Arts Inc.Inventors: Ping Zhong, Zahra Shakeri, Siddharth Gururani, Kilol Gupta, Shahab Raji
-
Publication number: 20240290316Abstract: A system for use in video game development to generate expressive speech audio comprises a user interface configured to receive user-input text data and a user selection of a speech style. The system includes a machine-learned synthesizer comprising a text encoder, a speech style encoder and a decoder. The machine-learned synthesizer is configured to generate one or more text encodings derived from the user-input text data, using the text encoder of the machine-learned synthesizer; generate a speech style encoding by processing a set of speech style features associated with the selected speech style using the speech style encoder of the machine-learned synthesizer; combine the one or more text encodings and the speech style encoding to generate one or more combined encodings; and decode the one or more combined encodings with the decoder of the machine-learned synthesizer to generate predicted acoustic features.Type: ApplicationFiled: May 7, 2024Publication date: August 29, 2024Inventors: Siddharth Gururani, Kilol Gupta, Dhaval Shah, Zahra Shakeri, Jervis Pinto, Mohsen Sardari, Navid Aghdaie, Kazi Zaman
-
Patent number: 12033611Abstract: A system for use in video game development to generate expressive speech audio comprises a user interface configured to receive user-input text data and a user selection of a speech style. The system includes a machine-learned synthesizer comprising a text encoder, a speech style encoder and a decoder. The machine-learned synthesizer is configured to generate one or more text encodings derived from the user-input text data, using the text encoder of the machine-learned synthesizer; generate a speech style encoding by processing a set of speech style features associated with the selected speech style using the speech style encoder of the machine-learned synthesizer; combine the one or more text encodings and the speech style encoding to generate one or more combined encodings; and decode the one or more combined encodings with the decoder of the machine-learned synthesizer to generate predicted acoustic features.Type: GrantFiled: February 28, 2022Date of Patent: July 9, 2024Assignee: ELECTRONIC ARTS INC.Inventors: Siddharth Gururani, Kilol Gupta, Dhaval Shah, Zahra Shakeri, Jervis Pinto, Mohsen Sardari, Navid Aghdaie, Kazi Zaman
-
Publication number: 20240144568Abstract: Apparatuses, systems, and techniques are presented to generate digital content. In at least one embodiment, one or more neural networks are used to generate video information based at least in part upon voice information and a combination of image features and facial landmarks corresponding to one or more images of a person.Type: ApplicationFiled: September 6, 2022Publication date: May 2, 2024Inventors: Siddharth Gururani, Arun Mallya, Ting-Chun Wang, Jose Rafael Valle da Costa, Ming-Yu Liu
-
Patent number: 11735158Abstract: This specification describes systems and methods for aging voice audio, in particular voice audio in computer games. According to one aspect of this specification, there is described a method for aging speech audio data. The method comprises: inputting an initial audio signal and an age embedding into a machine-learned age convertor model, wherein: the initial audio signal comprises speech audio; and the age embedding is based on an age classification of a plurality of speech audio samples of subjects in a target age category; processing, by the machine-learned age convertor model, the initial audio signal and the age embedding to generate an age-altered audio signal, wherein the age-altered audio signal corresponds to a version of the initial audio signal in the target age category; and outputting, from the machine-learned age convertor model, the age-altered audio signal.Type: GrantFiled: August 11, 2021Date of Patent: August 22, 2023Assignee: ELECTRONIC ARTS INC.Inventors: Kilol Gupta, Zahra Shakeri, Ping Zhong, Siddharth Gururani, Mohsen Sardari
-
Publication number: 20220208170Abstract: A system for use in video game development to generate expressive speech audio comprises a user interface configured to receive user-input text data and a user selection of a speech style. The system includes a machine-learned synthesizer comprising a text encoder, a speech style encoder and a decoder. The machine-learned synthesizer is configured to generate one or more text encodings derived from the user-input text data, using the text encoder of the machine-learned synthesizer; generate a speech style encoding by processing a set of speech style features associated with the selected speech style using the speech style encoder of the machine-learned synthesizer; combine the one or more text encodings and the speech style encoding to generate one or more combined encodings; and decode the one or more combined encodings with the decoder of the machine-learned synthesizer to generate predicted acoustic features.Type: ApplicationFiled: February 28, 2022Publication date: June 30, 2022Inventors: Siddharth Gururani, Kilol Gupta, Dhaval Shah, Zahra Shakeri, Jervis Pinto, Mohsen Sardari, Navid Aghdaie, Kazi Zaman
-
Patent number: 11295721Abstract: A system for use in video game development to generate expressive speech audio comprises a user interface configured to receive user-input text data and a user selection of a speech style. The system includes a machine-learned synthesizer comprising a text encoder, a speech style encoder and a decoder. The machine-learned synthesizer is configured to generate one or more text encodings derived from the user-input text data, using the text encoder of the machine-learned synthesizer; generate a speech style encoding by processing a set of speech style features associated with the selected speech style using the speech style encoder of the machine-learned synthesizer; combine the one or more text encodings and the speech style encoding to generate one or more combined encodings; and decode the one or more combined encodings with the decoder of the machine-learned synthesizer to generate predicted acoustic features.Type: GrantFiled: April 3, 2020Date of Patent: April 5, 2022Assignee: ELECTRONIC ARTS INC.Inventors: Siddharth Gururani, Kilol Gupta, Dhaval Shah, Zahra Shakeri, Jervis Pinto, Mohsen Sardari, Navid Aghdaie, Kazi Zaman
-
Publication number: 20210151029Abstract: A system for use in video game development to generate expressive speech audio comprises a user interface configured to receive user-input text data and a user selection of a speech style. The system includes a machine-learned synthesizer comprising a text encoder, a speech style encoder and a decoder. The machine-learned synthesizer is configured to generate one or more text encodings derived from the user-input text data, using the text encoder of the machine-learned synthesizer; generate a speech style encoding by processing a set of speech style features associated with the selected speech style using the speech style encoder of the machine-learned synthesizer; combine the one or more text encodings and the speech style encoding to generate one or more combined encodings; and decode the one or more combined encodings with the decoder of the machine-learned synthesizer to generate predicted acoustic features.Type: ApplicationFiled: April 3, 2020Publication date: May 20, 2021Inventors: Siddharth Gururani, Kilol Gupta, Dhaval Shah, Zahra Shakeri, Jervis Pinto, Mohsen Sardari, Navid Aghdaie, Kazi Zaman