Patents by Inventor Kilol Gupta

Kilol Gupta has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

CHAIN OF THOUGHT REASONING FOR ASR

Publication number: 20250118293

Abstract: A method includes receiving a conversational training dataset including a plurality of conversational training samples, each training sample associated with a corresponding conversation and including: corresponding audio data characterizing a corresponding current utterance spoken by a user during a current turn in the corresponding conversation; a corresponding context for the corresponding current utterance including a transcript of a previous turn in the corresponding conversation that precedes the current turn; a corresponding ground-truth transcription of the corresponding current utterance; and a CoT annotation representing a corresponding logical relationship between the corresponding current utterance and the previous turn.

Type: Application

Filed: September 20, 2024

Publication date: April 10, 2025

Applicant: Google LLC

Inventors: Mingqing Chen, Rajiv Mathews, Andrew Hard, Swaroop Ramaswamy, Kilol Gupta
Robust speech audio generation for video games

Patent number: 12233338

Abstract: This specification describes a computer-implemented method of training a machine-learned speech audio generation system for use in video games. The training comprises: receiving one or more training examples. Each training example comprises: (i) ground-truth acoustic features for speech audio, (ii) speech content data representing speech content of the speech audio, and (iii) a ground-truth speaker identifier for a speaker of the speech audio.

Type: Grant

Filed: November 16, 2021

Date of Patent: February 25, 2025

Assignee: Electronic Arts Inc.

Inventors: Ping Zhong, Zahra Shakeri, Siddharth Gururani, Kilol Gupta, Shahab Raji
Automated pipeline selection for synthesis of audio assets

Patent number: 12159618

Abstract: An example method of automated selection of audio asset synthesizing pipelines includes: receiving an audio stream comprising human speech; determining one or more features of the audio stream; selecting, based on the one or more features of the audio stream, an audio asset synthesizing pipeline; training, using the audio stream, one or more audio asset synthesizing models implementing respective stages of the selected audio asset synthesizing pipeline; and responsive to determining that a quality metric of the audio asset synthesizing pipeline satisfies a predetermined quality condition, synthesizing one or more audio assets by the selected audio asset synthesizing pipeline.

Type: Grant

Filed: October 20, 2022

Date of Patent: December 3, 2024

Assignee: Electronic Arts Inc.

Inventors: Kilol Gupta, Tushar Agarwal, Zahra Shakeri, Mohsen Sardari, Harold Henry Chaput, Navid Aghdaie
Generating Expressive Speech Audio From Text Data

Publication number: 20240290316

Abstract: A system for use in video game development to generate expressive speech audio comprises a user interface configured to receive user-input text data and a user selection of a speech style. The system includes a machine-learned synthesizer comprising a text encoder, a speech style encoder and a decoder. The machine-learned synthesizer is configured to generate one or more text encodings derived from the user-input text data, using the text encoder of the machine-learned synthesizer; generate a speech style encoding by processing a set of speech style features associated with the selected speech style using the speech style encoder of the machine-learned synthesizer; combine the one or more text encodings and the speech style encoding to generate one or more combined encodings; and decode the one or more combined encodings with the decoder of the machine-learned synthesizer to generate predicted acoustic features.

Type: Application

Filed: May 7, 2024

Publication date: August 29, 2024

Inventors: Siddharth Gururani, Kilol Gupta, Dhaval Shah, Zahra Shakeri, Jervis Pinto, Mohsen Sardari, Navid Aghdaie, Kazi Zaman
Generating expressive speech audio from text data

Patent number: 12033611

Abstract: A system for use in video game development to generate expressive speech audio comprises a user interface configured to receive user-input text data and a user selection of a speech style. The system includes a machine-learned synthesizer comprising a text encoder, a speech style encoder and a decoder. The machine-learned synthesizer is configured to generate one or more text encodings derived from the user-input text data, using the text encoder of the machine-learned synthesizer; generate a speech style encoding by processing a set of speech style features associated with the selected speech style using the speech style encoder of the machine-learned synthesizer; combine the one or more text encodings and the speech style encoding to generate one or more combined encodings; and decode the one or more combined encodings with the decoder of the machine-learned synthesizer to generate predicted acoustic features.

Type: Grant

Filed: February 28, 2022

Date of Patent: July 9, 2024

Assignee: ELECTRONIC ARTS INC.

Inventors: Siddharth Gururani, Kilol Gupta, Dhaval Shah, Zahra Shakeri, Jervis Pinto, Mohsen Sardari, Navid Aghdaie, Kazi Zaman
Generating speech in the voice of a player of a video game

Patent number: 11790884

Abstract: A computer-implemented method of generating speech audio in a video game is provided. The method includes inputting, into a synthesizer module, input data that represents speech content. Source acoustic features for the speech content in the voice of a source speaker are generated and are input, along with a speaker embedding associated with a player of the video game into an acoustic feature encoder of a voice convertor. One or more acoustic feature encodings are generated as output of the acoustic feature encoder, which are inputted into an acoustic feature decoder of the voice convertor to generate target acoustic features. The target acoustic features are processed with one or more modules, to generate speech audio in the voice of the player.

Type: Grant

Filed: October 28, 2020

Date of Patent: October 17, 2023

Assignee: ELECTRONIC ARTS INC.

Inventors: Zahra Shakeri, Jervis Pinto, Kilol Gupta, Mohsen Sardari, Harold Chaput, Navid Aghdaie, Kenneth Moss
Voice aging using machine learning

Patent number: 11735158

Abstract: This specification describes systems and methods for aging voice audio, in particular voice audio in computer games. According to one aspect of this specification, there is described a method for aging speech audio data. The method comprises: inputting an initial audio signal and an age embedding into a machine-learned age convertor model, wherein: the initial audio signal comprises speech audio; and the age embedding is based on an age classification of a plurality of speech audio samples of subjects in a target age category; processing, by the machine-learned age convertor model, the initial audio signal and the age embedding to generate an age-altered audio signal, wherein the age-altered audio signal corresponds to a version of the initial audio signal in the target age category; and outputting, from the machine-learned age convertor model, the age-altered audio signal.

Type: Grant

Filed: August 11, 2021

Date of Patent: August 22, 2023

Assignee: ELECTRONIC ARTS INC.

Inventors: Kilol Gupta, Zahra Shakeri, Ping Zhong, Siddharth Gururani, Mohsen Sardari
Speaker conversion for video games

Patent number: 11605388

Abstract: This specification describes a computer-implemented method of generating speech audio for use in a video game, wherein the speech audio is generated using a voice convertor that has been trained to convert audio data for a source speaker into audio data for a target speaker. The method comprises receiving: (i) source speech audio, and (ii) a target speaker identifier. The source speech audio comprises speech content in the voice of a source speaker. Source acoustic features are determined for the source speech audio. A target speaker embedding associated with the target speaker identifier is generated as output of a speaker encoder of the voice convertor. The target speaker embedding and the source acoustic features are inputted into an acoustic feature encoder of the voice convertor. One or more acoustic feature encodings are generated as output of the acoustic feature encoder. The one or more acoustic feature encodings are derived from the target speaker embedding and the source acoustic features.

Type: Grant

Filed: November 9, 2020

Date of Patent: March 14, 2023

Assignee: Electronic Arts Inc.

Inventors: Kilol Gupta, Dhaval Shah, Zahra Shakeri, Jervis Pinto, Mohsen Sardari, Harold Chaput, Navid Aghdaie, Kazi Zaman
AUTOMATED PIPELINE SELECTION FOR SYNTHESIS OF AUDIO ASSETS

Publication number: 20230039540

Abstract: An example method of automated selection of audio asset synthesizing pipelines includes: receiving an audio stream comprising human speech; determining one or more features of the audio stream; selecting, based on the one or more features of the audio stream, an audio asset synthesizing pipeline; training, using the audio stream, one or more audio asset synthesizing models implementing respective stages of the selected audio asset synthesizing pipeline; and responsive to determining that a quality metric of the audio asset synthesizing pipeline satisfies a predetermined quality condition, synthesizing one or more audio assets by the selected audio asset synthesizing pipeline.

Type: Application

Filed: October 20, 2022

Publication date: February 9, 2023

Inventors: Kilol Gupta, Tushar Agarwal, Zahra Shakeri, Mohsen Sardari, Harold Henry Chaput, Navid Aghdaie
Automated pipeline selection for synthesis of audio assets

Patent number: 11521594

Abstract: An example method of automated selection of audio asset synthesizing pipelines includes: receiving an audio stream comprising human speech; determining one or more features of the audio stream; selecting, based on the one or more features of the audio stream, an audio asset synthesizing pipeline; training, using the audio stream, one or more audio asset synthesizing models implementing respective stages of the selected audio asset synthesizing pipeline; and responsive to determining that a quality metric of the audio asset synthesizing pipeline satisfies a predetermined quality condition, synthesizing one or more audio assets by the selected audio asset synthesizing pipeline.

Type: Grant

Filed: November 10, 2020

Date of Patent: December 6, 2022

Assignee: Electronic Arts Inc.

Inventors: Kilol Gupta, Tushar Agarwal, Zahra Shakeri, Mohsen Sardari, Harold Henry Chaput, Navid Aghdaie
Generating Expressive Speech Audio From Text Data

Publication number: 20220208170

Abstract: A system for use in video game development to generate expressive speech audio comprises a user interface configured to receive user-input text data and a user selection of a speech style. The system includes a machine-learned synthesizer comprising a text encoder, a speech style encoder and a decoder. The machine-learned synthesizer is configured to generate one or more text encodings derived from the user-input text data, using the text encoder of the machine-learned synthesizer; generate a speech style encoding by processing a set of speech style features associated with the selected speech style using the speech style encoder of the machine-learned synthesizer; combine the one or more text encodings and the speech style encoding to generate one or more combined encodings; and decode the one or more combined encodings with the decoder of the machine-learned synthesizer to generate predicted acoustic features.

Type: Application

Filed: February 28, 2022

Publication date: June 30, 2022

Inventors: Siddharth Gururani, Kilol Gupta, Dhaval Shah, Zahra Shakeri, Jervis Pinto, Mohsen Sardari, Navid Aghdaie, Kazi Zaman
AUTOMATED PIPELINE SELECTION FOR SYNTHESIS OF AUDIO ASSETS

Publication number: 20220148561

Abstract: An example method of automated selection of audio asset synthesizing pipelines includes: receiving an audio stream comprising human speech; determining one or more features of the audio stream; selecting, based on the one or more features of the audio stream, an audio asset synthesizing pipeline; training, using the audio stream, one or more audio asset synthesizing models implementing respective stages of the selected audio asset synthesizing pipeline; and responsive to determining that a quality metric of the audio asset synthesizing pipeline satisfies a predetermined quality condition, synthesizing one or more audio assets by the selected audio asset synthesizing pipeline.

Type: Application

Filed: November 10, 2020

Publication date: May 12, 2022

Inventors: Kilol Gupta, Tushar Agarwal, Zahra Shakeri, Mohsen Sardari, Harold Henry Chaput, Navid Aghdaie
Generating expressive speech audio from text data

Patent number: 11295721

Abstract: A system for use in video game development to generate expressive speech audio comprises a user interface configured to receive user-input text data and a user selection of a speech style. The system includes a machine-learned synthesizer comprising a text encoder, a speech style encoder and a decoder. The machine-learned synthesizer is configured to generate one or more text encodings derived from the user-input text data, using the text encoder of the machine-learned synthesizer; generate a speech style encoding by processing a set of speech style features associated with the selected speech style using the speech style encoder of the machine-learned synthesizer; combine the one or more text encodings and the speech style encoding to generate one or more combined encodings; and decode the one or more combined encodings with the decoder of the machine-learned synthesizer to generate predicted acoustic features.

Type: Grant

Filed: April 3, 2020

Date of Patent: April 5, 2022

Assignee: ELECTRONIC ARTS INC.

Inventors: Siddharth Gururani, Kilol Gupta, Dhaval Shah, Zahra Shakeri, Jervis Pinto, Mohsen Sardari, Navid Aghdaie, Kazi Zaman
Generating Expressive Speech Audio From Text Data

Publication number: 20210151029

Abstract: A system for use in video game development to generate expressive speech audio comprises a user interface configured to receive user-input text data and a user selection of a speech style. The system includes a machine-learned synthesizer comprising a text encoder, a speech style encoder and a decoder. The machine-learned synthesizer is configured to generate one or more text encodings derived from the user-input text data, using the text encoder of the machine-learned synthesizer; generate a speech style encoding by processing a set of speech style features associated with the selected speech style using the speech style encoder of the machine-learned synthesizer; combine the one or more text encodings and the speech style encoding to generate one or more combined encodings; and decode the one or more combined encodings with the decoder of the machine-learned synthesizer to generate predicted acoustic features.

Type: Application

Filed: April 3, 2020

Publication date: May 20, 2021

Inventors: Siddharth Gururani, Kilol Gupta, Dhaval Shah, Zahra Shakeri, Jervis Pinto, Mohsen Sardari, Navid Aghdaie, Kazi Zaman