Patents by Inventor Kilol Gupta

Kilol Gupta has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20250118293
    Abstract: A method includes receiving a conversational training dataset including a plurality of conversational training samples, each training sample associated with a corresponding conversation and including: corresponding audio data characterizing a corresponding current utterance spoken by a user during a current turn in the corresponding conversation; a corresponding context for the corresponding current utterance including a transcript of a previous turn in the corresponding conversation that precedes the current turn; a corresponding ground-truth transcription of the corresponding current utterance; and a CoT annotation representing a corresponding logical relationship between the corresponding current utterance and the previous turn.
    Type: Application
    Filed: September 20, 2024
    Publication date: April 10, 2025
    Applicant: Google LLC
    Inventors: Mingqing Chen, Rajiv Mathews, Andrew Hard, Swaroop Ramaswamy, Kilol Gupta
  • Patent number: 12233338
    Abstract: This specification describes a computer-implemented method of training a machine-learned speech audio generation system for use in video games. The training comprises: receiving one or more training examples. Each training example comprises: (i) ground-truth acoustic features for speech audio, (ii) speech content data representing speech content of the speech audio, and (iii) a ground-truth speaker identifier for a speaker of the speech audio.
    Type: Grant
    Filed: November 16, 2021
    Date of Patent: February 25, 2025
    Assignee: Electronic Arts Inc.
    Inventors: Ping Zhong, Zahra Shakeri, Siddharth Gururani, Kilol Gupta, Shahab Raji
  • Patent number: 12159618
    Abstract: An example method of automated selection of audio asset synthesizing pipelines includes: receiving an audio stream comprising human speech; determining one or more features of the audio stream; selecting, based on the one or more features of the audio stream, an audio asset synthesizing pipeline; training, using the audio stream, one or more audio asset synthesizing models implementing respective stages of the selected audio asset synthesizing pipeline; and responsive to determining that a quality metric of the audio asset synthesizing pipeline satisfies a predetermined quality condition, synthesizing one or more audio assets by the selected audio asset synthesizing pipeline.
    Type: Grant
    Filed: October 20, 2022
    Date of Patent: December 3, 2024
    Assignee: Electronic Arts Inc.
    Inventors: Kilol Gupta, Tushar Agarwal, Zahra Shakeri, Mohsen Sardari, Harold Henry Chaput, Navid Aghdaie
  • Publication number: 20240290316
    Abstract: A system for use in video game development to generate expressive speech audio comprises a user interface configured to receive user-input text data and a user selection of a speech style. The system includes a machine-learned synthesizer comprising a text encoder, a speech style encoder and a decoder. The machine-learned synthesizer is configured to generate one or more text encodings derived from the user-input text data, using the text encoder of the machine-learned synthesizer; generate a speech style encoding by processing a set of speech style features associated with the selected speech style using the speech style encoder of the machine-learned synthesizer; combine the one or more text encodings and the speech style encoding to generate one or more combined encodings; and decode the one or more combined encodings with the decoder of the machine-learned synthesizer to generate predicted acoustic features.
    Type: Application
    Filed: May 7, 2024
    Publication date: August 29, 2024
    Inventors: Siddharth Gururani, Kilol Gupta, Dhaval Shah, Zahra Shakeri, Jervis Pinto, Mohsen Sardari, Navid Aghdaie, Kazi Zaman
  • Patent number: 12033611
    Abstract: A system for use in video game development to generate expressive speech audio comprises a user interface configured to receive user-input text data and a user selection of a speech style. The system includes a machine-learned synthesizer comprising a text encoder, a speech style encoder and a decoder. The machine-learned synthesizer is configured to generate one or more text encodings derived from the user-input text data, using the text encoder of the machine-learned synthesizer; generate a speech style encoding by processing a set of speech style features associated with the selected speech style using the speech style encoder of the machine-learned synthesizer; combine the one or more text encodings and the speech style encoding to generate one or more combined encodings; and decode the one or more combined encodings with the decoder of the machine-learned synthesizer to generate predicted acoustic features.
    Type: Grant
    Filed: February 28, 2022
    Date of Patent: July 9, 2024
    Assignee: ELECTRONIC ARTS INC.
    Inventors: Siddharth Gururani, Kilol Gupta, Dhaval Shah, Zahra Shakeri, Jervis Pinto, Mohsen Sardari, Navid Aghdaie, Kazi Zaman
  • Patent number: 11790884
    Abstract: A computer-implemented method of generating speech audio in a video game is provided. The method includes inputting, into a synthesizer module, input data that represents speech content. Source acoustic features for the speech content in the voice of a source speaker are generated and are input, along with a speaker embedding associated with a player of the video game into an acoustic feature encoder of a voice convertor. One or more acoustic feature encodings are generated as output of the acoustic feature encoder, which are inputted into an acoustic feature decoder of the voice convertor to generate target acoustic features. The target acoustic features are processed with one or more modules, to generate speech audio in the voice of the player.
    Type: Grant
    Filed: October 28, 2020
    Date of Patent: October 17, 2023
    Assignee: ELECTRONIC ARTS INC.
    Inventors: Zahra Shakeri, Jervis Pinto, Kilol Gupta, Mohsen Sardari, Harold Chaput, Navid Aghdaie, Kenneth Moss
  • Patent number: 11735158
    Abstract: This specification describes systems and methods for aging voice audio, in particular voice audio in computer games. According to one aspect of this specification, there is described a method for aging speech audio data. The method comprises: inputting an initial audio signal and an age embedding into a machine-learned age convertor model, wherein: the initial audio signal comprises speech audio; and the age embedding is based on an age classification of a plurality of speech audio samples of subjects in a target age category; processing, by the machine-learned age convertor model, the initial audio signal and the age embedding to generate an age-altered audio signal, wherein the age-altered audio signal corresponds to a version of the initial audio signal in the target age category; and outputting, from the machine-learned age convertor model, the age-altered audio signal.
    Type: Grant
    Filed: August 11, 2021
    Date of Patent: August 22, 2023
    Assignee: ELECTRONIC ARTS INC.
    Inventors: Kilol Gupta, Zahra Shakeri, Ping Zhong, Siddharth Gururani, Mohsen Sardari
  • Patent number: 11605388
    Abstract: This specification describes a computer-implemented method of generating speech audio for use in a video game, wherein the speech audio is generated using a voice convertor that has been trained to convert audio data for a source speaker into audio data for a target speaker. The method comprises receiving: (i) source speech audio, and (ii) a target speaker identifier. The source speech audio comprises speech content in the voice of a source speaker. Source acoustic features are determined for the source speech audio. A target speaker embedding associated with the target speaker identifier is generated as output of a speaker encoder of the voice convertor. The target speaker embedding and the source acoustic features are inputted into an acoustic feature encoder of the voice convertor. One or more acoustic feature encodings are generated as output of the acoustic feature encoder. The one or more acoustic feature encodings are derived from the target speaker embedding and the source acoustic features.
    Type: Grant
    Filed: November 9, 2020
    Date of Patent: March 14, 2023
    Assignee: Electronic Arts Inc.
    Inventors: Kilol Gupta, Dhaval Shah, Zahra Shakeri, Jervis Pinto, Mohsen Sardari, Harold Chaput, Navid Aghdaie, Kazi Zaman
  • Publication number: 20230039540
    Abstract: An example method of automated selection of audio asset synthesizing pipelines includes: receiving an audio stream comprising human speech; determining one or more features of the audio stream; selecting, based on the one or more features of the audio stream, an audio asset synthesizing pipeline; training, using the audio stream, one or more audio asset synthesizing models implementing respective stages of the selected audio asset synthesizing pipeline; and responsive to determining that a quality metric of the audio asset synthesizing pipeline satisfies a predetermined quality condition, synthesizing one or more audio assets by the selected audio asset synthesizing pipeline.
    Type: Application
    Filed: October 20, 2022
    Publication date: February 9, 2023
    Inventors: Kilol Gupta, Tushar Agarwal, Zahra Shakeri, Mohsen Sardari, Harold Henry Chaput, Navid Aghdaie
  • Patent number: 11521594
    Abstract: An example method of automated selection of audio asset synthesizing pipelines includes: receiving an audio stream comprising human speech; determining one or more features of the audio stream; selecting, based on the one or more features of the audio stream, an audio asset synthesizing pipeline; training, using the audio stream, one or more audio asset synthesizing models implementing respective stages of the selected audio asset synthesizing pipeline; and responsive to determining that a quality metric of the audio asset synthesizing pipeline satisfies a predetermined quality condition, synthesizing one or more audio assets by the selected audio asset synthesizing pipeline.
    Type: Grant
    Filed: November 10, 2020
    Date of Patent: December 6, 2022
    Assignee: Electronic Arts Inc.
    Inventors: Kilol Gupta, Tushar Agarwal, Zahra Shakeri, Mohsen Sardari, Harold Henry Chaput, Navid Aghdaie
  • Publication number: 20220208170
    Abstract: A system for use in video game development to generate expressive speech audio comprises a user interface configured to receive user-input text data and a user selection of a speech style. The system includes a machine-learned synthesizer comprising a text encoder, a speech style encoder and a decoder. The machine-learned synthesizer is configured to generate one or more text encodings derived from the user-input text data, using the text encoder of the machine-learned synthesizer; generate a speech style encoding by processing a set of speech style features associated with the selected speech style using the speech style encoder of the machine-learned synthesizer; combine the one or more text encodings and the speech style encoding to generate one or more combined encodings; and decode the one or more combined encodings with the decoder of the machine-learned synthesizer to generate predicted acoustic features.
    Type: Application
    Filed: February 28, 2022
    Publication date: June 30, 2022
    Inventors: Siddharth Gururani, Kilol Gupta, Dhaval Shah, Zahra Shakeri, Jervis Pinto, Mohsen Sardari, Navid Aghdaie, Kazi Zaman
  • Publication number: 20220148561
    Abstract: An example method of automated selection of audio asset synthesizing pipelines includes: receiving an audio stream comprising human speech; determining one or more features of the audio stream; selecting, based on the one or more features of the audio stream, an audio asset synthesizing pipeline; training, using the audio stream, one or more audio asset synthesizing models implementing respective stages of the selected audio asset synthesizing pipeline; and responsive to determining that a quality metric of the audio asset synthesizing pipeline satisfies a predetermined quality condition, synthesizing one or more audio assets by the selected audio asset synthesizing pipeline.
    Type: Application
    Filed: November 10, 2020
    Publication date: May 12, 2022
    Inventors: Kilol Gupta, Tushar Agarwal, Zahra Shakeri, Mohsen Sardari, Harold Henry Chaput, Navid Aghdaie
  • Patent number: 11295721
    Abstract: A system for use in video game development to generate expressive speech audio comprises a user interface configured to receive user-input text data and a user selection of a speech style. The system includes a machine-learned synthesizer comprising a text encoder, a speech style encoder and a decoder. The machine-learned synthesizer is configured to generate one or more text encodings derived from the user-input text data, using the text encoder of the machine-learned synthesizer; generate a speech style encoding by processing a set of speech style features associated with the selected speech style using the speech style encoder of the machine-learned synthesizer; combine the one or more text encodings and the speech style encoding to generate one or more combined encodings; and decode the one or more combined encodings with the decoder of the machine-learned synthesizer to generate predicted acoustic features.
    Type: Grant
    Filed: April 3, 2020
    Date of Patent: April 5, 2022
    Assignee: ELECTRONIC ARTS INC.
    Inventors: Siddharth Gururani, Kilol Gupta, Dhaval Shah, Zahra Shakeri, Jervis Pinto, Mohsen Sardari, Navid Aghdaie, Kazi Zaman
  • Publication number: 20210151029
    Abstract: A system for use in video game development to generate expressive speech audio comprises a user interface configured to receive user-input text data and a user selection of a speech style. The system includes a machine-learned synthesizer comprising a text encoder, a speech style encoder and a decoder. The machine-learned synthesizer is configured to generate one or more text encodings derived from the user-input text data, using the text encoder of the machine-learned synthesizer; generate a speech style encoding by processing a set of speech style features associated with the selected speech style using the speech style encoder of the machine-learned synthesizer; combine the one or more text encodings and the speech style encoding to generate one or more combined encodings; and decode the one or more combined encodings with the decoder of the machine-learned synthesizer to generate predicted acoustic features.
    Type: Application
    Filed: April 3, 2020
    Publication date: May 20, 2021
    Inventors: Siddharth Gururani, Kilol Gupta, Dhaval Shah, Zahra Shakeri, Jervis Pinto, Mohsen Sardari, Navid Aghdaie, Kazi Zaman