Patents by Inventor Jitong CHEN

Jitong CHEN has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20230419930
    Abstract: A music generation system is provided comprising a processor and a memory operatively coupled to the processor and storing a rhythm template database comprising a plurality of rhythm templates, and a music generation program stored in the memory and executed by the processor to be configured to receive a user input of lyrics, identify a plurality of syllables in the lyrics, determine a syllable pattern in the identified plurality of syllables, match the syllable pattern to a selected rhythm template of the plurality of rhythm templates, generate a melody based on the selected rhythm template, generate a music file encoding the melody and the lyrics, and output the music file encoding the melody and the lyrics.
    Type: Application
    Filed: June 24, 2022
    Publication date: December 28, 2023
    Inventors: Yilin ZHANG, Andrew SHAW, Jitong CHEN
  • Publication number: 20230377608
    Abstract: The present application provides a special effect processing method and apparatus. The method includes: generating an audio signal in response to a touch operation of a user in a process of playing a video; segmenting the audio signal into multiple audio frames; performing, according to attributes of the audio frames, special effect processing on a picture which is currently played in the video.
    Type: Application
    Filed: August 7, 2023
    Publication date: November 23, 2023
    Inventors: Chenyu SUN, Jitong CHEN, Nathanael SCHAGER, Maryyann CRICHTON, Josiah John SERRANO, Bochen LI, Xuefan HU, Fraser SMITH, Hwankyoo Shawn KIM, David TREVELYAN, Suiyu FENG, Brandon WU, Tao XIONG
  • Publication number: 20230360619
    Abstract: In examples, a method for generating a remixed audio sample is provided. The method may include receiving an audio portion, obtaining metadata from the received audio portion, and analyzing the metadata and generating a symbolic music representation based on the analyzed metadata. In some examples, a selection of a style asset is received and applied to the symbolic music representation. Accordingly, a remixed audio portion may be rendered based on the stylized symbolic representation. That is, metadata associated with a song or song portion may be analyzed to identify a tempo, key, structure, chord, and/or progressions, etc., such that a remixed version of the song can be provided with customized instrumental arrangements and styles.
    Type: Application
    Filed: May 5, 2022
    Publication date: November 9, 2023
    Inventors: Bochen LI, Vibert THIO, Haonan CHEN, Xuefan HU, Jitong CHEN
  • Publication number: 20230360618
    Abstract: Systems and methods directed to combining audio tracks are provided. More specifically, a first audio track and a second audio track are received. The first audio track is separated into a vocal component and one or more accompaniment components. The second audio track is separated into a vocal component and one or more accompaniment components. A structure of the first audio track and a structure of the second audio track are determined. The first audio track and the second audio track are aligned based on the determined structures of the tracks. The vocal component of the first audio track is stretched to match a tempo of the second audio track. The stretched vocal component of the first audio track is added to the one or more accompaniment components of the second audio track.
    Type: Application
    Filed: May 5, 2022
    Publication date: November 9, 2023
    Inventors: Vibert THIO, Bochen LI, Haonan CHEN, Jitong CHEN
  • Publication number: 20230360620
    Abstract: In examples, a method for converting audio samples to full song arrangements is provided. The method includes receiving audio sample data, determining a melodic transcription, based on the audio sample data, and determining a sequence of music chords, based on the melodic transcription. The method further includes generating a full song arrangement, based on the sequence of music chords, and the audio sample data.
    Type: Application
    Filed: May 5, 2022
    Publication date: November 9, 2023
    Inventors: Bochen LI, Andrew SHAW, Jitong CHEN
  • Publication number: 20230282188
    Abstract: Methods, systems, and storage media for generating a beatbox transcript are disclosed. Some examples may include: receiving an audio signal having a plurality of beatbox sounds, generating a spectrogram of the audio signal, processing the spectrogram of the audio signal with a neural network model trained on training samples including beatbox sounds, generating, by the neural network model a beatbox sound activation map including a plurality of activation times for a plurality of beatbox sounds, decoding the beatbox sound activation map into a beatbox transcript and providing the beatbox transcript as an output.
    Type: Application
    Filed: March 7, 2022
    Publication date: September 7, 2023
    Inventors: Bochen Li, Rodrigo Castellon, Daiyu Zhang, Jitong Chen
  • Publication number: 20230197040
    Abstract: A method for generating an audio output is described. Image inputs of interactive movements by a user captured by an image sensor are received. The interactive movements are mapped to a sequence of audio element identifiers. The sequence of audio element identifiers are processed to generate a musical sequence by performing music theory rule enforcement on the sequence of audio element identifiers. An audio output that represents the musical sequence is generated.
    Type: Application
    Filed: December 20, 2021
    Publication date: June 22, 2023
    Inventors: Bochen Li, Daiyu Zhang, Shawn Chan Zhen Yi, Jitong Chen
  • Publication number: 20230154451
    Abstract: The present disclosure describes techniques for differentiable wavetable synthesizer. The techniques comprise extracting features from a dataset of sounds, wherein the features comprise at least timbre embedding; input the features to the first machine learning model, wherein the first machine learning model is configured to extract a set of N×L learnable parameters, N represents a number of wavetables, and L represents a wavetable length; outputting a plurality of wavetables, wherein each of plurality of wavetables comprises a waveform associated with a unique timbre, the plurality of wavetables form a dictionary, and the plurality of wavetables are portable to perform audio-related tasks.
    Type: Application
    Filed: November 12, 2021
    Publication date: May 18, 2023
    Inventors: Lamtharn HANTRAKUL, Siyuan Shan, Jitong Chen, Matthew David Avent, David Trevelyan
  • Patent number: 11482207
    Abstract: Described herein are embodiments of an end-to-end text-to-speech (TTS) system with parallel wave generation. In one or more embodiments, a Gaussian inverse autoregressive flow is distilled from an autoregressive WaveNet by minimizing a novel regularized Kullback-Leibler (KL) divergence between their highly-peaked output distributions. Embodiments of the methodology computes the KL divergence in a closed-form, which simplifies the training process and provides very efficient distillation. Embodiments of a novel text-to-wave neural architecture for speech synthesis are also described, which are fully convolutional and enable fast end-to-end training from scratch. These embodiments significantly outperform the previous pipeline that connects a text-to-spectrogram model to a separately trained WaveNet. Also, a parallel waveform synthesizer embodiment conditioned on the hidden representation in an embodiment of this end-to-end model were successfully distilled.
    Type: Grant
    Filed: December 21, 2020
    Date of Patent: October 25, 2022
    Assignee: Baidu USA LLC
    Inventors: Wei Ping, Kainan Peng, Jitong Chen
  • Patent number: 11238843
    Abstract: Voice cloning is a highly desired capability for personalized speech interfaces. Neural network-based speech synthesis has been shown to generate high quality speech for a large number of speakers. Neural voice cloning systems that take a few audio samples as input are presented herein. Two approaches, speaker adaptation and speaker encoding, are disclosed. Speaker adaptation embodiments are based on fine-tuning a multi-speaker generative model with a few cloning samples. Speaker encoding embodiments are based on training a separate model to directly infer a new speaker embedding from cloning audios, which is used in or with a multi-speaker generative model. Both approaches achieve good performance in terms of naturalness of the speech and its similarity to original speaker—even with very few cloning audios.
    Type: Grant
    Filed: September 26, 2018
    Date of Patent: February 1, 2022
    Assignee: Baidu USA LLC
    Inventors: Sercan O. Arik, Jitong Chen, Kainan Peng, Wei Ping, Yanqi Zhou
  • Publication number: 20210110810
    Abstract: Described herein are embodiments of an end-to-end text-to-speech (TTS) system with parallel wave generation. In one or more embodiments, a Gaussian inverse autoregressive flow is distilled from an autoregressive WaveNet by minimizing a novel regularized Kullback-Leibler (KL) divergence between their highly-peaked output distributions. Embodiments of the methodology computes the KL divergence in a closed-form, which simplifies the training process and provides very efficient distillation. Embodiments of a novel text-to-wave neural architecture for speech synthesis are also described, which are fully convolutional and enable fast end-to-end training from scratch. These embodiments significantly outperform the previous pipeline that connects a text-to-spectrogram model to a separately trained WaveNet. Also, a parallel waveform synthesizer embodiment conditioned on the hidden representation in an embodiment of this end-to-end model were successfully distilled.
    Type: Application
    Filed: December 21, 2020
    Publication date: April 15, 2021
    Applicant: Baidu USA LLC
    Inventors: Wei PING, Kainan PENG, Jitong CHEN
  • Patent number: 10872596
    Abstract: Described herein are embodiments of an end-to-end text-to-speech (TTS) system with parallel wave generation. In one or more embodiments, a Gaussian inverse autoregressive flow is distilled from an autoregressive WaveNet by minimizing a novel regularized Kullback-Leibler (KL) divergence between their highly-peaked output distributions. Embodiments of the methodology computes the KL divergence in a closed-form, which simplifies the training process and provides very efficient distillation. Embodiments of a novel text-to-wave neural architecture for speech synthesis are also described, which are fully convolutional and enable fast end-to-end training from scratch. These embodiments significantly outperform the previous pipeline that connects a text-to-spectrogram model to a separately trained WaveNet. Also, a parallel waveform synthesizer embodiment conditioned on the hidden representation in an embodiment of this end-to-end model were successfully distilled.
    Type: Grant
    Filed: February 15, 2019
    Date of Patent: December 22, 2020
    Assignee: Baidu USA LLC
    Inventors: Wei Ping, Kainan Peng, Jitong Chen
  • Publication number: 20190251952
    Abstract: Voice cloning is a highly desired capability for personalized speech interfaces. Neural network-based speech synthesis has been shown to generate high quality speech for a large number of speakers. Neural voice cloning systems that take a few audio samples as input are presented herein. Two approaches, speaker adaptation and speaker encoding, are disclosed. Speaker adaptation embodiments are based on fine-tuning a multi-speaker generative model with a few cloning samples. Speaker encoding embodiments are based on training a separate model to directly infer a new speaker embedding from cloning audios, which is used in or with a multi-speaker generative model. Both approaches achieve good performance in terms of naturalness of the speech and its similarity to original speaker—even with very few cloning audios.
    Type: Application
    Filed: September 26, 2018
    Publication date: August 15, 2019
    Applicant: Baidu USA LLC
    Inventors: Sercan O. ARIK, Jitong CHEN, Kainan PENG, Wei PING, Yanqi ZHOU
  • Publication number: 20190180732
    Abstract: Described herein are embodiments of an end-to-end text-to-speech (TTS) system with parallel wave generation. In one or more embodiments, a Gaussian inverse autoregressive flow is distilled from an autoregressive WaveNet by minimizing a novel regularized Kullback-Leibler (KL) divergence between their highly-peaked output distributions. Embodiments of the methodology computes the KL divergence in a closed-form, which simplifies the training process and provides very efficient distillation. Embodiments of a novel text-to-wave neural architecture for speech synthesis are also described, which are fully convolutional and enable fast end-to-end training from scratch. These embodiments significantly outperform the previous pipeline that connects a text-to-spectrogram model to a separately trained WaveNet. Also, a parallel waveform synthesizer embodiment conditioned on the hidden representation in an embodiment of this end-to-end model were successfully distilled.
    Type: Application
    Filed: February 15, 2019
    Publication date: June 13, 2019
    Applicant: Baidu USA LLC
    Inventors: Wei PING, Kainan PENG, Jitong CHEN