Patents by Inventor Jitong CHEN

Jitong CHEN has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20250069585
    Abstract: The present disclosure relates to a music generation method, apparatus and system, and storage medium. In an embodiment of the present disclosure: obtaining text information, and converting the text information into a corresponding voice audio; obtaining an initial music audio, wherein the initial music audio comprises a music key point, and music characteristics of the initial music audio have a sudden change at the position of an audio key point; and on the basis of the position of the music key point, synthesizing the voice audio and the initial music audio to obtain a target music audio. In the target music audio, the voice audio appears at the position of the music key point of the initial music audio. Thus, a music audio is generated from text information, and the user can customize the content of the text information and customize the initial music audio.
    Type: Application
    Filed: April 27, 2023
    Publication date: February 27, 2025
    Inventors: Andrew SHAW, Yilin ZHANG, Jitong CHEN, Vibert THIO, Shawn Chan Zhen YI, Liangqin XU, Yufan XUE
  • Patent number: 12198673
    Abstract: The present disclosure describes techniques for differentiable wavetable synthesizer. The techniques comprise extracting features from a dataset of sounds, wherein the features comprise at least timbre embedding; input the features to the first machine learning model, wherein the first machine learning model is configured to extract a set of N×L learnable parameters, N represents a number of wavetables, and L represents a wavetable length; outputting a plurality of wavetables, wherein each of plurality of wavetables comprises a waveform associated with a unique timbre, the plurality of wavetables form a dictionary, and the plurality of wavetables are portable to perform audio-related tasks. Finally, the said wavetables are used to initialize another machine learning model so as to help reduce computational complexity of an audio synthesis obtained as output of the another machine learning model.
    Type: Grant
    Filed: November 12, 2021
    Date of Patent: January 14, 2025
    Assignee: LEMON INC.
    Inventors: Lamtharn Hantrakul, Siyuan Shan, Jitong Chen, Matthew David Avent, David Trevelyan
  • Publication number: 20240371345
    Abstract: Embodiments of the present disclosure relate to a music generation method, apparatus, system and storage medium. In at least some embodiments of the present disclosure, by displaying a music generation interface including a text input box, a music generation control and a music configuration item in response to an operation by a user triggering the music generation control, so that the user can input a custom text in the text input box and configure a music melody through the music configuration item, and then in response to an operation by the user triggering the music generation control, it is possible to generate a voice based on the custom text input by the user, and generate a music including the voice corresponding to the custom text based on the generated voice and the user configured music melody.
    Type: Application
    Filed: April 27, 2023
    Publication date: November 7, 2024
    Inventors: Yufan XUE, Qiang ZHENG, Dong NIU, Liangqin XU, Xiaochan WANG, Jitong CHEN, Bochen LI, Naihan LI
  • Publication number: 20240290306
    Abstract: The present disclosure relates to a song generation method, apparatus and system, and a storage medium. The song generation method includes acquiring a target lyric text input by a user; aligning the target lyric text with a singing melody of an initial song, to determine correspondence between text units in the target lyric text and notes in the singing melody, wherein the singing melody is a singing melody of initial lyrics in the initial song; performing voice synthesis on the target lyric text based on the correspondence between the text units in the target lyric text and the notes in the singing melody, to obtain a singing voice singing the target lyric text with the singing melody; and combining the singing voice with an accompaniment audio of the initial song to generate a target song.
    Type: Application
    Filed: May 8, 2023
    Publication date: August 29, 2024
    Inventors: Yilin ZHANG, Bochen LI, Vibert THIO, Shizhu LIU, Jitong CHEN, Naihan LI, Yuping WANG
  • Patent number: 12040000
    Abstract: The present application provides a special effect processing method and apparatus. The method includes: generating an audio signal in response to a touch operation of a user in a process of playing a video; segmenting the audio signal into multiple audio frames; performing, according to attributes of the audio frames, special effect processing on a picture which is currently played in the video.
    Type: Grant
    Filed: August 7, 2023
    Date of Patent: July 16, 2024
    Assignee: LEMON INC.
    Inventors: Chenyu Sun, Jitong Chen, Nathanael Schager, Maryyann Crichton, Josiah John Serrano, Bochen Li, Xuefan Hu, Fraser Smith, Hwankyoo Shawn Kim, David Trevelyan, Suiyu Feng, Brandon Wu, Tao Xiong
  • Publication number: 20230419930
    Abstract: A music generation system is provided comprising a processor and a memory operatively coupled to the processor and storing a rhythm template database comprising a plurality of rhythm templates, and a music generation program stored in the memory and executed by the processor to be configured to receive a user input of lyrics, identify a plurality of syllables in the lyrics, determine a syllable pattern in the identified plurality of syllables, match the syllable pattern to a selected rhythm template of the plurality of rhythm templates, generate a melody based on the selected rhythm template, generate a music file encoding the melody and the lyrics, and output the music file encoding the melody and the lyrics.
    Type: Application
    Filed: June 24, 2022
    Publication date: December 28, 2023
    Inventors: Yilin ZHANG, Andrew SHAW, Jitong CHEN
  • Publication number: 20230377608
    Abstract: The present application provides a special effect processing method and apparatus. The method includes: generating an audio signal in response to a touch operation of a user in a process of playing a video; segmenting the audio signal into multiple audio frames; performing, according to attributes of the audio frames, special effect processing on a picture which is currently played in the video.
    Type: Application
    Filed: August 7, 2023
    Publication date: November 23, 2023
    Inventors: Chenyu SUN, Jitong CHEN, Nathanael SCHAGER, Maryyann CRICHTON, Josiah John SERRANO, Bochen LI, Xuefan HU, Fraser SMITH, Hwankyoo Shawn KIM, David TREVELYAN, Suiyu FENG, Brandon WU, Tao XIONG
  • Publication number: 20230360618
    Abstract: Systems and methods directed to combining audio tracks are provided. More specifically, a first audio track and a second audio track are received. The first audio track is separated into a vocal component and one or more accompaniment components. The second audio track is separated into a vocal component and one or more accompaniment components. A structure of the first audio track and a structure of the second audio track are determined. The first audio track and the second audio track are aligned based on the determined structures of the tracks. The vocal component of the first audio track is stretched to match a tempo of the second audio track. The stretched vocal component of the first audio track is added to the one or more accompaniment components of the second audio track.
    Type: Application
    Filed: May 5, 2022
    Publication date: November 9, 2023
    Inventors: Vibert THIO, Bochen LI, Haonan CHEN, Jitong CHEN
  • Publication number: 20230360619
    Abstract: In examples, a method for generating a remixed audio sample is provided. The method may include receiving an audio portion, obtaining metadata from the received audio portion, and analyzing the metadata and generating a symbolic music representation based on the analyzed metadata. In some examples, a selection of a style asset is received and applied to the symbolic music representation. Accordingly, a remixed audio portion may be rendered based on the stylized symbolic representation. That is, metadata associated with a song or song portion may be analyzed to identify a tempo, key, structure, chord, and/or progressions, etc., such that a remixed version of the song can be provided with customized instrumental arrangements and styles.
    Type: Application
    Filed: May 5, 2022
    Publication date: November 9, 2023
    Inventors: Bochen LI, Vibert THIO, Haonan CHEN, Xuefan HU, Jitong CHEN
  • Publication number: 20230360620
    Abstract: In examples, a method for converting audio samples to full song arrangements is provided. The method includes receiving audio sample data, determining a melodic transcription, based on the audio sample data, and determining a sequence of music chords, based on the melodic transcription. The method further includes generating a full song arrangement, based on the sequence of music chords, and the audio sample data.
    Type: Application
    Filed: May 5, 2022
    Publication date: November 9, 2023
    Inventors: Bochen LI, Andrew SHAW, Jitong CHEN
  • Publication number: 20230282188
    Abstract: Methods, systems, and storage media for generating a beatbox transcript are disclosed. Some examples may include: receiving an audio signal having a plurality of beatbox sounds, generating a spectrogram of the audio signal, processing the spectrogram of the audio signal with a neural network model trained on training samples including beatbox sounds, generating, by the neural network model a beatbox sound activation map including a plurality of activation times for a plurality of beatbox sounds, decoding the beatbox sound activation map into a beatbox transcript and providing the beatbox transcript as an output.
    Type: Application
    Filed: March 7, 2022
    Publication date: September 7, 2023
    Inventors: Bochen Li, Rodrigo Castellon, Daiyu Zhang, Jitong Chen
  • Publication number: 20230197040
    Abstract: A method for generating an audio output is described. Image inputs of interactive movements by a user captured by an image sensor are received. The interactive movements are mapped to a sequence of audio element identifiers. The sequence of audio element identifiers are processed to generate a musical sequence by performing music theory rule enforcement on the sequence of audio element identifiers. An audio output that represents the musical sequence is generated.
    Type: Application
    Filed: December 20, 2021
    Publication date: June 22, 2023
    Inventors: Bochen Li, Daiyu Zhang, Shawn Chan Zhen Yi, Jitong Chen
  • Publication number: 20230154451
    Abstract: The present disclosure describes techniques for differentiable wavetable synthesizer. The techniques comprise extracting features from a dataset of sounds, wherein the features comprise at least timbre embedding; input the features to the first machine learning model, wherein the first machine learning model is configured to extract a set of N×L learnable parameters, N represents a number of wavetables, and L represents a wavetable length; outputting a plurality of wavetables, wherein each of plurality of wavetables comprises a waveform associated with a unique timbre, the plurality of wavetables form a dictionary, and the plurality of wavetables are portable to perform audio-related tasks.
    Type: Application
    Filed: November 12, 2021
    Publication date: May 18, 2023
    Inventors: Lamtharn HANTRAKUL, Siyuan Shan, Jitong Chen, Matthew David Avent, David Trevelyan
  • Patent number: 11482207
    Abstract: Described herein are embodiments of an end-to-end text-to-speech (TTS) system with parallel wave generation. In one or more embodiments, a Gaussian inverse autoregressive flow is distilled from an autoregressive WaveNet by minimizing a novel regularized Kullback-Leibler (KL) divergence between their highly-peaked output distributions. Embodiments of the methodology computes the KL divergence in a closed-form, which simplifies the training process and provides very efficient distillation. Embodiments of a novel text-to-wave neural architecture for speech synthesis are also described, which are fully convolutional and enable fast end-to-end training from scratch. These embodiments significantly outperform the previous pipeline that connects a text-to-spectrogram model to a separately trained WaveNet. Also, a parallel waveform synthesizer embodiment conditioned on the hidden representation in an embodiment of this end-to-end model were successfully distilled.
    Type: Grant
    Filed: December 21, 2020
    Date of Patent: October 25, 2022
    Assignee: Baidu USA LLC
    Inventors: Wei Ping, Kainan Peng, Jitong Chen
  • Patent number: 11238843
    Abstract: Voice cloning is a highly desired capability for personalized speech interfaces. Neural network-based speech synthesis has been shown to generate high quality speech for a large number of speakers. Neural voice cloning systems that take a few audio samples as input are presented herein. Two approaches, speaker adaptation and speaker encoding, are disclosed. Speaker adaptation embodiments are based on fine-tuning a multi-speaker generative model with a few cloning samples. Speaker encoding embodiments are based on training a separate model to directly infer a new speaker embedding from cloning audios, which is used in or with a multi-speaker generative model. Both approaches achieve good performance in terms of naturalness of the speech and its similarity to original speaker—even with very few cloning audios.
    Type: Grant
    Filed: September 26, 2018
    Date of Patent: February 1, 2022
    Assignee: Baidu USA LLC
    Inventors: Sercan O. Arik, Jitong Chen, Kainan Peng, Wei Ping, Yanqi Zhou
  • Publication number: 20210110810
    Abstract: Described herein are embodiments of an end-to-end text-to-speech (TTS) system with parallel wave generation. In one or more embodiments, a Gaussian inverse autoregressive flow is distilled from an autoregressive WaveNet by minimizing a novel regularized Kullback-Leibler (KL) divergence between their highly-peaked output distributions. Embodiments of the methodology computes the KL divergence in a closed-form, which simplifies the training process and provides very efficient distillation. Embodiments of a novel text-to-wave neural architecture for speech synthesis are also described, which are fully convolutional and enable fast end-to-end training from scratch. These embodiments significantly outperform the previous pipeline that connects a text-to-spectrogram model to a separately trained WaveNet. Also, a parallel waveform synthesizer embodiment conditioned on the hidden representation in an embodiment of this end-to-end model were successfully distilled.
    Type: Application
    Filed: December 21, 2020
    Publication date: April 15, 2021
    Applicant: Baidu USA LLC
    Inventors: Wei PING, Kainan PENG, Jitong CHEN
  • Patent number: 10872596
    Abstract: Described herein are embodiments of an end-to-end text-to-speech (TTS) system with parallel wave generation. In one or more embodiments, a Gaussian inverse autoregressive flow is distilled from an autoregressive WaveNet by minimizing a novel regularized Kullback-Leibler (KL) divergence between their highly-peaked output distributions. Embodiments of the methodology computes the KL divergence in a closed-form, which simplifies the training process and provides very efficient distillation. Embodiments of a novel text-to-wave neural architecture for speech synthesis are also described, which are fully convolutional and enable fast end-to-end training from scratch. These embodiments significantly outperform the previous pipeline that connects a text-to-spectrogram model to a separately trained WaveNet. Also, a parallel waveform synthesizer embodiment conditioned on the hidden representation in an embodiment of this end-to-end model were successfully distilled.
    Type: Grant
    Filed: February 15, 2019
    Date of Patent: December 22, 2020
    Assignee: Baidu USA LLC
    Inventors: Wei Ping, Kainan Peng, Jitong Chen
  • Publication number: 20190251952
    Abstract: Voice cloning is a highly desired capability for personalized speech interfaces. Neural network-based speech synthesis has been shown to generate high quality speech for a large number of speakers. Neural voice cloning systems that take a few audio samples as input are presented herein. Two approaches, speaker adaptation and speaker encoding, are disclosed. Speaker adaptation embodiments are based on fine-tuning a multi-speaker generative model with a few cloning samples. Speaker encoding embodiments are based on training a separate model to directly infer a new speaker embedding from cloning audios, which is used in or with a multi-speaker generative model. Both approaches achieve good performance in terms of naturalness of the speech and its similarity to original speaker—even with very few cloning audios.
    Type: Application
    Filed: September 26, 2018
    Publication date: August 15, 2019
    Applicant: Baidu USA LLC
    Inventors: Sercan O. ARIK, Jitong CHEN, Kainan PENG, Wei PING, Yanqi ZHOU
  • Publication number: 20190180732
    Abstract: Described herein are embodiments of an end-to-end text-to-speech (TTS) system with parallel wave generation. In one or more embodiments, a Gaussian inverse autoregressive flow is distilled from an autoregressive WaveNet by minimizing a novel regularized Kullback-Leibler (KL) divergence between their highly-peaked output distributions. Embodiments of the methodology computes the KL divergence in a closed-form, which simplifies the training process and provides very efficient distillation. Embodiments of a novel text-to-wave neural architecture for speech synthesis are also described, which are fully convolutional and enable fast end-to-end training from scratch. These embodiments significantly outperform the previous pipeline that connects a text-to-spectrogram model to a separately trained WaveNet. Also, a parallel waveform synthesizer embodiment conditioned on the hidden representation in an embodiment of this end-to-end model were successfully distilled.
    Type: Application
    Filed: February 15, 2019
    Publication date: June 13, 2019
    Applicant: Baidu USA LLC
    Inventors: Wei PING, Kainan PENG, Jitong CHEN