Patents by Inventor Jitong CHEN

Jitong CHEN has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

MUSIC GENERATION METHOD, APPARATUS AND SYSTEM, AND STORAGE MEDIUM

Publication number: 20250069585

Abstract: The present disclosure relates to a music generation method, apparatus and system, and storage medium. In an embodiment of the present disclosure: obtaining text information, and converting the text information into a corresponding voice audio; obtaining an initial music audio, wherein the initial music audio comprises a music key point, and music characteristics of the initial music audio have a sudden change at the position of an audio key point; and on the basis of the position of the music key point, synthesizing the voice audio and the initial music audio to obtain a target music audio. In the target music audio, the voice audio appears at the position of the music key point of the initial music audio. Thus, a music audio is generated from text information, and the user can customize the content of the text information and customize the initial music audio.

Type: Application

Filed: April 27, 2023

Publication date: February 27, 2025

Inventors: Andrew SHAW, Yilin ZHANG, Jitong CHEN, Vibert THIO, Shawn Chan Zhen YI, Liangqin XU, Yufan XUE
Differentiable wavetable synthesizer using plurality of machine learning models to reduce computational complexity of audio synthesis

Patent number: 12198673

Abstract: The present disclosure describes techniques for differentiable wavetable synthesizer. The techniques comprise extracting features from a dataset of sounds, wherein the features comprise at least timbre embedding; input the features to the first machine learning model, wherein the first machine learning model is configured to extract a set of N×L learnable parameters, N represents a number of wavetables, and L represents a wavetable length; outputting a plurality of wavetables, wherein each of plurality of wavetables comprises a waveform associated with a unique timbre, the plurality of wavetables form a dictionary, and the plurality of wavetables are portable to perform audio-related tasks. Finally, the said wavetables are used to initialize another machine learning model so as to help reduce computational complexity of an audio synthesis obtained as output of the another machine learning model.

Type: Grant

Filed: November 12, 2021

Date of Patent: January 14, 2025

Assignee: LEMON INC.

Inventors: Lamtharn Hantrakul, Siyuan Shan, Jitong Chen, Matthew David Avent, David Trevelyan
MUSIC GENERATION METHOD, APPARATUS, SYSTEM AND STORAGE MEDIUM

Publication number: 20240371345

Abstract: Embodiments of the present disclosure relate to a music generation method, apparatus, system and storage medium. In at least some embodiments of the present disclosure, by displaying a music generation interface including a text input box, a music generation control and a music configuration item in response to an operation by a user triggering the music generation control, so that the user can input a custom text in the text input box and configure a music melody through the music configuration item, and then in response to an operation by the user triggering the music generation control, it is possible to generate a voice based on the custom text input by the user, and generate a music including the voice corresponding to the custom text based on the generated voice and the user configured music melody.

Type: Application

Filed: April 27, 2023

Publication date: November 7, 2024

Inventors: Yufan XUE, Qiang ZHENG, Dong NIU, Liangqin XU, Xiaochan WANG, Jitong CHEN, Bochen LI, Naihan LI
SONG GENERATION METHOD, APPARATUS AND SYSTEM, AND STORAGE MEDIUM

Publication number: 20240290306

Abstract: The present disclosure relates to a song generation method, apparatus and system, and a storage medium. The song generation method includes acquiring a target lyric text input by a user; aligning the target lyric text with a singing melody of an initial song, to determine correspondence between text units in the target lyric text and notes in the singing melody, wherein the singing melody is a singing melody of initial lyrics in the initial song; performing voice synthesis on the target lyric text based on the correspondence between the text units in the target lyric text and the notes in the singing melody, to obtain a singing voice singing the target lyric text with the singing melody; and combining the singing voice with an accompaniment audio of the initial song to generate a target song.

Type: Application

Filed: May 8, 2023

Publication date: August 29, 2024

Inventors: Yilin ZHANG, Bochen LI, Vibert THIO, Shizhu LIU, Jitong CHEN, Naihan LI, Yuping WANG
Special effect processing method and apparatus

Patent number: 12040000

Abstract: The present application provides a special effect processing method and apparatus. The method includes: generating an audio signal in response to a touch operation of a user in a process of playing a video; segmenting the audio signal into multiple audio frames; performing, according to attributes of the audio frames, special effect processing on a picture which is currently played in the video.

Type: Grant

Filed: August 7, 2023

Date of Patent: July 16, 2024

Assignee: LEMON INC.

Inventors: Chenyu Sun, Jitong Chen, Nathanael Schager, Maryyann Crichton, Josiah John Serrano, Bochen Li, Xuefan Hu, Fraser Smith, Hwankyoo Shawn Kim, David Trevelyan, Suiyu Feng, Brandon Wu, Tao Xiong
COMPUTING SYSTEM AND METHOD FOR MUSIC GENERATION

Publication number: 20230419930

Abstract: A music generation system is provided comprising a processor and a memory operatively coupled to the processor and storing a rhythm template database comprising a plurality of rhythm templates, and a music generation program stored in the memory and executed by the processor to be configured to receive a user input of lyrics, identify a plurality of syllables in the lyrics, determine a syllable pattern in the identified plurality of syllables, match the syllable pattern to a selected rhythm template of the plurality of rhythm templates, generate a melody based on the selected rhythm template, generate a music file encoding the melody and the lyrics, and output the music file encoding the melody and the lyrics.

Type: Application

Filed: June 24, 2022

Publication date: December 28, 2023

Inventors: Yilin ZHANG, Andrew SHAW, Jitong CHEN
SPECIAL EFFECT PROCESSING METHOD AND APPARATUS

Publication number: 20230377608

Abstract: The present application provides a special effect processing method and apparatus. The method includes: generating an audio signal in response to a touch operation of a user in a process of playing a video; segmenting the audio signal into multiple audio frames; performing, according to attributes of the audio frames, special effect processing on a picture which is currently played in the video.

Type: Application

Filed: August 7, 2023

Publication date: November 23, 2023

Inventors: Chenyu SUN, Jitong CHEN, Nathanael SCHAGER, Maryyann CRICHTON, Josiah John SERRANO, Bochen LI, Xuefan HU, Fraser SMITH, Hwankyoo Shawn KIM, David TREVELYAN, Suiyu FENG, Brandon WU, Tao XIONG
AUTOMATIC AND INTERACTIVE MASHUP SYSTEM

Publication number: 20230360618

Abstract: Systems and methods directed to combining audio tracks are provided. More specifically, a first audio track and a second audio track are received. The first audio track is separated into a vocal component and one or more accompaniment components. The second audio track is separated into a vocal component and one or more accompaniment components. A structure of the first audio track and a structure of the second audio track are determined. The first audio track and the second audio track are aligned based on the determined structures of the tracks. The vocal component of the first audio track is stretched to match a tempo of the second audio track. The stretched vocal component of the first audio track is added to the one or more accompaniment components of the second audio track.

Type: Application

Filed: May 5, 2022

Publication date: November 9, 2023

Inventors: Vibert THIO, Bochen LI, Haonan CHEN, Jitong CHEN
APPROACH TO AUTOMATIC MUSIC REMIX BASED ON STYLE TEMPLATES

Publication number: 20230360619

Abstract: In examples, a method for generating a remixed audio sample is provided. The method may include receiving an audio portion, obtaining metadata from the received audio portion, and analyzing the metadata and generating a symbolic music representation based on the analyzed metadata. In some examples, a selection of a style asset is received and applied to the symbolic music representation. Accordingly, a remixed audio portion may be rendered based on the stylized symbolic representation. That is, metadata associated with a song or song portion may be analyzed to identify a tempo, key, structure, chord, and/or progressions, etc., such that a remixed version of the song can be provided with customized instrumental arrangements and styles.

Type: Application

Filed: May 5, 2022

Publication date: November 9, 2023

Inventors: Bochen LI, Vibert THIO, Haonan CHEN, Xuefan HU, Jitong CHEN
CONVERTING AUDIO SAMPLES TO FULL SONG ARRANGEMENTS

Publication number: 20230360620

Abstract: In examples, a method for converting audio samples to full song arrangements is provided. The method includes receiving audio sample data, determining a melodic transcription, based on the audio sample data, and determining a sequence of music chords, based on the melodic transcription. The method further includes generating a full song arrangement, based on the sequence of music chords, and the audio sample data.

Type: Application

Filed: May 5, 2022

Publication date: November 9, 2023

Inventors: Bochen LI, Andrew SHAW, Jitong CHEN
BEATBOXING TRANSCRIPTION

Publication number: 20230282188

Abstract: Methods, systems, and storage media for generating a beatbox transcript are disclosed. Some examples may include: receiving an audio signal having a plurality of beatbox sounds, generating a spectrogram of the audio signal, processing the spectrogram of the audio signal with a neural network model trained on training samples including beatbox sounds, generating, by the neural network model a beatbox sound activation map including a plurality of activation times for a plurality of beatbox sounds, decoding the beatbox sound activation map into a beatbox transcript and providing the beatbox transcript as an output.

Type: Application

Filed: March 7, 2022

Publication date: September 7, 2023

Inventors: Bochen Li, Rodrigo Castellon, Daiyu Zhang, Jitong Chen
INTERACTIVE MOVEMENT AUDIO ENGINE

Publication number: 20230197040

Abstract: A method for generating an audio output is described. Image inputs of interactive movements by a user captured by an image sensor are received. The interactive movements are mapped to a sequence of audio element identifiers. The sequence of audio element identifiers are processed to generate a musical sequence by performing music theory rule enforcement on the sequence of audio element identifiers. An audio output that represents the musical sequence is generated.

Type: Application

Filed: December 20, 2021

Publication date: June 22, 2023

Inventors: Bochen Li, Daiyu Zhang, Shawn Chan Zhen Yi, Jitong Chen
DIFFERENTIABLE WAVETABLE SYNTHESIZER

Publication number: 20230154451

Abstract: The present disclosure describes techniques for differentiable wavetable synthesizer. The techniques comprise extracting features from a dataset of sounds, wherein the features comprise at least timbre embedding; input the features to the first machine learning model, wherein the first machine learning model is configured to extract a set of N×L learnable parameters, N represents a number of wavetables, and L represents a wavetable length; outputting a plurality of wavetables, wherein each of plurality of wavetables comprises a waveform associated with a unique timbre, the plurality of wavetables form a dictionary, and the plurality of wavetables are portable to perform audio-related tasks.

Type: Application

Filed: November 12, 2021

Publication date: May 18, 2023

Inventors: Lamtharn HANTRAKUL, Siyuan Shan, Jitong Chen, Matthew David Avent, David Trevelyan
Waveform generation using end-to-end text-to-waveform system

Patent number: 11482207

Abstract: Described herein are embodiments of an end-to-end text-to-speech (TTS) system with parallel wave generation. In one or more embodiments, a Gaussian inverse autoregressive flow is distilled from an autoregressive WaveNet by minimizing a novel regularized Kullback-Leibler (KL) divergence between their highly-peaked output distributions. Embodiments of the methodology computes the KL divergence in a closed-form, which simplifies the training process and provides very efficient distillation. Embodiments of a novel text-to-wave neural architecture for speech synthesis are also described, which are fully convolutional and enable fast end-to-end training from scratch. These embodiments significantly outperform the previous pipeline that connects a text-to-spectrogram model to a separately trained WaveNet. Also, a parallel waveform synthesizer embodiment conditioned on the hidden representation in an embodiment of this end-to-end model were successfully distilled.

Type: Grant

Filed: December 21, 2020

Date of Patent: October 25, 2022

Assignee: Baidu USA LLC

Inventors: Wei Ping, Kainan Peng, Jitong Chen
Systems and methods for neural voice cloning with a few samples

Patent number: 11238843

Abstract: Voice cloning is a highly desired capability for personalized speech interfaces. Neural network-based speech synthesis has been shown to generate high quality speech for a large number of speakers. Neural voice cloning systems that take a few audio samples as input are presented herein. Two approaches, speaker adaptation and speaker encoding, are disclosed. Speaker adaptation embodiments are based on fine-tuning a multi-speaker generative model with a few cloning samples. Speaker encoding embodiments are based on training a separate model to directly infer a new speaker embedding from cloning audios, which is used in or with a multi-speaker generative model. Both approaches achieve good performance in terms of naturalness of the speech and its similarity to original speaker—even with very few cloning audios.

Type: Grant

Filed: September 26, 2018

Date of Patent: February 1, 2022

Assignee: Baidu USA LLC

Inventors: Sercan O. Arik, Jitong Chen, Kainan Peng, Wei Ping, Yanqi Zhou
WAVEFORM GENERATION USING END-TO-END TEXT-TO-WAVEFORM SYSTEM

Publication number: 20210110810

Abstract: Described herein are embodiments of an end-to-end text-to-speech (TTS) system with parallel wave generation. In one or more embodiments, a Gaussian inverse autoregressive flow is distilled from an autoregressive WaveNet by minimizing a novel regularized Kullback-Leibler (KL) divergence between their highly-peaked output distributions. Embodiments of the methodology computes the KL divergence in a closed-form, which simplifies the training process and provides very efficient distillation. Embodiments of a novel text-to-wave neural architecture for speech synthesis are also described, which are fully convolutional and enable fast end-to-end training from scratch. These embodiments significantly outperform the previous pipeline that connects a text-to-spectrogram model to a separately trained WaveNet. Also, a parallel waveform synthesizer embodiment conditioned on the hidden representation in an embodiment of this end-to-end model were successfully distilled.

Type: Application

Filed: December 21, 2020

Publication date: April 15, 2021

Applicant: Baidu USA LLC

Inventors: Wei PING, Kainan PENG, Jitong CHEN
Systems and methods for parallel wave generation in end-to-end text-to-speech

Patent number: 10872596

Abstract: Described herein are embodiments of an end-to-end text-to-speech (TTS) system with parallel wave generation. In one or more embodiments, a Gaussian inverse autoregressive flow is distilled from an autoregressive WaveNet by minimizing a novel regularized Kullback-Leibler (KL) divergence between their highly-peaked output distributions. Embodiments of the methodology computes the KL divergence in a closed-form, which simplifies the training process and provides very efficient distillation. Embodiments of a novel text-to-wave neural architecture for speech synthesis are also described, which are fully convolutional and enable fast end-to-end training from scratch. These embodiments significantly outperform the previous pipeline that connects a text-to-spectrogram model to a separately trained WaveNet. Also, a parallel waveform synthesizer embodiment conditioned on the hidden representation in an embodiment of this end-to-end model were successfully distilled.

Type: Grant

Filed: February 15, 2019

Date of Patent: December 22, 2020

Assignee: Baidu USA LLC

Inventors: Wei Ping, Kainan Peng, Jitong Chen
SYSTEMS AND METHODS FOR NEURAL VOICE CLONING WITH A FEW SAMPLES

Publication number: 20190251952

Abstract: Voice cloning is a highly desired capability for personalized speech interfaces. Neural network-based speech synthesis has been shown to generate high quality speech for a large number of speakers. Neural voice cloning systems that take a few audio samples as input are presented herein. Two approaches, speaker adaptation and speaker encoding, are disclosed. Speaker adaptation embodiments are based on fine-tuning a multi-speaker generative model with a few cloning samples. Speaker encoding embodiments are based on training a separate model to directly infer a new speaker embedding from cloning audios, which is used in or with a multi-speaker generative model. Both approaches achieve good performance in terms of naturalness of the speech and its similarity to original speaker—even with very few cloning audios.

Type: Application

Filed: September 26, 2018

Publication date: August 15, 2019

Applicant: Baidu USA LLC

Inventors: Sercan O. ARIK, Jitong CHEN, Kainan PENG, Wei PING, Yanqi ZHOU
SYSTEMS AND METHODS FOR PARALLEL WAVE GENERATION IN END-TO-END TEXT-TO-SPEECH

Publication number: 20190180732

Abstract: Described herein are embodiments of an end-to-end text-to-speech (TTS) system with parallel wave generation. In one or more embodiments, a Gaussian inverse autoregressive flow is distilled from an autoregressive WaveNet by minimizing a novel regularized Kullback-Leibler (KL) divergence between their highly-peaked output distributions. Embodiments of the methodology computes the KL divergence in a closed-form, which simplifies the training process and provides very efficient distillation. Embodiments of a novel text-to-wave neural architecture for speech synthesis are also described, which are fully convolutional and enable fast end-to-end training from scratch. These embodiments significantly outperform the previous pipeline that connects a text-to-spectrogram model to a separately trained WaveNet. Also, a parallel waveform synthesizer embodiment conditioned on the hidden representation in an embodiment of this end-to-end model were successfully distilled.

Type: Application

Filed: February 15, 2019

Publication date: June 13, 2019

Applicant: Baidu USA LLC

Inventors: Wei PING, Kainan PENG, Jitong CHEN