Patents by Inventor Chengzhu YU

Chengzhu YU has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

CONSUMABLE CHIP, CONSUMABLE AND COMMUNICATION METHOD

Publication number: 20220317613

Abstract: The present disclosure provides a consumable chip, a consumable, and a communication method between the image forming apparatus and the consumable chip. The consumable chip is capable of being installed on the consumable, and the consumable is capable of being detachably installed on the image forming apparatus. The consumable chip includes a first storage unit and a chip control unit. The first storage unit is configured to store first fixed data representing attribute information of the consumable. The chip control unit is configured to receive an authentication request sent by the image forming apparatus, obtain the first fixed data and first variable data representing consumption information of the consumable, generate authentication data by performing a calculation on the first fixed data and the first variable data according to a first preset algorithm, and send the authentication data for determining whether the consumable meets expectation to the image forming apparatus.

Type: Application

Filed: March 25, 2022

Publication date: October 6, 2022

Inventors: Chengzhu YU, Dan NING
Learning singing from speech

Patent number: 11430431

Abstract: A method, computer program, and computer system is provided for converting a singing voice of a first person associated with a first speaker to a singing voice of a second person using a speaking voice of the second person associated with a second speaker. A context associated with one or more phonemes corresponding to the singing voice of a first person is encoded, and the one or more phonemes are aligned to one or more target acoustic frames based on the encoded context. One or more mel-spectrogram features are recursively generated from the aligned phonemes, the target acoustic frames, and a sample of the speaking voice of the second person. A sample corresponding to the singing voice of a first person is converted to a sample corresponding to the second singing voice using the generated mel-spectrogram features.

Type: Grant

Filed: February 6, 2020

Date of Patent: August 30, 2022

Assignee: TENCENT AMERICA LLC

Inventors: Chengzhu Yu, Heng Lu, Chao Weng, Dong Yu
MULTI-BAND SYNCHRONIZED NEURAL VOCODER

Publication number: 20220189495

Abstract: An apparatus and a method include receiving an input audio signal to be processed by a multi-band synchronized neural vocoder. The input audio signal is separated into a plurality of frequency bands. A plurality of audio signals corresponding to the plurality of frequency bands is obtained. Each of the audio signals is downsampled, and processed by the multi-band synchronized neural vocoder. An audio output signal is generated.

Type: Application

Filed: March 4, 2022

Publication date: June 16, 2022

Applicant: TENCENT AMERICA LLC

Inventors: Chengzhu YU, Meng YU, Heng LU, Dong YU
LEARNABLE SPEED CONTROL OF SPEECH SYNTHESIS

Publication number: 20220180856

Abstract: A method, computer program, and computer system is provided for synthesizing speech at one or more speeds. A context associated with one or more phonemes corresponding to a speaking voice is encoded, and the one or more phonemes are aligned to one or more target acoustic frames based on the encoded context. One or more mel-spectrogram features are recursively generated from the aligned phonemes and target acoustic frames, and a voice sample corresponding to the speaking voice is synthesized using the generated mel-spectrogram features.

Type: Application

Filed: February 24, 2022

Publication date: June 9, 2022

Applicant: TENCENT AMERICA LLC

Inventors: Chengzhu Yu, Dong Yu
MULTI-TASK TRAINING ARCHITECTURE AND STRATEGY FOR ATTENTION-BASED SPEECH RECOGNITION SYSTEM

Publication number: 20220115005

Abstract: Methods and apparatuses are provided for performing sequence to sequence (Seq2Seq) speech recognition training performed by at least one processor. The method includes acquiring a training set comprising a plurality of pairs of input data and target data corresponding to the input data, encoding the input data into a sequence of hidden states, performing a connectionist temporal classification (CTC) model training based on the sequence of hidden states, performing an attention model training based on the sequence of hidden states, and decoding the sequence of hidden states to generate target labels by independently performing the CTC model training and the attention model training.

Type: Application

Filed: December 22, 2021

Publication date: April 14, 2022

Applicant: TENCENT AMERICA LLC

Inventors: Jia CUI, Chao WENG, Guangsen WANG, Jun WANG, Chengzhu YU, Dan SU, Dong YU
Learnable speed control for speech synthesis

Patent number: 11302301

Abstract: A method, computer program, and computer system is provided for synthesizing speech at one or more speeds. A context associated with one or more phonemes corresponding to a speaking voice is encoded, and the one or more phonemes are aligned to one or more target acoustic frames based on the encoded context. One or more mel-spectrogram features are recursively generated from the aligned phonemes and target acoustic frames, and a voice sample corresponding to the speaking voice is synthesized using the generated mel-spectrogram features.

Type: Grant

Filed: March 3, 2020

Date of Patent: April 12, 2022

Assignee: TENCENT AMERICA LLC

Inventors: Chengzhu Yu, Dong Yu
Multi-band synchronized neural vocoder

Patent number: 11295751

Abstract: An apparatus and a method include receiving an input audio signal to be processed by a multi-band synchronized neural vocoder. The input audio signal is separated into a plurality of frequency bands. A plurality of audio signals corresponding to the plurality of frequency bands is obtained. Each of the audio signals is downsampled, and processed by the multi-band synchronized neural vocoder. An audio output signal is generated.

Type: Grant

Filed: September 20, 2019

Date of Patent: April 5, 2022

Assignee: TENCENT AMERICA LLC

Inventors: Chengzhu Yu, Meng Yu, Heng Lu, Dong Yu
Unsupervised singing voice conversion with pitch adversarial network

Patent number: 11257480

Abstract: A method, a computer readable medium, and a computer system are provided for singing voice conversion. Data corresponding to a singing voice is received. One or more features and pitch data are extracted from the received data using one or more adversarial neural networks. One or more audio samples are generated based on the extracted pitch data and the one or more features.

Type: Grant

Filed: March 3, 2020

Date of Patent: February 22, 2022

Assignee: TENCENT AMERICA LLC

Inventors: Chengzhu Yu, Heng Lu, Chao Weng, Dong Yu
Multi-task training architecture and strategy for attention-based speech recognition system

Patent number: 11257481

Abstract: Methods and apparatuses are provided for performing sequence to sequence (Seq2Seq) speech recognition training performed by at least one processor. The method includes acquiring a training set comprising a plurality of pairs of input data and target data corresponding to the input data, encoding the input data into a sequence of hidden states, performing a connectionist temporal classification (CTC) model training based on the sequence of hidden states, performing an attention model training based on the sequence of hidden states, and decoding the sequence of hidden states to generate target labels by independently performing the CTC model training and the attention model training.

Type: Grant

Filed: October 24, 2018

Date of Patent: February 22, 2022

Assignee: TENCENT AMERICA LLC

Inventors: Jia Cui, Chao Weng, Guangsen Wang, Jun Wang, Chengzhu Yu, Dan Su, Dong Yu
SINGING VOICE CONVERSION

Publication number: 20220036874

Abstract: A method, computer program, and computer system is provided for converting a singing first singing voice associated with a first speaker to a second singing voice associated with a second speaker. A context associated with one or more phonemes corresponding to the first singing voice is encoded, and the one or more phonemes are aligned to one or more target acoustic frames based on the encoded context. One or more mel-spectrogram features are recursively generated from the aligned phonemes and target acoustic frames, and a sample corresponding to the first singing voice is converted to a sample corresponding to the second singing voice using the generated mel-spectrogram features.

Type: Application

Filed: October 14, 2021

Publication date: February 3, 2022

Applicant: TENCENT AMERICA LLC

Inventors: Chengzhu YU, Heng LU, Chao WENG, Dong YU
AUTOMATIC LEXICAL SEMEME PREDICTION SYSTEM USING LEXICAL DICTIONARIES

Publication number: 20220027567

Abstract: Method and apparatus for automatically predicting lexical sememes using a lexical dictionary, comprising inputting a word, retrieving the word's semantic definition and sememes corresponding to the word from an online dictionary, setting each of the retrieved sememes as a candidate sememe, inputting the word's semantic definition and candidate sememe, and estimating the probability that the candidate sememe can be inferred from the word's semantic definition.

Type: Application

Filed: September 8, 2021

Publication date: January 27, 2022

Applicant: TENCENT AMERICA LLC

Inventors: Kun XU, Chao WENG, Chengzhu YU, Dong YU
DURATION INFORMED ATTENTION NETWORK (DURIAN) FOR AUDIO-VISUAL SYNTHESIS

Publication number: 20210375259

Abstract: A method and apparatus include receiving a text input that includes a sequence of text components. Respective temporal durations of the text components are determined using a duration model. A spectrogram frame is generated based on the duration model. An audio waveform is generated based on the spectrogram frame. Video information is generated based on the audio waveform. The audio waveform is provided as an output along with a corresponding video.

Type: Application

Filed: August 6, 2021

Publication date: December 2, 2021

Applicant: TENCENT AMERICA LLC

Inventors: Heng LU, Chengzhu Yu, Dong Yu
Singing voice conversion

Patent number: 11183168

Abstract: A method, computer program, and computer system is provided for converting a singing first singing voice associated with a first speaker to a second singing voice associated with a second speaker. A context associated with one or more phonemes corresponding to the first singing voice is encoded, and the one or more phonemes are aligned to one or more target acoustic frames based on the encoded context. One or more mel-spectrogram features are recursively generated from the aligned phonemes and target acoustic frames, and a sample corresponding to the first singing voice is converted to a sample corresponding to the second singing voice using the generated mel-spectrogram features.

Type: Grant

Filed: February 13, 2020

Date of Patent: November 23, 2021

Assignee: TENCENT AMERICA LLC

Inventors: Chengzhu Yu, Heng Lu, Chao Weng, Dong Yu
Automatic lexical sememe prediction system using lexical dictionaries

Patent number: 11170167

Abstract: Method and apparatus for automatically predicting lexical sememes using a lexical dictionary, comprising inputting a word, retrieving the word's semantic definition and sememes corresponding to the word from an online dictionary, setting each of the retrieved sememes as a candidate sememe, inputting the word's semantic definition and candidate sememe, and estimating the probability that the candidate sememe can be inferred from the word's semantic definition.

Type: Grant

Filed: March 26, 2019

Date of Patent: November 9, 2021

Assignee: TENCENT AMERICA LLC

Inventors: Kun Xu, Chao Weng, Chengzhu Yu, Dong Yu
Duration informed attention network (DURIAN) for audio-visual synthesis

Patent number: 11151979

Abstract: A method and apparatus include receiving a text input that includes a sequence of text components. Respective temporal durations of the text components are determined using a duration model. A spectrogram frame is generated based on the duration model. An audio waveform is generated based on the spectrogram frame. Video information is generated based on the audio waveform. The audio waveform is provided as an output along with a corresponding video.

Type: Grant

Filed: August 23, 2019

Date of Patent: October 19, 2021

Assignee: TENCENT AMERICA LLC

Inventors: Heng Lu, Chengzhu Yu, Dong Yu
Unsupervised automatic speech recognition

Patent number: 11138966

Abstract: A method for generating an automatic speech recognition (ASR) model using unsupervised learning includes obtaining, by a device, text information. The method includes determining, by the device, a set of phoneme sequences associated with the text information. The method includes obtaining, by the device, speech waveform data. The method includes determining, by the device, a set of phoneme boundaries associated with the speech waveform data. The method includes generating, by the device, the ASR model using an output distribution matching (ODM) technique based on determining the set of phoneme sequences associated with the text information and based on determining the set of phoneme boundaries associated with the speech waveform data.

Type: Grant

Filed: February 7, 2019

Date of Patent: October 5, 2021

Assignee: TENCENT AMERICA LLC

Inventors: Jianshu Chen, Chengzhu Yu, Dong Yu, Chih-Kuan Yeh
LEARNABLE SPEED CONTROL FOR SPEECH SYNTHESIS

Publication number: 20210280164

Abstract: A method, computer program, and computer system is provided for synthesizing speech at one or more speeds. A context associated with one or more phonemes corresponding to a speaking voice is encoded, and the one or more phonemes are aligned to one or more target acoustic frames based on the encoded context. One or more mel-spectrogram features are recursively generated from the aligned phonemes and target acoustic frames, and a voice sample corresponding to the speaking voice is synthesized using the generated mel-spectrogram features.

Type: Application

Filed: March 3, 2020

Publication date: September 9, 2021

Applicant: TENCENT AMERICA LLC

Inventors: Chengzhu YU, Dong YU
UNSUPERVISED SINGING VOICE CONVERSION WITH PITCH ADVERSARIAL NETWORK

Publication number: 20210280165

Abstract: A method, a computer readable medium, and a computer system are provided for singing voice conversion. Data corresponding to a singing voice is received. One or more features and pitch data are extracted from the received data using one or more adversarial neural networks. One or more audio samples are generated based on the extracted pitch data and the one or more features.

Type: Application

Filed: March 3, 2020

Publication date: September 9, 2021

Applicant: TENCENT AMERICA LLC

Inventors: Chengzhu YU, Heng Lu, Chao Weng, Dong Yu
SINGING VOICE CONVERSION

Publication number: 20210256958

Abstract: A method, computer program, and computer system is provided for converting a singing first singing voice associated with a first speaker to a second singing voice associated with a second speaker. A context associated with one or more phonemes corresponding to the first singing voice is encoded, and the one or more phonemes are aligned to one or more target acoustic frames based on the encoded context. One or more mel-spectrogram features are recursively generated from the aligned phonemes and target acoustic frames, and a sample corresponding to the first singing voice is converted to a sample corresponding to the second singing voice using the generated mel-spectrogram features.

Type: Application

Filed: February 13, 2020

Publication date: August 19, 2021

Applicant: TENCENT AMERICA LLC

Inventors: Chengzhu YU, Heng LU, Chao WENG, Dong YU
LEARNING SINGING FROM SPEECH

Publication number: 20210248997

Abstract: A method, computer program, and computer system is provided for converting a singing voice of a first person associated with a first speaker to a singing voice of a second person using a speaking voice of the second person associated with a second speaker. A context associated with one or more phonemes corresponding to the singing voice of a first person is encoded, and the one or more phonemes are aligned to one or more target acoustic frames based on the encoded context. One or more mel-spectrogram features are recursively generated from the aligned phonemes, the target acoustic frames, and a sample of the speaking voice of the second person. A sample corresponding to the singing voice of a first person is converted to a sample corresponding to the second singing voice using the generated mel-spectrogram features.

Type: Application

Filed: February 6, 2020

Publication date: August 12, 2021

Applicant: TENCENT AMERICA LLC

Inventors: Chengzhu YU, Heng LU, Chao WENG, Dong YU

prev 1 2 3 next