Patents by Inventor Chengzhu YU

Chengzhu YU has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Multistage curriculum training framework for acoustic-to-word speech recognition

Patent number: 11004443

Abstract: Methods and apparatuses are provided for performing acoustic to word (A2W) speech recognition training performed by at least one processor. The method includes initializing, by the at least one processor, one or more first layers of a neural network with phone based Connectionist Temporal Classification (CTC), initializing, by the at least one processor, one or more second layers of the neural network with grapheme based CTC, acquiring, by the at least one processor, training data and performing, by the at least one processor, A2W speech recognition training based the initialized one or more first layers and one or more second layers of the neural network using the training data.

Type: Grant

Filed: August 30, 2018

Date of Patent: May 11, 2021

Assignee: TENCENT AMERICA LLC

Inventors: Chengzhu Yu, Chao Weng, Jia Cui, Dong Yu
MULTI-BAND SYNCHRONIZED NEURAL VOCODER

Publication number: 20210090584

Abstract: An apparatus and a method include receiving an input audio signal to be processed by a multi-band synchronized neural vocoder. The input audio signal is separated into a plurality of frequency bands. A plurality of audio signals corresponding to the plurality of frequency bands is obtained. Each of the audio signals is downsampled, and processed by the multi-band synchronized neural vocoder. An audio output signal is generated.

Type: Application

Filed: September 20, 2019

Publication date: March 25, 2021

Applicant: TENCENT AMERICA LLC

Inventors: Chengzhu YU, Meng YU, Heng LU, Dong YU
DURATION INFORMED ATTENTION NETWORK (DURIAN) FOR AUDIO-VISUAL SYNTHESIS

Publication number: 20210056949

Abstract: A method and apparatus include receiving a text input that includes a sequence of text components. Respective temporal durations of the text components are determined using a duration model. A spectrogram frame is generated based on the duration model. An audio waveform is generated based on the spectrogram frame. Video information is generated based on the audio waveform. The audio waveform is provided as an output along with a corresponding video.

Type: Application

Filed: August 23, 2019

Publication date: February 25, 2021

Applicant: TENCENT AMERICA LLC

Inventors: Heng LU, Chengzhu YU, Dong YU
DURATION INFORMED ATTENTION NETWORK FOR TEXT-TO-SPEECH ANALYSIS

Publication number: 20200342849

Abstract: A method and apparatus include receiving a text input that includes a sequence of text components. Respective temporal durations of the text components are determined using a duration model. A first set of spectra is generated based on the sequence of text components. A second set of spectra is generated based on the first set of spectra and the respective temporal durations of the sequence of text components. A spectrogram frame is generated based on the second set of spectra. An audio waveform is generated based on the spectrogram frame. The audio waveform is provided as an output.

Type: Application

Filed: April 29, 2019

Publication date: October 29, 2020

Applicant: TENCENT AMERICA LLC

Inventors: Chengzhu YU, Heng LU, Dong YU
AUTOMATIC LEXICAL SEMEME PREDICTION SYSTEM USING LEXICAL DICTIONARIES

Publication number: 20200311196

Abstract: Method and apparatus for automatically predicting lexical sememes using a lexical dictionary, comprising inputting a word, retrieving the word's semantic definition and sememes corresponding to the word from an online dictionary, setting each of the retrieved sememes as a candidate sememe, inputting the word's semantic definition and candidate sememe, and estimating the probability that the candidate sememe can be inferred from the word's semantic definition.

Type: Application

Filed: March 26, 2019

Publication date: October 1, 2020

Applicant: TENCENT AMERICA LLC

Inventors: Kun XU, Chao WENG, Chengzhu YU, Dong YU
UNSUPERVISED AUTOMATIC SPEECH RECOGNITION

Publication number: 20200258497

Abstract: A method for generating an automatic speech recognition (ASR) model using unsupervised learning includes obtaining, by a device, text information. The method includes determining, by the device, a set of phoneme sequences associated with the text information. The method includes obtaining, by the device, speech waveform data. The method includes determining, by the device, a set of phoneme boundaries associated with the speech waveform data. The method includes generating, by the device, the ASR model using an output distribution matching (ODM) technique based on determining the set of phoneme sequences associated with the text information and based on determining the set of phoneme boundaries associated with the speech waveform data.

Type: Application

Filed: February 7, 2019

Publication date: August 13, 2020

Applicant: TENCENT AMERICA LLC

Inventors: Jianshu Chen, Chengzhu Yu, Dong Yu, Chih-Kuan Yeh
Input-feeding architecture for attention based end-to-end speech recognition

Patent number: 10672382

Abstract: Methods and apparatuses are provided for performing end-to-end speech recognition training performed by at least one processor. The method includes receiving, by the at least one processor, one or more input speech frames, generating, by the at least one processor, a sequence of encoder hidden states by transforming the input speech frames, computing, by the at least one processor, attention weights based on each of the sequence of encoder hidden states and a current decoder hidden state, performing, by the at least one processor, a decoding operation based on a previous embedded label prediction information and a previous attentional hidden state information generated based on the attention weights; and generating a current embedded label prediction information based on a result of the decoding operation and the attention weights.

Type: Grant

Filed: October 15, 2018

Date of Patent: June 2, 2020

Assignee: TENCENT AMERICA LLC

Inventors: Chao Weng, Jia Cui, Guangsen Wang, Jun Wang, Chengzhu Yu, Dan Su, Dong Yu
N- BEST SOFTMAX SMOOTHING FOR MINIMUM BAYES RISK TRAINING OF ATTENTION BASED SEQUENCE-TO-SEQUENCE MODELS

Publication number: 20200151623

Abstract: A method and apparatus are provided that analyzing sequence-to-sequence data, such as sequence-to-sequence speech data or sequence-to-sequence machine translation data for example, by minimum Bayes risk (MBR) training a sequence-to-sequence model and within introduction of applications of softmax smoothing to an N-best generation of the MBR training of the sequence-to-sequence model.

Type: Application

Filed: November 14, 2018

Publication date: May 14, 2020

Applicant: TENCENT America LLC

Inventors: Chao WENG, Jia CUI, Guangsen WANG, Jun WANG, Chengzhu YU, Dan SU, Dong YU
MULTI-TASK TRAINING ARCHITECTURE AND STRATEGY FOR ATTENTION-BASED SPEECH RECOGNITION SYSTEM

Publication number: 20200135174

Abstract: Methods and apparatuses are provided for performing sequence to sequence (Seq2Seq) speech recognition training performed by at least one processor. The method includes acquiring a training set comprising a plurality of pairs of input data and target data corresponding to the input data, encoding the input data into a sequence of hidden states, performing a connectionist temporal classification (CTC) model training based on the sequence of hidden states, performing an attention model training based on the sequence of hidden states, and decoding the sequence of hidden states to generate target labels by independently performing the CTC model training and the attention model training.

Type: Application

Filed: October 24, 2018

Publication date: April 30, 2020

Applicant: TENCENT AMERICA LLC

Inventors: Jia CUI, Chao WENG, Guangsen WANG, Jun WANG, Chengzhu YU, Dan SU, Dong YU
INPUT-FEEDING ARCHITECTURE FOR ATTENTION BASED END-TO-END SPEECH RECOGNITION

Publication number: 20200118547

Abstract: Methods and apparatuses are provided for performing end-to-end speech recognition training performed by at least one processor. The method includes receiving, by the at least one processor, one or more input speech frames, generating, by the at least one processor, a sequence of encoder hidden states by transforming the input speech frames, computing, by the at least one processor, attention weights based on each of the sequence of encoder hidden states and a current decoder hidden state, performing, by the at least one processor, a decoding operation based on a previous embedded label prediction information and a previous attentional hidden state information generated based on the attention weights; and generating a current embedded label prediction information based on a result of the decoding operation and the attention weights.

Type: Application

Filed: October 15, 2018

Publication date: April 16, 2020

Applicant: TENCENT AMERICA LLC

Inventors: Chao WENG, Jia Cui, Guangsen WANG, Jun Wang, Chengzhu Yu, Dan Su, Dong Yu
MULTISTAGE CURRICULUM TRAINING FRAMEWORK FOR ACOUSTIC-TO-WORD SPEECH RECOGNITION

Publication number: 20200074983

Abstract: Methods and apparatuses are provided for performing acoustic to word (A2W) speech recognition training performed by at least one processor. The method includes initializing, by the at least one processor, one or more first layers of a neural network with phone based Connectionist Temporal Classification (CTC), initializing, by the at least one processor, one or more second layers of the neural network with grapheme based CTC, acquiring, by the at least one processor, training data and performing, by the at least one processor, A2W speech recognition training based the initialized one or more first layers and one or more second layers of the neural network using the training data.

Type: Application

Filed: August 30, 2018

Publication date: March 5, 2020

Applicant: TENCENT AMERICA LLC

Inventors: Chengzhu YU, Chao WENG, Jia CUI, Dong YU

prev 1 2 3