Patents by Inventor Chengzhu YU

Chengzhu YU has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11004443
    Abstract: Methods and apparatuses are provided for performing acoustic to word (A2W) speech recognition training performed by at least one processor. The method includes initializing, by the at least one processor, one or more first layers of a neural network with phone based Connectionist Temporal Classification (CTC), initializing, by the at least one processor, one or more second layers of the neural network with grapheme based CTC, acquiring, by the at least one processor, training data and performing, by the at least one processor, A2W speech recognition training based the initialized one or more first layers and one or more second layers of the neural network using the training data.
    Type: Grant
    Filed: August 30, 2018
    Date of Patent: May 11, 2021
    Assignee: TENCENT AMERICA LLC
    Inventors: Chengzhu Yu, Chao Weng, Jia Cui, Dong Yu
  • Publication number: 20210090584
    Abstract: An apparatus and a method include receiving an input audio signal to be processed by a multi-band synchronized neural vocoder. The input audio signal is separated into a plurality of frequency bands. A plurality of audio signals corresponding to the plurality of frequency bands is obtained. Each of the audio signals is downsampled, and processed by the multi-band synchronized neural vocoder. An audio output signal is generated.
    Type: Application
    Filed: September 20, 2019
    Publication date: March 25, 2021
    Applicant: TENCENT AMERICA LLC
    Inventors: Chengzhu YU, Meng YU, Heng LU, Dong YU
  • Publication number: 20210056949
    Abstract: A method and apparatus include receiving a text input that includes a sequence of text components. Respective temporal durations of the text components are determined using a duration model. A spectrogram frame is generated based on the duration model. An audio waveform is generated based on the spectrogram frame. Video information is generated based on the audio waveform. The audio waveform is provided as an output along with a corresponding video.
    Type: Application
    Filed: August 23, 2019
    Publication date: February 25, 2021
    Applicant: TENCENT AMERICA LLC
    Inventors: Heng LU, Chengzhu YU, Dong YU
  • Publication number: 20200342849
    Abstract: A method and apparatus include receiving a text input that includes a sequence of text components. Respective temporal durations of the text components are determined using a duration model. A first set of spectra is generated based on the sequence of text components. A second set of spectra is generated based on the first set of spectra and the respective temporal durations of the sequence of text components. A spectrogram frame is generated based on the second set of spectra. An audio waveform is generated based on the spectrogram frame. The audio waveform is provided as an output.
    Type: Application
    Filed: April 29, 2019
    Publication date: October 29, 2020
    Applicant: TENCENT AMERICA LLC
    Inventors: Chengzhu YU, Heng LU, Dong YU
  • Publication number: 20200311196
    Abstract: Method and apparatus for automatically predicting lexical sememes using a lexical dictionary, comprising inputting a word, retrieving the word's semantic definition and sememes corresponding to the word from an online dictionary, setting each of the retrieved sememes as a candidate sememe, inputting the word's semantic definition and candidate sememe, and estimating the probability that the candidate sememe can be inferred from the word's semantic definition.
    Type: Application
    Filed: March 26, 2019
    Publication date: October 1, 2020
    Applicant: TENCENT AMERICA LLC
    Inventors: Kun XU, Chao WENG, Chengzhu YU, Dong YU
  • Publication number: 20200258497
    Abstract: A method for generating an automatic speech recognition (ASR) model using unsupervised learning includes obtaining, by a device, text information. The method includes determining, by the device, a set of phoneme sequences associated with the text information. The method includes obtaining, by the device, speech waveform data. The method includes determining, by the device, a set of phoneme boundaries associated with the speech waveform data. The method includes generating, by the device, the ASR model using an output distribution matching (ODM) technique based on determining the set of phoneme sequences associated with the text information and based on determining the set of phoneme boundaries associated with the speech waveform data.
    Type: Application
    Filed: February 7, 2019
    Publication date: August 13, 2020
    Applicant: TENCENT AMERICA LLC
    Inventors: Jianshu Chen, Chengzhu Yu, Dong Yu, Chih-Kuan Yeh
  • Patent number: 10672382
    Abstract: Methods and apparatuses are provided for performing end-to-end speech recognition training performed by at least one processor. The method includes receiving, by the at least one processor, one or more input speech frames, generating, by the at least one processor, a sequence of encoder hidden states by transforming the input speech frames, computing, by the at least one processor, attention weights based on each of the sequence of encoder hidden states and a current decoder hidden state, performing, by the at least one processor, a decoding operation based on a previous embedded label prediction information and a previous attentional hidden state information generated based on the attention weights; and generating a current embedded label prediction information based on a result of the decoding operation and the attention weights.
    Type: Grant
    Filed: October 15, 2018
    Date of Patent: June 2, 2020
    Assignee: TENCENT AMERICA LLC
    Inventors: Chao Weng, Jia Cui, Guangsen Wang, Jun Wang, Chengzhu Yu, Dan Su, Dong Yu
  • Publication number: 20200151623
    Abstract: A method and apparatus are provided that analyzing sequence-to-sequence data, such as sequence-to-sequence speech data or sequence-to-sequence machine translation data for example, by minimum Bayes risk (MBR) training a sequence-to-sequence model and within introduction of applications of softmax smoothing to an N-best generation of the MBR training of the sequence-to-sequence model.
    Type: Application
    Filed: November 14, 2018
    Publication date: May 14, 2020
    Applicant: TENCENT America LLC
    Inventors: Chao WENG, Jia CUI, Guangsen WANG, Jun WANG, Chengzhu YU, Dan SU, Dong YU
  • Publication number: 20200135174
    Abstract: Methods and apparatuses are provided for performing sequence to sequence (Seq2Seq) speech recognition training performed by at least one processor. The method includes acquiring a training set comprising a plurality of pairs of input data and target data corresponding to the input data, encoding the input data into a sequence of hidden states, performing a connectionist temporal classification (CTC) model training based on the sequence of hidden states, performing an attention model training based on the sequence of hidden states, and decoding the sequence of hidden states to generate target labels by independently performing the CTC model training and the attention model training.
    Type: Application
    Filed: October 24, 2018
    Publication date: April 30, 2020
    Applicant: TENCENT AMERICA LLC
    Inventors: Jia CUI, Chao WENG, Guangsen WANG, Jun WANG, Chengzhu YU, Dan SU, Dong YU
  • Publication number: 20200118547
    Abstract: Methods and apparatuses are provided for performing end-to-end speech recognition training performed by at least one processor. The method includes receiving, by the at least one processor, one or more input speech frames, generating, by the at least one processor, a sequence of encoder hidden states by transforming the input speech frames, computing, by the at least one processor, attention weights based on each of the sequence of encoder hidden states and a current decoder hidden state, performing, by the at least one processor, a decoding operation based on a previous embedded label prediction information and a previous attentional hidden state information generated based on the attention weights; and generating a current embedded label prediction information based on a result of the decoding operation and the attention weights.
    Type: Application
    Filed: October 15, 2018
    Publication date: April 16, 2020
    Applicant: TENCENT AMERICA LLC
    Inventors: Chao WENG, Jia Cui, Guangsen WANG, Jun Wang, Chengzhu Yu, Dan Su, Dong Yu
  • Publication number: 20200074983
    Abstract: Methods and apparatuses are provided for performing acoustic to word (A2W) speech recognition training performed by at least one processor. The method includes initializing, by the at least one processor, one or more first layers of a neural network with phone based Connectionist Temporal Classification (CTC), initializing, by the at least one processor, one or more second layers of the neural network with grapheme based CTC, acquiring, by the at least one processor, training data and performing, by the at least one processor, A2W speech recognition training based the initialized one or more first layers and one or more second layers of the neural network using the training data.
    Type: Application
    Filed: August 30, 2018
    Publication date: March 5, 2020
    Applicant: TENCENT AMERICA LLC
    Inventors: Chengzhu YU, Chao WENG, Jia CUI, Dong YU