Patents by Inventor Jia CUI

Jia CUI has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Multi-task training architecture and strategy for attention-based speech recognition system

Patent number: 11972754

Abstract: Methods and apparatuses are provided for performing sequence to sequence (Seq2Seq) speech recognition training performed by at least one processor. The method includes acquiring a training set comprising a plurality of pairs of input data and target data corresponding to the input data, encoding the input data into a sequence of hidden states, performing a connectionist temporal classification (CTC) model training based on the sequence of hidden states, performing an attention model training based on the sequence of hidden states, and decoding the sequence of hidden states to generate target labels by independently performing the CTC model training and the attention model training.

Type: Grant

Filed: December 22, 2021

Date of Patent: April 30, 2024

Assignee: TENCENT AMERICA LLC

Inventors: Jia Cui, Chao Weng, Guangsen Wang, Jun Wang, Chengzhu Yu, Dan Su, Dong Yu
EFFICIENT HYBRID TEXT NORMALIZATION

Publication number: 20240086637

Abstract: Methods and devices to efficiently normalize text by processing inputted text based on a text normalization model that includes processing the input text in a first stage including a statistical model as a first output, processing the first output in a second stage including a rule based model as a normalized text, and outputting the normalized text.

Type: Application

Filed: September 8, 2022

Publication date: March 14, 2024

Applicant: Tencent America LLC

Inventors: Jia Cui, Dong Yu
SYSTEMS AND METHODS FOR CHARACTER-TO-PHONE CONVERSION

Publication number: 20240054989

Abstract: Systems and methods for training a model to perform end-to-end character-to-phoneme (C2P) conversion include: selecting a plurality of unlabeled sentences from a first data source, selecting a plurality of labeled sentences from a second data source, preprocessing a combined corpus of the selected unlabeled and labeled sentences to extract a plurality of linguistic features, generating mixed training data by automatically labeling tokens in the preprocessed corpus based on the plurality of extracted linguistic features, and training a pre-trained model, using the mixed training data, to perform end-to-end C2P conversion.

Type: Application

Filed: August 15, 2022

Publication date: February 15, 2024

Applicant: TENCENT AMERICA LLC

Inventor: Jia CUI
N-best softmax smoothing for minimum bayes risk training of attention based sequence-to-sequence models

Patent number: 11803618

Abstract: A method and apparatus are provided that analyzing sequence-to-sequence data, such as sequence-to-sequence speech data or sequence-to-sequence machine translation data for example, by minimum Bayes risk (MBR) training a sequence-to-sequence model and within introduction of applications of softmax smoothing to an N-best generation of the MBR training of the sequence-to-sequence model.

Type: Grant

Filed: November 17, 2022

Date of Patent: October 31, 2023

Assignee: TENCENT AMERICA LLC

Inventors: Chao Weng, Jia Cui, Guangsen Wang, Jun Wang, Chengzhu Yu, Dan Su, Dong Yu
Token-wise training for attention based end-to-end speech recognition

Patent number: 11636848

Abstract: A method of attention-based end-to-end (A-E2E) automatic speech recognition (ASR) training, includes performing cross-entropy training of a model, based on one or more input features of a speech signal, determining a posterior probability vector at a time of a first wrong token among one or more output tokens of the model of which the cross-entropy training is performed, and determining a loss of the first wrong token at the time, based on the determined posterior probability vector. The method further includes determining a total loss of a training set of the model of which the cross-entropy training is performed, based on the determined loss of the first wrong token, and updating the model of which the cross-entropy training is performed, based on the determined total loss of the training set.

Type: Grant

Filed: May 11, 2021

Date of Patent: April 25, 2023

Assignee: TENCENT AMERICA LLC

Inventors: Peidong Wang, Jia Cui, Chao Weng, Dong Yu
N-BEST SOFTMAX SMOOTHING FOR MINIMUM BAYES RISK TRAINING OF ATTENTION BASED SEQUENCE-TO-SEQUENCE MODELS

Publication number: 20230092440

Abstract: A method and apparatus are provided that analyzing sequence-to-sequence data, such as sequence-to-sequence speech data or sequence-to-sequence machine translation data for example, by minimum Bayes risk (MBR) training a sequence-to-sequence model and within introduction of applications of softmax smoothing to an N-best generation of the MBR training of the sequence-to-sequence model.

Type: Application

Filed: November 17, 2022

Publication date: March 23, 2023

Applicant: TENCENT AMERICA LLC

Inventors: Chao WENG, Jia CUI, Guangsen WANG, Jun WANG, Chengzhu YU, Dan SU, Dong YU
N-best softmax smoothing for minimum bayes risk training of attention based sequence-to-sequence models

Patent number: 11551136

Abstract: A method and apparatus are provided that analyzing sequence-to-sequence data, such as sequence-to-sequence speech data or sequence-to-sequence machine translation data for example, by minimum Bayes risk (MBR) training a sequence-to-sequence model and within introduction of applications of softmax smoothing to an N-best generation of the MBR training of the sequence-to-sequence model.

Type: Grant

Filed: November 14, 2018

Date of Patent: January 10, 2023

Assignee: TENCENT AMERICA LLC

Inventors: Chao Weng, Jia Cui, Guangsen Wang, Jun Wang, Chengzhu Yu, Dan Su, Dong Yu
MULTI-TASK TRAINING ARCHITECTURE AND STRATEGY FOR ATTENTION-BASED SPEECH RECOGNITION SYSTEM

Publication number: 20220115005

Abstract: Methods and apparatuses are provided for performing sequence to sequence (Seq2Seq) speech recognition training performed by at least one processor. The method includes acquiring a training set comprising a plurality of pairs of input data and target data corresponding to the input data, encoding the input data into a sequence of hidden states, performing a connectionist temporal classification (CTC) model training based on the sequence of hidden states, performing an attention model training based on the sequence of hidden states, and decoding the sequence of hidden states to generate target labels by independently performing the CTC model training and the attention model training.

Type: Application

Filed: December 22, 2021

Publication date: April 14, 2022

Applicant: TENCENT AMERICA LLC

Inventors: Jia CUI, Chao WENG, Guangsen WANG, Jun WANG, Chengzhu YU, Dan SU, Dong YU
Multi-task training architecture and strategy for attention-based speech recognition system

Patent number: 11257481

Abstract: Methods and apparatuses are provided for performing sequence to sequence (Seq2Seq) speech recognition training performed by at least one processor. The method includes acquiring a training set comprising a plurality of pairs of input data and target data corresponding to the input data, encoding the input data into a sequence of hidden states, performing a connectionist temporal classification (CTC) model training based on the sequence of hidden states, performing an attention model training based on the sequence of hidden states, and decoding the sequence of hidden states to generate target labels by independently performing the CTC model training and the attention model training.

Type: Grant

Filed: October 24, 2018

Date of Patent: February 22, 2022

Assignee: TENCENT AMERICA LLC

Inventors: Jia Cui, Chao Weng, Guangsen Wang, Jun Wang, Chengzhu Yu, Dan Su, Dong Yu
TOKEN-WISE TRAINING FOR ATTENTION BASED END-TO-END SPEECH RECOGNITION

Publication number: 20210264901

Abstract: A method of attention-based end-to-end (A-E2E) automatic speech recognition (ASR) training, includes performing cross-entropy training of a model, based on one or more input features of a speech signal, determining a posterior probability vector at a time of a first wrong token among one or more output tokens of the model of which the cross-entropy training is performed, and determining a loss of the first wrong token at the time, based on the determined posterior probability vector. The method further includes determining a total loss of a training set of the model of which the cross-entropy training is performed, based on the determined loss of the first wrong token, and updating the model of which the cross-entropy training is performed, based on the determined total loss of the training set.

Type: Application

Filed: May 11, 2021

Publication date: August 26, 2021

Applicant: TENCENT AMERICA LLC

Inventors: Peidong WANG, Jia CUI, Chao WENG, Dong YU
Token-wise training for attention based end-to-end speech recognition

Patent number: 11037547

Abstract: A method of attention-based end-to-end (A-E2E) automatic speech recognition (ASR) training, includes performing cross-entropy training of a model, based on one or more input features of a speech signal, determining a posterior probability vector at a time of a first wrong token among one or more output tokens of the model of which the cross-entropy training is performed, and determining a loss of the first wrong token at the time, based on the determined posterior probability vector. The method further includes determining a total loss of a training set of the model of which the cross-entropy training is performed, based on the determined loss of the first wrong token, and updating the model of which the cross-entropy training is performed, based on the determined total loss of the training set.

Type: Grant

Filed: February 14, 2019

Date of Patent: June 15, 2021

Assignee: TENCENT AMERICA LLC

Inventors: Peidong Wang, Jia Cui, Chao Weng, Dong Yu
Multistage curriculum training framework for acoustic-to-word speech recognition

Patent number: 11004443

Abstract: Methods and apparatuses are provided for performing acoustic to word (A2W) speech recognition training performed by at least one processor. The method includes initializing, by the at least one processor, one or more first layers of a neural network with phone based Connectionist Temporal Classification (CTC), initializing, by the at least one processor, one or more second layers of the neural network with grapheme based CTC, acquiring, by the at least one processor, training data and performing, by the at least one processor, A2W speech recognition training based the initialized one or more first layers and one or more second layers of the neural network using the training data.

Type: Grant

Filed: August 30, 2018

Date of Patent: May 11, 2021

Assignee: TENCENT AMERICA LLC

Inventors: Chengzhu Yu, Chao Weng, Jia Cui, Dong Yu
Best path change rate for unsupervised language model weight selection

Patent number: 10923117

Abstract: A method for selecting an optimal language model weight (LMW) used to perform automatic speech recognition, including decoding test audio into a lattice using a language model; analyzing the lattice using a first LMW of a plurality of LMWs to determine a first plurality of best paths; analyzing the lattice using a second LMW of the plurality of LMWs to determine a second plurality of best paths; determining a first best path change rate (BCPR) corresponding to the first LMW based on a number of best path changes between the first plurality of best paths and the second plurality of best paths; and determining the first LMW to be the optimal LMW based on the first BCPR being a lowest BCPR from among a plurality of BCPRs corresponding to the plurality of LMWs.

Type: Grant

Filed: February 19, 2019

Date of Patent: February 16, 2021

Assignee: TENCENT AMERICA LLC

Inventors: Peidong Wang, Jia Cui, Chao Weng, Dong Yu
Large margin training for attention-based end-to-end speech recognition

Patent number: 10861441

Abstract: A method of attention-based end-to-end (E2E) automatic speech recognition (ASR) training, includes performing cross-entropy training of a model, based on one or more input features of a speech signal, performing beam searching of the model of which the cross-entropy training is performed, to generate an n-best hypotheses list of output hypotheses, and determining a one-best hypothesis among the generated n-best hypotheses list. The method further includes determining a character-based gradient and a word-based gradient, based on the model of which the cross-entropy training is performed and a loss function in which a distance between a reference sequence and the determined one-best hypothesis is maximized, and performing backpropagation of the determined character-based gradient and the determined word-based gradient to the model, to update the model.

Type: Grant

Filed: February 14, 2019

Date of Patent: December 8, 2020

Assignee: TENCENT AMERICA LLC

Inventors: Peidong Wang, Jia Cui, Chao Weng, Dong Yu
LARGE MARGIN TRAINING FOR ATTENTION-BASED END-TO-END SPEECH RECOGNITION

Publication number: 20200265831

Abstract: A method of attention-based end-to-end (E2E) automatic speech recognition (ASR) training, includes performing cross-entropy training of a model, based on one or more input features of a speech signal, performing beam searching of the model of which the cross-entropy training is performed, to generate an n-best hypotheses list of output hypotheses, and determining a one-best hypothesis among the generated n-best hypotheses list. The method further includes determining a character-based gradient and a word-based gradient, based on the model of which the cross-entropy training is performed and a loss function in which a distance between a reference sequence and the determined one-best hypothesis is maximized, and performing backpropagation of the determined character-based gradient and the determined word-based gradient to the model, to update the model.

Type: Application

Filed: February 14, 2019

Publication date: August 20, 2020

Applicant: Tencent America LLC

Inventors: Peidong WANG, Jia CUI, Chao WENG, Dong YU
TOKEN-WISE TRAINING FOR ATTENTION BASED END-TO-END SPEECH RECOGNITION

Publication number: 20200265830

Abstract: A method of attention-based end-to-end (A-E2E) automatic speech recognition (ASR) training, includes performing cross-entropy training of a model, based on one or more input features of a speech signal, determining a posterior probability vector at a time of a first wrong token among one or more output tokens of the model of which the cross-entropy training is performed, and determining a loss of the first wrong token at the time, based on the determined posterior probability vector. The method further includes determining a total loss of a training set of the model of which the cross-entropy training is performed, based on the determined loss of the first wrong token, and updating the model of which the cross-entropy training is performed, based on the determined total loss of the training set.

Type: Application

Filed: February 14, 2019

Publication date: August 20, 2020

Applicant: TENCENT AMERICA LLC

Inventors: Peidong WANG, Jia Cui, Chao Weng, Dong Yu
Input-feeding architecture for attention based end-to-end speech recognition

Patent number: 10672382

Abstract: Methods and apparatuses are provided for performing end-to-end speech recognition training performed by at least one processor. The method includes receiving, by the at least one processor, one or more input speech frames, generating, by the at least one processor, a sequence of encoder hidden states by transforming the input speech frames, computing, by the at least one processor, attention weights based on each of the sequence of encoder hidden states and a current decoder hidden state, performing, by the at least one processor, a decoding operation based on a previous embedded label prediction information and a previous attentional hidden state information generated based on the attention weights; and generating a current embedded label prediction information based on a result of the decoding operation and the attention weights.

Type: Grant

Filed: October 15, 2018

Date of Patent: June 2, 2020

Assignee: TENCENT AMERICA LLC

Inventors: Chao Weng, Jia Cui, Guangsen Wang, Jun Wang, Chengzhu Yu, Dan Su, Dong Yu
N- BEST SOFTMAX SMOOTHING FOR MINIMUM BAYES RISK TRAINING OF ATTENTION BASED SEQUENCE-TO-SEQUENCE MODELS

Publication number: 20200151623

Abstract: A method and apparatus are provided that analyzing sequence-to-sequence data, such as sequence-to-sequence speech data or sequence-to-sequence machine translation data for example, by minimum Bayes risk (MBR) training a sequence-to-sequence model and within introduction of applications of softmax smoothing to an N-best generation of the MBR training of the sequence-to-sequence model.

Type: Application

Filed: November 14, 2018

Publication date: May 14, 2020

Applicant: TENCENT America LLC

Inventors: Chao WENG, Jia CUI, Guangsen WANG, Jun WANG, Chengzhu YU, Dan SU, Dong YU
MULTI-TASK TRAINING ARCHITECTURE AND STRATEGY FOR ATTENTION-BASED SPEECH RECOGNITION SYSTEM

Publication number: 20200135174

Abstract: Methods and apparatuses are provided for performing sequence to sequence (Seq2Seq) speech recognition training performed by at least one processor. The method includes acquiring a training set comprising a plurality of pairs of input data and target data corresponding to the input data, encoding the input data into a sequence of hidden states, performing a connectionist temporal classification (CTC) model training based on the sequence of hidden states, performing an attention model training based on the sequence of hidden states, and decoding the sequence of hidden states to generate target labels by independently performing the CTC model training and the attention model training.

Type: Application

Filed: October 24, 2018

Publication date: April 30, 2020

Applicant: TENCENT AMERICA LLC

Inventors: Jia CUI, Chao WENG, Guangsen WANG, Jun WANG, Chengzhu YU, Dan SU, Dong YU
INPUT-FEEDING ARCHITECTURE FOR ATTENTION BASED END-TO-END SPEECH RECOGNITION

Publication number: 20200118547

Abstract: Methods and apparatuses are provided for performing end-to-end speech recognition training performed by at least one processor. The method includes receiving, by the at least one processor, one or more input speech frames, generating, by the at least one processor, a sequence of encoder hidden states by transforming the input speech frames, computing, by the at least one processor, attention weights based on each of the sequence of encoder hidden states and a current decoder hidden state, performing, by the at least one processor, a decoding operation based on a previous embedded label prediction information and a previous attentional hidden state information generated based on the attention weights; and generating a current embedded label prediction information based on a result of the decoding operation and the attention weights.

Type: Application

Filed: October 15, 2018

Publication date: April 16, 2020

Applicant: TENCENT AMERICA LLC

Inventors: Chao WENG, Jia Cui, Guangsen WANG, Jun Wang, Chengzhu Yu, Dan Su, Dong Yu

1 2 next