Patents by Inventor Jia CUI
Jia CUI has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 11972754Abstract: Methods and apparatuses are provided for performing sequence to sequence (Seq2Seq) speech recognition training performed by at least one processor. The method includes acquiring a training set comprising a plurality of pairs of input data and target data corresponding to the input data, encoding the input data into a sequence of hidden states, performing a connectionist temporal classification (CTC) model training based on the sequence of hidden states, performing an attention model training based on the sequence of hidden states, and decoding the sequence of hidden states to generate target labels by independently performing the CTC model training and the attention model training.Type: GrantFiled: December 22, 2021Date of Patent: April 30, 2024Assignee: TENCENT AMERICA LLCInventors: Jia Cui, Chao Weng, Guangsen Wang, Jun Wang, Chengzhu Yu, Dan Su, Dong Yu
-
Publication number: 20240086637Abstract: Methods and devices to efficiently normalize text by processing inputted text based on a text normalization model that includes processing the input text in a first stage including a statistical model as a first output, processing the first output in a second stage including a rule based model as a normalized text, and outputting the normalized text.Type: ApplicationFiled: September 8, 2022Publication date: March 14, 2024Applicant: Tencent America LLCInventors: Jia Cui, Dong Yu
-
Publication number: 20240054989Abstract: Systems and methods for training a model to perform end-to-end character-to-phoneme (C2P) conversion include: selecting a plurality of unlabeled sentences from a first data source, selecting a plurality of labeled sentences from a second data source, preprocessing a combined corpus of the selected unlabeled and labeled sentences to extract a plurality of linguistic features, generating mixed training data by automatically labeling tokens in the preprocessed corpus based on the plurality of extracted linguistic features, and training a pre-trained model, using the mixed training data, to perform end-to-end C2P conversion.Type: ApplicationFiled: August 15, 2022Publication date: February 15, 2024Applicant: TENCENT AMERICA LLCInventor: Jia CUI
-
Patent number: 11803618Abstract: A method and apparatus are provided that analyzing sequence-to-sequence data, such as sequence-to-sequence speech data or sequence-to-sequence machine translation data for example, by minimum Bayes risk (MBR) training a sequence-to-sequence model and within introduction of applications of softmax smoothing to an N-best generation of the MBR training of the sequence-to-sequence model.Type: GrantFiled: November 17, 2022Date of Patent: October 31, 2023Assignee: TENCENT AMERICA LLCInventors: Chao Weng, Jia Cui, Guangsen Wang, Jun Wang, Chengzhu Yu, Dan Su, Dong Yu
-
Patent number: 11636848Abstract: A method of attention-based end-to-end (A-E2E) automatic speech recognition (ASR) training, includes performing cross-entropy training of a model, based on one or more input features of a speech signal, determining a posterior probability vector at a time of a first wrong token among one or more output tokens of the model of which the cross-entropy training is performed, and determining a loss of the first wrong token at the time, based on the determined posterior probability vector. The method further includes determining a total loss of a training set of the model of which the cross-entropy training is performed, based on the determined loss of the first wrong token, and updating the model of which the cross-entropy training is performed, based on the determined total loss of the training set.Type: GrantFiled: May 11, 2021Date of Patent: April 25, 2023Assignee: TENCENT AMERICA LLCInventors: Peidong Wang, Jia Cui, Chao Weng, Dong Yu
-
Publication number: 20230092440Abstract: A method and apparatus are provided that analyzing sequence-to-sequence data, such as sequence-to-sequence speech data or sequence-to-sequence machine translation data for example, by minimum Bayes risk (MBR) training a sequence-to-sequence model and within introduction of applications of softmax smoothing to an N-best generation of the MBR training of the sequence-to-sequence model.Type: ApplicationFiled: November 17, 2022Publication date: March 23, 2023Applicant: TENCENT AMERICA LLCInventors: Chao WENG, Jia CUI, Guangsen WANG, Jun WANG, Chengzhu YU, Dan SU, Dong YU
-
Patent number: 11551136Abstract: A method and apparatus are provided that analyzing sequence-to-sequence data, such as sequence-to-sequence speech data or sequence-to-sequence machine translation data for example, by minimum Bayes risk (MBR) training a sequence-to-sequence model and within introduction of applications of softmax smoothing to an N-best generation of the MBR training of the sequence-to-sequence model.Type: GrantFiled: November 14, 2018Date of Patent: January 10, 2023Assignee: TENCENT AMERICA LLCInventors: Chao Weng, Jia Cui, Guangsen Wang, Jun Wang, Chengzhu Yu, Dan Su, Dong Yu
-
Publication number: 20220115005Abstract: Methods and apparatuses are provided for performing sequence to sequence (Seq2Seq) speech recognition training performed by at least one processor. The method includes acquiring a training set comprising a plurality of pairs of input data and target data corresponding to the input data, encoding the input data into a sequence of hidden states, performing a connectionist temporal classification (CTC) model training based on the sequence of hidden states, performing an attention model training based on the sequence of hidden states, and decoding the sequence of hidden states to generate target labels by independently performing the CTC model training and the attention model training.Type: ApplicationFiled: December 22, 2021Publication date: April 14, 2022Applicant: TENCENT AMERICA LLCInventors: Jia CUI, Chao WENG, Guangsen WANG, Jun WANG, Chengzhu YU, Dan SU, Dong YU
-
Patent number: 11257481Abstract: Methods and apparatuses are provided for performing sequence to sequence (Seq2Seq) speech recognition training performed by at least one processor. The method includes acquiring a training set comprising a plurality of pairs of input data and target data corresponding to the input data, encoding the input data into a sequence of hidden states, performing a connectionist temporal classification (CTC) model training based on the sequence of hidden states, performing an attention model training based on the sequence of hidden states, and decoding the sequence of hidden states to generate target labels by independently performing the CTC model training and the attention model training.Type: GrantFiled: October 24, 2018Date of Patent: February 22, 2022Assignee: TENCENT AMERICA LLCInventors: Jia Cui, Chao Weng, Guangsen Wang, Jun Wang, Chengzhu Yu, Dan Su, Dong Yu
-
Publication number: 20210264901Abstract: A method of attention-based end-to-end (A-E2E) automatic speech recognition (ASR) training, includes performing cross-entropy training of a model, based on one or more input features of a speech signal, determining a posterior probability vector at a time of a first wrong token among one or more output tokens of the model of which the cross-entropy training is performed, and determining a loss of the first wrong token at the time, based on the determined posterior probability vector. The method further includes determining a total loss of a training set of the model of which the cross-entropy training is performed, based on the determined loss of the first wrong token, and updating the model of which the cross-entropy training is performed, based on the determined total loss of the training set.Type: ApplicationFiled: May 11, 2021Publication date: August 26, 2021Applicant: TENCENT AMERICA LLCInventors: Peidong WANG, Jia CUI, Chao WENG, Dong YU
-
Patent number: 11037547Abstract: A method of attention-based end-to-end (A-E2E) automatic speech recognition (ASR) training, includes performing cross-entropy training of a model, based on one or more input features of a speech signal, determining a posterior probability vector at a time of a first wrong token among one or more output tokens of the model of which the cross-entropy training is performed, and determining a loss of the first wrong token at the time, based on the determined posterior probability vector. The method further includes determining a total loss of a training set of the model of which the cross-entropy training is performed, based on the determined loss of the first wrong token, and updating the model of which the cross-entropy training is performed, based on the determined total loss of the training set.Type: GrantFiled: February 14, 2019Date of Patent: June 15, 2021Assignee: TENCENT AMERICA LLCInventors: Peidong Wang, Jia Cui, Chao Weng, Dong Yu
-
Patent number: 11004443Abstract: Methods and apparatuses are provided for performing acoustic to word (A2W) speech recognition training performed by at least one processor. The method includes initializing, by the at least one processor, one or more first layers of a neural network with phone based Connectionist Temporal Classification (CTC), initializing, by the at least one processor, one or more second layers of the neural network with grapheme based CTC, acquiring, by the at least one processor, training data and performing, by the at least one processor, A2W speech recognition training based the initialized one or more first layers and one or more second layers of the neural network using the training data.Type: GrantFiled: August 30, 2018Date of Patent: May 11, 2021Assignee: TENCENT AMERICA LLCInventors: Chengzhu Yu, Chao Weng, Jia Cui, Dong Yu
-
Patent number: 10923117Abstract: A method for selecting an optimal language model weight (LMW) used to perform automatic speech recognition, including decoding test audio into a lattice using a language model; analyzing the lattice using a first LMW of a plurality of LMWs to determine a first plurality of best paths; analyzing the lattice using a second LMW of the plurality of LMWs to determine a second plurality of best paths; determining a first best path change rate (BCPR) corresponding to the first LMW based on a number of best path changes between the first plurality of best paths and the second plurality of best paths; and determining the first LMW to be the optimal LMW based on the first BCPR being a lowest BCPR from among a plurality of BCPRs corresponding to the plurality of LMWs.Type: GrantFiled: February 19, 2019Date of Patent: February 16, 2021Assignee: TENCENT AMERICA LLCInventors: Peidong Wang, Jia Cui, Chao Weng, Dong Yu
-
Patent number: 10861441Abstract: A method of attention-based end-to-end (E2E) automatic speech recognition (ASR) training, includes performing cross-entropy training of a model, based on one or more input features of a speech signal, performing beam searching of the model of which the cross-entropy training is performed, to generate an n-best hypotheses list of output hypotheses, and determining a one-best hypothesis among the generated n-best hypotheses list. The method further includes determining a character-based gradient and a word-based gradient, based on the model of which the cross-entropy training is performed and a loss function in which a distance between a reference sequence and the determined one-best hypothesis is maximized, and performing backpropagation of the determined character-based gradient and the determined word-based gradient to the model, to update the model.Type: GrantFiled: February 14, 2019Date of Patent: December 8, 2020Assignee: TENCENT AMERICA LLCInventors: Peidong Wang, Jia Cui, Chao Weng, Dong Yu
-
Publication number: 20200265831Abstract: A method of attention-based end-to-end (E2E) automatic speech recognition (ASR) training, includes performing cross-entropy training of a model, based on one or more input features of a speech signal, performing beam searching of the model of which the cross-entropy training is performed, to generate an n-best hypotheses list of output hypotheses, and determining a one-best hypothesis among the generated n-best hypotheses list. The method further includes determining a character-based gradient and a word-based gradient, based on the model of which the cross-entropy training is performed and a loss function in which a distance between a reference sequence and the determined one-best hypothesis is maximized, and performing backpropagation of the determined character-based gradient and the determined word-based gradient to the model, to update the model.Type: ApplicationFiled: February 14, 2019Publication date: August 20, 2020Applicant: Tencent America LLCInventors: Peidong WANG, Jia CUI, Chao WENG, Dong YU
-
Publication number: 20200265830Abstract: A method of attention-based end-to-end (A-E2E) automatic speech recognition (ASR) training, includes performing cross-entropy training of a model, based on one or more input features of a speech signal, determining a posterior probability vector at a time of a first wrong token among one or more output tokens of the model of which the cross-entropy training is performed, and determining a loss of the first wrong token at the time, based on the determined posterior probability vector. The method further includes determining a total loss of a training set of the model of which the cross-entropy training is performed, based on the determined loss of the first wrong token, and updating the model of which the cross-entropy training is performed, based on the determined total loss of the training set.Type: ApplicationFiled: February 14, 2019Publication date: August 20, 2020Applicant: TENCENT AMERICA LLCInventors: Peidong WANG, Jia Cui, Chao Weng, Dong Yu
-
Patent number: 10672382Abstract: Methods and apparatuses are provided for performing end-to-end speech recognition training performed by at least one processor. The method includes receiving, by the at least one processor, one or more input speech frames, generating, by the at least one processor, a sequence of encoder hidden states by transforming the input speech frames, computing, by the at least one processor, attention weights based on each of the sequence of encoder hidden states and a current decoder hidden state, performing, by the at least one processor, a decoding operation based on a previous embedded label prediction information and a previous attentional hidden state information generated based on the attention weights; and generating a current embedded label prediction information based on a result of the decoding operation and the attention weights.Type: GrantFiled: October 15, 2018Date of Patent: June 2, 2020Assignee: TENCENT AMERICA LLCInventors: Chao Weng, Jia Cui, Guangsen Wang, Jun Wang, Chengzhu Yu, Dan Su, Dong Yu
-
Publication number: 20200151623Abstract: A method and apparatus are provided that analyzing sequence-to-sequence data, such as sequence-to-sequence speech data or sequence-to-sequence machine translation data for example, by minimum Bayes risk (MBR) training a sequence-to-sequence model and within introduction of applications of softmax smoothing to an N-best generation of the MBR training of the sequence-to-sequence model.Type: ApplicationFiled: November 14, 2018Publication date: May 14, 2020Applicant: TENCENT America LLCInventors: Chao WENG, Jia CUI, Guangsen WANG, Jun WANG, Chengzhu YU, Dan SU, Dong YU
-
Publication number: 20200135174Abstract: Methods and apparatuses are provided for performing sequence to sequence (Seq2Seq) speech recognition training performed by at least one processor. The method includes acquiring a training set comprising a plurality of pairs of input data and target data corresponding to the input data, encoding the input data into a sequence of hidden states, performing a connectionist temporal classification (CTC) model training based on the sequence of hidden states, performing an attention model training based on the sequence of hidden states, and decoding the sequence of hidden states to generate target labels by independently performing the CTC model training and the attention model training.Type: ApplicationFiled: October 24, 2018Publication date: April 30, 2020Applicant: TENCENT AMERICA LLCInventors: Jia CUI, Chao WENG, Guangsen WANG, Jun WANG, Chengzhu YU, Dan SU, Dong YU
-
Publication number: 20200118547Abstract: Methods and apparatuses are provided for performing end-to-end speech recognition training performed by at least one processor. The method includes receiving, by the at least one processor, one or more input speech frames, generating, by the at least one processor, a sequence of encoder hidden states by transforming the input speech frames, computing, by the at least one processor, attention weights based on each of the sequence of encoder hidden states and a current decoder hidden state, performing, by the at least one processor, a decoding operation based on a previous embedded label prediction information and a previous attentional hidden state information generated based on the attention weights; and generating a current embedded label prediction information based on a result of the decoding operation and the attention weights.Type: ApplicationFiled: October 15, 2018Publication date: April 16, 2020Applicant: TENCENT AMERICA LLCInventors: Chao WENG, Jia Cui, Guangsen WANG, Jun Wang, Chengzhu Yu, Dan Su, Dong Yu