Patents by Inventor George Andrei Saon
George Andrei Saon has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 11942078Abstract: A computer-implemented method is provided for improving accuracy recognition of digital speech. The method includes receiving the digital speech. The method further includes splitting the digital speech into overlapping chunks. The method also includes computing a bidirectional encoder embedding of each of the overlapping chunks to obtain bidirectional encoder embeddings. The method additionally includes combining the bidirectional encoder embeddings. The method further includes interpreting, by a speech recognition system, the digital speech using the combined bidirectional encoder embeddings.Type: GrantFiled: February 26, 2021Date of Patent: March 26, 2024Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventor: George Andrei Saon
-
Patent number: 11908458Abstract: A computer-implemented method for customizing a recurrent neural network transducer (RNN-T) is provided. The computer implemented method includes synthesizing first domain audio data from first domain text data, and feeding the synthesized first domain audio data into a trained encoder of the recurrent neural network transducer (RNN-T) having an initial condition, wherein the encoder is updated using the synthesized first domain audio data and the first domain text data. The computer implemented method further includes synthesizing second domain audio data from second domain text data, and feeding the synthesized second domain audio data into the updated encoder of the recurrent neural network transducer (RNN-T), wherein the prediction network is updated using the synthesized second domain audio data and the second domain text data. The computer implemented method further includes restoring the updated encoder to the initial condition.Type: GrantFiled: December 29, 2020Date of Patent: February 20, 2024Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Gakuto Kurata, George Andrei Saon, Brian E. D. Kingsbury
-
Patent number: 11908454Abstract: A processor-implemented method trains an automatic speech recognition system using speech data and text data. A computer device receives speech data, and generates a spectrogram based on the speech data. The computing device receives text data associated with an entire corpus of text data, and generates a textogram based upon the text data. The computing device trains an automatic speech recognition system using the spectrogram and the textogram.Type: GrantFiled: December 1, 2021Date of Patent: February 20, 2024Assignee: International Business Machines CorporationInventors: Samuel Thomas, Hong-Kwang Kuo, Brian E. D. Kingsbury, George Andrei Saon, Gakuto Kurata
-
Publication number: 20230410797Abstract: A computer-implemented method is provided for model training. The method includes training a second end-to-end neural speech recognition model that has a bidirectional encoder to output same symbols from an output probability lattice of the second end-to-end neural speech recognition model as from an output probability lattice of a trained first end-to-end neural speech recognition model having a unidirectional encoder. The method also includes building a third end-to-end neural speech recognition model that has a unidirectional encoder by training the third end-to-end neural speech recognition model as a student by using the trained second end-to-end neural speech recognition model as a teacher in a knowledge distillation method.Type: ApplicationFiled: September 1, 2023Publication date: December 21, 2023Inventors: Gakuto Kurata, George Andrei Saon
-
Patent number: 11783811Abstract: A computer-implemented method is provided for model training. The method includes training a second end-to-end neural speech recognition model that has a bidirectional encoder to output same symbols from an output probability lattice of the second end-to-end neural speech recognition model as from an output probability lattice of a trained first end-to-end neural speech recognition model having a unidirectional encoder. The method also includes building a third end-to-end neural speech recognition model that has a unidirectional encoder by training the third end-to-end neural speech recognition model as a student by using the trained second end-to-end neural speech recognition model as a teacher in a knowledge distillation method.Type: GrantFiled: September 24, 2020Date of Patent: October 10, 2023Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Gakuto Kurata, George Andrei Saon
-
Patent number: 11741946Abstract: Using an encoder neural network model, an encoder vector is computed, the encoder vector comprising a vector representation of a current portion of input data in an input sequence. Using a prediction neural network model, a prediction vector is predicted, the prediction performed using a previous prediction vector and a previous output symbol corresponding to a previous portion of input data in the input sequence. Using a joint neural network model, a joint vector corresponding to the encoder vector and the prediction vector is computed, the joint vector multiplicatively combining each element of the encoder vector with a corresponding element of the prediction vector. Using a softmax function, the joint vector is converted to a probability distribution comprising a probability that a current output symbol corresponds to the current portion of input data in the input sequence.Type: GrantFiled: August 21, 2020Date of Patent: August 29, 2023Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: George Andrei Saon, Daniel Bolanos
-
Publication number: 20230186903Abstract: Mechanisms are provided for performing machine learning training of a computer model. A perturbation generator generates a modified training data comprising perturbations injected into original training data, where the perturbations cause a data corruption of the original training data. The modified training data is input into a prediction network of the computer model and processing the modified training data through the prediction network to generate a prediction output. Machine learning training is executed of the prediction network based on the prediction output and the original training data to generate a trained prediction network of a trained computer model. The trained computer model is deployed to an artificial intelligence computing system for performance of an inference operation.Type: ApplicationFiled: December 13, 2021Publication date: June 15, 2023Inventors: Xiaodong Cui, Brian E. D. Kingsbury, George Andrei Saon, David Haws, Zoltan Tueske
-
Publication number: 20230169954Abstract: A processor-implemented method trains an automatic speech recognition system using speech data and text data. A computer device receives speech data, and generates a spectrogram based on the speech data. The computing device receives text data associated with an entire corpus of text data, and generates a textogram based upon the text data. The computing device trains an automatic speech recognition system using the spectrogram and the textogram.Type: ApplicationFiled: December 1, 2021Publication date: June 1, 2023Inventors: SAMUEL THOMAS, HONG-KWANG KUO, BRIAN E.D. KINGSBURY, GEORGE ANDREI SAON, KAGUTO KURATA
-
Publication number: 20230081306Abstract: Training data can be received, which can include pairs of speech and meaning representation associated with the speech as ground truth data. The meaning representation includes at least semantic entities associated with the speech, where the spoken order of the semantic entities is unknown. The semantic entities of the meaning representation in the training data can be reordered into spoken order of the associated speech using an alignment technique. A spoken language understanding machine learning model can be trained using the pairs of speech and meaning representation having the reordered semantic entities. The meaning representation, e.g., semantic entities, in the received training data can be perturbed to create random order sequence variations of the semantic entities associated with speech. Perturbed meaning representation with associated speech can augment the training data.Type: ApplicationFiled: August 27, 2021Publication date: March 16, 2023Inventors: Hong-Kwang Kuo, Zoltan Tueske, Samuel Thomas, Brian E. D. Kingsbury, George Andrei Saon
-
Publication number: 20230056680Abstract: Audio signals representing a current utterance in a conversation and a dialog history including at least information associated with past utterances corresponding to the current utterance in the conversation can be received. The dialog history can be encoded into an embedding. A spoken language understanding neural network model can be trained to perform a spoken language understanding task based on input features including at least speech features associated with the received audio signals and the embedding. An encoder can also be trained to encode a given dialog history into an embedding. The spoken language understanding task can include predicting a dialog action of an utterance. The spoken language understanding task can include predicting a dialog intent or overall topic of the conversation.Type: ApplicationFiled: August 18, 2021Publication date: February 23, 2023Inventors: Samuel Thomas, Jatin Ganhotra, Hong-Kwang Kuo, Sachindra Joshi, George Andrei Saon, Zoltan Tueske, Brian E. D. Kingsbury
-
Publication number: 20220319494Abstract: An approach to training an end-to-end spoken language understanding model may be provided. A pre-trained general automatic speech recognition model may be adapted to a domain specific spoken language understanding model. The pre-trained general automatic speech recognition model may be a recurrent neural network transducer model. The adaptation may provide transcription data annotated with spoken language understanding labels. Adaptation may include audio data may also be provided for in addition to verbatim transcripts annotated with spoken language understanding labels. The spoken language understanding labels may be entity and/or intent based with values associated with each label.Type: ApplicationFiled: March 31, 2021Publication date: October 6, 2022Inventors: Samuel Thomas, Hong-Kwang Kuo, George Andrei Saon, Zoltan Tueske, Brian E. D. Kingsbury
-
Publication number: 20220277734Abstract: A computer-implemented method is provided for improving accuracy recognition of digital speech. The method includes receiving the digital speech. The method further includes splitting the digital speech into overlapping chunks. The method also includes computing a bidirectional encoder embedding of each of the overlapping chunks to obtain bidirectional encoder embeddings. The method additionally includes combining the bidirectional encoder embeddings. The method further includes interpreting, by a speech recognition system, the digital speech using the combined bidirectional encoder embeddings.Type: ApplicationFiled: February 26, 2021Publication date: September 1, 2022Inventor: George Andrei Saon
-
Publication number: 20220208179Abstract: A computer-implemented method for customizing a recurrent neural network transducer (RNN-T) is provided. The computer implemented method includes synthesizing first domain audio data from first domain text data, and feeding the synthesized first domain audio data into a trained encoder of the recurrent neural network transducer (RNN-T) having an initial condition, wherein the encoder is updated using the synthesized first domain audio data and the first domain text data. The computer implemented method further includes synthesizing second domain audio data from second domain text data, and feeding the synthesized second domain audio data into the updated encoder of the recurrent neural network transducer (RNN-T), wherein the prediction network is updated using the synthesized second domain audio data and the second domain text data. The computer implemented method further includes restoring the updated encoder to the initial condition.Type: ApplicationFiled: December 29, 2020Publication date: June 30, 2022Inventors: Gakuto Kurata, George Andrei Saon, Brian E. D. Kingsbury
-
Publication number: 20220093083Abstract: A computer-implemented method is provided for model training. The method includes training a second end-to-end neural speech recognition model that has a bidirectional encoder to output same symbols from an output probability lattice of the second end-to-end neural speech recognition model as from an output probability lattice of a trained first end-to-end neural speech recognition model having a unidirectional encoder. The method also includes building a third end-to-end neural speech recognition model that has a unidirectional encoder by training the third end-to-end neural speech recognition model as a student by using the trained second end-to-end neural speech recognition model as a teacher in a knowledge distillation method.Type: ApplicationFiled: September 24, 2020Publication date: March 24, 2022Inventors: Gakuto Kurata, George Andrei Saon
-
Publication number: 20220059082Abstract: Using an encoder neural network model, an encoder vector is computed, the encoder vector comprising a vector representation of a current portion of input data in an input sequence. Using a prediction neural network model, a prediction vector is predicted, the prediction performed using a previous prediction vector and a previous output symbol corresponding to a previous portion of input data in the input sequence. Using a joint neural network model, a joint vector corresponding to the encoder vector and the prediction vector is computed, the joint vector multiplicatively combining each element of the encoder vector with a corresponding element of the prediction vector. Using a softmax function, the joint vector is converted to a probability distribution comprising a probability that a current output symbol corresponds to the current portion of input data in the input sequence.Type: ApplicationFiled: August 21, 2020Publication date: February 24, 2022Applicant: International Business Machines CorporationInventors: George Andrei Saon, Daniel Bolanos
-
Patent number: 11158303Abstract: In an approach to soft-forgetting training, one or more computer processors train a first model utilizing one or more training batches wherein each training batch of the one or more training batches comprises one or more blocks of information. The one or more computer processors, responsive to a completion of the training of the first model, initiate a training of a second model utilizing the one or more training batches. The one or more computer processors jitter a random block size for each block of information for each of the one or more training batches for the second model. The one or more computer processors unroll the second model over one or more non-overlapping contiguous jittered blocks of information. The one or more computer processors, responsive to the unrolling of the second model, reduce overfitting for the second model by applying twin regularization.Type: GrantFiled: August 27, 2019Date of Patent: October 26, 2021Assignee: International Business Machines CorporationInventors: Kartik Audhkhasi, George Andrei Saon, Zoltan Tueske, Brian E. D. Kingsbury, Michael Alan Picheny
-
Publication number: 20210065680Abstract: In an approach to soft-forgetting training, one or more computer processors train a first model utilizing one or more training batches wherein each training batch of the one or more training batches comprises one or more blocks of information. The one or more computer processors, responsive to a completion of the training of the first model, initiate a training of a second model utilizing the one or more training batches. The one or more computer processors jitter a random block size for each block of information for each of the one or more training batches for the second model. The one or more computer processors unroll the second model over one or more non-overlapping contiguous jittered blocks of information. The one or more computer processors, responsive to the unrolling of the second model, reduce overfitting for the second model by applying twin regularization.Type: ApplicationFiled: August 27, 2019Publication date: March 4, 2021Inventors: Kartik Audhkhasi, George Andrei Saon, Zoltan Tueske, Brian E. D. Kingsbury, Michael Alan Picheny
-
Patent number: 6609093Abstract: The present invention provides a new approach to heteroscedastic linear discriminant analysis (HDA) by defining an objective function which maximizes the class discrimination in the projected subspace while ignoring the rejected dimensions. Moreover, we present a link between discrimination and the likelihood of the projected samples and show that HDA can be viewed as a constrained maximum likelihood (ML) projection for a full covariance gaussian model, the constraint being given by the maximization of the projected between-class scatter volume. The present invention also provides that, under diagonal covariance gaussian modeling constraints, applying a diagonalizing linear transformation (e.g., MLLT—maximum likelihood linear transformation) to the HDA space results in an increased classification accuracy.Type: GrantFiled: June 1, 2000Date of Patent: August 19, 2003Assignee: International Business Machines CorporationInventors: Ramesh Ambat Gopinath, Mukund Padmanabhan, George Andrei Saon
-
Patent number: 6385579Abstract: A method of forming an augmented textual training corpus with compound words for use with an associated with a speech recognition system includes computing a measure for a consecutive word pair in the training corpus. The measure is then compared to a threshold value. The consecutive word pair is replaced in the training corpus with a corresponding compound word depending on the result of the comparison between the measure and the threshold value. One or more measures may be employed. A first measure is an average of a direct bigram probability value and a reverse bigram probability value. A second measure is based on mutual information between the words in the pair. A third measure is based on a comparison of the number of times a co-articulated baseform for the pair is preferred over a concatenation of non-co-articulated individual baseforms of the words forming the pair.Type: GrantFiled: April 29, 1999Date of Patent: May 7, 2002Assignee: International Business Machines CorporationInventors: Mukund Padmanabhan, George Andrei Saon