Patents by Inventor George Andrei Saon

George Andrei Saon has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Accuracy of streaming RNN transducer

Patent number: 12288551

Abstract: A computer-implemented method is provided for model training. The method includes training a second end-to-end neural speech recognition model that has a bidirectional encoder to output same symbols from an output probability lattice of the second end-to-end neural speech recognition model as from an output probability lattice of a trained first end-to-end neural speech recognition model having a unidirectional encoder. The method also includes building a third end-to-end neural speech recognition model that has a unidirectional encoder by training the third end-to-end neural speech recognition model as a student by using the trained second end-to-end neural speech recognition model as a teacher in a knowledge distillation method.

Type: Grant

Filed: September 1, 2023

Date of Patent: April 29, 2025

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Gakuto Kurata, George Andrei Saon
NEURONAL ACTIVITY MODULATION OF ARTIFICIAL NEURAL NETWORKS

Publication number: 20250005348

Abstract: Systems and techniques that facilitate neuronal activity modulation of artificial neural networks are provided. In various embodiments, an artificial neural network can comprise a set of base neuron populations that collectively generate, during an inferencing phase or a training phase of the artificial neural network, an inferencing task result based on a data candidate. In various aspects, the artificial neural network can comprise a control neuron population that is independent of the set of base neuron populations. In various instances, the control neuron population can modulate, during the inferencing phase or the training phase, neuronal activity of at least one base neuron population of the set of base neuron populations. In various cases, the control neuron population can modulate the neuronal activity of the at least one base neuron population by scaling one or more operands internally produced by the at least one base neuron population.

Type: Application

Filed: June 27, 2023

Publication date: January 2, 2025

Inventors: Thomas Ortner, Ayush Garg, Stanislaw Andrzej Wozniak, George Andrei Saon, Angeliki Pantazi
Reducing exposure bias in machine learning training of sequence-to-sequence transducers

Patent number: 12148419

Abstract: Mechanisms are provided for performing machine learning training of a computer model. A perturbation generator generates a modified training data comprising perturbations injected into original training data, where the perturbations cause a data corruption of the original training data. The modified training data is input into a prediction network of the computer model and processing the modified training data through the prediction network to generate a prediction output. Machine learning training is executed of the prediction network based on the prediction output and the original training data to generate a trained prediction network of a trained computer model. The trained computer model is deployed to an artificial intelligence computing system for performance of an inference operation.

Type: Grant

Filed: December 13, 2021

Date of Patent: November 19, 2024

Assignee: International Business Machines Corporation

Inventors: Xiaodong Cui, Brian E. D. Kingsbury, George Andrei Saon, David Haws, Zoltan Tueske
TEXTUAL KNOWLEDGE TRANSFER FOR IMPROVED SPEECH RECOGNITION AND UNDERSTANDING

Publication number: 20240371361

Abstract: Systems, computer-implemented methods, and computer program products to facilitate fine-grained textual knowledge transfer to improve speech recognition and understanding are provided. According to an embodiment, a system can comprise a processor that executes components stored in memory. The computer executable components comprise deriving component that can derive one or more speech-based embeddings from an utterance via a speech encoder. The computer executable components can comprise a cross-attention component that can align, at a token level, one or more large language model (LLM) based sentence embeddings with the one or more speech-based embeddings. The computer executable components can comprise a loss component that can combine an alignment loss and an automatic speech recognition (ASR) loss.

Type: Application

Filed: May 2, 2023

Publication date: November 7, 2024

Inventors: Samuel Thomas, Vishal Sunder, Hong-Kwang Kuo, Brian E. D. Kingsbury, Eric Fosler-Lussier, George Andrei Saon
Integrating dialog history into end-to-end spoken language understanding systems

Patent number: 12136414

Abstract: Audio signals representing a current utterance in a conversation and a dialog history including at least information associated with past utterances corresponding to the current utterance in the conversation can be received. The dialog history can be encoded into an embedding. A spoken language understanding neural network model can be trained to perform a spoken language understanding task based on input features including at least speech features associated with the received audio signals and the embedding. An encoder can also be trained to encode a given dialog history into an embedding. The spoken language understanding task can include predicting a dialog action of an utterance. The spoken language understanding task can include predicting a dialog intent or overall topic of the conversation.

Type: Grant

Filed: August 18, 2021

Date of Patent: November 5, 2024

Assignee: International Business Machines Corporation

Inventors: Samuel Thomas, Jatin Ganhotra, Hong-Kwang Kuo, Sachindra Joshi, George Andrei Saon, Zoltan Tueske, Brian E. D. Kingsbury
MULTI-SPEAKER DATA AUGMENTATION FOR IMPROVED END-TO-END AUTOMATIC SPEECH RECOGNITION

Publication number: 20240331684

Abstract: Features of two or more single speaker utterances are concatenated together and corresponding labels of the two or more single speaker utterances are concatenated together. Single speaker acoustic embeddings for each of the single speaker utterances of the concatenated single speaker utterances are generated using a single speaker teacher encoder network. An enhanced model is trained on the concatenated single speaker utterances using a classification loss LCLASS and a representation similarity loss LREP, the representation similarity loss LREP defined to influence an embedding derived from the concatenated single speaker utterances, the influence being based on the single speaker acoustic embeddings derived from the single speaker teacher encoder network.

Type: Application

Filed: March 31, 2023

Publication date: October 3, 2024

Inventors: Samuel Thomas, Hong-Kwang Kuo, George Andrei Saon, Brian E. D. Kingsbury
INSERTION ERROR REDUCTION WITH CONFIDENCE SCORE-BASED WORD FILTERING

Publication number: 20240331687

Abstract: A word-level confidence score is calculated using a computerized automatic speech recognition system by computing an average of confidence levels for each character in a word and a trailing space character delineating an end of the word and the word is managed using the computerized automatic speech recognition system and using a threshold process based on the calculated word-level confidence score.

Type: Application

Filed: March 30, 2023

Publication date: October 3, 2024

Inventors: Takashi Fukuda, George Andrei Saon
Training end-to-end spoken language understanding systems with unordered entities

Patent number: 12046236

Abstract: Training data can be received, which can include pairs of speech and meaning representation associated with the speech as ground truth data. The meaning representation includes at least semantic entities associated with the speech, where the spoken order of the semantic entities is unknown. The semantic entities of the meaning representation in the training data can be reordered into spoken order of the associated speech using an alignment technique. A spoken language understanding machine learning model can be trained using the pairs of speech and meaning representation having the reordered semantic entities. The meaning representation, e.g., semantic entities, in the received training data can be perturbed to create random order sequence variations of the semantic entities associated with speech. Perturbed meaning representation with associated speech can augment the training data.

Type: Grant

Filed: August 27, 2021

Date of Patent: July 23, 2024

Assignee: International Business Machines Corporation

Inventors: Hong-Kwang Kuo, Zoltan Tueske, Samuel Thomas, Brian E. D. Kingsbury, George Andrei Saon
LENGTH PERTURBATION TECHNIQUES FOR IMPROVING GENERALIZATION OF DEEP NEURAL NETWORK ACOUSTIC MODELS

Publication number: 20240170005

Abstract: One or more systems, devices, computer program products and/or computer-implemented methods of use provided herein relate to length perturbation techniques for improving generalization of DNN acoustic models. A computer-implemented system can comprise a memory that can store computer executable components. The computer-implemented system can further comprise a processor that can execute the computer executable components stored in the memory, wherein the computer executable components can comprise a frame skipping component that can remove one or more frames from an acoustic utterance via frame skipping. The computer executable components can further comprise a frame insertion component that can insert one or more replacement frames into the acoustic utterance via frame insertion to replace the one or more frames with the one or more replacement frames to enable length perturbation of the acoustic utterance.

Type: Application

Filed: November 22, 2022

Publication date: May 23, 2024

Inventors: Xiaodong Cui, Brian E. D. Kingsbury, George Andrei Saon
LABEL SMOOTHING TECHNIQUE FOR IMPROVING GENERALIZATION OF DEEP NEURAL NETWORK ACOUSTIC MODELS

Publication number: 20240169197

Abstract: One or more systems, devices, computer program products and/or computer-implemented methods of use provided herein relate to n-best based label smoothing techniques for improving generalization of DNN acoustic models. A computer-implemented system can comprise a memory that can store computer executable components. The computer-implemented system can further comprise a processor that can execute the computer executable components stored in the memory, wherein the computer executable components can comprise a generation component that can generate one or more n-best hypotheses of a ground truth label sequence, using one or more acoustic models, wherein the one or more n-best hypotheses of the ground truth label sequence can represent one or more competing labels that can be used to smooth out the ground truth label sequence.

Type: Application

Filed: November 22, 2022

Publication date: May 23, 2024

Inventors: Xiaodong Cui, Brian E. D. Kingsbury, George Andrei Saon
Chunking and overlap decoding strategy for streaming RNN transducers for speech recognition

Patent number: 11942078

Abstract: A computer-implemented method is provided for improving accuracy recognition of digital speech. The method includes receiving the digital speech. The method further includes splitting the digital speech into overlapping chunks. The method also includes computing a bidirectional encoder embedding of each of the overlapping chunks to obtain bidirectional encoder embeddings. The method additionally includes combining the bidirectional encoder embeddings. The method further includes interpreting, by a speech recognition system, the digital speech using the combined bidirectional encoder embeddings.

Type: Grant

Filed: February 26, 2021

Date of Patent: March 26, 2024

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventor: George Andrei Saon
Integrating text inputs for training and adapting neural network transducer ASR models

Patent number: 11908454

Abstract: A processor-implemented method trains an automatic speech recognition system using speech data and text data. A computer device receives speech data, and generates a spectrogram based on the speech data. The computing device receives text data associated with an entire corpus of text data, and generates a textogram based upon the text data. The computing device trains an automatic speech recognition system using the spectrogram and the textogram.

Type: Grant

Filed: December 1, 2021

Date of Patent: February 20, 2024

Assignee: International Business Machines Corporation

Inventors: Samuel Thomas, Hong-Kwang Kuo, Brian E. D. Kingsbury, George Andrei Saon, Gakuto Kurata
Customization of recurrent neural network transducers for speech recognition

Patent number: 11908458

Abstract: A computer-implemented method for customizing a recurrent neural network transducer (RNN-T) is provided. The computer implemented method includes synthesizing first domain audio data from first domain text data, and feeding the synthesized first domain audio data into a trained encoder of the recurrent neural network transducer (RNN-T) having an initial condition, wherein the encoder is updated using the synthesized first domain audio data and the first domain text data. The computer implemented method further includes synthesizing second domain audio data from second domain text data, and feeding the synthesized second domain audio data into the updated encoder of the recurrent neural network transducer (RNN-T), wherein the prediction network is updated using the synthesized second domain audio data and the second domain text data. The computer implemented method further includes restoring the updated encoder to the initial condition.

Type: Grant

Filed: December 29, 2020

Date of Patent: February 20, 2024

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Gakuto Kurata, George Andrei Saon, Brian E. D. Kingsbury
ACCURACY OF STREAMING RNN TRANSDUCER

Publication number: 20230410797

Abstract: A computer-implemented method is provided for model training. The method includes training a second end-to-end neural speech recognition model that has a bidirectional encoder to output same symbols from an output probability lattice of the second end-to-end neural speech recognition model as from an output probability lattice of a trained first end-to-end neural speech recognition model having a unidirectional encoder. The method also includes building a third end-to-end neural speech recognition model that has a unidirectional encoder by training the third end-to-end neural speech recognition model as a student by using the trained second end-to-end neural speech recognition model as a teacher in a knowledge distillation method.

Type: Application

Filed: September 1, 2023

Publication date: December 21, 2023

Inventors: Gakuto Kurata, George Andrei Saon
Accuracy of streaming RNN transducer

Patent number: 11783811

Abstract: A computer-implemented method is provided for model training. The method includes training a second end-to-end neural speech recognition model that has a bidirectional encoder to output same symbols from an output probability lattice of the second end-to-end neural speech recognition model as from an output probability lattice of a trained first end-to-end neural speech recognition model having a unidirectional encoder. The method also includes building a third end-to-end neural speech recognition model that has a unidirectional encoder by training the third end-to-end neural speech recognition model as a student by using the trained second end-to-end neural speech recognition model as a teacher in a knowledge distillation method.

Type: Grant

Filed: September 24, 2020

Date of Patent: October 10, 2023

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Gakuto Kurata, George Andrei Saon
Multiplicative integration in neural network transducer models for end-to-end speech recognition

Patent number: 11741946

Abstract: Using an encoder neural network model, an encoder vector is computed, the encoder vector comprising a vector representation of a current portion of input data in an input sequence. Using a prediction neural network model, a prediction vector is predicted, the prediction performed using a previous prediction vector and a previous output symbol corresponding to a previous portion of input data in the input sequence. Using a joint neural network model, a joint vector corresponding to the encoder vector and the prediction vector is computed, the joint vector multiplicatively combining each element of the encoder vector with a corresponding element of the prediction vector. Using a softmax function, the joint vector is converted to a probability distribution comprising a probability that a current output symbol corresponds to the current portion of input data in the input sequence.

Type: Grant

Filed: August 21, 2020

Date of Patent: August 29, 2023

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: George Andrei Saon, Daniel Bolanos
Reducing Exposure Bias in Machine Learning Training of Sequence-to-Sequence Transducers

Publication number: 20230186903

Abstract: Mechanisms are provided for performing machine learning training of a computer model. A perturbation generator generates a modified training data comprising perturbations injected into original training data, where the perturbations cause a data corruption of the original training data. The modified training data is input into a prediction network of the computer model and processing the modified training data through the prediction network to generate a prediction output. Machine learning training is executed of the prediction network based on the prediction output and the original training data to generate a trained prediction network of a trained computer model. The trained computer model is deployed to an artificial intelligence computing system for performance of an inference operation.

Type: Application

Filed: December 13, 2021

Publication date: June 15, 2023

Inventors: Xiaodong Cui, Brian E. D. Kingsbury, George Andrei Saon, David Haws, Zoltan Tueske
TRAINING END-TO-END SPOKEN LANGUAGE UNDERSTANDING SYSTEMS WITH UNORDERED ENTITIES

Publication number: 20230081306

Abstract: Training data can be received, which can include pairs of speech and meaning representation associated with the speech as ground truth data. The meaning representation includes at least semantic entities associated with the speech, where the spoken order of the semantic entities is unknown. The semantic entities of the meaning representation in the training data can be reordered into spoken order of the associated speech using an alignment technique. A spoken language understanding machine learning model can be trained using the pairs of speech and meaning representation having the reordered semantic entities. The meaning representation, e.g., semantic entities, in the received training data can be perturbed to create random order sequence variations of the semantic entities associated with speech. Perturbed meaning representation with associated speech can augment the training data.

Type: Application

Filed: August 27, 2021

Publication date: March 16, 2023

Inventors: Hong-Kwang Kuo, Zoltan Tueske, Samuel Thomas, Brian E. D. Kingsbury, George Andrei Saon
INTEGRATING DIALOG HISTORY INTO END-TO-END SPOKEN LANGUAGE UNDERSTANDING SYSTEMS

Publication number: 20230056680

Abstract: Audio signals representing a current utterance in a conversation and a dialog history including at least information associated with past utterances corresponding to the current utterance in the conversation can be received. The dialog history can be encoded into an embedding. A spoken language understanding neural network model can be trained to perform a spoken language understanding task based on input features including at least speech features associated with the received audio signals and the embedding. An encoder can also be trained to encode a given dialog history into an embedding. The spoken language understanding task can include predicting a dialog action of an utterance. The spoken language understanding task can include predicting a dialog intent or overall topic of the conversation.

Type: Application

Filed: August 18, 2021

Publication date: February 23, 2023

Inventors: Samuel Thomas, Jatin Ganhotra, Hong-Kwang Kuo, Sachindra Joshi, George Andrei Saon, Zoltan Tueske, Brian E. D. Kingsbury
END TO END SPOKEN LANGUAGE UNDERSTANDING MODEL

Publication number: 20220319494

Abstract: An approach to training an end-to-end spoken language understanding model may be provided. A pre-trained general automatic speech recognition model may be adapted to a domain specific spoken language understanding model. The pre-trained general automatic speech recognition model may be a recurrent neural network transducer model. The adaptation may provide transcription data annotated with spoken language understanding labels. Adaptation may include audio data may also be provided for in addition to verbatim transcripts annotated with spoken language understanding labels. The spoken language understanding labels may be entity and/or intent based with values associated with each label.

Type: Application

Filed: March 31, 2021

Publication date: October 6, 2022

Inventors: Samuel Thomas, Hong-Kwang Kuo, George Andrei Saon, Zoltan Tueske, Brian E. D. Kingsbury

1 2 next