Patents by Inventor Brian E. D. Kingsbury

Brian E. D. Kingsbury has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

LABEL SMOOTHING TECHNIQUE FOR IMPROVING GENERALIZATION OF DEEP NEURAL NETWORK ACOUSTIC MODELS

Publication number: 20240169197

Abstract: One or more systems, devices, computer program products and/or computer-implemented methods of use provided herein relate to n-best based label smoothing techniques for improving generalization of DNN acoustic models. A computer-implemented system can comprise a memory that can store computer executable components. The computer-implemented system can further comprise a processor that can execute the computer executable components stored in the memory, wherein the computer executable components can comprise a generation component that can generate one or more n-best hypotheses of a ground truth label sequence, using one or more acoustic models, wherein the one or more n-best hypotheses of the ground truth label sequence can represent one or more competing labels that can be used to smooth out the ground truth label sequence.

Type: Application

Filed: November 22, 2022

Publication date: May 23, 2024

Inventors: Xiaodong Cui, Brian E. D. Kingsbury, George Andrei Saon
LENGTH PERTURBATION TECHNIQUES FOR IMPROVING GENERALIZATION OF DEEP NEURAL NETWORK ACOUSTIC MODELS

Publication number: 20240170005

Abstract: One or more systems, devices, computer program products and/or computer-implemented methods of use provided herein relate to length perturbation techniques for improving generalization of DNN acoustic models. A computer-implemented system can comprise a memory that can store computer executable components. The computer-implemented system can further comprise a processor that can execute the computer executable components stored in the memory, wherein the computer executable components can comprise a frame skipping component that can remove one or more frames from an acoustic utterance via frame skipping. The computer executable components can further comprise a frame insertion component that can insert one or more replacement frames into the acoustic utterance via frame insertion to replace the one or more frames with the one or more replacement frames to enable length perturbation of the acoustic utterance.

Type: Application

Filed: November 22, 2022

Publication date: May 23, 2024

Inventors: Xiaodong Cui, Brian E. D. Kingsbury, George Andrei Saon
Bilevel Optimization Based Decentralized Framework for Personalized Client Learning

Publication number: 20240095515

Abstract: Decentralized bilevel optimization techniques for personalized learning over a heterogenous network are provided. In one aspect, a decentralized learning system includes: a distributed machine learning network with multiple nodes, and datasets associated with the nodes; and a bilevel learning structure at each of the nodes for optimizing one or more features from each of the datasets using a decentralized bilevel optimization solver, while maintaining distinct features from each of the datasets. A method for decentralized learning is also provided.

Type: Application

Filed: September 13, 2022

Publication date: March 21, 2024

Inventors: Songtao Lu, Xiaodong Cui, Mark S. Squillante, Brian E.D. Kingsbury, Lior Horesh
End-to-end spoken language understanding without full transcripts

Patent number: 11929062

Abstract: A method and system of training a spoken language understanding (SLU) model includes receiving natural language training data comprising (i) one or more speech recording, and (ii) a set of semantic entities and/or intents for each corresponding speech recording. For each speech recording, one or more entity labels and corresponding values, and one or more intent labels are extracted from the corresponding semantic entities and/or overall intent. A spoken language understanding (SLU) model is trained based upon the one or more entity labels and corresponding values, and one or more intent labels of the corresponding speech recordings without a need for a transcript of the corresponding speech recording.

Type: Grant

Filed: September 15, 2020

Date of Patent: March 12, 2024

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Hong-Kwang Jeff Kuo, Zoltan Tueske, Samuel Thomas, Yinghui Huang, Brian E. D. Kingsbury, Kartik Audhkhasi
Input encoding for classifier generalization

Patent number: 11914678

Abstract: Techniques for classifier generalization in a supervised learning process using input encoding are provided. In one aspect, a method for classification generalization includes: encoding original input features from at least one input sample {right arrow over (x)}S with a uniquely decodable code using an encoder E(?) to produce encoded input features E({right arrow over (x)}S), wherein the at least one input sample {right arrow over (x)}S comprises uncoded input features; feeding the uncoded input features and the encoded input features E({right arrow over (x)}S) to a base model to build an encoded model; and learning a classification function {tilde over (C)}E(?) using the encoded model, wherein the classification function {tilde over (C)}E(?) learned using the encoded model is more general than that learned using the uncoded input features alone.

Type: Grant

Filed: September 23, 2020

Date of Patent: February 27, 2024

Assignee: International Business Machines Corporation

Inventors: Hazar Yueksel, Kush Raj Varshney, Brian E. D. Kingsbury
Customization of recurrent neural network transducers for speech recognition

Patent number: 11908458

Abstract: A computer-implemented method for customizing a recurrent neural network transducer (RNN-T) is provided. The computer implemented method includes synthesizing first domain audio data from first domain text data, and feeding the synthesized first domain audio data into a trained encoder of the recurrent neural network transducer (RNN-T) having an initial condition, wherein the encoder is updated using the synthesized first domain audio data and the first domain text data. The computer implemented method further includes synthesizing second domain audio data from second domain text data, and feeding the synthesized second domain audio data into the updated encoder of the recurrent neural network transducer (RNN-T), wherein the prediction network is updated using the synthesized second domain audio data and the second domain text data. The computer implemented method further includes restoring the updated encoder to the initial condition.

Type: Grant

Filed: December 29, 2020

Date of Patent: February 20, 2024

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Gakuto Kurata, George Andrei Saon, Brian E. D. Kingsbury
Integrating text inputs for training and adapting neural network transducer ASR models

Patent number: 11908454

Abstract: A processor-implemented method trains an automatic speech recognition system using speech data and text data. A computer device receives speech data, and generates a spectrogram based on the speech data. The computing device receives text data associated with an entire corpus of text data, and generates a textogram based upon the text data. The computing device trains an automatic speech recognition system using the spectrogram and the textogram.

Type: Grant

Filed: December 1, 2021

Date of Patent: February 20, 2024

Assignee: International Business Machines Corporation

Inventors: Samuel Thomas, Hong-Kwang Kuo, Brian E. D. Kingsbury, George Andrei Saon, Gakuto Kurata
END-TO-END INTEGRATION OF DIALOG HISTORY FOR SPOKEN LANGUAGE UNDERSTANDING

Publication number: 20230298596

Abstract: Systems, computer-implemented methods, and computer program products to facilitate end to end integration of dialogue history for spoken language understanding are provided. According to an embodiment, a system can comprise a processor that executes components stored in memory. The computer executable components comprise a conversation component that encodes speech-based content of an utterance and text-based content of the utterance into a uniform representation.

Type: Application

Filed: March 18, 2022

Publication date: September 21, 2023

Inventors: Samuel Thomas, Vishal Sunder, Hong-Kwang Kuo, Jatin Ganhotra, Brian E. D. Kingsbury, Eric Fosler-Lussier
Reducing Exposure Bias in Machine Learning Training of Sequence-to-Sequence Transducers

Publication number: 20230186903

Abstract: Mechanisms are provided for performing machine learning training of a computer model. A perturbation generator generates a modified training data comprising perturbations injected into original training data, where the perturbations cause a data corruption of the original training data. The modified training data is input into a prediction network of the computer model and processing the modified training data through the prediction network to generate a prediction output. Machine learning training is executed of the prediction network based on the prediction output and the original training data to generate a trained prediction network of a trained computer model. The trained computer model is deployed to an artificial intelligence computing system for performance of an inference operation.

Type: Application

Filed: December 13, 2021

Publication date: June 15, 2023

Inventors: Xiaodong Cui, Brian E. D. Kingsbury, George Andrei Saon, David Haws, Zoltan Tueske
INTEGRATING TEXT INPUTS FOR TRAINING AND ADAPTING NEURAL NETWORK TRANSDUCER ASR MODELS

Publication number: 20230169954

Abstract: A processor-implemented method trains an automatic speech recognition system using speech data and text data. A computer device receives speech data, and generates a spectrogram based on the speech data. The computing device receives text data associated with an entire corpus of text data, and generates a textogram based upon the text data. The computing device trains an automatic speech recognition system using the spectrogram and the textogram.

Type: Application

Filed: December 1, 2021

Publication date: June 1, 2023

Inventors: SAMUEL THOMAS, HONG-KWANG KUO, BRIAN E.D. KINGSBURY, GEORGE ANDREI SAON, KAGUTO KURATA
TRAINING END-TO-END SPOKEN LANGUAGE UNDERSTANDING SYSTEMS WITH UNORDERED ENTITIES

Publication number: 20230081306

Abstract: Training data can be received, which can include pairs of speech and meaning representation associated with the speech as ground truth data. The meaning representation includes at least semantic entities associated with the speech, where the spoken order of the semantic entities is unknown. The semantic entities of the meaning representation in the training data can be reordered into spoken order of the associated speech using an alignment technique. A spoken language understanding machine learning model can be trained using the pairs of speech and meaning representation having the reordered semantic entities. The meaning representation, e.g., semantic entities, in the received training data can be perturbed to create random order sequence variations of the semantic entities associated with speech. Perturbed meaning representation with associated speech can augment the training data.

Type: Application

Filed: August 27, 2021

Publication date: March 16, 2023

Inventors: Hong-Kwang Kuo, Zoltan Tueske, Samuel Thomas, Brian E. D. Kingsbury, George Andrei Saon
INTEGRATING DIALOG HISTORY INTO END-TO-END SPOKEN LANGUAGE UNDERSTANDING SYSTEMS

Publication number: 20230056680

Abstract: Audio signals representing a current utterance in a conversation and a dialog history including at least information associated with past utterances corresponding to the current utterance in the conversation can be received. The dialog history can be encoded into an embedding. A spoken language understanding neural network model can be trained to perform a spoken language understanding task based on input features including at least speech features associated with the received audio signals and the embedding. An encoder can also be trained to encode a given dialog history into an embedding. The spoken language understanding task can include predicting a dialog action of an utterance. The spoken language understanding task can include predicting a dialog intent or overall topic of the conversation.

Type: Application

Filed: August 18, 2021

Publication date: February 23, 2023

Inventors: Samuel Thomas, Jatin Ganhotra, Hong-Kwang Kuo, Sachindra Joshi, George Andrei Saon, Zoltan Tueske, Brian E. D. Kingsbury
Transliteration based data augmentation for training multilingual ASR acoustic models in low resource settings

Patent number: 11568858

Abstract: A computer-implemented method of building a multilingual acoustic model for automatic speech recognition in a low resource setting includes training a multilingual network on a set of training languages with an original transcribed training data to create a baseline multilingual acoustic model. Transliteration of transcribed training data is performed by processing through the multilingual network a plurality of multilingual data types from the set of languages, and outputting a pool of transliterated data. A filtering metric is applied to the pool of transliterated data output to select one or more portions of the transliterated data for retraining of the acoustic model. Data augmentation is performed by adding one or more selected portions of the output transliterated data back to the original transcribed training data to update training data. The training of a new multilingual acoustic model through the multilingual network is performed using the updated training data.

Type: Grant

Filed: October 17, 2020

Date of Patent: January 31, 2023

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Samuel Thomas, Kartik Audhkhasi, Brian E. D. Kingsbury
Input-Encoding with Federated Learning

Publication number: 20220343218

Abstract: Embodiments relate to an input-encoding technique in conjunction with federation. Participating entities are arranged in a collaborative relationship. Each participating entity trains a machine learning model with an encoder on a training data set. The performance of each of the models is measured and at least one of the models is selectively identified based on the measured performance. An encoder of the selectively identified machine learning model is shared with each of the participating entities. The shared encoder is configured to be applied by the participating entities to train the first and second machine learning models, which are configured to be merged and shared in the federated learning environment.

Type: Application

Filed: April 26, 2021

Publication date: October 27, 2022

Applicant: International Business Machines Corporation

Inventors: Hazar Yueksel, Brian E. D. Kingsbury, Kush Raj Varshney, Pradip Bose, Dinesh C. Verma, Shiqiang Wang, Augusto Vega, ASHISH VERMA, SUPRIYO CHAKRABORTY
END TO END SPOKEN LANGUAGE UNDERSTANDING MODEL

Publication number: 20220319494

Abstract: An approach to training an end-to-end spoken language understanding model may be provided. A pre-trained general automatic speech recognition model may be adapted to a domain specific spoken language understanding model. The pre-trained general automatic speech recognition model may be a recurrent neural network transducer model. The adaptation may provide transcription data annotated with spoken language understanding labels. Adaptation may include audio data may also be provided for in addition to verbatim transcripts annotated with spoken language understanding labels. The spoken language understanding labels may be entity and/or intent based with values associated with each label.

Type: Application

Filed: March 31, 2021

Publication date: October 6, 2022

Inventors: Samuel Thomas, Hong-Kwang Kuo, George Andrei Saon, Zoltan Tueske, Brian E. D. Kingsbury
CUSTOMIZATION OF RECURRENT NEURAL NETWORK TRANSDUCERS FOR SPEECH RECOGNITION

Publication number: 20220208179

Abstract: A computer-implemented method for customizing a recurrent neural network transducer (RNN-T) is provided. The computer implemented method includes synthesizing first domain audio data from first domain text data, and feeding the synthesized first domain audio data into a trained encoder of the recurrent neural network transducer (RNN-T) having an initial condition, wherein the encoder is updated using the synthesized first domain audio data and the first domain text data. The computer implemented method further includes synthesizing second domain audio data from second domain text data, and feeding the synthesized second domain audio data into the updated encoder of the recurrent neural network transducer (RNN-T), wherein the prediction network is updated using the synthesized second domain audio data and the second domain text data. The computer implemented method further includes restoring the updated encoder to the initial condition.

Type: Application

Filed: December 29, 2020

Publication date: June 30, 2022

Inventors: Gakuto Kurata, George Andrei Saon, Brian E. D. Kingsbury
TRANSLITERATION BASED DATA AUGMENTATION FOR TRAINING MULTILINGUAL ASR ACOUSTIC MODELS IN LOW RESOURCE SETTINGS

Publication number: 20220122585

Abstract: A computer-implemented method of building a multilingual acoustic model for automatic speech recognition in a low resource setting includes training a multilingual network on a set of training languages with an original transcribed training data to create a baseline multilingual acoustic model. Transliteration of transcribed training data is performed by processing through the multilingual network a plurality of multilingual data types from the set of languages, and outputting a pool of transliterated data. A filtering metric is applied to the pool of transliterated data output to select one or more portions of the transliterated data for retraining of the acoustic model. Data augmentation is performed by adding one or more selected portions of the output transliterated data back to the original transcribed training data to update training data. The training of a new multilingual acoustic model through the multilingual network is performed using the updated training data.

Type: Application

Filed: October 17, 2020

Publication date: April 21, 2022

Inventors: Samuel Thomas, Kartik Audhkhasi, Brian E. D. Kingsbury
Input Encoding for Classifier Generalization

Publication number: 20220092365

Abstract: Techniques for classifier generalization in a supervised learning process using input encoding are provided. In one aspect, a method for classification generalization includes: encoding original input features from at least one input sample {right arrow over (x)}S with a uniquely decodable code using an encoder E(?) to produce encoded input features E({right arrow over (x)}S), wherein the at least one input sample {right arrow over (x)}S comprises uncoded input features; feeding the uncoded input features and the encoded input features E({right arrow over (x)}S) to a base model to build an encoded model; and learning a classification function {tilde over (C)}E(?) using the encoded model, wherein the classification function {tilde over (C)}E(?) learned using the encoded model is more general than that learned using the uncoded input features alone.

Type: Application

Filed: September 23, 2020

Publication date: March 24, 2022

Inventors: Hazar Yueksel, Kush Raj Varshney, Brian E.D. Kingsbury
End-to-End Spoken Language Understanding Without Full Transcripts

Publication number: 20220084508

Abstract: A method and system of training a spoken language understanding (SLU) model includes receiving natural language training data comprising (i) one or more speech recording, and (ii) a set of semantic entities and/or intents for each corresponding speech recording. For each speech recording, one or more entity labels and corresponding values, and one or more intent labels are extracted from the corresponding semantic entities and/or overall intent. A spoken language understanding (SLU) model is trained based upon the one or more entity labels and corresponding values, and one or more intent labels of the corresponding speech recordings without a need for a transcript of the corresponding speech recording.

Type: Application

Filed: September 15, 2020

Publication date: March 17, 2022

Inventors: Hong-Kwang Jeff Kuo, Zoltan Tueske, Samuel Thomas, Yinghui Huang, Brian E. D. Kingsbury, Kartik Audhkhasi
Soft-forgetting for connectionist temporal classification based automatic speech recognition

Patent number: 11158303

Abstract: In an approach to soft-forgetting training, one or more computer processors train a first model utilizing one or more training batches wherein each training batch of the one or more training batches comprises one or more blocks of information. The one or more computer processors, responsive to a completion of the training of the first model, initiate a training of a second model utilizing the one or more training batches. The one or more computer processors jitter a random block size for each block of information for each of the one or more training batches for the second model. The one or more computer processors unroll the second model over one or more non-overlapping contiguous jittered blocks of information. The one or more computer processors, responsive to the unrolling of the second model, reduce overfitting for the second model by applying twin regularization.

Type: Grant

Filed: August 27, 2019

Date of Patent: October 26, 2021

Assignee: International Business Machines Corporation

Inventors: Kartik Audhkhasi, George Andrei Saon, Zoltan Tueske, Brian E. D. Kingsbury, Michael Alan Picheny

1 2 3 next