Patents by Inventor Zoltan Tueske

Zoltan Tueske has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Reducing exposure bias in machine learning training of sequence-to-sequence transducers

Patent number: 12148419

Abstract: Mechanisms are provided for performing machine learning training of a computer model. A perturbation generator generates a modified training data comprising perturbations injected into original training data, where the perturbations cause a data corruption of the original training data. The modified training data is input into a prediction network of the computer model and processing the modified training data through the prediction network to generate a prediction output. Machine learning training is executed of the prediction network based on the prediction output and the original training data to generate a trained prediction network of a trained computer model. The trained computer model is deployed to an artificial intelligence computing system for performance of an inference operation.

Type: Grant

Filed: December 13, 2021

Date of Patent: November 19, 2024

Assignee: International Business Machines Corporation

Inventors: Xiaodong Cui, Brian E. D. Kingsbury, George Andrei Saon, David Haws, Zoltan Tueske
Integrating dialog history into end-to-end spoken language understanding systems

Patent number: 12136414

Abstract: Audio signals representing a current utterance in a conversation and a dialog history including at least information associated with past utterances corresponding to the current utterance in the conversation can be received. The dialog history can be encoded into an embedding. A spoken language understanding neural network model can be trained to perform a spoken language understanding task based on input features including at least speech features associated with the received audio signals and the embedding. An encoder can also be trained to encode a given dialog history into an embedding. The spoken language understanding task can include predicting a dialog action of an utterance. The spoken language understanding task can include predicting a dialog intent or overall topic of the conversation.

Type: Grant

Filed: August 18, 2021

Date of Patent: November 5, 2024

Assignee: International Business Machines Corporation

Inventors: Samuel Thomas, Jatin Ganhotra, Hong-Kwang Kuo, Sachindra Joshi, George Andrei Saon, Zoltan Tueske, Brian E. D. Kingsbury
Training end-to-end spoken language understanding systems with unordered entities

Patent number: 12046236

Abstract: Training data can be received, which can include pairs of speech and meaning representation associated with the speech as ground truth data. The meaning representation includes at least semantic entities associated with the speech, where the spoken order of the semantic entities is unknown. The semantic entities of the meaning representation in the training data can be reordered into spoken order of the associated speech using an alignment technique. A spoken language understanding machine learning model can be trained using the pairs of speech and meaning representation having the reordered semantic entities. The meaning representation, e.g., semantic entities, in the received training data can be perturbed to create random order sequence variations of the semantic entities associated with speech. Perturbed meaning representation with associated speech can augment the training data.

Type: Grant

Filed: August 27, 2021

Date of Patent: July 23, 2024

Assignee: International Business Machines Corporation

Inventors: Hong-Kwang Kuo, Zoltan Tueske, Samuel Thomas, Brian E. D. Kingsbury, George Andrei Saon
End-to-end spoken language understanding without full transcripts

Patent number: 11929062

Abstract: A method and system of training a spoken language understanding (SLU) model includes receiving natural language training data comprising (i) one or more speech recording, and (ii) a set of semantic entities and/or intents for each corresponding speech recording. For each speech recording, one or more entity labels and corresponding values, and one or more intent labels are extracted from the corresponding semantic entities and/or overall intent. A spoken language understanding (SLU) model is trained based upon the one or more entity labels and corresponding values, and one or more intent labels of the corresponding speech recordings without a need for a transcript of the corresponding speech recording.

Type: Grant

Filed: September 15, 2020

Date of Patent: March 12, 2024

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Hong-Kwang Jeff Kuo, Zoltan Tueske, Samuel Thomas, Yinghui Huang, Brian E. D. Kingsbury, Kartik Audhkhasi
Reducing Exposure Bias in Machine Learning Training of Sequence-to-Sequence Transducers

Publication number: 20230186903

Abstract: Mechanisms are provided for performing machine learning training of a computer model. A perturbation generator generates a modified training data comprising perturbations injected into original training data, where the perturbations cause a data corruption of the original training data. The modified training data is input into a prediction network of the computer model and processing the modified training data through the prediction network to generate a prediction output. Machine learning training is executed of the prediction network based on the prediction output and the original training data to generate a trained prediction network of a trained computer model. The trained computer model is deployed to an artificial intelligence computing system for performance of an inference operation.

Type: Application

Filed: December 13, 2021

Publication date: June 15, 2023

Inventors: Xiaodong Cui, Brian E. D. Kingsbury, George Andrei Saon, David Haws, Zoltan Tueske
TRAINING END-TO-END SPOKEN LANGUAGE UNDERSTANDING SYSTEMS WITH UNORDERED ENTITIES

Publication number: 20230081306

Abstract: Training data can be received, which can include pairs of speech and meaning representation associated with the speech as ground truth data. The meaning representation includes at least semantic entities associated with the speech, where the spoken order of the semantic entities is unknown. The semantic entities of the meaning representation in the training data can be reordered into spoken order of the associated speech using an alignment technique. A spoken language understanding machine learning model can be trained using the pairs of speech and meaning representation having the reordered semantic entities. The meaning representation, e.g., semantic entities, in the received training data can be perturbed to create random order sequence variations of the semantic entities associated with speech. Perturbed meaning representation with associated speech can augment the training data.

Type: Application

Filed: August 27, 2021

Publication date: March 16, 2023

Inventors: Hong-Kwang Kuo, Zoltan Tueske, Samuel Thomas, Brian E. D. Kingsbury, George Andrei Saon
INTEGRATING DIALOG HISTORY INTO END-TO-END SPOKEN LANGUAGE UNDERSTANDING SYSTEMS

Publication number: 20230056680

Abstract: Audio signals representing a current utterance in a conversation and a dialog history including at least information associated with past utterances corresponding to the current utterance in the conversation can be received. The dialog history can be encoded into an embedding. A spoken language understanding neural network model can be trained to perform a spoken language understanding task based on input features including at least speech features associated with the received audio signals and the embedding. An encoder can also be trained to encode a given dialog history into an embedding. The spoken language understanding task can include predicting a dialog action of an utterance. The spoken language understanding task can include predicting a dialog intent or overall topic of the conversation.

Type: Application

Filed: August 18, 2021

Publication date: February 23, 2023

Inventors: Samuel Thomas, Jatin Ganhotra, Hong-Kwang Kuo, Sachindra Joshi, George Andrei Saon, Zoltan Tueske, Brian E. D. Kingsbury
END TO END SPOKEN LANGUAGE UNDERSTANDING MODEL

Publication number: 20220319494

Abstract: An approach to training an end-to-end spoken language understanding model may be provided. A pre-trained general automatic speech recognition model may be adapted to a domain specific spoken language understanding model. The pre-trained general automatic speech recognition model may be a recurrent neural network transducer model. The adaptation may provide transcription data annotated with spoken language understanding labels. Adaptation may include audio data may also be provided for in addition to verbatim transcripts annotated with spoken language understanding labels. The spoken language understanding labels may be entity and/or intent based with values associated with each label.

Type: Application

Filed: March 31, 2021

Publication date: October 6, 2022

Inventors: Samuel Thomas, Hong-Kwang Kuo, George Andrei Saon, Zoltan Tueske, Brian E. D. Kingsbury
End-to-End Spoken Language Understanding Without Full Transcripts

Publication number: 20220084508

Abstract: A method and system of training a spoken language understanding (SLU) model includes receiving natural language training data comprising (i) one or more speech recording, and (ii) a set of semantic entities and/or intents for each corresponding speech recording. For each speech recording, one or more entity labels and corresponding values, and one or more intent labels are extracted from the corresponding semantic entities and/or overall intent. A spoken language understanding (SLU) model is trained based upon the one or more entity labels and corresponding values, and one or more intent labels of the corresponding speech recordings without a need for a transcript of the corresponding speech recording.

Type: Application

Filed: September 15, 2020

Publication date: March 17, 2022

Inventors: Hong-Kwang Jeff Kuo, Zoltan Tueske, Samuel Thomas, Yinghui Huang, Brian E. D. Kingsbury, Kartik Audhkhasi
Using closed captions as parallel training data for customization of closed captioning systems

Patent number: 11250872

Abstract: Method, apparatus, and computer program product are provided for customizing an automatic closed captioning system. In some embodiments, at a data use (DU) location, an automatic closed captioning system that includes a base model is provided, search criteria are defined to request from one or more data collection (DC) locations, a search request based on the search criteria is sent to the one or more DC locations, relevant closed caption data from the one or more DC locations are received responsive to the search request, the received relevant closed caption data are processed by computing a confidence score for each of a plurality of data sub-sets of the received relevant closed caption data and selecting one or more of the data sub-sets based on the confidence scores, and the automatic closed captioning system is customized by using the selected one or more data sub-sets to train the base model.

Type: Grant

Filed: December 14, 2019

Date of Patent: February 15, 2022

Assignee: International Business Machines Corporation

Inventors: Samuel Thomas, Yinghui Huang, Masayuki Suzuki, Zoltan Tueske, Laurence P. Sansone, Michael A. Picheny
Detecting and recovering out-of-vocabulary words in voice-to-text transcription systems

Patent number: 11183194

Abstract: Aspects of the present disclosure describe techniques for identifying and recovering out-of-vocabulary words in transcripts of a voice data recording using word recognition models and word sub-unit recognition models. An example method generally includes receiving a voice data recording for transcription into a textual representation of the voice data recording. The voice data recording is transcribed into the textual representation using a word recognition model. An unknown word is identified in the textual representation, and the unknown word is reconstructed based on recognition of sub-units of the unknown word generated by a sub-unit recognition model. The textual representation of the voice data recording is modified by replacing the unknown word with the reconstruction of the unknown word, and the modified textual representation is output.

Type: Grant

Filed: September 13, 2019

Date of Patent: November 23, 2021

Assignee: International Business Machines Corporation

Inventors: Samuel Thomas, Kartik Audhkhasi, Zoltan Tueske, Yinghui Huang, Michael Alan Picheny
Soft-forgetting for connectionist temporal classification based automatic speech recognition

Patent number: 11158303

Abstract: In an approach to soft-forgetting training, one or more computer processors train a first model utilizing one or more training batches wherein each training batch of the one or more training batches comprises one or more blocks of information. The one or more computer processors, responsive to a completion of the training of the first model, initiate a training of a second model utilizing the one or more training batches. The one or more computer processors jitter a random block size for each block of information for each of the one or more training batches for the second model. The one or more computer processors unroll the second model over one or more non-overlapping contiguous jittered blocks of information. The one or more computer processors, responsive to the unrolling of the second model, reduce overfitting for the second model by applying twin regularization.

Type: Grant

Filed: August 27, 2019

Date of Patent: October 26, 2021

Assignee: International Business Machines Corporation

Inventors: Kartik Audhkhasi, George Andrei Saon, Zoltan Tueske, Brian E. D. Kingsbury, Michael Alan Picheny
USING CLOSED CAPTIONS AS PARALLEL TRAINING DATA FOR CUSTOMIZATION OF CLOSED CAPTIONING SYSTEMS

Publication number: 20210183404

Abstract: Method, apparatus, and computer program product are provided for customizing an automatic closed captioning system. In some embodiments, at a data use (DU) location, an automatic closed captioning system that includes a base model is provided, search criteria are defined to request from one or more data collection (DC) locations, a search request based on the search criteria is sent to the one or more DC locations, relevant closed caption data from the one or more DC locations are received responsive to the search request, the received relevant closed caption data are processed by computing a confidence score for each of a plurality of data sub-sets of the received relevant closed caption data and selecting one or more of the data sub-sets based on the confidence scores, and the automatic closed captioning system is customized by using the selected one or more data sub-sets to train the base model.

Type: Application

Filed: December 14, 2019

Publication date: June 17, 2021

Inventors: Samuel Thomas, Yinghui Huang, Masayuki Suzuki, Zoltan Tueske, Laurence P. Sansone, Michael A. Picheny
DETECTING AND RECOVERING OUT-OF-VOCABULARY WORDS IN VOICE-TO-TEXT TRANSCRIPTION SYSTEMS

Publication number: 20210082437

Abstract: Aspects of the present disclosure describe techniques for identifying and recovering out-of-vocabulary words in transcripts of a voice data recording using word recognition models and word sub-unit recognition models. An example method generally includes receiving a voice data recording for transcription into a textual representation of the voice data recording. The voice data recording is transcribed into the textual representation using a word recognition model. An unknown word is identified in the textual representation, and the unknown word is reconstructed based on recognition of sub-units of the unknown word generated by a sub-unit recognition model. The textual representation of the voice data recording is modified by replacing the unknown word with the reconstruction of the unknown word, and the modified textual representation is output.

Type: Application

Filed: September 13, 2019

Publication date: March 18, 2021

Inventors: SAMUEL THOMAS, KARTIK AUDHKHASI, ZOLTAN TUESKE, YINGHUI HUANG, MICHAEL ALAN PICHENY
SOFT-FORGETTING FOR CONNECTIONIST TEMPORAL CLASSIFICATION BASED AUTOMATIC SPEECH RECOGNITION

Publication number: 20210065680

Abstract: In an approach to soft-forgetting training, one or more computer processors train a first model utilizing one or more training batches wherein each training batch of the one or more training batches comprises one or more blocks of information. The one or more computer processors, responsive to a completion of the training of the first model, initiate a training of a second model utilizing the one or more training batches. The one or more computer processors jitter a random block size for each block of information for each of the one or more training batches for the second model. The one or more computer processors unroll the second model over one or more non-overlapping contiguous jittered blocks of information. The one or more computer processors, responsive to the unrolling of the second model, reduce overfitting for the second model by applying twin regularization.

Type: Application

Filed: August 27, 2019

Publication date: March 4, 2021

Inventors: Kartik Audhkhasi, George Andrei Saon, Zoltan Tueske, Brian E. D. Kingsbury, Michael Alan Picheny