Patents by Inventor Kartik Audhkhasi

Kartik Audhkhasi has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

End-to-end spoken language understanding without full transcripts

Patent number: 11929062

Abstract: A method and system of training a spoken language understanding (SLU) model includes receiving natural language training data comprising (i) one or more speech recording, and (ii) a set of semantic entities and/or intents for each corresponding speech recording. For each speech recording, one or more entity labels and corresponding values, and one or more intent labels are extracted from the corresponding semantic entities and/or overall intent. A spoken language understanding (SLU) model is trained based upon the one or more entity labels and corresponding values, and one or more intent labels of the corresponding speech recordings without a need for a transcript of the corresponding speech recording.

Type: Grant

Filed: September 15, 2020

Date of Patent: March 12, 2024

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Hong-Kwang Jeff Kuo, Zoltan Tueske, Samuel Thomas, Yinghui Huang, Brian E. D. Kingsbury, Kartik Audhkhasi
Multilingual intent recognition

Patent number: 11900922

Abstract: Embodiments of the present invention provide computer implemented methods, computer program products and computer systems. For example, embodiments of the present invention can access one or more intents and associated entities from limited amount of speech to text training data in a single language. Embodiments of the present invention can locate speech to text training data in one or more other languages using the accessed one or more intents and associated entities to locate speech to text training data in the one or more other languages different than the single language. Embodiments of the present invention can then train a neural network based on the limited amount of speech to text training data in the single language and the located speech to text training data in the one or more other languages.

Type: Grant

Filed: November 10, 2020

Date of Patent: February 13, 2024

Assignee: International Business Machines Corporation

Inventors: Samuel Thomas, Hong-Kwang Kuo, Kartik Audhkhasi, Michael Alan Picheny
KNOWLEDGE TRANSFER BETWEEN RECURRENT NEURAL NETWORKS

Publication number: 20230196107

Abstract: Knowledge transfer between recurrent neural networks is performed by obtaining a first output sequence from a bidirectional Recurrent Neural Network (RNN) model for an input sequence, obtaining a second output sequence from a unidirectional RNN model for the input sequence, selecting at least one first output from the first output sequence based on a similarity between the at least one first output and a second output from the second output sequence; and training the unidirectional RNN model to increase the similarity between the at least one first output and the second output.

Type: Application

Filed: February 14, 2023

Publication date: June 22, 2023

Inventors: Gakuto Kurata, Kartik Audhkhasi
Knowledge transfer between recurrent neural networks

Patent number: 11625595

Abstract: Knowledge transfer between recurrent neural networks is performed by obtaining a first output sequence from a bidirectional Recurrent Neural Network (RNN) model for an input sequence, obtaining a second output sequence from a unidirectional RNN model for the input sequence, selecting at least one first output from the first output sequence based on a similarity between the at least one first output and a second output from the second output sequence; and training the unidirectional RNN model to increase the similarity between the at least one first output and the second output.

Type: Grant

Filed: August 29, 2018

Date of Patent: April 11, 2023

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Gakuto Kurata, Kartik Audhkhasi
Leveraging unpaired text data for training end-to-end spoken language understanding systems

Patent number: 11587551

Abstract: An illustrative embodiment includes a method for training an end-to-end (E2E) spoken language understanding (SLU) system. The method includes receiving a training corpus comprising a set of text classified using one or more sets of semantic labels but unpaired with speech and using the set of unpaired text to train the E2E SLU system to classify speech using at least one of the one or more sets of semantic labels. The method may include training a text-to-intent model using the set of unpaired text; and training a speech-to-intent model using the text-to-intent model. Alternatively or additionally, the method may include using a text-to-speech (TTS) system to generate synthetic speech from the unpaired text; and training the E2E SLU system using the synthetic speech.

Type: Grant

Filed: April 7, 2020

Date of Patent: February 21, 2023

Assignee: International Business Machines Corporation

Inventors: Hong-Kwang Jeff Kuo, Yinghui Huang, Samuel Thomas, Kartik Audhkhasi, Michael Alan Picheny
Transliteration based data augmentation for training multilingual ASR acoustic models in low resource settings

Patent number: 11568858

Abstract: A computer-implemented method of building a multilingual acoustic model for automatic speech recognition in a low resource setting includes training a multilingual network on a set of training languages with an original transcribed training data to create a baseline multilingual acoustic model. Transliteration of transcribed training data is performed by processing through the multilingual network a plurality of multilingual data types from the set of languages, and outputting a pool of transliterated data. A filtering metric is applied to the pool of transliterated data output to select one or more portions of the transliterated data for retraining of the acoustic model. Data augmentation is performed by adding one or more selected portions of the output transliterated data back to the original transcribed training data to update training data. The training of a new multilingual acoustic model through the multilingual network is performed using the updated training data.

Type: Grant

Filed: October 17, 2020

Date of Patent: January 31, 2023

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Samuel Thomas, Kartik Audhkhasi, Brian E. D. Kingsbury
Noise speed-ups in hidden markov models with applications to speech recognition

Patent number: 11495213

Abstract: A learning computer system may estimate unknown parameters and states of a stochastic or uncertain system having a probability structure. The system may include a data processing system that may include a hardware processor that has a configuration that: receives data; generates random, chaotic, fuzzy, or other numerical perturbations of the data, one or more of the states, or the probability structure; estimates observed and hidden states of the stochastic or uncertain system using the data, the generated perturbations, previous states of the stochastic or uncertain system, or estimated states of the stochastic or uncertain system; and causes perturbations or independent noise to be injected into the data, the states, or the stochastic or uncertain system so as to speed up training or learning of the probability structure and of the system parameters or the states.

Type: Grant

Filed: July 17, 2015

Date of Patent: November 8, 2022

Assignee: University of Southern California

Inventors: Kartik Audhkhasi, Osonde Osoba, Bart Kosko
Regularizing Word Segmentation

Publication number: 20220310061

Abstract: A method for subword segmentation includes receiving an input word to be segmented into a plurality of subword units. The method also includes executing a subword segmentation routine to segment the input word into a plurality of subword units by accessing a trained vocabulary set of subword units and selecting the plurality of subword units from the input word by greedily finding a longest subword unit from the input word that is present in the trained vocabulary set until an end of the input word is reached.

Type: Application

Filed: March 23, 2022

Publication date: September 29, 2022

Applicant: Google LLC

Inventors: Bhuvana Ramabhadran, Hainan Xu, Kartik Audhkhasi, Yinghui Huang
Mixture Model Attention for Flexible Streaming and Non-Streaming Automatic Speech Recognition

Publication number: 20220310073

Abstract: A method for an automated speech recognition (ASR) model for unifying streaming and non-streaming speech recognition including receiving a sequence of acoustic frames. The method includes generating, using an audio encoder of an automatic speech recognition (ASR) model, a higher order feature representation for a corresponding acoustic frame in the sequence of acoustic frames. The method further includes generating, using a joint encoder of the ASR model, a probability distribution over possible speech recognition hypothesis at the corresponding time step based on the higher order feature representation generated by the audio encoder at the corresponding time step. The audio encoder comprises a neural network that applies mixture model (MiMo) attention to compute an attention probability distribution function (PDF) using a set of mixture components of softmaxes over a context window.

Type: Application

Filed: December 15, 2021

Publication date: September 29, 2022

Applicant: Google LLC

Inventors: Kartik Audhkhasi, Bhuvana Ramabhadran, Tongzhou Chen, Pedro J. Moreno Mengibar
Mixture Model Attention for Flexible Streaming and Non-Streaming Automatic Speech Recognition

Publication number: 20220310074

Abstract: A method for an automated speech recognition (ASR) model for unifying streaming and non-streaming speech recognition including receiving a sequence of acoustic frames. The method includes generating, using an audio encoder of an automatic speech recognition (ASR) model, a higher order feature representation for a corresponding acoustic frame in the sequence of acoustic frames. The method further includes generating, using a joint encoder of the ASR model, a probability distribution over possible speech recognition hypothesis at the corresponding time step based on the higher order feature representation generated by the audio encoder at the corresponding time step. The audio encoder comprises a neural network that applies mixture model (MiMo) attention to compute an attention probability distribution function (PDF) using a set of mixture components of softmaxes over a context window.

Type: Application

Filed: December 15, 2021

Publication date: September 29, 2022

Applicant: Google LLC

Inventors: Kartik Audhkhasi, Bhuvana Ramabhadran, Tongzhou Chen, Pedro J. Moreno Mengibar
Feature and feature variant reconstruction for recurrent model accuracy improvement in speech recognition

Patent number: 11404047

Abstract: A multi-task learning system is provided for speech recognition. The system includes a common encoder network. The system further includes a primary network for minimizing a Connectionist Temporal Classification (CTC) loss for speech recognition. The system also includes a sub network for minimizing a Mean squared error (MSE) loss for feature reconstruction. A first set of output data of the common encoder network is received by both of the primary network and the sub network. A second set of the output data of the common encode network is received only by the primary network from among the primary network and the sub network.

Type: Grant

Filed: March 8, 2019

Date of Patent: August 2, 2022

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Gakuto Kurata, Kartik Audhkhasi
MULTILINGUAL INTENT RECOGNITION

Publication number: 20220148581

Abstract: Embodiments of the present invention provide computer implemented methods, computer program products and computer systems. For example, embodiments of the present invention can access one or more intents and associated entities from limited amount of speech to text training data in a single language. Embodiments of the present invention can locate speech to text training data in one or more other languages using the accessed one or more intents and associated entities to locate speech to text training data in the one or more other languages different than the single language. Embodiments of the present invention can then train a neural network based on the limited amount of speech to text training data in the single language and the located speech to text training data in the one or more other languages.

Type: Application

Filed: November 10, 2020

Publication date: May 12, 2022

Inventors: Samuel Thomas, Hong-Kwang Kuo, Kartik Audhkhasi, Michael Alan Picheny
TRANSLITERATION BASED DATA AUGMENTATION FOR TRAINING MULTILINGUAL ASR ACOUSTIC MODELS IN LOW RESOURCE SETTINGS

Publication number: 20220122585

Abstract: A computer-implemented method of building a multilingual acoustic model for automatic speech recognition in a low resource setting includes training a multilingual network on a set of training languages with an original transcribed training data to create a baseline multilingual acoustic model. Transliteration of transcribed training data is performed by processing through the multilingual network a plurality of multilingual data types from the set of languages, and outputting a pool of transliterated data. A filtering metric is applied to the pool of transliterated data output to select one or more portions of the transliterated data for retraining of the acoustic model. Data augmentation is performed by adding one or more selected portions of the output transliterated data back to the original transcribed training data to update training data. The training of a new multilingual acoustic model through the multilingual network is performed using the updated training data.

Type: Application

Filed: October 17, 2020

Publication date: April 21, 2022

Inventors: Samuel Thomas, Kartik Audhkhasi, Brian E. D. Kingsbury
Aligning spike timing of models for maching learning

Patent number: 11302309

Abstract: A technique for aligning spike timing of models is disclosed. A first model having a first architecture trained with a set of training samples is generated. Each training sample includes an input sequence of observations and an output sequence of symbols having different length from the input sequence. Then, one or more second models are trained with the trained first model by minimizing a guide loss jointly with a normal loss for each second model and a sequence recognition task is performed using the one or more second models. The guide loss evaluates dissimilarity in spike timing between the trained first model and each second model being trained.

Type: Grant

Filed: September 13, 2019

Date of Patent: April 12, 2022

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Gakuto Kurata, Kartik Audhkhasi
End-to-End Spoken Language Understanding Without Full Transcripts

Publication number: 20220084508

Abstract: A method and system of training a spoken language understanding (SLU) model includes receiving natural language training data comprising (i) one or more speech recording, and (ii) a set of semantic entities and/or intents for each corresponding speech recording. For each speech recording, one or more entity labels and corresponding values, and one or more intent labels are extracted from the corresponding semantic entities and/or overall intent. A spoken language understanding (SLU) model is trained based upon the one or more entity labels and corresponding values, and one or more intent labels of the corresponding speech recordings without a need for a transcript of the corresponding speech recording.

Type: Application

Filed: September 15, 2020

Publication date: March 17, 2022

Inventors: Hong-Kwang Jeff Kuo, Zoltan Tueske, Samuel Thomas, Yinghui Huang, Brian E. D. Kingsbury, Kartik Audhkhasi
Noise-enhanced convolutional neural networks

Patent number: 11256982

Abstract: A learning computer system may include a data processing system and a hardware processor and may estimate parameters and states of a stochastic or uncertain system. The system may receive data from a user or other source. Parameters and states of the stochastic or uncertain system are estimated using the received data, numerical perturbations, and previous parameters and states of the stochastic or uncertain system. It is determined whether the generated numerical perturbations satisfy a condition. If the numerical perturbations satisfy the condition, the numerical perturbations are injected into the estimated parameters or states, the received data, the processed data, the masked or filtered data, or the processing units.

Type: Grant

Filed: July 20, 2015

Date of Patent: February 22, 2022

Assignee: University of Southern California

Inventors: Kartik Audhkhasi, Bart Kosko, Osonde Osoba
Detecting and recovering out-of-vocabulary words in voice-to-text transcription systems

Patent number: 11183194

Abstract: Aspects of the present disclosure describe techniques for identifying and recovering out-of-vocabulary words in transcripts of a voice data recording using word recognition models and word sub-unit recognition models. An example method generally includes receiving a voice data recording for transcription into a textual representation of the voice data recording. The voice data recording is transcribed into the textual representation using a word recognition model. An unknown word is identified in the textual representation, and the unknown word is reconstructed based on recognition of sub-units of the unknown word generated by a sub-unit recognition model. The textual representation of the voice data recording is modified by replacing the unknown word with the reconstruction of the unknown word, and the modified textual representation is output.

Type: Grant

Filed: September 13, 2019

Date of Patent: November 23, 2021

Assignee: International Business Machines Corporation

Inventors: Samuel Thomas, Kartik Audhkhasi, Zoltan Tueske, Yinghui Huang, Michael Alan Picheny
Soft-forgetting for connectionist temporal classification based automatic speech recognition

Patent number: 11158303

Abstract: In an approach to soft-forgetting training, one or more computer processors train a first model utilizing one or more training batches wherein each training batch of the one or more training batches comprises one or more blocks of information. The one or more computer processors, responsive to a completion of the training of the first model, initiate a training of a second model utilizing the one or more training batches. The one or more computer processors jitter a random block size for each block of information for each of the one or more training batches for the second model. The one or more computer processors unroll the second model over one or more non-overlapping contiguous jittered blocks of information. The one or more computer processors, responsive to the unrolling of the second model, reduce overfitting for the second model by applying twin regularization.

Type: Grant

Filed: August 27, 2019

Date of Patent: October 26, 2021

Assignee: International Business Machines Corporation

Inventors: Kartik Audhkhasi, George Andrei Saon, Zoltan Tueske, Brian E. D. Kingsbury, Michael Alan Picheny
TRAINING OF MODEL FOR PROCESSING SEQUENCE DATA

Publication number: 20210312294

Abstract: A technique for training a model is disclosed. A training sample including an input sequence of observations and a target sequence of symbols having length different from the input sequence of observations is obtained. The input sequence of observations is fed into the model to obtain a sequence of predictions. The sequence of predictions is shifted by an amount with respect to the input sequence of observations. The model is updated based on a loss using a shifted sequence of predictions and the target sequence of the symbols.

Type: Application

Filed: April 3, 2020

Publication date: October 7, 2021

Inventors: Gakuto Kurata, Kartik Audhkhasi
LEVERAGING UNPAIRED TEXT DATA FOR TRAINING END-TO-END SPOKEN LANGUAGE UNDERSTANDING SYSTEMS

Publication number: 20210312906

Abstract: An illustrative embodiment includes a method for training an end-to-end (E2E) spoken language understanding (SLU) system. The method includes receiving a training corpus comprising a set of text classified using one or more sets of semantic labels but unpaired with speech and using the set of unpaired text to train the E2E SLU system to classify speech using at least one of the one or more sets of semantic labels. The method may include training a text-to-intent model using the set of unpaired text; and training a speech-to-intent model using the text-to-intent model. Alternatively or additionally, the method may include using a text-to-speech (TTS) system to generate synthetic speech from the unpaired text; and training the E2E SLU system using the synthetic speech.

Type: Application

Filed: April 7, 2020

Publication date: October 7, 2021

Inventors: Hong-Kwang Jeff Kuo, Yinghui Huang, Samuel Thomas, Kartik Audhkhasi, Michael Alan Picheny

1 2 next