Patents by Inventor Kartik Audhkhasi
Kartik Audhkhasi has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20210082437Abstract: Aspects of the present disclosure describe techniques for identifying and recovering out-of-vocabulary words in transcripts of a voice data recording using word recognition models and word sub-unit recognition models. An example method generally includes receiving a voice data recording for transcription into a textual representation of the voice data recording. The voice data recording is transcribed into the textual representation using a word recognition model. An unknown word is identified in the textual representation, and the unknown word is reconstructed based on recognition of sub-units of the unknown word generated by a sub-unit recognition model. The textual representation of the voice data recording is modified by replacing the unknown word with the reconstruction of the unknown word, and the modified textual representation is output.Type: ApplicationFiled: September 13, 2019Publication date: March 18, 2021Inventors: SAMUEL THOMAS, KARTIK AUDHKHASI, ZOLTAN TUESKE, YINGHUI HUANG, MICHAEL ALAN PICHENY
-
Publication number: 20210082399Abstract: A technique for aligning spike timing of models is disclosed. A first model having a first architecture trained with a set of training samples is generated. Each training sample includes an input sequence of observations and an output sequence of symbols having different length from the input sequence. Then, one or more second models are trained with the trained first model by minimizing a guide loss jointly with a normal loss for each second model and a sequence recognition task is performed using the one or more second models. The guide loss evaluates dissimilarity in spike timing between the trained first model and each second model being trained.Type: ApplicationFiled: September 13, 2019Publication date: March 18, 2021Inventors: Gakuto Kurata, Kartik Audhkhasi
-
Publication number: 20210065680Abstract: In an approach to soft-forgetting training, one or more computer processors train a first model utilizing one or more training batches wherein each training batch of the one or more training batches comprises one or more blocks of information. The one or more computer processors, responsive to a completion of the training of the first model, initiate a training of a second model utilizing the one or more training batches. The one or more computer processors jitter a random block size for each block of information for each of the one or more training batches for the second model. The one or more computer processors unroll the second model over one or more non-overlapping contiguous jittered blocks of information. The one or more computer processors, responsive to the unrolling of the second model, reduce overfitting for the second model by applying twin regularization.Type: ApplicationFiled: August 27, 2019Publication date: March 4, 2021Inventors: Kartik Audhkhasi, George Andrei Saon, Zoltan Tueske, Brian E. D. Kingsbury, Michael Alan Picheny
-
Patent number: 10839792Abstract: A method (and structure and computer product) for learning Out-of-Vocabulary (OOV) words in an Automatic Speech Recognition (ASR) system includes using an Acoustic Word Embedding Recurrent Neural Network (AWE RNN) to receive a character sequence for a new OOV word for the ASR system, the RNN providing an Acoustic Word Embedding (AWE) vector as an output thereof. The AWE vector output from the AWE RNN is provided as an input into an Acoustic Word Embedding-to-Acoustic-to-Word Neural Network (AWE?A2W NN) trained to provide an OOV word weight value from the AWE vector. The OOV word weight is inserted into a listing of Acoustic-to-Word (A2W) word embeddings used by the ASR system to output recognized words from an input of speech acoustic features, wherein the OOV word weight is inserted into the A2W word embeddings list relative to existing weights in the A2W word embeddings list.Type: GrantFiled: February 5, 2019Date of Patent: November 17, 2020Assignees: INTERNATIONAL BUSINESS MACHINES CORPORATION, TOYOTA TECHNOLOGICAL INSTITUTE AT CHICAGOInventors: Kartik Audhkhasi, Karen Livescu, Michael Picheny, Shane Settle
-
Publication number: 20200286464Abstract: A multi-task learning system is provided for speech recognition. The system includes a common encoder network. The system further includes a primary network for minimizing a Connectionist Temporal Classification (CTC) loss for speech recognition. The system also includes a sub network for minimizing a Mean squared error (MSE) loss for feature reconstruction. A first set of output data of the common encoder network is received by both of the primary network and the sub network. A second set of the output data of the common encode network is received only by the primary network from among the primary network and the sub network.Type: ApplicationFiled: March 8, 2019Publication date: September 10, 2020Inventors: Gakuto Kurata, Kartik Audhkhasi
-
Publication number: 20200251096Abstract: A method (and structure and computer product) for learning Out-of-Vocabulary (OOV) words in an Automatic Speech Recognition (ASR) system includes using an Acoustic Word Embedding Recurrent Neural Network (AWE RNN) to receive a character sequence for a new OOV word for the ASR system, the RNN providing an Acoustic Word Embedding (AWE) vector as an output thereof. The AWE vector output from the AWE RNN is provided as an input into an Acoustic Word Embedding-to-Acoustic-to-Word Neural Network (AWE?A2W NN) trained to provide an OOV word weight value from the AWE vector. The OOV word weight is inserted into a listing of Acoustic-to-Word (A2W) word embeddings used by the ASR system to output recognized words from an input of speech acoustic features, wherein the OOV word weight is inserted into the A2W word embeddings list relative to existing weights in the A2W word embeddings list.Type: ApplicationFiled: February 5, 2019Publication date: August 6, 2020Inventors: Kartik AUDHKHASI, Karen Livescu, Michael Picheny, Shane Settle
-
Patent number: 10692488Abstract: A computer selects a test set of sentences from among sentences applied to train a whole sentence recurrent neural network language model to estimate the probability of likelihood of each whole sentence processed by natural language processing being correct. The computer generates imposter sentences from among the test set of sentences by substituting one word in each sentence of the test set of sentences. The computer generates, through the whole sentence recurrent neural network language model, a first score for each sentence of the test set of sentences and at least one additional score for each of the imposter sentences. The computer evaluates an accuracy of the natural language processing system in performing sequential classification tasks based on an accuracy value of the first score in reflecting a correct sentence and the at least one additional score in reflecting an incorrect sentence.Type: GrantFiled: August 23, 2019Date of Patent: June 23, 2020Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Yinghui Huang, Abhinav Sethy, Kartik Audhkhasi, Bhuvana Ramabhadran
-
Publication number: 20200074292Abstract: Knowledge transfer between recurrent neural networks is performed by obtaining a first output sequence from a bidirectional Recurrent Neural Network (RNN) model for an input sequence, obtaining a second output sequence from a unidirectional RNN model for the input sequence, selecting at least one first output from the first output sequence based on a similarity between the at least one first output and a second output from the second output sequence; and training the unidirectional RNN model to increase the similarity between the at least one first output and the second output.Type: ApplicationFiled: August 29, 2018Publication date: March 5, 2020Inventors: Gakuto Kurata, Kartik Audhkhasi
-
Publication number: 20200013393Abstract: A computer selects a test set of sentences from among sentences applied to train a whole sentence recurrent neural network language model to estimate the probability of likelihood of each whole sentence processed by natural language processing being correct. The computer generates imposter sentences from among the test set of sentences by substituting one word in each sentence of the test set of sentences. The computer generates, through the whole sentence recurrent neural network language model, a first score for each sentence of the test set of sentences and at least one additional score for each of the imposter sentences. The computer evaluates an accuracy of the natural language processing system in performing sequential classification tasks based on an accuracy value of the first score in reflecting a correct sentence and the at least one additional score in reflecting an incorrect sentence.Type: ApplicationFiled: August 23, 2019Publication date: January 9, 2020Inventors: YINGHUI HUANG, Abhinav Sethy, Kartik Audhkhasi, Bhuvana Ramabhadran
-
Publication number: 20190318732Abstract: A whole sentence recurrent neural network (RNN) language model (LM) is provided for for estimating a probability of likelihood of each whole sentence processed by natural language processing being correct. A noise contrastive estimation sampler is applied against at least one entire sentence from a corpus of multiple sentences to generate at least one incorrect sentence. The whole sentence RNN LN is trained, using the at least one entire sentence from the corpus and the at least one incorrect sentence, to distinguish the at least one entire sentence as correct. The whole sentence recurrent neural network language model is applied to estimate the probability of likelihood of each whole sentence processed by natural language processing being correct.Type: ApplicationFiled: April 16, 2018Publication date: October 17, 2019Inventors: Yinghui Huang, Abhinav Sethy, Kartik Audhkhasi, Bhuvana Ramabhadran
-
Patent number: 10431210Abstract: A whole sentence recurrent neural network (RNN) language model (LM) is provided for for estimating a probability of likelihood of each whole sentence processed by natural language processing being correct. A noise contrastive estimation sampler is applied against at least one entire sentence from a corpus of multiple sentences to generate at least one incorrect sentence. The whole sentence RNN LN is trained, using the at least one entire sentence from the corpus and the at least one incorrect sentence, to distinguish the at least one entire sentence as correct. The whole sentence recurrent neural network language model is applied to estimate the probability of likelihood of each whole sentence processed by natural language processing being correct.Type: GrantFiled: April 16, 2018Date of Patent: October 1, 2019Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Yinghui Huang, Abhinav Sethy, Kartik Audhkhasi, Bhuvana Ramabhadran
-
Patent number: 10019438Abstract: A mechanism is provided in a data processing system for external word embedding neural network language models. The mechanism configures the data processing system with an external word embedding neural network language model that accepts as input a sequence of words and predicts a current word based on the sequence of words. The external word embedding neural network language model combines an external embedding matrix to a history word embedding matrix and a prediction word embedding matrix of the external word embedding neural network language model. The mechanism receives a sequence of input words by the data processing system. The mechanism applies a plurality of previous words in the sequence of input words as inputs to the external word embedding neural network language model. The external word embedding neural network language model generates a predicted current word based on the plurality of previous words.Type: GrantFiled: March 18, 2016Date of Patent: July 10, 2018Assignee: International Business Machines CorporationInventors: Kartik Audhkhasi, Bhuvana Ramabhadran, Abhinav Sethy
-
Publication number: 20170270100Abstract: A mechanism is provided in a data processing system for external word embedding neural network language models. The mechanism configures the data processing system with an external word embedding neural network language model that accepts as input a sequence of words and predicts a current word based on the sequence of words. The external word embedding neural network language model combines an external embedding matrix to a history word embedding matrix and a prediction word embedding matrix of the external word embedding neural network language model. The mechanism receives a sequence of input words by the data processing system. The mechanism applies a plurality of previous words in the sequence of input words as inputs to the external word embedding neural network language model. The external word embedding neural network language model generates a predicted current word based on the plurality of previous words.Type: ApplicationFiled: March 18, 2016Publication date: September 21, 2017Inventors: Kartik Audhkhasi, Bhuvana Ramabhadran, Abhinav Sethy
-
Publication number: 20160034814Abstract: A learning computer system may update parameters and states of an uncertain system.Type: ApplicationFiled: August 3, 2015Publication date: February 4, 2016Applicant: UNIVERSITY OF SOUTHERN CALIFORNIAInventors: Kartik Audhkhasi, Osonde Osoba, Bart Kosko
-
Publication number: 20160019459Abstract: A learning computer system may include a data processing system and a hardware processor and may estimate parameters and states of a stochastic or uncertain system.Type: ApplicationFiled: July 20, 2015Publication date: January 21, 2016Applicant: UNIVERSITY OF SOUTHERN CALIFORNIAInventors: Kartik Audhkhasi, Bart Kosko, Osonde Osoba
-
Publication number: 20160005399Abstract: A learning computer system may estimate unknown parameters and states of a stochastic or uncertain system having a probability structure. The system may include a data processing system that may include a hardware processor that has a configuration that: receives data; generates random, chaotic, fuzzy, or other numerical perturbations of the data, one or more of the states, or the probability structure; estimates observed and hidden states of the stochastic or uncertain system using the data, the generated perturbations, previous states of the stochastic or uncertain system, or estimated states of the stochastic or uncertain system; and causes perturbations or independent noise to be injected into the data, the states, or the stochastic or uncertain system so as to speed up training or learning of the probability structure and of the system parameters or the states.Type: ApplicationFiled: July 17, 2015Publication date: January 7, 2016Applicant: UNIVERSITY OF SOUTHERN CALIFORNIAInventors: Kartik Audhkhasi, Osonde Osoba, Bart Kosko
-
Patent number: 8457967Abstract: A procedure to automatically evaluate the spoken fluency of a speaker by prompting the speaker to talk on a given topic, recording the speaker's speech to get a recorded sample of speech, and then analyzing the patterns of disfluencies in the speech to compute a numerical score to quantify the spoken fluency skills of the speakers. The numerical fluency score accounts for various prosodic and lexical features, including formant-based filled-pause detection, closely-occurring exact and inexact repeat N-grams, normalized average distance between consecutive occurrences of N-grams. The lexical features and prosodic features are combined to classify the speaker with a C-class classification and develop a rating for the speaker.Type: GrantFiled: August 15, 2009Date of Patent: June 4, 2013Assignee: Nuance Communications, Inc.Inventors: Kartik Audhkhasi, Om D. Deshmukh, Kundan Kandhway, Ashish Verma
-
Publication number: 20110040554Abstract: A procedure to automatically evaluate the spoken fluency of a speaker by prompting the speaker to talk on a given topic, recording the speaker's speech to get a recorded sample of speech, and then analyzing the patterns of disfluencies in the speech to compute a numerical score to quantify the spoken fluency skills of the speakers. The numerical fluency score accounts for various prosodic and lexical features, including formant-based filled-pause detection, closely-occurring exact and inexact repeat N-grams, normalized average distance between consecutive occurrences of N-grams. The lexical features and prosodic features are combined to classify the speaker with a C-class classification and develop a rating for the speaker.Type: ApplicationFiled: August 15, 2009Publication date: February 17, 2011Applicant: International Business Machines CorporationInventors: Kartik Audhkhasi, Om D. Deshmukh, Kundan Kandhway, Ashish Verma