Patents by Inventor Kartik Audhkhasi

Kartik Audhkhasi has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

DETECTING AND RECOVERING OUT-OF-VOCABULARY WORDS IN VOICE-TO-TEXT TRANSCRIPTION SYSTEMS

Publication number: 20210082437

Abstract: Aspects of the present disclosure describe techniques for identifying and recovering out-of-vocabulary words in transcripts of a voice data recording using word recognition models and word sub-unit recognition models. An example method generally includes receiving a voice data recording for transcription into a textual representation of the voice data recording. The voice data recording is transcribed into the textual representation using a word recognition model. An unknown word is identified in the textual representation, and the unknown word is reconstructed based on recognition of sub-units of the unknown word generated by a sub-unit recognition model. The textual representation of the voice data recording is modified by replacing the unknown word with the reconstruction of the unknown word, and the modified textual representation is output.

Type: Application

Filed: September 13, 2019

Publication date: March 18, 2021

Inventors: SAMUEL THOMAS, KARTIK AUDHKHASI, ZOLTAN TUESKE, YINGHUI HUANG, MICHAEL ALAN PICHENY
ALIGNING SPIKE TIMING OF MODELS

Publication number: 20210082399

Abstract: A technique for aligning spike timing of models is disclosed. A first model having a first architecture trained with a set of training samples is generated. Each training sample includes an input sequence of observations and an output sequence of symbols having different length from the input sequence. Then, one or more second models are trained with the trained first model by minimizing a guide loss jointly with a normal loss for each second model and a sequence recognition task is performed using the one or more second models. The guide loss evaluates dissimilarity in spike timing between the trained first model and each second model being trained.

Type: Application

Filed: September 13, 2019

Publication date: March 18, 2021

Inventors: Gakuto Kurata, Kartik Audhkhasi
SOFT-FORGETTING FOR CONNECTIONIST TEMPORAL CLASSIFICATION BASED AUTOMATIC SPEECH RECOGNITION

Publication number: 20210065680

Abstract: In an approach to soft-forgetting training, one or more computer processors train a first model utilizing one or more training batches wherein each training batch of the one or more training batches comprises one or more blocks of information. The one or more computer processors, responsive to a completion of the training of the first model, initiate a training of a second model utilizing the one or more training batches. The one or more computer processors jitter a random block size for each block of information for each of the one or more training batches for the second model. The one or more computer processors unroll the second model over one or more non-overlapping contiguous jittered blocks of information. The one or more computer processors, responsive to the unrolling of the second model, reduce overfitting for the second model by applying twin regularization.

Type: Application

Filed: August 27, 2019

Publication date: March 4, 2021

Inventors: Kartik Audhkhasi, George Andrei Saon, Zoltan Tueske, Brian E. D. Kingsbury, Michael Alan Picheny
Recognition of out-of-vocabulary in direct acoustics-to-word speech recognition using acoustic word embedding

Patent number: 10839792

Abstract: A method (and structure and computer product) for learning Out-of-Vocabulary (OOV) words in an Automatic Speech Recognition (ASR) system includes using an Acoustic Word Embedding Recurrent Neural Network (AWE RNN) to receive a character sequence for a new OOV word for the ASR system, the RNN providing an Acoustic Word Embedding (AWE) vector as an output thereof. The AWE vector output from the AWE RNN is provided as an input into an Acoustic Word Embedding-to-Acoustic-to-Word Neural Network (AWE?A2W NN) trained to provide an OOV word weight value from the AWE vector. The OOV word weight is inserted into a listing of Acoustic-to-Word (A2W) word embeddings used by the ASR system to output recognized words from an input of speech acoustic features, wherein the OOV word weight is inserted into the A2W word embeddings list relative to existing weights in the A2W word embeddings list.

Type: Grant

Filed: February 5, 2019

Date of Patent: November 17, 2020

Assignees: INTERNATIONAL BUSINESS MACHINES CORPORATION, TOYOTA TECHNOLOGICAL INSTITUTE AT CHICAGO

Inventors: Kartik Audhkhasi, Karen Livescu, Michael Picheny, Shane Settle
FEATURE AND FEATURE VARIANT RECONSTRUCTION FOR RECURRENT MODEL ACCURACY IMPROVEMENT IN SPEECH RECOGNITION

Publication number: 20200286464

Abstract: A multi-task learning system is provided for speech recognition. The system includes a common encoder network. The system further includes a primary network for minimizing a Connectionist Temporal Classification (CTC) loss for speech recognition. The system also includes a sub network for minimizing a Mean squared error (MSE) loss for feature reconstruction. A first set of output data of the common encoder network is received by both of the primary network and the sub network. A second set of the output data of the common encode network is received only by the primary network from among the primary network and the sub network.

Type: Application

Filed: March 8, 2019

Publication date: September 10, 2020

Inventors: Gakuto Kurata, Kartik Audhkhasi
RECOGNITION OF OUT-OF-VOCABULARY IN DIRECT ACOUSTICS- TO-WORD SPEECH RECOGNITION USING ACOUSTIC WORD EMBEDDING

Publication number: 20200251096

Abstract: A method (and structure and computer product) for learning Out-of-Vocabulary (OOV) words in an Automatic Speech Recognition (ASR) system includes using an Acoustic Word Embedding Recurrent Neural Network (AWE RNN) to receive a character sequence for a new OOV word for the ASR system, the RNN providing an Acoustic Word Embedding (AWE) vector as an output thereof. The AWE vector output from the AWE RNN is provided as an input into an Acoustic Word Embedding-to-Acoustic-to-Word Neural Network (AWE?A2W NN) trained to provide an OOV word weight value from the AWE vector. The OOV word weight is inserted into a listing of Acoustic-to-Word (A2W) word embeddings used by the ASR system to output recognized words from an input of speech acoustic features, wherein the OOV word weight is inserted into the A2W word embeddings list relative to existing weights in the A2W word embeddings list.

Type: Application

Filed: February 5, 2019

Publication date: August 6, 2020

Inventors: Kartik AUDHKHASI, Karen Livescu, Michael Picheny, Shane Settle
Implementing a whole sentence recurrent neural network language model for natural language processing

Patent number: 10692488

Abstract: A computer selects a test set of sentences from among sentences applied to train a whole sentence recurrent neural network language model to estimate the probability of likelihood of each whole sentence processed by natural language processing being correct. The computer generates imposter sentences from among the test set of sentences by substituting one word in each sentence of the test set of sentences. The computer generates, through the whole sentence recurrent neural network language model, a first score for each sentence of the test set of sentences and at least one additional score for each of the imposter sentences. The computer evaluates an accuracy of the natural language processing system in performing sequential classification tasks based on an accuracy value of the first score in reflecting a correct sentence and the at least one additional score in reflecting an incorrect sentence.

Type: Grant

Filed: August 23, 2019

Date of Patent: June 23, 2020

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Yinghui Huang, Abhinav Sethy, Kartik Audhkhasi, Bhuvana Ramabhadran
KNOWLEDGE TRANSFER BETWEEN RECURRENT NEURAL NETWORKS

Publication number: 20200074292

Abstract: Knowledge transfer between recurrent neural networks is performed by obtaining a first output sequence from a bidirectional Recurrent Neural Network (RNN) model for an input sequence, obtaining a second output sequence from a unidirectional RNN model for the input sequence, selecting at least one first output from the first output sequence based on a similarity between the at least one first output and a second output from the second output sequence; and training the unidirectional RNN model to increase the similarity between the at least one first output and the second output.

Type: Application

Filed: August 29, 2018

Publication date: March 5, 2020

Inventors: Gakuto Kurata, Kartik Audhkhasi
IMPLEMENTING A WHOLE SENTENCE RECURRENT NEURAL NETWORK LANGUAGE MODEL FOR NATURAL LANGUAGE PROCESSING

Publication number: 20200013393

Abstract: A computer selects a test set of sentences from among sentences applied to train a whole sentence recurrent neural network language model to estimate the probability of likelihood of each whole sentence processed by natural language processing being correct. The computer generates imposter sentences from among the test set of sentences by substituting one word in each sentence of the test set of sentences. The computer generates, through the whole sentence recurrent neural network language model, a first score for each sentence of the test set of sentences and at least one additional score for each of the imposter sentences. The computer evaluates an accuracy of the natural language processing system in performing sequential classification tasks based on an accuracy value of the first score in reflecting a correct sentence and the at least one additional score in reflecting an incorrect sentence.

Type: Application

Filed: August 23, 2019

Publication date: January 9, 2020

Inventors: YINGHUI HUANG, Abhinav Sethy, Kartik Audhkhasi, Bhuvana Ramabhadran
IMPLEMENTING A WHOLE SENTENCE RECURRENT NEURAL NETWORK LANGUAGE MODEL FOR NATURAL LANGUAGE PROCESSING

Publication number: 20190318732

Abstract: A whole sentence recurrent neural network (RNN) language model (LM) is provided for for estimating a probability of likelihood of each whole sentence processed by natural language processing being correct. A noise contrastive estimation sampler is applied against at least one entire sentence from a corpus of multiple sentences to generate at least one incorrect sentence. The whole sentence RNN LN is trained, using the at least one entire sentence from the corpus and the at least one incorrect sentence, to distinguish the at least one entire sentence as correct. The whole sentence recurrent neural network language model is applied to estimate the probability of likelihood of each whole sentence processed by natural language processing being correct.

Type: Application

Filed: April 16, 2018

Publication date: October 17, 2019

Inventors: Yinghui Huang, Abhinav Sethy, Kartik Audhkhasi, Bhuvana Ramabhadran
Implementing a whole sentence recurrent neural network language model for natural language processing

Patent number: 10431210

Abstract: A whole sentence recurrent neural network (RNN) language model (LM) is provided for for estimating a probability of likelihood of each whole sentence processed by natural language processing being correct. A noise contrastive estimation sampler is applied against at least one entire sentence from a corpus of multiple sentences to generate at least one incorrect sentence. The whole sentence RNN LN is trained, using the at least one entire sentence from the corpus and the at least one incorrect sentence, to distinguish the at least one entire sentence as correct. The whole sentence recurrent neural network language model is applied to estimate the probability of likelihood of each whole sentence processed by natural language processing being correct.

Type: Grant

Filed: April 16, 2018

Date of Patent: October 1, 2019

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Yinghui Huang, Abhinav Sethy, Kartik Audhkhasi, Bhuvana Ramabhadran
External word embedding neural network language models

Patent number: 10019438

Abstract: A mechanism is provided in a data processing system for external word embedding neural network language models. The mechanism configures the data processing system with an external word embedding neural network language model that accepts as input a sequence of words and predicts a current word based on the sequence of words. The external word embedding neural network language model combines an external embedding matrix to a history word embedding matrix and a prediction word embedding matrix of the external word embedding neural network language model. The mechanism receives a sequence of input words by the data processing system. The mechanism applies a plurality of previous words in the sequence of input words as inputs to the external word embedding neural network language model. The external word embedding neural network language model generates a predicted current word based on the plurality of previous words.

Type: Grant

Filed: March 18, 2016

Date of Patent: July 10, 2018

Assignee: International Business Machines Corporation

Inventors: Kartik Audhkhasi, Bhuvana Ramabhadran, Abhinav Sethy
External Word Embedding Neural Network Language Models

Publication number: 20170270100

Abstract: A mechanism is provided in a data processing system for external word embedding neural network language models. The mechanism configures the data processing system with an external word embedding neural network language model that accepts as input a sequence of words and predicts a current word based on the sequence of words. The external word embedding neural network language model combines an external embedding matrix to a history word embedding matrix and a prediction word embedding matrix of the external word embedding neural network language model. The mechanism receives a sequence of input words by the data processing system. The mechanism applies a plurality of previous words in the sequence of input words as inputs to the external word embedding neural network language model. The external word embedding neural network language model generates a predicted current word based on the plurality of previous words.

Type: Application

Filed: March 18, 2016

Publication date: September 21, 2017

Inventors: Kartik Audhkhasi, Bhuvana Ramabhadran, Abhinav Sethy
NOISE-BOOSTED BACK PROPAGATION AND DEEP LEARNING NEURAL NETWORKS

Publication number: 20160034814

Abstract: A learning computer system may update parameters and states of an uncertain system.

Type: Application

Filed: August 3, 2015

Publication date: February 4, 2016

Applicant: UNIVERSITY OF SOUTHERN CALIFORNIA

Inventors: Kartik Audhkhasi, Osonde Osoba, Bart Kosko
NOISE-ENHANCED CONVOLUTIONAL NEURAL NETWORKS

Publication number: 20160019459

Abstract: A learning computer system may include a data processing system and a hardware processor and may estimate parameters and states of a stochastic or uncertain system.

Type: Application

Filed: July 20, 2015

Publication date: January 21, 2016

Applicant: UNIVERSITY OF SOUTHERN CALIFORNIA

Inventors: Kartik Audhkhasi, Bart Kosko, Osonde Osoba
NOISE SPEED-UPS IN HIDDEN MARKOV MODELS WITH APPLICATIONS TO SPEECH RECOGNITION

Publication number: 20160005399

Abstract: A learning computer system may estimate unknown parameters and states of a stochastic or uncertain system having a probability structure. The system may include a data processing system that may include a hardware processor that has a configuration that: receives data; generates random, chaotic, fuzzy, or other numerical perturbations of the data, one or more of the states, or the probability structure; estimates observed and hidden states of the stochastic or uncertain system using the data, the generated perturbations, previous states of the stochastic or uncertain system, or estimated states of the stochastic or uncertain system; and causes perturbations or independent noise to be injected into the data, the states, or the stochastic or uncertain system so as to speed up training or learning of the probability structure and of the system parameters or the states.

Type: Application

Filed: July 17, 2015

Publication date: January 7, 2016

Applicant: UNIVERSITY OF SOUTHERN CALIFORNIA

Inventors: Kartik Audhkhasi, Osonde Osoba, Bart Kosko
Automatic evaluation of spoken fluency

Patent number: 8457967

Abstract: A procedure to automatically evaluate the spoken fluency of a speaker by prompting the speaker to talk on a given topic, recording the speaker's speech to get a recorded sample of speech, and then analyzing the patterns of disfluencies in the speech to compute a numerical score to quantify the spoken fluency skills of the speakers. The numerical fluency score accounts for various prosodic and lexical features, including formant-based filled-pause detection, closely-occurring exact and inexact repeat N-grams, normalized average distance between consecutive occurrences of N-grams. The lexical features and prosodic features are combined to classify the speaker with a C-class classification and develop a rating for the speaker.

Type: Grant

Filed: August 15, 2009

Date of Patent: June 4, 2013

Assignee: Nuance Communications, Inc.

Inventors: Kartik Audhkhasi, Om D. Deshmukh, Kundan Kandhway, Ashish Verma
Automatic Evaluation of Spoken Fluency

Publication number: 20110040554

Abstract: A procedure to automatically evaluate the spoken fluency of a speaker by prompting the speaker to talk on a given topic, recording the speaker's speech to get a recorded sample of speech, and then analyzing the patterns of disfluencies in the speech to compute a numerical score to quantify the spoken fluency skills of the speakers. The numerical fluency score accounts for various prosodic and lexical features, including formant-based filled-pause detection, closely-occurring exact and inexact repeat N-grams, normalized average distance between consecutive occurrences of N-grams. The lexical features and prosodic features are combined to classify the speaker with a C-class classification and develop a rating for the speaker.

Type: Application

Filed: August 15, 2009

Publication date: February 17, 2011

Applicant: International Business Machines Corporation

Inventors: Kartik Audhkhasi, Om D. Deshmukh, Kundan Kandhway, Ashish Verma

prev 1 2