Patents by Inventor Michael Alan Picheny

Michael Alan Picheny has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Multilingual intent recognition

Patent number: 11900922

Abstract: Embodiments of the present invention provide computer implemented methods, computer program products and computer systems. For example, embodiments of the present invention can access one or more intents and associated entities from limited amount of speech to text training data in a single language. Embodiments of the present invention can locate speech to text training data in one or more other languages using the accessed one or more intents and associated entities to locate speech to text training data in the one or more other languages different than the single language. Embodiments of the present invention can then train a neural network based on the limited amount of speech to text training data in the single language and the located speech to text training data in the one or more other languages.

Type: Grant

Filed: November 10, 2020

Date of Patent: February 13, 2024

Assignee: International Business Machines Corporation

Inventors: Samuel Thomas, Hong-Kwang Kuo, Kartik Audhkhasi, Michael Alan Picheny
Leveraging unpaired text data for training end-to-end spoken language understanding systems

Patent number: 11587551

Abstract: An illustrative embodiment includes a method for training an end-to-end (E2E) spoken language understanding (SLU) system. The method includes receiving a training corpus comprising a set of text classified using one or more sets of semantic labels but unpaired with speech and using the set of unpaired text to train the E2E SLU system to classify speech using at least one of the one or more sets of semantic labels. The method may include training a text-to-intent model using the set of unpaired text; and training a speech-to-intent model using the text-to-intent model. Alternatively or additionally, the method may include using a text-to-speech (TTS) system to generate synthetic speech from the unpaired text; and training the E2E SLU system using the synthetic speech.

Type: Grant

Filed: April 7, 2020

Date of Patent: February 21, 2023

Assignee: International Business Machines Corporation

Inventors: Hong-Kwang Jeff Kuo, Yinghui Huang, Samuel Thomas, Kartik Audhkhasi, Michael Alan Picheny
MULTILINGUAL INTENT RECOGNITION

Publication number: 20220148581

Abstract: Embodiments of the present invention provide computer implemented methods, computer program products and computer systems. For example, embodiments of the present invention can access one or more intents and associated entities from limited amount of speech to text training data in a single language. Embodiments of the present invention can locate speech to text training data in one or more other languages using the accessed one or more intents and associated entities to locate speech to text training data in the one or more other languages different than the single language. Embodiments of the present invention can then train a neural network based on the limited amount of speech to text training data in the single language and the located speech to text training data in the one or more other languages.

Type: Application

Filed: November 10, 2020

Publication date: May 12, 2022

Inventors: Samuel Thomas, Hong-Kwang Kuo, Kartik Audhkhasi, Michael Alan Picheny
Detecting and recovering out-of-vocabulary words in voice-to-text transcription systems

Patent number: 11183194

Abstract: Aspects of the present disclosure describe techniques for identifying and recovering out-of-vocabulary words in transcripts of a voice data recording using word recognition models and word sub-unit recognition models. An example method generally includes receiving a voice data recording for transcription into a textual representation of the voice data recording. The voice data recording is transcribed into the textual representation using a word recognition model. An unknown word is identified in the textual representation, and the unknown word is reconstructed based on recognition of sub-units of the unknown word generated by a sub-unit recognition model. The textual representation of the voice data recording is modified by replacing the unknown word with the reconstruction of the unknown word, and the modified textual representation is output.

Type: Grant

Filed: September 13, 2019

Date of Patent: November 23, 2021

Assignee: International Business Machines Corporation

Inventors: Samuel Thomas, Kartik Audhkhasi, Zoltan Tueske, Yinghui Huang, Michael Alan Picheny
Soft-forgetting for connectionist temporal classification based automatic speech recognition

Patent number: 11158303

Abstract: In an approach to soft-forgetting training, one or more computer processors train a first model utilizing one or more training batches wherein each training batch of the one or more training batches comprises one or more blocks of information. The one or more computer processors, responsive to a completion of the training of the first model, initiate a training of a second model utilizing the one or more training batches. The one or more computer processors jitter a random block size for each block of information for each of the one or more training batches for the second model. The one or more computer processors unroll the second model over one or more non-overlapping contiguous jittered blocks of information. The one or more computer processors, responsive to the unrolling of the second model, reduce overfitting for the second model by applying twin regularization.

Type: Grant

Filed: August 27, 2019

Date of Patent: October 26, 2021

Assignee: International Business Machines Corporation

Inventors: Kartik Audhkhasi, George Andrei Saon, Zoltan Tueske, Brian E. D. Kingsbury, Michael Alan Picheny
LEVERAGING UNPAIRED TEXT DATA FOR TRAINING END-TO-END SPOKEN LANGUAGE UNDERSTANDING SYSTEMS

Publication number: 20210312906

Abstract: An illustrative embodiment includes a method for training an end-to-end (E2E) spoken language understanding (SLU) system. The method includes receiving a training corpus comprising a set of text classified using one or more sets of semantic labels but unpaired with speech and using the set of unpaired text to train the E2E SLU system to classify speech using at least one of the one or more sets of semantic labels. The method may include training a text-to-intent model using the set of unpaired text; and training a speech-to-intent model using the text-to-intent model. Alternatively or additionally, the method may include using a text-to-speech (TTS) system to generate synthetic speech from the unpaired text; and training the E2E SLU system using the synthetic speech.

Type: Application

Filed: April 7, 2020

Publication date: October 7, 2021

Inventors: Hong-Kwang Jeff Kuo, Yinghui Huang, Samuel Thomas, Kartik Audhkhasi, Michael Alan Picheny
TEXT TO SPEECH PROMPT TUNING BY EXAMPLE

Publication number: 20210280167

Abstract: According to one embodiment, a method, computer system, and computer program product for customizing the rendering of a synthesized speech prompt is provided. The present invention may include extracting prosodic information from a received audio recording of a prompt by parsing the text corresponding with the prompt and generating phonetic units, aligning the phonetic units with the audio recording, and calculating, based on the alignment, prosodic values for the phonetic units. The invention may further include adapting the prosodic values to match a text-to-speech voice in use, and then synthesizing speech for the prompt based upon the adapted prosodic information.

Type: Application

Filed: March 4, 2020

Publication date: September 9, 2021

Inventors: Maria E. Smith, Radek Kazbunda, Michael Alan Picheny, Raul Fernandez
SOFT-FORGETTING FOR CONNECTIONIST TEMPORAL CLASSIFICATION BASED AUTOMATIC SPEECH RECOGNITION

Publication number: 20210065680

Abstract: In an approach to soft-forgetting training, one or more computer processors train a first model utilizing one or more training batches wherein each training batch of the one or more training batches comprises one or more blocks of information. The one or more computer processors, responsive to a completion of the training of the first model, initiate a training of a second model utilizing the one or more training batches. The one or more computer processors jitter a random block size for each block of information for each of the one or more training batches for the second model. The one or more computer processors unroll the second model over one or more non-overlapping contiguous jittered blocks of information. The one or more computer processors, responsive to the unrolling of the second model, reduce overfitting for the second model by applying twin regularization.

Type: Application

Filed: August 27, 2019

Publication date: March 4, 2021

Inventors: Kartik Audhkhasi, George Andrei Saon, Zoltan Tueske, Brian E. D. Kingsbury, Michael Alan Picheny
Techniques for enhancing the performance of concatenative speech synthesis

Patent number: 8145491

Abstract: When pitch of a speech segment is being modified from a current pitch to a requested pitch, and the difference between these is relatively large, a pitch modification algorithm is used to modify the pitch of the speech segment. When the difference between current and requested pitches is relatively small, the pitch of the speech segment is not modified. After one or the other speech modification techniques are used, then the resultant modified speech segment is overlapped and added to previously modified speech segments. A modification ratio is determined in order to quantify the difference between the current and requested pitches for a speech segment. The modification ratio is a ratio between the requested and current pitches. Low and high ratio thresholds are used to determine when pitch is being modified to a predetermined high degree, and whether pitch of the speech segment will or will not be modified.

Type: Grant

Filed: July 30, 2002

Date of Patent: March 27, 2012

Assignee: Nuance Communications, Inc.

Inventors: Wael Mohamed Hamza, Michael Alan Picheny
Method and apparatus for fast semi-automatic semantic annotation

Patent number: 7996211

Abstract: A method, apparatus and computer instructions is provided for fast semi-automatic semantic annotation. Given a limited annotated corpus, the present invention assigns a tag and a label to each word of the next limited annotated corpus using a parser engine, a similarity engine, and a SVM engine. A rover then combines the parse trees from the three engines and annotates the next chunk of limited annotated corpus with confidence, such that the efforts required for human annotation is reduced.

Type: Grant

Filed: May 20, 2008

Date of Patent: August 9, 2011

Assignee: Nuance Communications, Inc.

Inventors: Yuqing Gao, Michael Alan Picheny, Ruhi Sarikaya
Method for fast semi-automatic semantic annotation

Patent number: 7610191

Abstract: A method, apparatus and computer instructions is provided for fast semi-automatic semantic annotation. Given a limited annotated corpus, the present invention assigns a tag and a label to each word of the next limited annotated corpus using a parser engine, a similarity engine, and a SVM engine. A rover then combines the parse trees from the three engines and annotates the next chunk of limited annotated corpus with confidence, such that the efforts required for human annotation is reduced.

Type: Grant

Filed: October 6, 2004

Date of Patent: October 27, 2009

Assignee: Nuance Communications, Inc.

Inventors: Yuqing Gao, Michael Alan Picheny, Ruhi Sarikaya
Methods and apparatus for adapting output speech in accordance with context of communication

Patent number: 7490042

Abstract: A technique for producing speech output in an automatic dialog system in accordance with a detected context is provided. Communication is received from a user at the automatic dialog system. A context of the communication from the user is detected in a context detector of the automatic dialog system. A message is created in a natural language generator of the automatic dialog system in communication with the context detector. The message is conveyed to the user through a speech synthesis system of the automatic dialog system, in communication with the natural language generator and the context detector. Responsive to a detected level of ambient noise, the context detector provides at least one command in a markup language to cause the natural language generator to create the message using maximally intelligible words and to cause the speech synthesis system to convey the message with increased volume and decreased speed.

Type: Grant

Filed: March 29, 2005

Date of Patent: February 10, 2009

Assignee: International Business Machines Corporation

Inventors: Ellen Marie Eide, Wael Mohamed Hamza, Michael Alan Picheny
SPEECH RECOGNITION UTILIZING MULTITUDE OF SPEECH FEATURES

Publication number: 20080312921

Abstract: In a speech recognition system, the combination of a log-linear model with a multitude of speech features is provided to recognize unknown speech utterances. The speech recognition system models the posterior probability of linguistic units relevant to speech recognition using a log-linear model. The posterior model captures the probability of the linguistic unit given the observed speech features and the parameters of the posterior model. The posterior model may be determined using the probability of the word sequence hypotheses given a multitude of speech features. Log-linear models are used with features derived from sparse or incomplete data. The speech features that are utilized may include asynchronous, overlapping, and statistically non-independent speech features. Not all features used in training need to appear in testing/recognition.

Type: Application

Filed: August 20, 2008

Publication date: December 18, 2008

Inventors: Scott E. Axelrod, Sreeram Viswanath Balakrishnan, Stanley F. Chen, Yuging Gao, Rameah A. Gopinath, Hong-Kwang Kuo, Benoit Maison, David Nahamoo, Michael Alan Picheny, George A. Saon, Geoffrey G. Zweig
Speech recognition utilizing multitude of speech features

Patent number: 7464031

Abstract: In a speech recognition system, the combination of a log-linear model with a multitude of speech features is provided to recognize unknown speech utterances. The speech recognition system models the posterior probability of linguistic units relevant to speech recognition using a log-linear model. The posterior model captures the probability of the linguistic unit given the observed speech features and the parameters of the posterior model. The posterior model may be determined using the probability of the word sequence hypotheses given a multitude of speech features. Log-linear models are used with features derived from sparse or incomplete data. The speech features that are utilized may include asynchronous, overlapping, and statistically non-independent speech features. Not all features used in training need to appear in testing/recognition.

Type: Grant

Filed: November 28, 2003

Date of Patent: December 9, 2008

Assignee: International Business Machines Corporation

Inventors: Scott E. Axelrod, Sreeram Viswanath Balakrishnan, Stanley F. Chen, Yuging Gao, Ramesh A. Gopinath, Hong-Kwang Kuo, Benoit Maison, David Nahamoo, Michael Alan Picheny, George A. Saon, Geoffrey G. Zweig
Method and Apparatus for Fast Semi-Automatic Semantic Annotation

Publication number: 20080221874

Abstract: A method, apparatus and computer instructions is provided for fast semi-automatic semantic annotation. Given a limited annotated corpus, the present invention assigns a tag and a label to each word of the next limited annotated corpus using a parser engine, a similarity engine, and a SVM engine. A rover then combines the parse trees from the three engines and annotates the next chunk of limited annotated corpus with confidence, such that the efforts required for human annotation is reduced.

Type: Application

Filed: May 20, 2008

Publication date: September 11, 2008

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Yuging Cao, Michael Alan Picheny, Ruhi Sarikaya
Methods and apparatus for training a pattern recognition system using maximal rank likelihood as an optimization function

Patent number: 6850888

Abstract: A method and apparatus are disclosed for training a pattern recognition system, such as a speech recognition system, using an improved objective function. The concept of rank likelihood, previously applied only to the decding process, is applied in a novel manner to the parameter estimation of the training phase of a pattern recognition system. The disclosed objective function is based on a pseudo-rank likelihood that not only maximizes the likelihood of an observation for the correct class, but also minimizes the likelihoods of the observation for all other classes, such that the discrimination between classes is maximized. A training process is disclosed that utilizes the pseudo-rank likelihood objective function to identify model parameters that will result in a pattern recognizer with the lowest possible recognition error rate. The discrete nature of the rank-based rank likelihood objective function is transformed to allow the parameter estimations to be optimized during the training phase.

Type: Grant

Filed: October 6, 2000

Date of Patent: February 1, 2005

Assignee: International Business Machines Corporation

Inventors: Yuqing Gao, Yongxin Li, Michael Alan Picheny
Augmentation of alternate word lists by acoustic confusability criterion

Patent number: 6754625

Abstract: There is provided a method for augmenting an alternate word list generated by a speech recognition system. The alternate word list includes at least one potentially correct word for replacing a wrongly decoded word. The method includes the step of identifying at least one acoustically confusable word with respect to the wrongly decoded word. The alternate word list is augmented with the at least one acoustically confusable word.

Type: Grant

Filed: December 26, 2000

Date of Patent: June 22, 2004

Assignee: International Business Machines Corporation

Inventors: Peder Andreas Olsen, Michael Alan Picheny, Harry W. Printz, Karthik Visweswariah
Techniques for enhancing the performance of concatenative speech synthesis

Publication number: 20040024600

Abstract: When pitch of a speech segment is being modified from a current pitch to a requested pitch, and the difference between these is relatively large, a pitch modification algorithm is used to modify the pitch of the speech segment. When the difference between current and requested pitches is relatively small, the pitch of the speech segment is not modified. After one or the other speech modification techniques are used, then the resultant modified speech segment is overlapped and added to previously modified speech segments. A modification ratio is determined in order to quantify the difference between the current and requested pitches for a speech segment. The modification ratio is a ratio between the requested and current pitches. Low and high ratio thresholds are used to determine when pitch is being modified to a predetermined high degree, and whether pitch of the speech segment will or will not be modified.

Type: Application

Filed: July 30, 2002

Publication date: February 5, 2004

Applicant: International Business Machines Corporation

Inventors: Wael Mohamed Hamza, Michael Alan Picheny
Augmentation of alternate word lists by acoustic confusability criterion

Publication number: 20020116191

Abstract: There is provided a method for augmenting an alternate word list generated by a speech recognition system. The alternate word list includes at least one potentially correct word for replacing a wrongly decoded word. The method includes the step of identifying at least one acoustically confusable word with respect to the wrongly decoded word. The alternate word list is augmented with the at least one acoustically confusable word.

Type: Application

Filed: December 26, 2000

Publication date: August 22, 2002

Applicant: International Business Machines Corporation

Inventors: Peder Andreas Olsen, Michael Alan Picheny, Harry W. Printz, Karthik Visweswariah
Speaker adaptation system and method based on class-specific pre-clustering training speakers

Patent number: 6073096

Abstract: A method of speech recognition, in accordance with the present invention includes the steps of grouping acoustics to form classes based on acoustic features, clustering training speakers by the classes to provide class-specific cluster systems, selecting from the cluster systems, a subset of cluster systems closest to adaptation data from a test speaker, transforming the subset of cluster systems to bring the subset of cluster systems closer to the test speaker based on the adaptation data to form adapted cluster systems and combining the adapted cluster systems to create a speaker adapted system for decoding speech from the test speaker. System and methods for building speech recognition systems as well as adapting speaker systems for class-specific speaker clusters are included.

Type: Grant

Filed: February 4, 1998

Date of Patent: June 6, 2000

Assignee: International Business Machines Corporation

Inventors: Yuqing Gao, Mukund Padmanabhan, Michael Alan Picheny

1 2 next