Patents by Inventor Michael Picheny

Michael Picheny has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Optimizing speech to text conversion and text summarization using a medical provider workflow model

Patent number: 11094322

Abstract: A method, a system, and a computer program product are provided. Speech signals from a medical conversation between a medical provider and a patient are converted to text based on a first domain model associated with a medical scenario. The first domain model is selected from multiple domain models associated with a workflow of the medical provider. One or more triggers are detected, each of which indicates a respective change in the medical scenario. A corresponding second domain model is applied to the medical conversation to more accurately convert the speech signals to text in response to each of the detected one or more triggers. The corresponding second domain model is associated with a respective change in the medical scenario of the workflow of the medical provider. A clinical note is provided based on the text produced by converting the speech signals.

Type: Grant

Filed: February 7, 2019

Date of Patent: August 17, 2021

Assignee: International Business Machines Corporation

Inventors: Andrew J. Lavery, Kenney Ng, Michael Picheny, Paul C. Tang
Using recurrent neural network for partitioning of audio data into segments that each correspond to a speech feature cluster identifier

Patent number: 10902843

Abstract: Audio features, such as perceptual linear prediction (PLP) features and time derivatives thereof, are extracted from frames of training audio data including speech by multiple speakers, and silence, such as by using linear discriminant analysis (LDA). The frames are clustered into k-means clusters using distance measures, such as Mahalanobis distance measures, of means and variances of the extracted audio features of the frames. A recurrent neural network (RNN) is trained on the extracted audio features of the frames and cluster identifiers of the k-means clusters into which the frames have been clustered. The RNN is applied to audio data to segment audio data into segments that each correspond to one of the cluster identifiers. Each segment can be assigned a label corresponding to one of the cluster identifiers. Speech recognition can be performed on the segments.

Type: Grant

Filed: November 15, 2019

Date of Patent: January 26, 2021

Assignee: International Business Machines Corporation

Inventors: Dimitrios B. Dimitriadis, David C. Haws, Michael Picheny, George Saon, Samuel Thomas
Recognition of out-of-vocabulary in direct acoustics-to-word speech recognition using acoustic word embedding

Patent number: 10839792

Abstract: A method (and structure and computer product) for learning Out-of-Vocabulary (OOV) words in an Automatic Speech Recognition (ASR) system includes using an Acoustic Word Embedding Recurrent Neural Network (AWE RNN) to receive a character sequence for a new OOV word for the ASR system, the RNN providing an Acoustic Word Embedding (AWE) vector as an output thereof. The AWE vector output from the AWE RNN is provided as an input into an Acoustic Word Embedding-to-Acoustic-to-Word Neural Network (AWE?A2W NN) trained to provide an OOV word weight value from the AWE vector. The OOV word weight is inserted into a listing of Acoustic-to-Word (A2W) word embeddings used by the ASR system to output recognized words from an input of speech acoustic features, wherein the OOV word weight is inserted into the A2W word embeddings list relative to existing weights in the A2W word embeddings list.

Type: Grant

Filed: February 5, 2019

Date of Patent: November 17, 2020

Assignees: INTERNATIONAL BUSINESS MACHINES CORPORATION, TOYOTA TECHNOLOGICAL INSTITUTE AT CHICAGO

Inventors: Kartik Audhkhasi, Karen Livescu, Michael Picheny, Shane Settle
OPTIMIZING SPEECH TO TEXT CONVERSION AND TEXT SUMMARIZATION USING A MEDICAL PROVIDER WORKFLOW MODEL

Publication number: 20200258510

Abstract: A method, a system, and a computer program product are provided. Speech signals from a medical conversation between a medical provider and a patient are converted to text based on a first domain model associated with a medical scenario. The first domain model is selected from multiple domain models associated with a workflow of the medical provider. One or more triggers are detected, each of which indicates a respective change in the medical scenario. A corresponding second domain model is applied to the medical conversation to more accurately convert the speech signals to text in response to each of the detected one or more triggers. The corresponding second domain model is associated with a respective change in the medical scenario of the workflow of the medical provider. A clinical note is provided based on the text produced by converting the speech signals.

Type: Application

Filed: February 7, 2019

Publication date: August 13, 2020

Inventors: Andrew J. Lavery, Kenney Ng, Michael Picheny, Paul C. Tang
RECOGNITION OF OUT-OF-VOCABULARY IN DIRECT ACOUSTICS- TO-WORD SPEECH RECOGNITION USING ACOUSTIC WORD EMBEDDING

Publication number: 20200251096

Abstract: A method (and structure and computer product) for learning Out-of-Vocabulary (OOV) words in an Automatic Speech Recognition (ASR) system includes using an Acoustic Word Embedding Recurrent Neural Network (AWE RNN) to receive a character sequence for a new OOV word for the ASR system, the RNN providing an Acoustic Word Embedding (AWE) vector as an output thereof. The AWE vector output from the AWE RNN is provided as an input into an Acoustic Word Embedding-to-Acoustic-to-Word Neural Network (AWE?A2W NN) trained to provide an OOV word weight value from the AWE vector. The OOV word weight is inserted into a listing of Acoustic-to-Word (A2W) word embeddings used by the ASR system to output recognized words from an input of speech acoustic features, wherein the OOV word weight is inserted into the A2W word embeddings list relative to existing weights in the A2W word embeddings list.

Type: Application

Filed: February 5, 2019

Publication date: August 6, 2020

Inventors: Kartik AUDHKHASI, Karen Livescu, Michael Picheny, Shane Settle
Using recurrent neural network for partitioning of audio data into segments that each correspond to a speech feature cluster identifier

Patent number: 10546575

Abstract: Audio features, such as perceptual linear prediction (PLP) features and time derivatives thereof, are extracted from frames of training audio data including speech by multiple speakers, and silence, such as by using linear discriminant analysis (LDA). The frames are clustered into k-means clusters using distance measures, such as Mahalanobis distance measures, of means and variances of the extracted audio features of the frames. A recurrent neural network (RNN) is trained on the extracted audio features of the frames and cluster identifiers of the k-means clusters into which the frames have been clustered. The RNN is applied to audio data to segment audio data into segments that each correspond to one of the cluster identifiers. Each segment can be assigned a label corresponding to one of the cluster identifiers. Speech recognition can be performed on the segments.

Type: Grant

Filed: December 14, 2016

Date of Patent: January 28, 2020

Assignee: International Business Machines Corporation

Inventors: Dimitrios B. Dimitriadis, David C. Haws, Michael Picheny, George Saon, Samuel Thomas
Using long short-term memory recurrent neural network for speaker diarization segmentation

Patent number: 10249292

Abstract: Speaker diarization is performed on audio data including speech by a first speaker, speech by a second speaker, and silence. The speaker diarization includes segmenting the audio data using a long short-term memory (LSTM) recurrent neural network (RNN) to identify change points of the audio data that divide the audio data into segments. The speaker diarization includes assigning a label selected from a group of labels to each segment of the audio data using the LSTM RNN. The group of labels comprising includes labels corresponding to the first speaker, the second speaker, and the silence. Each change point is a transition from one of the first speaker, the second speaker, and the silence to a different one of the first speaker, the second speaker, and the silence. Speech recognition can be performed on the segments that each correspond to one of the first speaker and the second speaker.

Type: Grant

Filed: December 14, 2016

Date of Patent: April 2, 2019

Assignee: International Business Machines Corporation

Inventors: Dimitrios B. Dimitriadis, David C. Haws, Michael Picheny, George Saon, Samuel Thomas
USING RECURRENT NEURAL NETWORK FOR PARTITIONING OF AUDIO DATA INTO SEGMENTS THAT EACH CORRESPOND TO A SPEECH FEATURE CLUSTER IDENTIFIER

Publication number: 20180166067

Abstract: Audio features, such as perceptual linear prediction (PLP) features and time derivatives thereof, are extracted from frames of training audio data including speech by multiple speakers, and silence, such as by using linear discriminant analysis (LDA). The frames are clustered into k-means clusters using distance measures, such as Mahalanobis distance measures, of means and variances of the extracted audio features of the frames. A recurrent neural network (RNN) is trained on the extracted audio features of the frames and cluster identifiers of the k-means clusters into which the frames have been clustered. The RNN is applied to audio data to segment audio data into segments that each correspond to one of the cluster identifiers. Each segment can be assigned a label corresponding to one of the cluster identifiers. Speech recognition can be performed on the segments.

Type: Application

Filed: December 14, 2016

Publication date: June 14, 2018

Inventors: Dimitrios B. Dimitriadis, David C. Haws, Michael Picheny, George Saon, Samuel Thomas
USING LONG SHORT-TERM MEMORY RECURRENT NEURAL NETWORK FOR SPEAKER DIARIZATION SEGMENTATION

Publication number: 20180166066

Abstract: Speaker diarization is performed on audio data including speech by a first speaker, speech by a second speaker, and silence. The speaker diarization includes segmenting the audio data using a long short-term memory (LSTM) recurrent neural network (RNN) to identify change points of the audio data that divide the audio data into segments. The speaker diarization includes assigning a label selected from a group of labels to each segment of the audio data using the LSTM RNN. The group of labels comprising includes labels corresponding to the first speaker, the second speaker, and the silence. Each change point is a transition from one of the first speaker, the second speaker, and the silence to a different one of the first speaker, the second speaker, and the silence. Speech recognition can be performed on the segments that each correspond to one of the first speaker and the second speaker.

Type: Application

Filed: December 14, 2016

Publication date: June 14, 2018

Inventors: Dimitrios B. Dimitriadis, David C. Haws, Michael Picheny, George Saon, Samuel Thomas
Text processing using natural language understanding

Patent number: 8924210

Abstract: Techniques for converting spoken speech into written speech are provided. The techniques include transcribing input speech via speech recognition, mapping each spoken utterance from input speech into a corresponding formal utterance, and mapping each formal utterance into a stylistically formatted written utterance.

Type: Grant

Filed: May 28, 2014

Date of Patent: December 30, 2014

Assignee: Nuance Communications, Inc.

Inventors: Sara H. Basson, Rick Hamilton, Dan Ning Jiang, Dimitri Kanevsky, David Nahamoo, Michael Picheny, Bhuvana Ramabhadran, Tara N. Sainath
Text processing using natural language understanding

Patent number: 8856004

Abstract: Techniques for converting spoken speech into written speech are provided. The techniques include transcribing input speech via speech recognition, mapping each spoken utterance from input speech into a corresponding formal utterance, and mapping each formal utterance into a stylistically formatted written utterance.

Type: Grant

Filed: May 13, 2011

Date of Patent: October 7, 2014

Assignee: Nuance Communications, Inc.

Inventors: Sara H. Basson, Rick Hamilton, Dan Ning Jiang, Dimitri Kanevsky, David Nahamoo, Michael Picheny, Bhuvana Ramabhadran, Tara N. Sainath
TEXT PROCESSING USING NATURAL LANGUAGE UNDERSTANDING

Publication number: 20140278410

Abstract: Techniques for converting spoken speech into written speech are provided. The techniques include transcribing input speech via speech recognition, mapping each spoken utterance from input speech into a corresponding formal utterance, and mapping each formal utterance into a stylistically formatted written utterance.

Type: Application

Filed: May 28, 2014

Publication date: September 18, 2014

Applicant: Nuance Communications, Inc.

Inventors: Sara H. Basson, Rick Hamilton, Dan Ning Jiang, Dimitri Kanevsky, David Nahamoo, Michael Picheny, Bhuvana Ramabhadran, Tara N. Sainath
Translating Between Spoken and Written Language

Publication number: 20120290299

Abstract: Techniques for converting spoken speech into written speech are provided. The techniques include transcribing input speech via speech recognition, mapping each spoken utterance from input speech into a corresponding formal utterance, and mapping each formal utterance into a stylistically formatted written utterance.

Type: Application

Filed: May 13, 2011

Publication date: November 15, 2012

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Sara H. Basson, Rick Hamilton, Dan Ning Jiang, Dimitri Kanevsky, David Nahamoo, Michael Picheny, Bhuvana Ramabhadran, Tara N. Sainath
Methods and apparatus for adapting output speech in accordance with context of communication

Publication number: 20060229873

Abstract: A technique for producing speech output in an automatic dialog system is provided. Communication is received from a user at the automatic dialog system. A context of the communication from the user is detected in a context detector of the automatic dialog system. A message is provided to the user from a text-to-speech system of the automatic dialog system in communication with the context detector, wherein the message is provided in accordance with the detected context of the communication.

Type: Application

Filed: March 29, 2005

Publication date: October 12, 2006

Applicant: International Business Machines Corporation

Inventors: Ellen Eide, Wael Hamza, Michael Picheny
Method, apparatus and computer program providing a multi-speaker database for concatenative text-to-speech synthesis

Publication number: 20060229876

Abstract: A method, apparatus and a computer program product to generate an audible speech word that corresponds to text. The method includes providing a text word and, in response to the text word, processing pre-recorded speech segments that are derived from a plurality of speakers to selectively concatenate together speech segments based on at least one cost function to form audio data for generating an audible speech word that corresponds to the text word. A data structure is also provided for use in a concatenative text-to-speech system that includes a plurality of speech segments derived from a plurality of speakers, where each speech segment includes an associated attribute vector each of which is comprised of at least one attribute vector element that identifies the speaker from which the speech segment was derived.

Type: Application

Filed: April 7, 2005

Publication date: October 12, 2006

Inventors: Andrew Aaron, Ellen Eide, Wael Hamza, Michael Picheny, Charles Rutherfoord, Zhi Shuang, Maria Smith
Method and apparatus for fast semi-automatic semantic annotation

Publication number: 20060074634

Abstract: A method, apparatus and computer instructions is provided for fast semi-automatic semantic annotation. Given a limited annotated corpus, the present invention assigns a tag and a label to each word of the next limited annotated corpus using a parser engine, a similarity engine, and a SVM engine. A rover then combines the parse trees from the three engines and annotates the next chunk of limited annotated corpus with confidence, such that the efforts required for human annotation is reduced.

Type: Application

Filed: October 6, 2004

Publication date: April 6, 2006

Applicant: International Business Machines Corporation

Inventors: Yuqing Gao, Michael Picheny, Ruhi Sarikaya
Speech recognition utilizing multitude of speech features

Publication number: 20050119885

Abstract: In a speech recognition system, the combination of a log-linear model with a multitude of speech features is provided to recognize unknown speech utterances. The speech recognition system models the posterior probability of linguistic units relevant to speech recognition using a log-linear model. The posterior model captures the probability of the linguistic unit given the observed speech features and the parameters of the posterior model. The posterior model may be determined using the probability of the word sequence hypotheses given a multitude of speech features. Log-linear models are used with features derived from sparse or incomplete data. The speech features that are utilized may include asynchronous, overlapping, and statistically non-independent speech features. Not all features used in training need to appear in testing/recognition.

Type: Application

Filed: November 28, 2003

Publication date: June 2, 2005

Inventors: Scott Axelrod, Sreeram Balakrishnan, Stanley Chen, Yuging Gao, Ramesh Gopinath, Hong-Kwang Kuo, Benoit Maison, David Nahamoo, Michael Picheny, George Saon, Geoffrey Zweig
Semantic language modeling and confidence measurement

Publication number: 20050055209

Abstract: A system and method for speech recognition includes generating a set of likely hypotheses in recognizing speech, rescoring the likely hypotheses by using semantic content by employing semantic structured language models, and scoring parse trees to identify a best sentence according to the sentence's parse tree by employing the semantic structured language models to clarify the recognized speech.

Type: Application

Filed: September 5, 2003

Publication date: March 10, 2005

Inventors: Mark Epstein, Hakan Erdogan, Yuqing Gao, Michael Picheny, Ruhi Sarikaya
Method and apparatus for translating natural-language speech using multiple output phrases

Patent number: 6859778

Abstract: A multi-lingual translation system that provides multiple output sentences for a given word or phrase. Each output sentence for a given word or phrase reflects, for example, a different emotional emphasis, dialect, accents, loudness or rates of speech. A given output sentence could be selected automatically, or manually as desired, to create a desired effect. For example, the same output sentence for a given word or phrase can be recorded three times, to selectively reflect excitement, sadness or fear. The multi-lingual translation system includes a phrase-spotting mechanism, a translation mechanism, a speech output mechanism and optionally, a language understanding mechanism or an event measuring mechanism or both. The phrase-spotting mechanism identifies a spoken phrase from a restricted domain of phrases. The language understanding mechanism, if present, maps the identified phrase onto a small set of formal phrases.

Type: Grant

Filed: March 16, 2000

Date of Patent: February 22, 2005

Assignees: International Business Machines Corporation, OIPENN, Inc.

Inventors: Raimo Bakis, Mark Edward Epstein, William Stuart Meisel, Miroslav Novak, Michael Picheny, Ridley M. Whitaker
Method and apparatus for time-synchronized translation and synthesis of natural-language speech

Patent number: 6556972

Abstract: A multi-lingual time-synchronized translation system and method provide automatic time-synchronized spoken translations of spoken phrases. The multi-lingual time-synchronized translation system includes a phrase-spotting mechanism, optionally, a language understanding mechanism, a translation mechanism, a speech output mechanism and an event measuring mechanism. The phrase-spotting mechanism identifies a spoken phrase from a restricted domain of phrases. The language understanding mechanism, if present, maps the identified phrase onto a small set of formal phrases. The translation mechanism maps the formal phrase onto a well-formed phrase in one or more target languages. The speech output mechanism produces high-quality output speech using the output of the event measuring mechanism for time synchronization. The event-measuring mechanism measures the duration of various key events in the source phrase.

Type: Grant

Filed: March 16, 2000

Date of Patent: April 29, 2003

Assignee: International Business Machines Corporation

Inventors: Raimo Bakis, Mark Edward Epstein, William Stuart Meisel, Miroslav Novak, Michael Picheny, Ridley M. Whitaker

1 2 next