Patents by Inventor Michael A. Picheny

Michael A. Picheny has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 10902843
    Abstract: Audio features, such as perceptual linear prediction (PLP) features and time derivatives thereof, are extracted from frames of training audio data including speech by multiple speakers, and silence, such as by using linear discriminant analysis (LDA). The frames are clustered into k-means clusters using distance measures, such as Mahalanobis distance measures, of means and variances of the extracted audio features of the frames. A recurrent neural network (RNN) is trained on the extracted audio features of the frames and cluster identifiers of the k-means clusters into which the frames have been clustered. The RNN is applied to audio data to segment audio data into segments that each correspond to one of the cluster identifiers. Each segment can be assigned a label corresponding to one of the cluster identifiers. Speech recognition can be performed on the segments.
    Type: Grant
    Filed: November 15, 2019
    Date of Patent: January 26, 2021
    Assignee: International Business Machines Corporation
    Inventors: Dimitrios B. Dimitriadis, David C. Haws, Michael Picheny, George Saon, Samuel Thomas
  • Patent number: 10839792
    Abstract: A method (and structure and computer product) for learning Out-of-Vocabulary (OOV) words in an Automatic Speech Recognition (ASR) system includes using an Acoustic Word Embedding Recurrent Neural Network (AWE RNN) to receive a character sequence for a new OOV word for the ASR system, the RNN providing an Acoustic Word Embedding (AWE) vector as an output thereof. The AWE vector output from the AWE RNN is provided as an input into an Acoustic Word Embedding-to-Acoustic-to-Word Neural Network (AWE?A2W NN) trained to provide an OOV word weight value from the AWE vector. The OOV word weight is inserted into a listing of Acoustic-to-Word (A2W) word embeddings used by the ASR system to output recognized words from an input of speech acoustic features, wherein the OOV word weight is inserted into the A2W word embeddings list relative to existing weights in the A2W word embeddings list.
    Type: Grant
    Filed: February 5, 2019
    Date of Patent: November 17, 2020
    Assignees: INTERNATIONAL BUSINESS MACHINES CORPORATION, TOYOTA TECHNOLOGICAL INSTITUTE AT CHICAGO
    Inventors: Kartik Audhkhasi, Karen Livescu, Michael Picheny, Shane Settle
  • Publication number: 20200258510
    Abstract: A method, a system, and a computer program product are provided. Speech signals from a medical conversation between a medical provider and a patient are converted to text based on a first domain model associated with a medical scenario. The first domain model is selected from multiple domain models associated with a workflow of the medical provider. One or more triggers are detected, each of which indicates a respective change in the medical scenario. A corresponding second domain model is applied to the medical conversation to more accurately convert the speech signals to text in response to each of the detected one or more triggers. The corresponding second domain model is associated with a respective change in the medical scenario of the workflow of the medical provider. A clinical note is provided based on the text produced by converting the speech signals.
    Type: Application
    Filed: February 7, 2019
    Publication date: August 13, 2020
    Inventors: Andrew J. Lavery, Kenney Ng, Michael Picheny, Paul C. Tang
  • Publication number: 20200251096
    Abstract: A method (and structure and computer product) for learning Out-of-Vocabulary (OOV) words in an Automatic Speech Recognition (ASR) system includes using an Acoustic Word Embedding Recurrent Neural Network (AWE RNN) to receive a character sequence for a new OOV word for the ASR system, the RNN providing an Acoustic Word Embedding (AWE) vector as an output thereof. The AWE vector output from the AWE RNN is provided as an input into an Acoustic Word Embedding-to-Acoustic-to-Word Neural Network (AWE?A2W NN) trained to provide an OOV word weight value from the AWE vector. The OOV word weight is inserted into a listing of Acoustic-to-Word (A2W) word embeddings used by the ASR system to output recognized words from an input of speech acoustic features, wherein the OOV word weight is inserted into the A2W word embeddings list relative to existing weights in the A2W word embeddings list.
    Type: Application
    Filed: February 5, 2019
    Publication date: August 6, 2020
    Inventors: Kartik AUDHKHASI, Karen Livescu, Michael Picheny, Shane Settle
  • Patent number: 10726844
    Abstract: A method, computer system, and a computer program product for optimizing speech recognition in a smart medical room. The present invention may include selecting, from a database, one or more speech domain models based on a plurality of signals from a plurality of biometric sensors associated with a plurality of medical equipment, wherein the one or more speech domain models are trained with one or more feedback from a clinician based on a medical encounter and from a continuous feedback display in the smart medical room, wherein the one or more feedback from the clinician is based on an optional notification to the clinician to confirm the one or more speech models in use.
    Type: Grant
    Filed: September 9, 2019
    Date of Patent: July 28, 2020
    Assignee: International Business Machines Corporation
    Inventors: Andrew J. Lavery, Kenney Ng, Michael A. Picheny, Paul C. Tang
  • Publication number: 20200105271
    Abstract: A method, computer system, and a computer program product for optimizing speech recognition in a smart medical room. The present invention may include selecting, from a database, one or more speech domain models based on a plurality of signals from a plurality of biometric sensors associated with a plurality of medical equipment, wherein the one or more speech domain models are trained with one or more feedback from a clinician based on a medical encounter and from a continuous feedback display in the smart medical room, wherein the one or more feedback from the clinician is based on an optional notification to the clinician to confirm the one or more speech models in use.
    Type: Application
    Filed: September 9, 2019
    Publication date: April 2, 2020
    Inventors: Andrew J. Lavery, Kenney Ng, Michael A. Picheny, Paul C. Tang
  • Publication number: 20200082809
    Abstract: Audio features, such as perceptual linear prediction (PLP) features and time derivatives thereof, are extracted from frames of training audio data including speech by multiple speakers, and silence, such as by using linear discriminant analysis (LDA). The frames are clustered into k-means clusters using distance measures, such as Mahalanobis distance measures, of means and variances of the extracted audio features of the frames. A recurrent neural network (RNN) is trained on the extracted audio features of the frames and cluster identifiers of the k-means clusters into which the frames have been clustered. The RNN is applied to audio data to segment audio data into segments that each correspond to one of the cluster identifiers. Each segment can be assigned a label corresponding to one of the cluster identifiers. Speech recognition can be performed on the segments.
    Type: Application
    Filed: November 15, 2019
    Publication date: March 12, 2020
    Inventors: DIMITRIOS B. DIMITRIADIS, David C. Haws, MICHAEL PICHENY, GEORGE SAON, Samuel Thomas
  • Patent number: 10546575
    Abstract: Audio features, such as perceptual linear prediction (PLP) features and time derivatives thereof, are extracted from frames of training audio data including speech by multiple speakers, and silence, such as by using linear discriminant analysis (LDA). The frames are clustered into k-means clusters using distance measures, such as Mahalanobis distance measures, of means and variances of the extracted audio features of the frames. A recurrent neural network (RNN) is trained on the extracted audio features of the frames and cluster identifiers of the k-means clusters into which the frames have been clustered. The RNN is applied to audio data to segment audio data into segments that each correspond to one of the cluster identifiers. Each segment can be assigned a label corresponding to one of the cluster identifiers. Speech recognition can be performed on the segments.
    Type: Grant
    Filed: December 14, 2016
    Date of Patent: January 28, 2020
    Assignee: International Business Machines Corporation
    Inventors: Dimitrios B. Dimitriadis, David C. Haws, Michael Picheny, George Saon, Samuel Thomas
  • Patent number: 10510348
    Abstract: A method, computer system, and a computer program product for optimizing speech recognition in a smart medical room. The present invention may include receiving a piece of verbal data associated with a medical encounter from one or more audio recording devices. The present invention may also include accessing a plurality of signals from a plurality of biometric sensors associated with a plurality of medical equipment associated with the smart medical room based on the received piece of verbal data associated with the medical encounter. The present invention may further include selecting, from a database, one or more speech domain models based on the accessed plurality of signals from the plurality of biometric sensors associated with the plurality of medical equipment, wherein the one or more speech domain models are utilized to optimize a transcription of speech during the medical encounter in the smart medical room.
    Type: Grant
    Filed: September 28, 2018
    Date of Patent: December 17, 2019
    Assignee: International Business Machines Corporation
    Inventors: Andrew J. Lavery, Kenney Ng, Michael A. Picheny, Paul C. Tang
  • Patent number: 10249292
    Abstract: Speaker diarization is performed on audio data including speech by a first speaker, speech by a second speaker, and silence. The speaker diarization includes segmenting the audio data using a long short-term memory (LSTM) recurrent neural network (RNN) to identify change points of the audio data that divide the audio data into segments. The speaker diarization includes assigning a label selected from a group of labels to each segment of the audio data using the LSTM RNN. The group of labels comprising includes labels corresponding to the first speaker, the second speaker, and the silence. Each change point is a transition from one of the first speaker, the second speaker, and the silence to a different one of the first speaker, the second speaker, and the silence. Speech recognition can be performed on the segments that each correspond to one of the first speaker and the second speaker.
    Type: Grant
    Filed: December 14, 2016
    Date of Patent: April 2, 2019
    Assignee: International Business Machines Corporation
    Inventors: Dimitrios B. Dimitriadis, David C. Haws, Michael Picheny, George Saon, Samuel Thomas
  • Publication number: 20180166067
    Abstract: Audio features, such as perceptual linear prediction (PLP) features and time derivatives thereof, are extracted from frames of training audio data including speech by multiple speakers, and silence, such as by using linear discriminant analysis (LDA). The frames are clustered into k-means clusters using distance measures, such as Mahalanobis distance measures, of means and variances of the extracted audio features of the frames. A recurrent neural network (RNN) is trained on the extracted audio features of the frames and cluster identifiers of the k-means clusters into which the frames have been clustered. The RNN is applied to audio data to segment audio data into segments that each correspond to one of the cluster identifiers. Each segment can be assigned a label corresponding to one of the cluster identifiers. Speech recognition can be performed on the segments.
    Type: Application
    Filed: December 14, 2016
    Publication date: June 14, 2018
    Inventors: Dimitrios B. Dimitriadis, David C. Haws, Michael Picheny, George Saon, Samuel Thomas
  • Publication number: 20180166066
    Abstract: Speaker diarization is performed on audio data including speech by a first speaker, speech by a second speaker, and silence. The speaker diarization includes segmenting the audio data using a long short-term memory (LSTM) recurrent neural network (RNN) to identify change points of the audio data that divide the audio data into segments. The speaker diarization includes assigning a label selected from a group of labels to each segment of the audio data using the LSTM RNN. The group of labels comprising includes labels corresponding to the first speaker, the second speaker, and the silence. Each change point is a transition from one of the first speaker, the second speaker, and the silence to a different one of the first speaker, the second speaker, and the silence. Speech recognition can be performed on the segments that each correspond to one of the first speaker and the second speaker.
    Type: Application
    Filed: December 14, 2016
    Publication date: June 14, 2018
    Inventors: Dimitrios B. Dimitriadis, David C. Haws, Michael Picheny, George Saon, Samuel Thomas
  • Publication number: 20180005111
    Abstract: Disclosed herein are systems, methods, and computer-readable media for classifying a set of inputs via a supervised classifier model that utilizes a novel activation function that provides the capability to learn a scale parameter in addition to a bias parameter and other weight parameters.
    Type: Application
    Filed: June 30, 2016
    Publication date: January 4, 2018
    Inventors: Upendra V. Chaudhari, Michael A. Picheny
  • Patent number: 9704482
    Abstract: A method for spoken term detection, comprising generating a time-marked word list, wherein the time-marked word list is an output of an automatic speech recognition system, generating an index from the time-marked word list, wherein generating the index comprises creating a word loop weighted finite state transducer for each utterance, i, receiving a plurality of keyword queries, and searching the index for a plurality of keyword hits.
    Type: Grant
    Filed: March 11, 2015
    Date of Patent: July 11, 2017
    Assignee: International Business Machines Corporation
    Inventors: Brian E. D. Kingsbury, Lidia Mangu, Michael A. Picheny, George A. Saon
  • Patent number: 9697830
    Abstract: A method for spoken term detection, comprising generating a time-marked word list, wherein the time-marked word list is an output of an automatic speech recognition system, generating an index from the time-marked word list, wherein generating the index comprises creating a word loop weighted finite state transducer for each utterance, i, receiving a plurality of keyword queries, and searching the index for a plurality of keyword hits.
    Type: Grant
    Filed: June 25, 2015
    Date of Patent: July 4, 2017
    Assignee: International Business Machines Corporation
    Inventors: Brian E. D. Kingsbury, Lidia Mangu, Michael A. Picheny, George A. Saon
  • Publication number: 20170154264
    Abstract: A method, executed by a computer, includes monitoring a conversation between a plurality of meeting participants, identifying a conversational focus within the conversation, generating at least one question corresponding to the conversational focus, and retrieving at least one answer corresponding to the at least one question. A computer system and computer program product corresponding to the method are also disclosed herein.
    Type: Application
    Filed: November 30, 2015
    Publication date: June 1, 2017
    Inventors: Stanley Chen, Kenneth W. Church, Robert G. Farrell, Vaibhava Goel, Lidia L. Mangu, Etienne Marcheret, Michael A. Picheny, Bhuvana Ramabhadran, Laurence P. Sansone, Abhinav Sethy, Samuel Thomas
  • Publication number: 20160267906
    Abstract: A method for spoken term detection, comprising generating a time-marked word list, wherein the time-marked word list is an output of an automatic speech recognition system, generating an index from the time-marked word list, wherein generating the index comprises creating a word loop weighted finite state transducer for each utterance, i, receiving a plurality of keyword queries, and searching the index for a plurality of keyword hits.
    Type: Application
    Filed: March 11, 2015
    Publication date: September 15, 2016
    Inventors: Brian E.D. Kingsbury, Lidia Mangu, Michael A. Picheny, George A. Saon
  • Publication number: 20160267907
    Abstract: A method for spoken term detection, comprising generating a time-marked word list, wherein the time-marked word list is an output of an automatic speech recognition system, generating an index from the time-marked word list, wherein generating the index comprises creating a word loop weighted finite state transducer for each utterance, i, receiving a plurality of keyword queries, and searching the index for a plurality of keyword hits.
    Type: Application
    Filed: June 25, 2015
    Publication date: September 15, 2016
    Inventors: Brian E.D. Kingsbury, Lidia Mangu, Michael A. Picheny, George A. Saon
  • Patent number: 9195650
    Abstract: Techniques for converting spoken speech into written speech are provided. The techniques include transcribing input speech via speech recognition, mapping each spoken utterance from input speech into a corresponding formal utterance, and mapping each formal utterance into a stylistically formatted written utterance.
    Type: Grant
    Filed: September 23, 2014
    Date of Patent: November 24, 2015
    Assignee: Nuance Communications, Inc.
    Inventors: Sara H. Basson, Rick A. Hamilton, II, Dan Ning Jiang, Dimitri Kanevsky, David Nahamoo, Michael A. Picheny, Bhuvana Ramabhadran, Tara N. Sainath
  • Publication number: 20150120275
    Abstract: Techniques for converting spoken speech into written speech are provided. The techniques include transcribing input speech via speech recognition, mapping each spoken utterance from input speech into a corresponding formal utterance, and mapping each formal utterance into a stylistically formatted written utterance.
    Type: Application
    Filed: September 23, 2014
    Publication date: April 30, 2015
    Applicant: Nuance Communications, Inc.
    Inventors: Sara H. Basson, Rick A. Hamilton, II, Dan Ning Jiang, Dimitri Kanevsky, David Nahamoo, Michael A. Picheny, Bhuvana Ramabhadran, Tara N. Sainath