Patents by Inventor Michael Picheny

Michael Picheny has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 10249292
    Abstract: Speaker diarization is performed on audio data including speech by a first speaker, speech by a second speaker, and silence. The speaker diarization includes segmenting the audio data using a long short-term memory (LSTM) recurrent neural network (RNN) to identify change points of the audio data that divide the audio data into segments. The speaker diarization includes assigning a label selected from a group of labels to each segment of the audio data using the LSTM RNN. The group of labels comprising includes labels corresponding to the first speaker, the second speaker, and the silence. Each change point is a transition from one of the first speaker, the second speaker, and the silence to a different one of the first speaker, the second speaker, and the silence. Speech recognition can be performed on the segments that each correspond to one of the first speaker and the second speaker.
    Type: Grant
    Filed: December 14, 2016
    Date of Patent: April 2, 2019
    Assignee: International Business Machines Corporation
    Inventors: Dimitrios B. Dimitriadis, David C. Haws, Michael Picheny, George Saon, Samuel Thomas
  • Publication number: 20180166067
    Abstract: Audio features, such as perceptual linear prediction (PLP) features and time derivatives thereof, are extracted from frames of training audio data including speech by multiple speakers, and silence, such as by using linear discriminant analysis (LDA). The frames are clustered into k-means clusters using distance measures, such as Mahalanobis distance measures, of means and variances of the extracted audio features of the frames. A recurrent neural network (RNN) is trained on the extracted audio features of the frames and cluster identifiers of the k-means clusters into which the frames have been clustered. The RNN is applied to audio data to segment audio data into segments that each correspond to one of the cluster identifiers. Each segment can be assigned a label corresponding to one of the cluster identifiers. Speech recognition can be performed on the segments.
    Type: Application
    Filed: December 14, 2016
    Publication date: June 14, 2018
    Inventors: Dimitrios B. Dimitriadis, David C. Haws, Michael Picheny, George Saon, Samuel Thomas
  • Publication number: 20180166066
    Abstract: Speaker diarization is performed on audio data including speech by a first speaker, speech by a second speaker, and silence. The speaker diarization includes segmenting the audio data using a long short-term memory (LSTM) recurrent neural network (RNN) to identify change points of the audio data that divide the audio data into segments. The speaker diarization includes assigning a label selected from a group of labels to each segment of the audio data using the LSTM RNN. The group of labels comprising includes labels corresponding to the first speaker, the second speaker, and the silence. Each change point is a transition from one of the first speaker, the second speaker, and the silence to a different one of the first speaker, the second speaker, and the silence. Speech recognition can be performed on the segments that each correspond to one of the first speaker and the second speaker.
    Type: Application
    Filed: December 14, 2016
    Publication date: June 14, 2018
    Inventors: Dimitrios B. Dimitriadis, David C. Haws, Michael Picheny, George Saon, Samuel Thomas
  • Publication number: 20180005111
    Abstract: Disclosed herein are systems, methods, and computer-readable media for classifying a set of inputs via a supervised classifier model that utilizes a novel activation function that provides the capability to learn a scale parameter in addition to a bias parameter and other weight parameters.
    Type: Application
    Filed: June 30, 2016
    Publication date: January 4, 2018
    Inventors: Upendra V. Chaudhari, Michael A. Picheny
  • Patent number: 9704482
    Abstract: A method for spoken term detection, comprising generating a time-marked word list, wherein the time-marked word list is an output of an automatic speech recognition system, generating an index from the time-marked word list, wherein generating the index comprises creating a word loop weighted finite state transducer for each utterance, i, receiving a plurality of keyword queries, and searching the index for a plurality of keyword hits.
    Type: Grant
    Filed: March 11, 2015
    Date of Patent: July 11, 2017
    Assignee: International Business Machines Corporation
    Inventors: Brian E. D. Kingsbury, Lidia Mangu, Michael A. Picheny, George A. Saon
  • Patent number: 9697830
    Abstract: A method for spoken term detection, comprising generating a time-marked word list, wherein the time-marked word list is an output of an automatic speech recognition system, generating an index from the time-marked word list, wherein generating the index comprises creating a word loop weighted finite state transducer for each utterance, i, receiving a plurality of keyword queries, and searching the index for a plurality of keyword hits.
    Type: Grant
    Filed: June 25, 2015
    Date of Patent: July 4, 2017
    Assignee: International Business Machines Corporation
    Inventors: Brian E. D. Kingsbury, Lidia Mangu, Michael A. Picheny, George A. Saon
  • Publication number: 20170154264
    Abstract: A method, executed by a computer, includes monitoring a conversation between a plurality of meeting participants, identifying a conversational focus within the conversation, generating at least one question corresponding to the conversational focus, and retrieving at least one answer corresponding to the at least one question. A computer system and computer program product corresponding to the method are also disclosed herein.
    Type: Application
    Filed: November 30, 2015
    Publication date: June 1, 2017
    Inventors: Stanley Chen, Kenneth W. Church, Robert G. Farrell, Vaibhava Goel, Lidia L. Mangu, Etienne Marcheret, Michael A. Picheny, Bhuvana Ramabhadran, Laurence P. Sansone, Abhinav Sethy, Samuel Thomas
  • Publication number: 20160267906
    Abstract: A method for spoken term detection, comprising generating a time-marked word list, wherein the time-marked word list is an output of an automatic speech recognition system, generating an index from the time-marked word list, wherein generating the index comprises creating a word loop weighted finite state transducer for each utterance, i, receiving a plurality of keyword queries, and searching the index for a plurality of keyword hits.
    Type: Application
    Filed: March 11, 2015
    Publication date: September 15, 2016
    Inventors: Brian E.D. Kingsbury, Lidia Mangu, Michael A. Picheny, George A. Saon
  • Publication number: 20160267907
    Abstract: A method for spoken term detection, comprising generating a time-marked word list, wherein the time-marked word list is an output of an automatic speech recognition system, generating an index from the time-marked word list, wherein generating the index comprises creating a word loop weighted finite state transducer for each utterance, i, receiving a plurality of keyword queries, and searching the index for a plurality of keyword hits.
    Type: Application
    Filed: June 25, 2015
    Publication date: September 15, 2016
    Inventors: Brian E.D. Kingsbury, Lidia Mangu, Michael A. Picheny, George A. Saon
  • Patent number: 9195650
    Abstract: Techniques for converting spoken speech into written speech are provided. The techniques include transcribing input speech via speech recognition, mapping each spoken utterance from input speech into a corresponding formal utterance, and mapping each formal utterance into a stylistically formatted written utterance.
    Type: Grant
    Filed: September 23, 2014
    Date of Patent: November 24, 2015
    Assignee: Nuance Communications, Inc.
    Inventors: Sara H. Basson, Rick A. Hamilton, II, Dan Ning Jiang, Dimitri Kanevsky, David Nahamoo, Michael A. Picheny, Bhuvana Ramabhadran, Tara N. Sainath
  • Publication number: 20150120275
    Abstract: Techniques for converting spoken speech into written speech are provided. The techniques include transcribing input speech via speech recognition, mapping each spoken utterance from input speech into a corresponding formal utterance, and mapping each formal utterance into a stylistically formatted written utterance.
    Type: Application
    Filed: September 23, 2014
    Publication date: April 30, 2015
    Applicant: Nuance Communications, Inc.
    Inventors: Sara H. Basson, Rick A. Hamilton, II, Dan Ning Jiang, Dimitri Kanevsky, David Nahamoo, Michael A. Picheny, Bhuvana Ramabhadran, Tara N. Sainath
  • Patent number: 8924210
    Abstract: Techniques for converting spoken speech into written speech are provided. The techniques include transcribing input speech via speech recognition, mapping each spoken utterance from input speech into a corresponding formal utterance, and mapping each formal utterance into a stylistically formatted written utterance.
    Type: Grant
    Filed: May 28, 2014
    Date of Patent: December 30, 2014
    Assignee: Nuance Communications, Inc.
    Inventors: Sara H. Basson, Rick Hamilton, Dan Ning Jiang, Dimitri Kanevsky, David Nahamoo, Michael Picheny, Bhuvana Ramabhadran, Tara N. Sainath
  • Patent number: 8856004
    Abstract: Techniques for converting spoken speech into written speech are provided. The techniques include transcribing input speech via speech recognition, mapping each spoken utterance from input speech into a corresponding formal utterance, and mapping each formal utterance into a stylistically formatted written utterance.
    Type: Grant
    Filed: May 13, 2011
    Date of Patent: October 7, 2014
    Assignee: Nuance Communications, Inc.
    Inventors: Sara H. Basson, Rick Hamilton, Dan Ning Jiang, Dimitri Kanevsky, David Nahamoo, Michael Picheny, Bhuvana Ramabhadran, Tara N. Sainath
  • Publication number: 20140278410
    Abstract: Techniques for converting spoken speech into written speech are provided. The techniques include transcribing input speech via speech recognition, mapping each spoken utterance from input speech into a corresponding formal utterance, and mapping each formal utterance into a stylistically formatted written utterance.
    Type: Application
    Filed: May 28, 2014
    Publication date: September 18, 2014
    Applicant: Nuance Communications, Inc.
    Inventors: Sara H. Basson, Rick Hamilton, Dan Ning Jiang, Dimitri Kanevsky, David Nahamoo, Michael Picheny, Bhuvana Ramabhadran, Tara N. Sainath
  • Publication number: 20120290299
    Abstract: Techniques for converting spoken speech into written speech are provided. The techniques include transcribing input speech via speech recognition, mapping each spoken utterance from input speech into a corresponding formal utterance, and mapping each formal utterance into a stylistically formatted written utterance.
    Type: Application
    Filed: May 13, 2011
    Publication date: November 15, 2012
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Sara H. Basson, Rick Hamilton, Dan Ning Jiang, Dimitri Kanevsky, David Nahamoo, Michael Picheny, Bhuvana Ramabhadran, Tara N. Sainath
  • Patent number: 7716052
    Abstract: A method, apparatus and a computer program product to generate an audible speech word that corresponds to text. The method includes providing a text word and, in response to the text word, processing pre-recorded speech segments that are derived from a plurality of speakers to selectively concatenate together speech segments based on at least one cost function to form audio data for generating an audible speech word that corresponds to the text word. A data structure is also provided for use in a concatenative text-to-speech system that includes a plurality of speech segments derived from a plurality of speakers, where each speech segment includes an associated attribute vector each of which is comprised of at least one attribute vector element that identifies the speaker from which the speech segment was derived.
    Type: Grant
    Filed: April 7, 2005
    Date of Patent: May 11, 2010
    Assignee: Nuance Communications, Inc.
    Inventors: Andrew S. Aaron, Ellen M. Eide, Wael M. Hamza, Michael A. Picheny, Charles T. Rutherfoord, Zhi Wei Shuang, Maria E. Smith
  • Patent number: 7702510
    Abstract: Systems and methods for dynamically selecting among text-to-speech (TTS) systems. Exemplary embodiments of the systems and methods include identifying text for converting into a speech waveform, synthesizing said text by three TTS systems, generating a candidate waveform from each of the three systems, generating a score from each of the three systems, comparing each of the three scores, selecting a score based on a criteria and selecting one of the three waveforms based on the selected of the three scores.
    Type: Grant
    Filed: January 12, 2007
    Date of Patent: April 20, 2010
    Assignee: Nuance Communications, Inc.
    Inventors: Ellen M. Eide, Raul Fernandez, Wael M. Hamza, Michael A. Picheny
  • Patent number: 7475015
    Abstract: A system and method for speech recognition includes generating a set of likely hypotheses in recognizing speech, rescoring the likely hypotheses by using semantic content by employing semantic structured language models, and scoring parse trees to identify a best sentence according to the sentence's parse tree by employing the semantic structured language models to clarify the recognized speech.
    Type: Grant
    Filed: September 5, 2003
    Date of Patent: January 6, 2009
    Assignee: International Business Machines Corporation
    Inventors: Mark E. Epstein, Hakan Erdogan, Yuqing Gao, Michael A. Picheny, Ruhi Sarikaya
  • Publication number: 20080172234
    Abstract: Systems and methods for dynamically selecting among text-to-speech (TTS) systems. Exemplary embodiments of the systems and methods include identifying text for converting into a speech waveform, synthesizing said text by three TTS systems, generating a candidate waveform from each of the three systems, generating a score from each of the three systems, comparing each of the three scores, selecting a score based on a criteria and selecting one of the three waveforms based on the selected of the three scores.
    Type: Application
    Filed: January 12, 2007
    Publication date: July 17, 2008
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Ellen M. Eide, Raul Fernandez, Wael M. Hamza, Michael A. Picheny
  • Publication number: 20080167876
    Abstract: A method and computer program product for providing paraphrasing in a text-to-speech (TTS) system is provided. The method includes receiving an input text, parsing the input text, and determining a paraphrase of the input text. The method also includes synthesizing the paraphrase into synthesized speech. The method further includes selecting synthesized speech to output, which includes: assigning a score to each synthesized speech associated with each paraphrase, comparing the score of each synthesized speech associated with each paraphrase, and selecting the top-scoring synthesized speech to output. Furthermore, the method includes outputting the selected synthesized speech.
    Type: Application
    Filed: January 4, 2007
    Publication date: July 10, 2008
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Raimo Bakis, Ellen M. Eide, Wael Hamza, Michael A. Picheny