Patents by Inventor Mei-Yuh Hwang

Mei-Yuh Hwang has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 7076422
    Abstract: A speech recognition system recognizes filled pause utterances made by a speaker. In one embodiment, an ergodic model is used to acoustically model filled pauses that provides flexibility allowing varying utterances of the filled pauses to be made. The ergodic HMM model can also be used for other types of noise such as but limited to breathing, keyboard operation, microphone noise, laughter, door openings and/or closings, or any other noise occurring in the environment of the user or made by the user. Similarly, silence can be modeled using an ergodic HMM model. Recognition can be used with N-gram, context-free grammar or hybrid language models.
    Type: Grant
    Filed: March 13, 2003
    Date of Patent: July 11, 2006
    Assignee: Microsoft Corporation
    Inventor: Mei-Yuh Hwang
  • Patent number: 7031918
    Abstract: Unsupervised speech data is provided to a speech recognizer that recognizes the speech data and outputs a recognition result along with a confidence measure for each recognized word. A task-related acoustic model is generated based on the recognition result, the speech data and the confidence measure. Additional task independent model can be used. The speech data can be weighted by the confidence measure in generating the acoustic model so that only data that has been recognized with a high degree of confidence will weigh heavily in generation of the acoustic model. The acoustic model can be formed from a Gaussian mean and variance of the data.
    Type: Grant
    Filed: March 20, 2002
    Date of Patent: April 18, 2006
    Assignee: Microsoft Corporation
    Inventor: Mei Yuh Hwang
  • Patent number: 7006972
    Abstract: The present invention generates a task-dependent acoustic model from a supervised task-independent corpus and further adapted it with an unsupervised task dependent corpus. The task-independent corpus includes task-independent training data which has an acoustic representation of words and a sequence of transcribed words corresponding to the acoustic representation. A relevance measure is defined for each of the words in the task-independent data. The relevance measure is used to weight the data associated with each of the words in the task-independent training data. The task-dependent acoustic model is then trained based on the weighted data for the words in the task-independent training data.
    Type: Grant
    Filed: March 20, 2002
    Date of Patent: February 28, 2006
    Assignee: Microsoft Corporation
    Inventor: Mei Yuh Hwang
  • Patent number: 6973427
    Abstract: A method and computer-readable medium convert the text of a word and a user's pronunciation of the word into a phonetic description to be added to a speech recognition lexicon. Initially, two possible phonetic descriptions are generated. One phonetic description is formed from the text of the word. The other phonetic description is formed by decoding a speech signal representing the user's pronunciation of the word. Both phonetic descriptions are scored based on their correspondence to the user's pronunciation. The phonetic description with the highest score is then selected for entry in the speech recognition lexicon.
    Type: Grant
    Filed: December 26, 2000
    Date of Patent: December 6, 2005
    Assignee: Microsoft Corporation
    Inventors: Mei-Yuh Hwang, Fileno A. Alleva, Rebecca C. Weiss
  • Publication number: 20050203739
    Abstract: A method and apparatus are provided for segmenting words into component parts. Under the invention, mutual information scores for pairs of graphoneme units found in a set of words are determined. Each graphoneme unit includes at least one letter. The graphoneme units of one pair of graphoneme units are combined based on the mutual information score. This forms a new graphoneme unit. Under one aspect of the invention, a syllable n-gram model is trained based on words that have been segmented into syllables using mutual information. The syllable n-gram model is used to segment a phonetic representation of a new word into syllables. Similarly, an inventory of morphemes is formed using mutual information and a morpheme n-gram is trained that can be used to segment a new word into a sequence of morphemes.
    Type: Application
    Filed: March 10, 2004
    Publication date: September 15, 2005
    Applicant: Microsoft Corporation
    Inventors: Mei-Yuh Hwang, Li Jiang
  • Publication number: 20050203738
    Abstract: A method and computer-readable medium convert the text of a word and a user's pronunciation of the word into a phonetic description to be added to a speech recognition lexicon. Initially, a plurality of at least two possible phonetic descriptions are generated. One phonetic description is formed by decoding a speech signal representing a user's pronunciation of the word. At least one other phonetic description is generated from the text of the word. The plurality of possible sequences comprising speech-based and text-based phonetic descriptions are aligned and scored in a single graph based on their correspondence to the user's pronunciation. The phonetic description with the highest score is then selected for entry in the speech recognition lexicon.
    Type: Application
    Filed: March 10, 2004
    Publication date: September 15, 2005
    Applicant: Microsoft Corporation
    Inventor: Mei-Yuh Hwang
  • Publication number: 20050187769
    Abstract: A method and computer-readable medium use syllable-like units (SLUs) to decode a pronunciation into a phonetic description. The syllable-like units are generally larger than a single phoneme but smaller than a word. The present invention provides a means for defining these syllable-like units and for generating a language model based on these syllable-like units that can be used in the decoding process. As SLUs are longer than phonemes, they contain more acoustic contextual clues and better lexical constraints for speech recognition. Thus, the phoneme accuracy produced from SLU recognition is much better than all-phone sequence recognition.
    Type: Application
    Filed: April 20, 2005
    Publication date: August 25, 2005
    Applicant: Microsoft Corporation
    Inventors: Mei-Yuh Hwang, Fileno Alleva, Rebecca Weiss
  • Publication number: 20050159949
    Abstract: An automatic speech recognition system recognizes user changes to dictated text and infers whether such changes result from the user changing his/her mind, or whether such changes are a result of a recognition error. If a recognition error is detected, the system uses the type of user correction to modify itself to reduce the chance that such recognition error will occur again. Accordingly, the system and methods provide for significant speech recognition learning with little or no additional user interaction.
    Type: Application
    Filed: January 20, 2004
    Publication date: July 21, 2005
    Applicant: Microsoft Corporation
    Inventors: Dong Yu, Peter Mau, Mei-Yuh Hwang, Alejandro Acero
  • Publication number: 20040181410
    Abstract: A speech recognition system recognizes filled pause utterances made by a speaker. In one embodiment, an ergodic model is used to acoustically model filled pauses that provides flexibility allowing varying utterances of the filled pauses to be made. The ergodic HMM model can also be used for other types of noise such as but limited to breathing, keyboard operation, microphone noise, laughter, door openings and/or closings, or any other noise occurring in the environment of the user or made by the user. Similarly, silence can be modeled using an ergodic HMM model. Recognition can be used with N-gram, context-free grammar or hybrid language models.
    Type: Application
    Filed: March 13, 2003
    Publication date: September 16, 2004
    Applicant: Microsoft Corporation
    Inventor: Mei-Yuh Hwang
  • Patent number: 6694296
    Abstract: The speech recognizer includes a dictation language model providing a dictation model output indicative of a likely word sequence recognized based on an input utterance. A spelling language model provides a spelling model output indicative of a likely letter sequence recognized based on the input utterance. An acoustic model provides an acoustic model output indicative of a likely speech unit recognized based on the input utterances. A speech recognition component is configured to access the dictation language model, the spelling language model and the acoustic model. The speech recognition component weights the dictation model output and the spelling model output in calculating likely recognized speech based on the input utterance. The speech recognizer can also be configured to confine spelled speech to an active lexicon.
    Type: Grant
    Filed: November 3, 2000
    Date of Patent: February 17, 2004
    Assignee: Microsoft Corporation
    Inventors: Fileno A. Alleva, Mei-Yuh Hwang, Yun-Cheng Ju
  • Publication number: 20030182120
    Abstract: Unsupervised speech data is provided to a speech recognizer that recognizes the speech data and outputs a recognition result along with a confidence measure for each recognized word. A task-related acoustic model is generated based on the recognition result, the speech data and the confidence measure. The speech data can be weighted by the confidence measure in generating the acoustic model so that only data that has been recognized with a high degree of confidence will weigh heavily in generation of the acoustic model.
    Type: Application
    Filed: March 20, 2002
    Publication date: September 25, 2003
    Inventor: Mei Yuh Hwang
  • Publication number: 20030182121
    Abstract: The present invention generates a task-dependent acoustic model from a supervised task-independent corpus and further adapted it with an unsupervised task dependent corpus. The task-independent corpus includes task-independent training data which has an acoustic representation of words and a sequence of transcribed words corresponding to the acoustic representation. A relevance measure is defined for each of the words in the task-independent data. The relevance measure is used to weight the data associated with each of the words in the task-independent training data. The task-dependent acoustic model is then trained based on the weighted data for the words in the task-independent training data.
    Type: Application
    Filed: March 20, 2002
    Publication date: September 25, 2003
    Inventor: Mei Yuh Hwang
  • Publication number: 20020082831
    Abstract: A method and computer-readable medium convert the text of a word and a user's pronunciation of the word into a phonetic description to be added to a speech recognition lexicon. Initially, two possible phonetic descriptions are generated. One phonetic description is formed from the text of the word. The other phonetic description is formed by decoding a speech signal representing the user's pronunciation of the word. Both phonetic descriptions are scored based on their correspondence to the user's pronunciation. The phonetic description with the highest score is then selected for entry in the speech recognition lexicon.
    Type: Application
    Filed: December 26, 2000
    Publication date: June 27, 2002
    Inventors: Mei-Yuh Hwang, Fileno A. Alleva, Rebecca C. Weiss
  • Patent number: 6336108
    Abstract: The invention performs speech recognition using an array of mixtures of Bayesian networks. A mixture of Bayesian networks (MBN) consists of plural hypothesis-specific Bayesian networks (HSBNs) having possibly hidden and observed variables. A common external hidden variable is associated with the MBN, but is not included in any of the HSBNs. The number of HSBNs in the MBN corresponds to the number of states of the common external hidden variable, and each HSBN models the world under the hypothesis that the common external hidden variable is in a corresponding one of those states. In accordance with the invention, the MBNs encode the probabilities of observing the sets of acoustic observations given the utterance of a respective one of said parts of speech. Each of the HSBNs encodes the probabilities of observing the sets of acoustic observations given the utterance of a respective one of the parts of speech and given a hidden common variable being in a particular state.
    Type: Grant
    Filed: December 23, 1998
    Date of Patent: January 1, 2002
    Assignee: Microsoft Corporation
    Inventors: Bo Thiesson, Christopher A. Meek, David Maxwell Chickering, David Earl Heckerman, Fileno A. Alleva, Mei-Yuh Hwang
  • Patent number: 6263308
    Abstract: Automated methods and apparatus for synchronizing audio and text data, e.g., in the form of electronic files, representing audio and text expressions of the same work or information are described. Also described are automated methods of detecting errors and other discrepancies between the audio and text versions of the same work. A speech recognition operation is performed on the audio data initially using a speaker independent acoustic model. The recognized text in addition to audio time stamps are produced by the speech recognition operation. The recognized text is compared to the text in text data to identify correctly recognized words. The acoustic model is then retrained using the correctly recognized text and corresponding audio segments from the audio data transforming the initial acoustic model into a speaker trained acoustic model. The retrained acoustic model is then used to perform an additional speech recognition operation on the audio data.
    Type: Grant
    Filed: March 20, 2000
    Date of Patent: July 17, 2001
    Assignee: Microsoft Corporation
    Inventors: David E. Heckerman, Fileno A. Alleva, Robert L. Rounthwaite, Daniel Rosen, Mei-Yuh Hwang, Yoram Yaacovi, John L. Manferdelli
  • Patent number: 6260011
    Abstract: Automated methods and apparatus for synchronizing audio and text data, e.g., in the form of electronic files, representing audio and text expressions of the same work or information are described. A statistical language model is generated from the text data. A speech recognition operation is then performed on the audio data using the generated language model and a speaker independent acoustic model. Silence is modeled as a word which can be recognized. The speech recognition operation produces a time indexed set of recognized words some of which may be silence. The recognized words are globally aligned with the words in the text data. Recognized periods of silence, which correspond to expected periods of silence, and are adjoined by one or more correctly recognized words are identified as points where the text and audio files should be synchronized, e.g., by the insertion of bi-directional pointers.
    Type: Grant
    Filed: March 20, 2000
    Date of Patent: July 10, 2001
    Assignee: Microsoft Corporation
    Inventors: David E. Heckerman, Fileno A. Alleva, Robert L. Rounthwaite, Daniel Rosen, Mei-Yuh Hwang, Yoram Yaacovi, John L. Manferdelli
  • Patent number: 6141641
    Abstract: The present invention includes a system for recognizing speech based on an input data stream. The system includes an acoustic model which has a model size. The model is adjustable to a desired size based on characteristics of a computer system on which the recognition system is run.
    Type: Grant
    Filed: April 15, 1998
    Date of Patent: October 31, 2000
    Assignee: Microsoft Corporation
    Inventors: Mei-Yuh Hwang, Xuedong D. Huang
  • Patent number: 6076056
    Abstract: Speech recognition is performed by receiving isolated speech training data indicative of a plurality of discretely spoken training words, and receiving continuous speech training data indicative of a plurality of continuously spoken training words. A plurality of speech unit models is trained based on the isolated speech training data and the continuous speech training data. Speech is recognized based on the speech unit models trained.
    Type: Grant
    Filed: September 19, 1997
    Date of Patent: June 13, 2000
    Assignee: Microsoft Corporation
    Inventors: Xuedong D. Huang, Fileno A. Alleva, Li Jiang, Mei-Yuh Hwang
  • Patent number: 5963903
    Abstract: A method and system for dynamically selecting words for training a speech recognition system. The speech recognition system models each phoneme using a hidden Markov model and represents each word as a sequence of phonemes. The training system ranks each phoneme for each frame according to the probability that the corresponding codeword will be spoken as part of the phoneme. The training system collects spoken utterances for which the corresponding word is known. The training system then aligns the codewords of each utterance with the phoneme that it is recognized to be part of. The training system then calculates an average rank for each phoneme using the aligned codewords for the aligned frames. Finally, the training system selects words for training that contain phonemes with a low rank.
    Type: Grant
    Filed: June 28, 1996
    Date of Patent: October 5, 1999
    Assignee: Microsoft Corporation
    Inventors: Hsiao-Wuen Hon, Xuedong D. Huang, Mei-Yuh Hwang, Li Jiang, Yun-Cheng Ju, Milind V. Mahajan, Michael J. Rozak
  • Patent number: 5794197
    Abstract: A speech recognition method provides improved modeling in recognition accuracy using hidden Markov models. During training, the method creates a senone tree for each state of each phoneme encountered in a data set of training words. All output distributions received for a selected state of a selected phoneme in the set of training words are clustered together in a root node of a senone tree. Each node of the tree beginning with the root node is divided into two nodes by asking linguistic questions regarding the phonemes immediately to the left and right of a central phoneme of a triphone. At a predetermined point, the tree creation stops, resulting in leaves representing clustered output distributions known as senones. The senone trees allow all possible triphones to be mapped into a sequence of senones simply by traversing the senone trees associated with the central phoneme of the triphone.
    Type: Grant
    Filed: May 2, 1997
    Date of Patent: August 11, 1998
    Assignee: Micrsoft Corporation
    Inventors: Fileno A. Alleva, Xuedong Huang, Mei-Yuh Hwang