Patents by Inventor Mei-Yuh Hwang

Mei-Yuh Hwang has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Modelling and processing filled pauses and noises in speech recognition

Patent number: 7076422

Abstract: A speech recognition system recognizes filled pause utterances made by a speaker. In one embodiment, an ergodic model is used to acoustically model filled pauses that provides flexibility allowing varying utterances of the filled pauses to be made. The ergodic HMM model can also be used for other types of noise such as but limited to breathing, keyboard operation, microphone noise, laughter, door openings and/or closings, or any other noise occurring in the environment of the user or made by the user. Similarly, silence can be modeled using an ergodic HMM model. Recognition can be used with N-gram, context-free grammar or hybrid language models.

Type: Grant

Filed: March 13, 2003

Date of Patent: July 11, 2006

Assignee: Microsoft Corporation

Inventor: Mei-Yuh Hwang
Generating a task-adapted acoustic model from one or more supervised and/or unsupervised corpora

Patent number: 7031918

Abstract: Unsupervised speech data is provided to a speech recognizer that recognizes the speech data and outputs a recognition result along with a confidence measure for each recognized word. A task-related acoustic model is generated based on the recognition result, the speech data and the confidence measure. Additional task independent model can be used. The speech data can be weighted by the confidence measure in generating the acoustic model so that only data that has been recognized with a high degree of confidence will weigh heavily in generation of the acoustic model. The acoustic model can be formed from a Gaussian mean and variance of the data.

Type: Grant

Filed: March 20, 2002

Date of Patent: April 18, 2006

Assignee: Microsoft Corporation

Inventor: Mei Yuh Hwang
Generating a task-adapted acoustic model from one or more different corpora

Patent number: 7006972

Abstract: The present invention generates a task-dependent acoustic model from a supervised task-independent corpus and further adapted it with an unsupervised task dependent corpus. The task-independent corpus includes task-independent training data which has an acoustic representation of words and a sequence of transcribed words corresponding to the acoustic representation. A relevance measure is defined for each of the words in the task-independent data. The relevance measure is used to weight the data associated with each of the words in the task-independent training data. The task-dependent acoustic model is then trained based on the weighted data for the words in the task-independent training data.

Type: Grant

Filed: March 20, 2002

Date of Patent: February 28, 2006

Assignee: Microsoft Corporation

Inventor: Mei Yuh Hwang
Method for adding phonetic descriptions to a speech recognition lexicon

Patent number: 6973427

Abstract: A method and computer-readable medium convert the text of a word and a user's pronunciation of the word into a phonetic description to be added to a speech recognition lexicon. Initially, two possible phonetic descriptions are generated. One phonetic description is formed from the text of the word. The other phonetic description is formed by decoding a speech signal representing the user's pronunciation of the word. Both phonetic descriptions are scored based on their correspondence to the user's pronunciation. The phonetic description with the highest score is then selected for entry in the speech recognition lexicon.

Type: Grant

Filed: December 26, 2000

Date of Patent: December 6, 2005

Assignee: Microsoft Corporation

Inventors: Mei-Yuh Hwang, Fileno A. Alleva, Rebecca C. Weiss
Generating large units of graphonemes with mutual information criterion for letter to sound conversion

Publication number: 20050203739

Abstract: A method and apparatus are provided for segmenting words into component parts. Under the invention, mutual information scores for pairs of graphoneme units found in a set of words are determined. Each graphoneme unit includes at least one letter. The graphoneme units of one pair of graphoneme units are combined based on the mutual information score. This forms a new graphoneme unit. Under one aspect of the invention, a syllable n-gram model is trained based on words that have been segmented into syllables using mutual information. The syllable n-gram model is used to segment a phonetic representation of a new word into syllables. Similarly, an inventory of morphemes is formed using mutual information and a morpheme n-gram is trained that can be used to segment a new word into a sequence of morphemes.

Type: Application

Filed: March 10, 2004

Publication date: September 15, 2005

Applicant: Microsoft Corporation

Inventors: Mei-Yuh Hwang, Li Jiang
New-word pronunciation learning using a pronunciation graph

Publication number: 20050203738

Abstract: A method and computer-readable medium convert the text of a word and a user's pronunciation of the word into a phonetic description to be added to a speech recognition lexicon. Initially, a plurality of at least two possible phonetic descriptions are generated. One phonetic description is formed by decoding a speech signal representing a user's pronunciation of the word. At least one other phonetic description is generated from the text of the word. The plurality of possible sequences comprising speech-based and text-based phonetic descriptions are aligned and scored in a single graph based on their correspondence to the user's pronunciation. The phonetic description with the highest score is then selected for entry in the speech recognition lexicon.

Type: Application

Filed: March 10, 2004

Publication date: September 15, 2005

Applicant: Microsoft Corporation

Inventor: Mei-Yuh Hwang
Method and apparatus for constructing and using syllable-like unit language models

Publication number: 20050187769

Abstract: A method and computer-readable medium use syllable-like units (SLUs) to decode a pronunciation into a phonetic description. The syllable-like units are generally larger than a single phoneme but smaller than a word. The present invention provides a means for defining these syllable-like units and for generating a language model based on these syllable-like units that can be used in the decoding process. As SLUs are longer than phonemes, they contain more acoustic contextual clues and better lexical constraints for speech recognition. Thus, the phoneme accuracy produced from SLU recognition is much better than all-phone sequence recognition.

Type: Application

Filed: April 20, 2005

Publication date: August 25, 2005

Applicant: Microsoft Corporation

Inventors: Mei-Yuh Hwang, Fileno Alleva, Rebecca Weiss
Automatic speech recognition learning using user corrections

Publication number: 20050159949

Abstract: An automatic speech recognition system recognizes user changes to dictated text and infers whether such changes result from the user changing his/her mind, or whether such changes are a result of a recognition error. If a recognition error is detected, the system uses the type of user correction to modify itself to reduce the chance that such recognition error will occur again. Accordingly, the system and methods provide for significant speech recognition learning with little or no additional user interaction.

Type: Application

Filed: January 20, 2004

Publication date: July 21, 2005

Applicant: Microsoft Corporation

Inventors: Dong Yu, Peter Mau, Mei-Yuh Hwang, Alejandro Acero
Modelling and processing filled pauses and noises in speech recognition

Publication number: 20040181410

Abstract: A speech recognition system recognizes filled pause utterances made by a speaker. In one embodiment, an ergodic model is used to acoustically model filled pauses that provides flexibility allowing varying utterances of the filled pauses to be made. The ergodic HMM model can also be used for other types of noise such as but limited to breathing, keyboard operation, microphone noise, laughter, door openings and/or closings, or any other noise occurring in the environment of the user or made by the user. Similarly, silence can be modeled using an ergodic HMM model. Recognition can be used with N-gram, context-free grammar or hybrid language models.

Type: Application

Filed: March 13, 2003

Publication date: September 16, 2004

Applicant: Microsoft Corporation

Inventor: Mei-Yuh Hwang
Method and apparatus for the recognition of spelled spoken words

Patent number: 6694296

Abstract: The speech recognizer includes a dictation language model providing a dictation model output indicative of a likely word sequence recognized based on an input utterance. A spelling language model provides a spelling model output indicative of a likely letter sequence recognized based on the input utterance. An acoustic model provides an acoustic model output indicative of a likely speech unit recognized based on the input utterances. A speech recognition component is configured to access the dictation language model, the spelling language model and the acoustic model. The speech recognition component weights the dictation model output and the spelling model output in calculating likely recognized speech based on the input utterance. The speech recognizer can also be configured to confine spelled speech to an active lexicon.

Type: Grant

Filed: November 3, 2000

Date of Patent: February 17, 2004

Assignee: Microsoft Corporation

Inventors: Fileno A. Alleva, Mei-Yuh Hwang, Yun-Cheng Ju
Generating a task-adapted acoustic model from one or more supervised and/or unsupervised corpora

Publication number: 20030182120

Abstract: Unsupervised speech data is provided to a speech recognizer that recognizes the speech data and outputs a recognition result along with a confidence measure for each recognized word. A task-related acoustic model is generated based on the recognition result, the speech data and the confidence measure. The speech data can be weighted by the confidence measure in generating the acoustic model so that only data that has been recognized with a high degree of confidence will weigh heavily in generation of the acoustic model.

Type: Application

Filed: March 20, 2002

Publication date: September 25, 2003

Inventor: Mei Yuh Hwang
Generating a task-adapted acoustic model from one or more different corpora

Publication number: 20030182121

Abstract: The present invention generates a task-dependent acoustic model from a supervised task-independent corpus and further adapted it with an unsupervised task dependent corpus. The task-independent corpus includes task-independent training data which has an acoustic representation of words and a sequence of transcribed words corresponding to the acoustic representation. A relevance measure is defined for each of the words in the task-independent data. The relevance measure is used to weight the data associated with each of the words in the task-independent training data. The task-dependent acoustic model is then trained based on the weighted data for the words in the task-independent training data.

Type: Application

Filed: March 20, 2002

Publication date: September 25, 2003

Inventor: Mei Yuh Hwang
Method for adding phonetic descriptions to a speech recognition lexicon

Publication number: 20020082831

Abstract: A method and computer-readable medium convert the text of a word and a user's pronunciation of the word into a phonetic description to be added to a speech recognition lexicon. Initially, two possible phonetic descriptions are generated. One phonetic description is formed from the text of the word. The other phonetic description is formed by decoding a speech signal representing the user's pronunciation of the word. Both phonetic descriptions are scored based on their correspondence to the user's pronunciation. The phonetic description with the highest score is then selected for entry in the speech recognition lexicon.

Type: Application

Filed: December 26, 2000

Publication date: June 27, 2002

Inventors: Mei-Yuh Hwang, Fileno A. Alleva, Rebecca C. Weiss
Speech recognition with mixtures of bayesian networks

Patent number: 6336108

Abstract: The invention performs speech recognition using an array of mixtures of Bayesian networks. A mixture of Bayesian networks (MBN) consists of plural hypothesis-specific Bayesian networks (HSBNs) having possibly hidden and observed variables. A common external hidden variable is associated with the MBN, but is not included in any of the HSBNs. The number of HSBNs in the MBN corresponds to the number of states of the common external hidden variable, and each HSBN models the world under the hypothesis that the common external hidden variable is in a corresponding one of those states. In accordance with the invention, the MBNs encode the probabilities of observing the sets of acoustic observations given the utterance of a respective one of said parts of speech. Each of the HSBNs encodes the probabilities of observing the sets of acoustic observations given the utterance of a respective one of the parts of speech and given a hidden common variable being in a particular state.

Type: Grant

Filed: December 23, 1998

Date of Patent: January 1, 2002

Assignee: Microsoft Corporation

Inventors: Bo Thiesson, Christopher A. Meek, David Maxwell Chickering, David Earl Heckerman, Fileno A. Alleva, Mei-Yuh Hwang
Methods and apparatus for performing speech recognition using acoustic models which are improved through an interactive process

Patent number: 6263308

Abstract: Automated methods and apparatus for synchronizing audio and text data, e.g., in the form of electronic files, representing audio and text expressions of the same work or information are described. Also described are automated methods of detecting errors and other discrepancies between the audio and text versions of the same work. A speech recognition operation is performed on the audio data initially using a speaker independent acoustic model. The recognized text in addition to audio time stamps are produced by the speech recognition operation. The recognized text is compared to the text in text data to identify correctly recognized words. The acoustic model is then retrained using the correctly recognized text and corresponding audio segments from the audio data transforming the initial acoustic model into a speaker trained acoustic model. The retrained acoustic model is then used to perform an additional speech recognition operation on the audio data.

Type: Grant

Filed: March 20, 2000

Date of Patent: July 17, 2001

Assignee: Microsoft Corporation

Inventors: David E. Heckerman, Fileno A. Alleva, Robert L. Rounthwaite, Daniel Rosen, Mei-Yuh Hwang, Yoram Yaacovi, John L. Manferdelli
Methods and apparatus for automatically synchronizing electronic audio files with electronic text files

Patent number: 6260011

Abstract: Automated methods and apparatus for synchronizing audio and text data, e.g., in the form of electronic files, representing audio and text expressions of the same work or information are described. A statistical language model is generated from the text data. A speech recognition operation is then performed on the audio data using the generated language model and a speaker independent acoustic model. Silence is modeled as a word which can be recognized. The speech recognition operation produces a time indexed set of recognized words some of which may be silence. The recognized words are globally aligned with the words in the text data. Recognized periods of silence, which correspond to expected periods of silence, and are adjoined by one or more correctly recognized words are identified as points where the text and audio files should be synchronized, e.g., by the insertion of bi-directional pointers.

Type: Grant

Filed: March 20, 2000

Date of Patent: July 10, 2001

Assignee: Microsoft Corporation

Inventors: David E. Heckerman, Fileno A. Alleva, Robert L. Rounthwaite, Daniel Rosen, Mei-Yuh Hwang, Yoram Yaacovi, John L. Manferdelli
Dynamically configurable acoustic model for speech recognition system

Patent number: 6141641

Abstract: The present invention includes a system for recognizing speech based on an input data stream. The system includes an acoustic model which has a model size. The model is adjustable to a desired size based on characteristics of a computer system on which the recognition system is run.

Type: Grant

Filed: April 15, 1998

Date of Patent: October 31, 2000

Assignee: Microsoft Corporation

Inventors: Mei-Yuh Hwang, Xuedong D. Huang
Speech recognition system for recognizing continuous and isolated speech

Patent number: 6076056

Abstract: Speech recognition is performed by receiving isolated speech training data indicative of a plurality of discretely spoken training words, and receiving continuous speech training data indicative of a plurality of continuously spoken training words. A plurality of speech unit models is trained based on the isolated speech training data and the continuous speech training data. Speech is recognized based on the speech unit models trained.

Type: Grant

Filed: September 19, 1997

Date of Patent: June 13, 2000

Assignee: Microsoft Corporation

Inventors: Xuedong D. Huang, Fileno A. Alleva, Li Jiang, Mei-Yuh Hwang
Method and system for dynamically adjusted training for speech recognition

Patent number: 5963903

Abstract: A method and system for dynamically selecting words for training a speech recognition system. The speech recognition system models each phoneme using a hidden Markov model and represents each word as a sequence of phonemes. The training system ranks each phoneme for each frame according to the probability that the corresponding codeword will be spoken as part of the phoneme. The training system collects spoken utterances for which the corresponding word is known. The training system then aligns the codewords of each utterance with the phoneme that it is recognized to be part of. The training system then calculates an average rank for each phoneme using the aligned codewords for the aligned frames. Finally, the training system selects words for training that contain phonemes with a low rank.

Type: Grant

Filed: June 28, 1996

Date of Patent: October 5, 1999

Assignee: Microsoft Corporation

Inventors: Hsiao-Wuen Hon, Xuedong D. Huang, Mei-Yuh Hwang, Li Jiang, Yun-Cheng Ju, Milind V. Mahajan, Michael J. Rozak
Senone tree representation and evaluation

Patent number: 5794197

Abstract: A speech recognition method provides improved modeling in recognition accuracy using hidden Markov models. During training, the method creates a senone tree for each state of each phoneme encountered in a data set of training words. All output distributions received for a selected state of a selected phoneme in the set of training words are clustered together in a root node of a senone tree. Each node of the tree beginning with the root node is divided into two nodes by asking linguistic questions regarding the phonemes immediately to the left and right of a central phoneme of a triphone. At a predetermined point, the tree creation stops, resulting in leaves representing clustered output distributions known as senones. The senone trees allow all possible triphones to be mapped into a sequence of senones simply by traversing the senone trees associated with the central phoneme of the triphone.

Type: Grant

Filed: May 2, 1997

Date of Patent: August 11, 1998

Assignee: Micrsoft Corporation

Inventors: Fileno A. Alleva, Xuedong Huang, Mei-Yuh Hwang

prev 1 2