Patents by Inventor Mei-Yuh Hwang
Mei-Yuh Hwang has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 7076422Abstract: A speech recognition system recognizes filled pause utterances made by a speaker. In one embodiment, an ergodic model is used to acoustically model filled pauses that provides flexibility allowing varying utterances of the filled pauses to be made. The ergodic HMM model can also be used for other types of noise such as but limited to breathing, keyboard operation, microphone noise, laughter, door openings and/or closings, or any other noise occurring in the environment of the user or made by the user. Similarly, silence can be modeled using an ergodic HMM model. Recognition can be used with N-gram, context-free grammar or hybrid language models.Type: GrantFiled: March 13, 2003Date of Patent: July 11, 2006Assignee: Microsoft CorporationInventor: Mei-Yuh Hwang
-
Patent number: 7031918Abstract: Unsupervised speech data is provided to a speech recognizer that recognizes the speech data and outputs a recognition result along with a confidence measure for each recognized word. A task-related acoustic model is generated based on the recognition result, the speech data and the confidence measure. Additional task independent model can be used. The speech data can be weighted by the confidence measure in generating the acoustic model so that only data that has been recognized with a high degree of confidence will weigh heavily in generation of the acoustic model. The acoustic model can be formed from a Gaussian mean and variance of the data.Type: GrantFiled: March 20, 2002Date of Patent: April 18, 2006Assignee: Microsoft CorporationInventor: Mei Yuh Hwang
-
Patent number: 7006972Abstract: The present invention generates a task-dependent acoustic model from a supervised task-independent corpus and further adapted it with an unsupervised task dependent corpus. The task-independent corpus includes task-independent training data which has an acoustic representation of words and a sequence of transcribed words corresponding to the acoustic representation. A relevance measure is defined for each of the words in the task-independent data. The relevance measure is used to weight the data associated with each of the words in the task-independent training data. The task-dependent acoustic model is then trained based on the weighted data for the words in the task-independent training data.Type: GrantFiled: March 20, 2002Date of Patent: February 28, 2006Assignee: Microsoft CorporationInventor: Mei Yuh Hwang
-
Patent number: 6973427Abstract: A method and computer-readable medium convert the text of a word and a user's pronunciation of the word into a phonetic description to be added to a speech recognition lexicon. Initially, two possible phonetic descriptions are generated. One phonetic description is formed from the text of the word. The other phonetic description is formed by decoding a speech signal representing the user's pronunciation of the word. Both phonetic descriptions are scored based on their correspondence to the user's pronunciation. The phonetic description with the highest score is then selected for entry in the speech recognition lexicon.Type: GrantFiled: December 26, 2000Date of Patent: December 6, 2005Assignee: Microsoft CorporationInventors: Mei-Yuh Hwang, Fileno A. Alleva, Rebecca C. Weiss
-
Publication number: 20050203739Abstract: A method and apparatus are provided for segmenting words into component parts. Under the invention, mutual information scores for pairs of graphoneme units found in a set of words are determined. Each graphoneme unit includes at least one letter. The graphoneme units of one pair of graphoneme units are combined based on the mutual information score. This forms a new graphoneme unit. Under one aspect of the invention, a syllable n-gram model is trained based on words that have been segmented into syllables using mutual information. The syllable n-gram model is used to segment a phonetic representation of a new word into syllables. Similarly, an inventory of morphemes is formed using mutual information and a morpheme n-gram is trained that can be used to segment a new word into a sequence of morphemes.Type: ApplicationFiled: March 10, 2004Publication date: September 15, 2005Applicant: Microsoft CorporationInventors: Mei-Yuh Hwang, Li Jiang
-
Publication number: 20050203738Abstract: A method and computer-readable medium convert the text of a word and a user's pronunciation of the word into a phonetic description to be added to a speech recognition lexicon. Initially, a plurality of at least two possible phonetic descriptions are generated. One phonetic description is formed by decoding a speech signal representing a user's pronunciation of the word. At least one other phonetic description is generated from the text of the word. The plurality of possible sequences comprising speech-based and text-based phonetic descriptions are aligned and scored in a single graph based on their correspondence to the user's pronunciation. The phonetic description with the highest score is then selected for entry in the speech recognition lexicon.Type: ApplicationFiled: March 10, 2004Publication date: September 15, 2005Applicant: Microsoft CorporationInventor: Mei-Yuh Hwang
-
Publication number: 20050187769Abstract: A method and computer-readable medium use syllable-like units (SLUs) to decode a pronunciation into a phonetic description. The syllable-like units are generally larger than a single phoneme but smaller than a word. The present invention provides a means for defining these syllable-like units and for generating a language model based on these syllable-like units that can be used in the decoding process. As SLUs are longer than phonemes, they contain more acoustic contextual clues and better lexical constraints for speech recognition. Thus, the phoneme accuracy produced from SLU recognition is much better than all-phone sequence recognition.Type: ApplicationFiled: April 20, 2005Publication date: August 25, 2005Applicant: Microsoft CorporationInventors: Mei-Yuh Hwang, Fileno Alleva, Rebecca Weiss
-
Publication number: 20050159949Abstract: An automatic speech recognition system recognizes user changes to dictated text and infers whether such changes result from the user changing his/her mind, or whether such changes are a result of a recognition error. If a recognition error is detected, the system uses the type of user correction to modify itself to reduce the chance that such recognition error will occur again. Accordingly, the system and methods provide for significant speech recognition learning with little or no additional user interaction.Type: ApplicationFiled: January 20, 2004Publication date: July 21, 2005Applicant: Microsoft CorporationInventors: Dong Yu, Peter Mau, Mei-Yuh Hwang, Alejandro Acero
-
Publication number: 20040181410Abstract: A speech recognition system recognizes filled pause utterances made by a speaker. In one embodiment, an ergodic model is used to acoustically model filled pauses that provides flexibility allowing varying utterances of the filled pauses to be made. The ergodic HMM model can also be used for other types of noise such as but limited to breathing, keyboard operation, microphone noise, laughter, door openings and/or closings, or any other noise occurring in the environment of the user or made by the user. Similarly, silence can be modeled using an ergodic HMM model. Recognition can be used with N-gram, context-free grammar or hybrid language models.Type: ApplicationFiled: March 13, 2003Publication date: September 16, 2004Applicant: Microsoft CorporationInventor: Mei-Yuh Hwang
-
Patent number: 6694296Abstract: The speech recognizer includes a dictation language model providing a dictation model output indicative of a likely word sequence recognized based on an input utterance. A spelling language model provides a spelling model output indicative of a likely letter sequence recognized based on the input utterance. An acoustic model provides an acoustic model output indicative of a likely speech unit recognized based on the input utterances. A speech recognition component is configured to access the dictation language model, the spelling language model and the acoustic model. The speech recognition component weights the dictation model output and the spelling model output in calculating likely recognized speech based on the input utterance. The speech recognizer can also be configured to confine spelled speech to an active lexicon.Type: GrantFiled: November 3, 2000Date of Patent: February 17, 2004Assignee: Microsoft CorporationInventors: Fileno A. Alleva, Mei-Yuh Hwang, Yun-Cheng Ju
-
Publication number: 20030182120Abstract: Unsupervised speech data is provided to a speech recognizer that recognizes the speech data and outputs a recognition result along with a confidence measure for each recognized word. A task-related acoustic model is generated based on the recognition result, the speech data and the confidence measure. The speech data can be weighted by the confidence measure in generating the acoustic model so that only data that has been recognized with a high degree of confidence will weigh heavily in generation of the acoustic model.Type: ApplicationFiled: March 20, 2002Publication date: September 25, 2003Inventor: Mei Yuh Hwang
-
Publication number: 20030182121Abstract: The present invention generates a task-dependent acoustic model from a supervised task-independent corpus and further adapted it with an unsupervised task dependent corpus. The task-independent corpus includes task-independent training data which has an acoustic representation of words and a sequence of transcribed words corresponding to the acoustic representation. A relevance measure is defined for each of the words in the task-independent data. The relevance measure is used to weight the data associated with each of the words in the task-independent training data. The task-dependent acoustic model is then trained based on the weighted data for the words in the task-independent training data.Type: ApplicationFiled: March 20, 2002Publication date: September 25, 2003Inventor: Mei Yuh Hwang
-
Publication number: 20020082831Abstract: A method and computer-readable medium convert the text of a word and a user's pronunciation of the word into a phonetic description to be added to a speech recognition lexicon. Initially, two possible phonetic descriptions are generated. One phonetic description is formed from the text of the word. The other phonetic description is formed by decoding a speech signal representing the user's pronunciation of the word. Both phonetic descriptions are scored based on their correspondence to the user's pronunciation. The phonetic description with the highest score is then selected for entry in the speech recognition lexicon.Type: ApplicationFiled: December 26, 2000Publication date: June 27, 2002Inventors: Mei-Yuh Hwang, Fileno A. Alleva, Rebecca C. Weiss
-
Patent number: 6336108Abstract: The invention performs speech recognition using an array of mixtures of Bayesian networks. A mixture of Bayesian networks (MBN) consists of plural hypothesis-specific Bayesian networks (HSBNs) having possibly hidden and observed variables. A common external hidden variable is associated with the MBN, but is not included in any of the HSBNs. The number of HSBNs in the MBN corresponds to the number of states of the common external hidden variable, and each HSBN models the world under the hypothesis that the common external hidden variable is in a corresponding one of those states. In accordance with the invention, the MBNs encode the probabilities of observing the sets of acoustic observations given the utterance of a respective one of said parts of speech. Each of the HSBNs encodes the probabilities of observing the sets of acoustic observations given the utterance of a respective one of the parts of speech and given a hidden common variable being in a particular state.Type: GrantFiled: December 23, 1998Date of Patent: January 1, 2002Assignee: Microsoft CorporationInventors: Bo Thiesson, Christopher A. Meek, David Maxwell Chickering, David Earl Heckerman, Fileno A. Alleva, Mei-Yuh Hwang
-
Patent number: 6263308Abstract: Automated methods and apparatus for synchronizing audio and text data, e.g., in the form of electronic files, representing audio and text expressions of the same work or information are described. Also described are automated methods of detecting errors and other discrepancies between the audio and text versions of the same work. A speech recognition operation is performed on the audio data initially using a speaker independent acoustic model. The recognized text in addition to audio time stamps are produced by the speech recognition operation. The recognized text is compared to the text in text data to identify correctly recognized words. The acoustic model is then retrained using the correctly recognized text and corresponding audio segments from the audio data transforming the initial acoustic model into a speaker trained acoustic model. The retrained acoustic model is then used to perform an additional speech recognition operation on the audio data.Type: GrantFiled: March 20, 2000Date of Patent: July 17, 2001Assignee: Microsoft CorporationInventors: David E. Heckerman, Fileno A. Alleva, Robert L. Rounthwaite, Daniel Rosen, Mei-Yuh Hwang, Yoram Yaacovi, John L. Manferdelli
-
Patent number: 6260011Abstract: Automated methods and apparatus for synchronizing audio and text data, e.g., in the form of electronic files, representing audio and text expressions of the same work or information are described. A statistical language model is generated from the text data. A speech recognition operation is then performed on the audio data using the generated language model and a speaker independent acoustic model. Silence is modeled as a word which can be recognized. The speech recognition operation produces a time indexed set of recognized words some of which may be silence. The recognized words are globally aligned with the words in the text data. Recognized periods of silence, which correspond to expected periods of silence, and are adjoined by one or more correctly recognized words are identified as points where the text and audio files should be synchronized, e.g., by the insertion of bi-directional pointers.Type: GrantFiled: March 20, 2000Date of Patent: July 10, 2001Assignee: Microsoft CorporationInventors: David E. Heckerman, Fileno A. Alleva, Robert L. Rounthwaite, Daniel Rosen, Mei-Yuh Hwang, Yoram Yaacovi, John L. Manferdelli
-
Patent number: 6141641Abstract: The present invention includes a system for recognizing speech based on an input data stream. The system includes an acoustic model which has a model size. The model is adjustable to a desired size based on characteristics of a computer system on which the recognition system is run.Type: GrantFiled: April 15, 1998Date of Patent: October 31, 2000Assignee: Microsoft CorporationInventors: Mei-Yuh Hwang, Xuedong D. Huang
-
Patent number: 6076056Abstract: Speech recognition is performed by receiving isolated speech training data indicative of a plurality of discretely spoken training words, and receiving continuous speech training data indicative of a plurality of continuously spoken training words. A plurality of speech unit models is trained based on the isolated speech training data and the continuous speech training data. Speech is recognized based on the speech unit models trained.Type: GrantFiled: September 19, 1997Date of Patent: June 13, 2000Assignee: Microsoft CorporationInventors: Xuedong D. Huang, Fileno A. Alleva, Li Jiang, Mei-Yuh Hwang
-
Patent number: 5963903Abstract: A method and system for dynamically selecting words for training a speech recognition system. The speech recognition system models each phoneme using a hidden Markov model and represents each word as a sequence of phonemes. The training system ranks each phoneme for each frame according to the probability that the corresponding codeword will be spoken as part of the phoneme. The training system collects spoken utterances for which the corresponding word is known. The training system then aligns the codewords of each utterance with the phoneme that it is recognized to be part of. The training system then calculates an average rank for each phoneme using the aligned codewords for the aligned frames. Finally, the training system selects words for training that contain phonemes with a low rank.Type: GrantFiled: June 28, 1996Date of Patent: October 5, 1999Assignee: Microsoft CorporationInventors: Hsiao-Wuen Hon, Xuedong D. Huang, Mei-Yuh Hwang, Li Jiang, Yun-Cheng Ju, Milind V. Mahajan, Michael J. Rozak
-
Patent number: 5794197Abstract: A speech recognition method provides improved modeling in recognition accuracy using hidden Markov models. During training, the method creates a senone tree for each state of each phoneme encountered in a data set of training words. All output distributions received for a selected state of a selected phoneme in the set of training words are clustered together in a root node of a senone tree. Each node of the tree beginning with the root node is divided into two nodes by asking linguistic questions regarding the phonemes immediately to the left and right of a central phoneme of a triphone. At a predetermined point, the tree creation stops, resulting in leaves representing clustered output distributions known as senones. The senone trees allow all possible triphones to be mapped into a sequence of senones simply by traversing the senone trees associated with the central phoneme of the triphone.Type: GrantFiled: May 2, 1997Date of Patent: August 11, 1998Assignee: Micrsoft CorporationInventors: Fileno A. Alleva, Xuedong Huang, Mei-Yuh Hwang