Patents by Inventor Fileno A. Alleva

Fileno A. Alleva has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Method and apparatus for generating and displaying N-Best alternatives in a speech recognition system

Publication number: 20050091054

Abstract: The present invention is directed to a method and apparatus for generating alternatives to words indicative of recognized speech. A reference path of recognized words is generated, based upon input speech data. An operator selection input is received and is indicative of a selected portion of the recognized speech, for which alternatives are to be generated. Boundary conditions for alternatives to be generated are calculated based upon bounds of a reference subpath corresponding to the selected portion of the recognized speech. Alternate subpaths satisfying the boundary conditions are constructed from a hypothesis store which corresponds to the input speech data.

Type: Application

Filed: November 23, 2004

Publication date: April 28, 2005

Applicant: Microsoft Corporation

Inventors: Chris Thrasher, Fileno Alleva
Method and system for frame alignment and unsupervised adaptation of acoustic models

Publication number: 20050071162

Abstract: An unsupervised adaptation method and apparatus are provided that reduce the storage and time requirements associated with adaptation. Under the invention, utterances are converted into feature vectors, which are decoded to produce a transcript and alignment unit boundaries for the utterance. Individual alignment units and the feature vectors associated with those alignment units are then provided to an alignment function, which aligns the feature vectors with the states of each alignment unit. Because the alignment is performed within alignment unit boundaries, fewer feature vectors are used and the time for alignment is reduced. After alignment, the feature vector dimensions aligned to a state are added to dimension sums that are kept for that state. After all the states in an utterance have had their sums updated, the speech signal and the alignment units are deleted. Once sufficient frames of data have been received to perform adaptive training, the acoustic model is adapted.

Type: Application

Filed: November 12, 2004

Publication date: March 31, 2005

Applicant: Microsoft Corporation

Inventors: William Rockenbeck, Milind Mahajan, Fileno Alleva
Method and apparatus for generating and displaying N-best alternatives in a speech recognition system

Patent number: 6856956

Abstract: The present invention is directed to a method and apparatus for generating alternatives to words indicative of recognized speech. A reference path of recognized words is generated, based upon input speech data. An operator selection input is received and is indicative of a selected portion of the recognized speech, for which alternatives are to be generated. Boundary conditions for alternatives to be generated are calculated based upon bounds of a reference subpath corresponding to the selected portion of the recognized speech. Alternate subpaths satisfying the boundary conditions are constructed from a hypothesis store which corresponds to the input speech data.

Type: Grant

Filed: March 12, 2001

Date of Patent: February 15, 2005

Assignee: Microsoft Corporation

Inventors: Chris Thrasher, Fileno A. Alleva
Method and apparatus for the recognition of spelled spoken words

Patent number: 6694296

Abstract: The speech recognizer includes a dictation language model providing a dictation model output indicative of a likely word sequence recognized based on an input utterance. A spelling language model provides a spelling model output indicative of a likely letter sequence recognized based on the input utterance. An acoustic model provides an acoustic model output indicative of a likely speech unit recognized based on the input utterances. A speech recognition component is configured to access the dictation language model, the spelling language model and the acoustic model. The speech recognition component weights the dictation model output and the spelling model output in calculating likely recognized speech based on the input utterance. The speech recognizer can also be configured to confine spelled speech to an active lexicon.

Type: Grant

Filed: November 3, 2000

Date of Patent: February 17, 2004

Assignee: Microsoft Corporation

Inventors: Fileno A. Alleva, Mei-Yuh Hwang, Yun-Cheng Ju
Disambiguation language model

Publication number: 20020128831

Abstract: A language model for a language processing system such as a speech recognition system is formed as a function of associated characters, word phrases and context cues. A method and apparatus for generating the training corpus used to train the language model and a system or module using such a language model is disclosed.

Type: Application

Filed: January 31, 2001

Publication date: September 12, 2002

Inventors: Yun-cheng Ju, Fileno A. Alleva
Method and system for frame alignment and unsupervised adaptation of acoustic models

Publication number: 20020116190

Abstract: An unsupervised adaptation method and apparatus are provided that reduce the storage and time requirements associated with adaptation. Under the invention, utterances are converted into feature vectors, which are decoded to produce a transcript and alignment unit boundaries for the utterance. Individual alignment units and the feature vectors associated with those alignment units are then provided to an alignment function, which aligns the feature vectors with the states of each alignment unit. Because the alignment is performed within alignment unit boundaries, fewer feature vectors are used and the time for alignment is reduced. After alignment, the feature vector dimensions aligned to a state are added to dimension sums that are kept for that state. After all the states in an utterance have had their sums updated, the speech signal and the alignment units are deleted. Once sufficient frames of data have been received to perform adaptive training, the acoustic model is adapted.

Type: Application

Filed: December 22, 2000

Publication date: August 22, 2002

Inventors: William H. Rockenbeck, Milind V. Mahajan, Fileno A. Alleva
Method for adding phonetic descriptions to a speech recognition lexicon

Publication number: 20020082831

Abstract: A method and computer-readable medium convert the text of a word and a user's pronunciation of the word into a phonetic description to be added to a speech recognition lexicon. Initially, two possible phonetic descriptions are generated. One phonetic description is formed from the text of the word. The other phonetic description is formed by decoding a speech signal representing the user's pronunciation of the word. Both phonetic descriptions are scored based on their correspondence to the user's pronunciation. The phonetic description with the highest score is then selected for entry in the speech recognition lexicon.

Type: Application

Filed: December 26, 2000

Publication date: June 27, 2002

Inventors: Mei-Yuh Hwang, Fileno A. Alleva, Rebecca C. Weiss
Method and apparatus for generating and displaying N-best alternatives in a speech recognition system

Publication number: 20020052742

Abstract: The present invention is directed to a method and apparatus for generating alternatives to words indicative of recognized speech. A reference path of recognized words is generated, based upon input speech data. An operator selection input is received and is indicative of a selected portion of the recognized speech, for which alternatives are to be generated. Boundary conditions for alternatives to be generated are calculated based upon bounds of a reference subpath corresponding to the selected portion of the recognized speech. Alternate subpaths satisfying the boundary conditions are constructed from a hypothesis store which corresponds to the input speech data.

Type: Application

Filed: March 12, 2001

Publication date: May 2, 2002

Inventors: Chris Thrasher, Fileno A. Alleva
Speech recognition with mixtures of bayesian networks

Patent number: 6336108

Abstract: The invention performs speech recognition using an array of mixtures of Bayesian networks. A mixture of Bayesian networks (MBN) consists of plural hypothesis-specific Bayesian networks (HSBNs) having possibly hidden and observed variables. A common external hidden variable is associated with the MBN, but is not included in any of the HSBNs. The number of HSBNs in the MBN corresponds to the number of states of the common external hidden variable, and each HSBN models the world under the hypothesis that the common external hidden variable is in a corresponding one of those states. In accordance with the invention, the MBNs encode the probabilities of observing the sets of acoustic observations given the utterance of a respective one of said parts of speech. Each of the HSBNs encodes the probabilities of observing the sets of acoustic observations given the utterance of a respective one of the parts of speech and given a hidden common variable being in a particular state.

Type: Grant

Filed: December 23, 1998

Date of Patent: January 1, 2002

Assignee: Microsoft Corporation

Inventors: Bo Thiesson, Christopher A. Meek, David Maxwell Chickering, David Earl Heckerman, Fileno A. Alleva, Mei-Yuh Hwang
Methods and apparatus for performing speech recognition using acoustic models which are improved through an interactive process

Patent number: 6263308

Abstract: Automated methods and apparatus for synchronizing audio and text data, e.g., in the form of electronic files, representing audio and text expressions of the same work or information are described. Also described are automated methods of detecting errors and other discrepancies between the audio and text versions of the same work. A speech recognition operation is performed on the audio data initially using a speaker independent acoustic model. The recognized text in addition to audio time stamps are produced by the speech recognition operation. The recognized text is compared to the text in text data to identify correctly recognized words. The acoustic model is then retrained using the correctly recognized text and corresponding audio segments from the audio data transforming the initial acoustic model into a speaker trained acoustic model. The retrained acoustic model is then used to perform an additional speech recognition operation on the audio data.

Type: Grant

Filed: March 20, 2000

Date of Patent: July 17, 2001

Assignee: Microsoft Corporation

Inventors: David E. Heckerman, Fileno A. Alleva, Robert L. Rounthwaite, Daniel Rosen, Mei-Yuh Hwang, Yoram Yaacovi, John L. Manferdelli
Methods and apparatus for automatically synchronizing electronic audio files with electronic text files

Patent number: 6260011

Abstract: Automated methods and apparatus for synchronizing audio and text data, e.g., in the form of electronic files, representing audio and text expressions of the same work or information are described. A statistical language model is generated from the text data. A speech recognition operation is then performed on the audio data using the generated language model and a speaker independent acoustic model. Silence is modeled as a word which can be recognized. The speech recognition operation produces a time indexed set of recognized words some of which may be silence. The recognized words are globally aligned with the words in the text data. Recognized periods of silence, which correspond to expected periods of silence, and are adjoined by one or more correctly recognized words are identified as points where the text and audio files should be synchronized, e.g., by the insertion of bi-directional pointers.

Type: Grant

Filed: March 20, 2000

Date of Patent: July 10, 2001

Assignee: Microsoft Corporation

Inventors: David E. Heckerman, Fileno A. Alleva, Robert L. Rounthwaite, Daniel Rosen, Mei-Yuh Hwang, Yoram Yaacovi, John L. Manferdelli
Speech recognition system for recognizing continuous and isolated speech

Patent number: 6076056

Abstract: Speech recognition is performed by receiving isolated speech training data indicative of a plurality of discretely spoken training words, and receiving continuous speech training data indicative of a plurality of continuously spoken training words. A plurality of speech unit models is trained based on the isolated speech training data and the continuous speech training data. Speech is recognized based on the speech unit models trained.

Type: Grant

Filed: September 19, 1997

Date of Patent: June 13, 2000

Assignee: Microsoft Corporation

Inventors: Xuedong D. Huang, Fileno A. Alleva, Li Jiang, Mei-Yuh Hwang
Text normalization using a context-free grammar

Patent number: 5970449

Abstract: A text normalizer normalizes text that is output from a speech recognizer. The normalization of the text produces text that is less awkward and more familiar to recipients of the text. The text may be normalized to include audio content, video content, or combinations of audio and video contents. The text may also be normalized to produce a hypertext document. The text normalization is performed using a context-free grammar. The context-free grammar includes rules that specify how text is to be normalized. The context-free grammar may be organized as a tree that is used to parse text and facilitate normalization. The context-free grammar is extensible and may be readily changed.

Type: Grant

Filed: April 3, 1997

Date of Patent: October 19, 1999

Assignee: Microsoft Corporation

Inventors: Fileno A. Alleva, Michael J. Rozak, Larry J. Israel
Method and system for editing phrases during continuous speech recognition

Patent number: 5884258

Abstract: A method and system for editing words that have been misrecognized. The system allows a speaker to specify a number of alternative words to be displayed in a correction window by resizing the correction window. The system also displays the words in the correction window in alphabetical order. A preferred system eliminates the possibility, when a misrecognized word is respoken, that the respoken utterance will be again recognized as the same misrecognized word. The system, when operating with a word processor, allows the speaker to specify the amount of speech that is buffered before transferring to the word processor.

Type: Grant

Filed: October 31, 1996

Date of Patent: March 16, 1999

Assignee: Microsoft Corporation

Inventors: Michael J. Rozak, Fileno A Alleva
Senone tree representation and evaluation

Patent number: 5794197

Abstract: A speech recognition method provides improved modeling in recognition accuracy using hidden Markov models. During training, the method creates a senone tree for each state of each phoneme encountered in a data set of training words. All output distributions received for a selected state of a selected phoneme in the set of training words are clustered together in a root node of a senone tree. Each node of the tree beginning with the root node is divided into two nodes by asking linguistic questions regarding the phonemes immediately to the left and right of a central phoneme of a triphone. At a predetermined point, the tree creation stops, resulting in leaves representing clustered output distributions known as senones. The senone trees allow all possible triphones to be mapped into a sequence of senones simply by traversing the senone trees associated with the central phoneme of the triphone.

Type: Grant

Filed: May 2, 1997

Date of Patent: August 11, 1998

Assignee: Micrsoft Corporation

Inventors: Fileno A. Alleva, Xuedong Huang, Mei-Yuh Hwang
Method and system for encoding pronunciation prefix trees

Patent number: 5758024

Abstract: A computer system for linearly encoding a pronunciation prefix tree. The pronunciation prefix tree has nodes such that each non-root and non-leaf node represents a phoneme and wherein each leaf node represents a word formed by the phonemes represented by the non-leaf nodes in a path from the root node to the leaf node. Each leaf node has a probability associated with the word of the leaf node. The computer system creates a tree node dictionary containing an indication of the phonemes that compose each word. The computer system then orders the child nodes of each non-leaf node based on the highest probability of descendent leaf nodes of the child node. Then, for each non-leaf node, the computer system sets the probability of the non-leaf node to a probability based on the probability of its child nodes, and for each node, sets a factor of the node to the probability of the node divided by the probability of the parent node of the node.

Type: Grant

Filed: June 25, 1996

Date of Patent: May 26, 1998

Assignee: Microsoft Corporation

Inventor: Fileno A. Alleva
System and method for speech recognition using dynamically adjusted confidence measure

Patent number: 5710866

Abstract: A computer-implemented method of recognizing an input speech utterance compares the input speech utterance to a plurality of hidden Markov models to obtain a constrained acoustic score that reflects the probability that the hidden Markov model matches the input speech utterance. The method computes a confidence measure for each hidden Markov model that reflects the probability of the constrained acoustic score being correct. The computed confidence measure is then used to adjust the constrained acoustic score. Preferably, the confidence measure is computed based on a difference between the constrained acoustic score and an unconstrained acoustic score that is computed independently of any language context. In addition, a new confidence measure preferably is computed for each input speech frame from the input speech utterance so that the constrained acoustic score is adjusted for each input speech frame.

Type: Grant

Filed: May 26, 1995

Date of Patent: January 20, 1998

Assignee: Microsoft Corporation

Inventors: Fileno A. Alleva, Douglas H. Beeferman, Xuedong Huang

prev 1 2