Patents by Inventor Ciprian Chelba

Ciprian Chelba has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Method and apparatus for indexing speech

Publication number: 20060265222

Abstract: A method of indexing a speech segment includes identifying at least two alternative word sequences based on the speech segment. For each word in the alternative sequences, information is placed in an entry for the word in the index. The information indicates the position of the word in at least one of the alternative sequences.

Type: Application

Filed: May 20, 2005

Publication date: November 23, 2006

Applicant: Microsoft Corporation

Inventors: Ciprian Chelba, Alejandro Acero
Method and apparatus for predicting word error rates from text

Patent number: 7117153

Abstract: A method of modeling a speech recognition system includes decoding a speech signal produced from a training text to produce a sequence of predicted speech units. The training text comprises a sequence of actual speech units that is used with the sequence of predicted speech units to form a confusion model. In further embodiments, the confusion model is used to decode a text to identify an error rate that would be expected if the speech recognition system decoded speech based on the text.

Type: Grant

Filed: February 13, 2003

Date of Patent: October 3, 2006

Assignee: Microsoft Corporation

Inventors: Milind Mahajan, Yonggang Deng, Alejandro Acero, Asela J. R. Gunawardana, Ciprian Chelba
Method and apparatus for predicting word error rates from text

Patent number: 7103544

Abstract: A method of modeling a speech recognition system includes decoding a speech signal produced from a training text to produce a sequence of predicted speech units. The training text comprises a sequence of actual speech units that is used with the sequence of predicted speech units to form a confusion model. In further embodiments, the confusion model is used to decode a text to identify an error rate that would be expected if the speech recognition system decoded speech based on the text.

Type: Grant

Filed: June 6, 2005

Date of Patent: September 5, 2006

Assignee: Microsoft Corporation

Inventors: Milind Mahajan, Yonggang Deng, Alejandro Acero, Asela J. R. Gunawardana, Ciprian Chelba
Generic spelling mnemonics

Publication number: 20060111907

Abstract: A system and method for creating a mnemonics Language Model for use with a speech recognition software application, wherein the method includes generating an n-gram Language Model containing a predefined large body of characters, wherein the n-gram Language Model includes at least one character from the predefined large body of characters, constructing a new language Model (LM) token for each of the at least one character, extracting pronunciations for each of the at least one character responsive to a predefined pronunciation dictionary to obtain a character pronunciation representation, creating at least one alternative pronunciation for each of the at least one character responsive to the character pronunciation representation to create an alternative pronunciation dictionary and compiling the n-gram Language Model for use with the speech recognition software application, wherein compiling the Language Model is responsive to the new Language Model token and the alternative pronunciation dictionary.

Type: Application

Filed: November 24, 2004

Publication date: May 25, 2006

Applicant: Microsoft Corporation

Inventors: David Mowatt, Robert Chambers, Ciprian Chelba, Qiang Wu
Conditional maximum likelihood estimation of naive bayes probability models

Publication number: 20060074630

Abstract: A statistical classifier is constructed by estimating Naïve Bayes classifiers such that the conditional likelihood of class given word sequence is maximized. The classifier is constructed using a rational function growth transform implemented for Naïve Bayes classifiers. The estimation method tunes the model parameters jointly for all classes such that the classifier discriminates between the correct class and the incorrect ones for a given training sentence or utterance. Optional parameter smoothing and/or convergence speedup can be used to improve model performance. The classifier can be integrated into a speech utterance classification system or other natural language processing system.

Type: Application

Filed: September 15, 2004

Publication date: April 6, 2006

Applicant: Microsoft Corporation

Inventors: Ciprian Chelba, Alejandro Acero
Method and apparatus for capitalizing text using maximum entropy

Publication number: 20060020448

Abstract: A method and apparatus are provided for selecting a form of capitalization for a text by determining a probability of a capitalization form for a word using a weighted sum of features. The features are based on the capitalization form and a context for the word.

Type: Application

Filed: October 29, 2004

Publication date: January 26, 2006

Applicant: Microsoft Corporation

Inventors: Ciprian Chelba, Alejandro Acero
Adaptation of exponential models

Publication number: 20060018541

Abstract: A method and apparatus are provided for adapting an exponential probability model. In a first stage, a general-purpose background model is built from background data by determining a set of model parameters for the probability model based on a set of background data. The background model parameters are then used to define a prior model for the parameters of an adapted probability model that is adapted and more specific to an adaptation data set of interest. The adaptation data set is generally of much smaller size than the background data set. A second set of model parameters are then determined for the adapted probability model based on the set of adaptation data and the prior model.

Type: Application

Filed: October 29, 2004

Publication date: January 26, 2006

Applicant: Microsoft Corporation

Inventors: Ciprian Chelba, Alejandro Acero
Language model adaptation using semantic supervision

Publication number: 20050228641

Abstract: A method and apparatus are provided for adapting a language model. The method and apparatus provide supervised class-based adaptation of the language model utilizing in-domain semantic information.

Type: Application

Filed: March 31, 2004

Publication date: October 13, 2005

Applicant: Microsoft Corporation

Inventors: Ciprian Chelba, Milind Mahajan, Alejandro Acero, Yik-Cheung Tam
Method and apparatus for predicting word error rates from text

Publication number: 20050228670

Abstract: A method of modeling a speech recognition system includes decoding a speech signal produced from a training text to produce a sequence of predicted speech units. The training text comprises a sequence of actual speech units that is used with the sequence of predicted speech units to form a confusion model. In further embodiments, the confusion model is used to decode a text to identify an error rate that would be expected if the speech recognition system decoded speech based on the text.

Type: Application

Filed: June 6, 2005

Publication date: October 13, 2005

Applicant: Microsoft Corporation

Inventors: Milind Mahajan, Yonggang Deng, Alejandro Acero, Asela Gunawardana, Ciprian Chelba
Representation of a deleted interpolation N-gram language model in ARPA standard format

Publication number: 20050216265

Abstract: A method and apparatus are provided for storing parameters of a deleted interpolation language model as parameters of a backoff language model. In particular, the parameters of the deleted interpolation language model are stored in the standard ARPA format. Under one embodiment, the deleted interpolation language model parameters are formed using fractional counts.

Type: Application

Filed: March 26, 2004

Publication date: September 29, 2005

Applicant: Microsoft Corporation

Inventors: Ciprian Chelba, Milind Mahajan, Alejandro Acero
Discriminative training of language models for text and speech classification

Publication number: 20040249628

Abstract: Methods are disclosed for estimating language models such that the conditional likelihood of a class given a word string, which is very well correlated with classification accuracy, is maximized. The methods comprise tuning statistical language model parameters jointly for all classes such that a classifier discriminates between the correct class and the incorrect ones for a given training sentence or utterance. Specific embodiments of the present invention pertain to implementation of the rational function growth transform in the context of a discriminative training technique for n-gram classifiers.

Type: Application

Filed: June 3, 2003

Publication date: December 9, 2004

Applicant: Microsoft Corporation

Inventors: Ciprian Chelba, Alejandro Acero, Milind Mahajan
System with composite statistical and rules-based grammar model for speech recognition and natural language understanding

Publication number: 20040220809

Abstract: The present invention thus uses a composite statistical model and rules-based grammar language model to perform both the speech recognition task and the natural language understanding task.

Type: Application

Filed: November 20, 2003

Publication date: November 4, 2004

Applicant: Microsoft Corporation One Microsoft Way

Inventors: Ye-Yi Wang, Alejandro Acero, Ciprian Chelba
Method and apparatus for predicting word error rates from text

Publication number: 20040162730

Abstract: A method of modeling a speech recognition system includes decoding a speech signal produced from a training text to produce a sequence of predicted speech units. The training text comprises a sequence of actual speech units that is used with the sequence of predicted speech units to form a confusion model. In further embodiments, the confusion model is used to decode a text to identify an error rate that would be expected if the speech recognition system decoded speech based on the text.

Type: Application

Filed: February 13, 2003

Publication date: August 19, 2004

Applicant: Microsoft Corporation

Inventors: Milind Mahajan, Yonggang Deng, Alejandro Acero, Asela J.R. Gunawardana, Ciprian Chelba
Statistical classifiers for spoken language understanding and command/control scenarios

Publication number: 20040148170

Abstract: The present invention involves using one or more statistical classifiers in order to perform task classification on natural language inputs. In another embodiment, the statistical classifiers can be used in conjunction with a rule-based classifier to perform task classification. In one application, a statistical classifier is used in order ascertain if an input is a search query or a natural-language input.

Type: Application

Filed: May 30, 2003

Publication date: July 29, 2004

Inventors: Alejandro Acero, Ciprian Chelba, YeYi Wang, Leon Wong, Ravi Shahani, Michael Calcagno, Domenic Cipollone, Curtis Huttenhower
System for using statistical classifiers for spoken language understanding

Publication number: 20040148154

Abstract: The present invention involves using one or more statistical classifiers in order to perform task classification on natural language inputs. In another embodiment, the statistical classifiers can be used in conjunction with a rule-based classifier to perform task classification.

Type: Application

Filed: January 23, 2003

Publication date: July 29, 2004

Inventors: Alejandro Acero, Ciprian Chelba, YeYi Wang, Leon Wong
Applying a structured language model to information extraction

Publication number: 20030216905

Abstract: One feature of the present invention uses the parsing capabilities of a structured language model in the information extraction process. During training, the structured language model is first initialized with syntactically annotated training data. The model is then trained by generating parses on semantically annotated training data enforcing annotated constituent boundaries. The syntactic labels in the parse trees generated by the parser are then replaced with joint syntactic and semantic labels. The model is then trained by generating parses on the semantically annotated training data enforcing the semantic tags or labels found in the training data. The trained model can then be used to extract information from test data using the parses generated by the model.

Type: Application

Filed: May 20, 2002

Publication date: November 20, 2003

Inventors: Ciprian Chelba, Milind Mahajan

prev 1 2