Patents by Inventor Ciprian Chelba

Ciprian Chelba has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20060265222
    Abstract: A method of indexing a speech segment includes identifying at least two alternative word sequences based on the speech segment. For each word in the alternative sequences, information is placed in an entry for the word in the index. The information indicates the position of the word in at least one of the alternative sequences.
    Type: Application
    Filed: May 20, 2005
    Publication date: November 23, 2006
    Applicant: Microsoft Corporation
    Inventors: Ciprian Chelba, Alejandro Acero
  • Patent number: 7117153
    Abstract: A method of modeling a speech recognition system includes decoding a speech signal produced from a training text to produce a sequence of predicted speech units. The training text comprises a sequence of actual speech units that is used with the sequence of predicted speech units to form a confusion model. In further embodiments, the confusion model is used to decode a text to identify an error rate that would be expected if the speech recognition system decoded speech based on the text.
    Type: Grant
    Filed: February 13, 2003
    Date of Patent: October 3, 2006
    Assignee: Microsoft Corporation
    Inventors: Milind Mahajan, Yonggang Deng, Alejandro Acero, Asela J. R. Gunawardana, Ciprian Chelba
  • Patent number: 7103544
    Abstract: A method of modeling a speech recognition system includes decoding a speech signal produced from a training text to produce a sequence of predicted speech units. The training text comprises a sequence of actual speech units that is used with the sequence of predicted speech units to form a confusion model. In further embodiments, the confusion model is used to decode a text to identify an error rate that would be expected if the speech recognition system decoded speech based on the text.
    Type: Grant
    Filed: June 6, 2005
    Date of Patent: September 5, 2006
    Assignee: Microsoft Corporation
    Inventors: Milind Mahajan, Yonggang Deng, Alejandro Acero, Asela J. R. Gunawardana, Ciprian Chelba
  • Publication number: 20060111907
    Abstract: A system and method for creating a mnemonics Language Model for use with a speech recognition software application, wherein the method includes generating an n-gram Language Model containing a predefined large body of characters, wherein the n-gram Language Model includes at least one character from the predefined large body of characters, constructing a new language Model (LM) token for each of the at least one character, extracting pronunciations for each of the at least one character responsive to a predefined pronunciation dictionary to obtain a character pronunciation representation, creating at least one alternative pronunciation for each of the at least one character responsive to the character pronunciation representation to create an alternative pronunciation dictionary and compiling the n-gram Language Model for use with the speech recognition software application, wherein compiling the Language Model is responsive to the new Language Model token and the alternative pronunciation dictionary.
    Type: Application
    Filed: November 24, 2004
    Publication date: May 25, 2006
    Applicant: Microsoft Corporation
    Inventors: David Mowatt, Robert Chambers, Ciprian Chelba, Qiang Wu
  • Publication number: 20060074630
    Abstract: A statistical classifier is constructed by estimating Naïve Bayes classifiers such that the conditional likelihood of class given word sequence is maximized. The classifier is constructed using a rational function growth transform implemented for Naïve Bayes classifiers. The estimation method tunes the model parameters jointly for all classes such that the classifier discriminates between the correct class and the incorrect ones for a given training sentence or utterance. Optional parameter smoothing and/or convergence speedup can be used to improve model performance. The classifier can be integrated into a speech utterance classification system or other natural language processing system.
    Type: Application
    Filed: September 15, 2004
    Publication date: April 6, 2006
    Applicant: Microsoft Corporation
    Inventors: Ciprian Chelba, Alejandro Acero
  • Publication number: 20060020448
    Abstract: A method and apparatus are provided for selecting a form of capitalization for a text by determining a probability of a capitalization form for a word using a weighted sum of features. The features are based on the capitalization form and a context for the word.
    Type: Application
    Filed: October 29, 2004
    Publication date: January 26, 2006
    Applicant: Microsoft Corporation
    Inventors: Ciprian Chelba, Alejandro Acero
  • Publication number: 20060018541
    Abstract: A method and apparatus are provided for adapting an exponential probability model. In a first stage, a general-purpose background model is built from background data by determining a set of model parameters for the probability model based on a set of background data. The background model parameters are then used to define a prior model for the parameters of an adapted probability model that is adapted and more specific to an adaptation data set of interest. The adaptation data set is generally of much smaller size than the background data set. A second set of model parameters are then determined for the adapted probability model based on the set of adaptation data and the prior model.
    Type: Application
    Filed: October 29, 2004
    Publication date: January 26, 2006
    Applicant: Microsoft Corporation
    Inventors: Ciprian Chelba, Alejandro Acero
  • Publication number: 20050228641
    Abstract: A method and apparatus are provided for adapting a language model. The method and apparatus provide supervised class-based adaptation of the language model utilizing in-domain semantic information.
    Type: Application
    Filed: March 31, 2004
    Publication date: October 13, 2005
    Applicant: Microsoft Corporation
    Inventors: Ciprian Chelba, Milind Mahajan, Alejandro Acero, Yik-Cheung Tam
  • Publication number: 20050228670
    Abstract: A method of modeling a speech recognition system includes decoding a speech signal produced from a training text to produce a sequence of predicted speech units. The training text comprises a sequence of actual speech units that is used with the sequence of predicted speech units to form a confusion model. In further embodiments, the confusion model is used to decode a text to identify an error rate that would be expected if the speech recognition system decoded speech based on the text.
    Type: Application
    Filed: June 6, 2005
    Publication date: October 13, 2005
    Applicant: Microsoft Corporation
    Inventors: Milind Mahajan, Yonggang Deng, Alejandro Acero, Asela Gunawardana, Ciprian Chelba
  • Publication number: 20050216265
    Abstract: A method and apparatus are provided for storing parameters of a deleted interpolation language model as parameters of a backoff language model. In particular, the parameters of the deleted interpolation language model are stored in the standard ARPA format. Under one embodiment, the deleted interpolation language model parameters are formed using fractional counts.
    Type: Application
    Filed: March 26, 2004
    Publication date: September 29, 2005
    Applicant: Microsoft Corporation
    Inventors: Ciprian Chelba, Milind Mahajan, Alejandro Acero
  • Publication number: 20040249628
    Abstract: Methods are disclosed for estimating language models such that the conditional likelihood of a class given a word string, which is very well correlated with classification accuracy, is maximized. The methods comprise tuning statistical language model parameters jointly for all classes such that a classifier discriminates between the correct class and the incorrect ones for a given training sentence or utterance. Specific embodiments of the present invention pertain to implementation of the rational function growth transform in the context of a discriminative training technique for n-gram classifiers.
    Type: Application
    Filed: June 3, 2003
    Publication date: December 9, 2004
    Applicant: Microsoft Corporation
    Inventors: Ciprian Chelba, Alejandro Acero, Milind Mahajan
  • Publication number: 20040220809
    Abstract: The present invention thus uses a composite statistical model and rules-based grammar language model to perform both the speech recognition task and the natural language understanding task.
    Type: Application
    Filed: November 20, 2003
    Publication date: November 4, 2004
    Applicant: Microsoft Corporation One Microsoft Way
    Inventors: Ye-Yi Wang, Alejandro Acero, Ciprian Chelba
  • Publication number: 20040162730
    Abstract: A method of modeling a speech recognition system includes decoding a speech signal produced from a training text to produce a sequence of predicted speech units. The training text comprises a sequence of actual speech units that is used with the sequence of predicted speech units to form a confusion model. In further embodiments, the confusion model is used to decode a text to identify an error rate that would be expected if the speech recognition system decoded speech based on the text.
    Type: Application
    Filed: February 13, 2003
    Publication date: August 19, 2004
    Applicant: Microsoft Corporation
    Inventors: Milind Mahajan, Yonggang Deng, Alejandro Acero, Asela J.R. Gunawardana, Ciprian Chelba
  • Publication number: 20040148170
    Abstract: The present invention involves using one or more statistical classifiers in order to perform task classification on natural language inputs. In another embodiment, the statistical classifiers can be used in conjunction with a rule-based classifier to perform task classification. In one application, a statistical classifier is used in order ascertain if an input is a search query or a natural-language input.
    Type: Application
    Filed: May 30, 2003
    Publication date: July 29, 2004
    Inventors: Alejandro Acero, Ciprian Chelba, YeYi Wang, Leon Wong, Ravi Shahani, Michael Calcagno, Domenic Cipollone, Curtis Huttenhower
  • Publication number: 20040148154
    Abstract: The present invention involves using one or more statistical classifiers in order to perform task classification on natural language inputs. In another embodiment, the statistical classifiers can be used in conjunction with a rule-based classifier to perform task classification.
    Type: Application
    Filed: January 23, 2003
    Publication date: July 29, 2004
    Inventors: Alejandro Acero, Ciprian Chelba, YeYi Wang, Leon Wong
  • Publication number: 20030216905
    Abstract: One feature of the present invention uses the parsing capabilities of a structured language model in the information extraction process. During training, the structured language model is first initialized with syntactically annotated training data. The model is then trained by generating parses on semantically annotated training data enforcing annotated constituent boundaries. The syntactic labels in the parse trees generated by the parser are then replaced with joint syntactic and semantic labels. The model is then trained by generating parses on the semantically annotated training data enforcing the semantic tags or labels found in the training data. The trained model can then be used to extract information from test data using the parses generated by the model.
    Type: Application
    Filed: May 20, 2002
    Publication date: November 20, 2003
    Inventors: Ciprian Chelba, Milind Mahajan