Patents by Inventor Ciprian Chelba

Ciprian Chelba has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20230025739
    Abstract: Aspects of the technology employ a machine translation quality prediction (MTQP) model to refine datasets that are used in training machine translation systems. This includes receiving, by a machine translation quality prediction model, a sentence pair of a source sentence and a translated output (802). Then performing feature extraction on the sentence pair using a set of two or more feature extractors, where each feature extractor generates a corresponding feature vector (804). The corresponding feature vectors from the set of feature extractors are concatenated together (806). And the concatenated feature vectors are applied to a feedforward neural network, in which the feedforward neural network generates a machine translation quality prediction score for the translated output (808).
    Type: Application
    Filed: June 29, 2022
    Publication date: January 26, 2023
    Inventors: Junpei Zhou, Yuezhang Li, Ciprian Chelba, Fangxiaoyu Feng, Bowen Liang, Pidong Wang
  • Patent number: 8725509
    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, relating to language models stored for digital language processing. In one aspect, a method includes the actions of generating a language model, including: receiving a collection of n-grams from a corpus, each n-gram of the collection having a corresponding first probability of occurring in the corpus, and generating a trie representing the collection of n-grams, the trie being represented using one or more arrays of integers, and compressing an array representation of the trie using block encoding; and using the language model to identify a second probability of a particular string of words occurring.
    Type: Grant
    Filed: June 17, 2009
    Date of Patent: May 13, 2014
    Assignee: Google Inc.
    Inventors: Boulos Harb, Ciprian Chelba, Jeffrey A. Dean, Sanjay Ghemawat
  • Patent number: 8706491
    Abstract: One feature of the present invention uses the parsing capabilities of a structured language model in the information extraction process. During training, the structured language model is first initialized with syntactically annotated training data. The model is then trained by generating parses on semantically annotated training data enforcing annotated constituent boundaries. The syntactic labels in the parse trees generated by the parser are then replaced with joint syntactic and semantic labels. The model is then trained by generating parses on the semantically annotated training data enforcing the semantic tags or labels found in the training data. The trained model can then be used to extract information from test data using the parses generated by the model.
    Type: Grant
    Filed: August 24, 2010
    Date of Patent: April 22, 2014
    Assignee: Microsoft Corporation
    Inventors: Ciprian Chelba, Milind Mahajan
  • Patent number: 8335683
    Abstract: The present invention involves using one or more statistical classifiers in order to perform task classification on natural language inputs. In another embodiment, the statistical classifiers can be used in conjunction with a rule-based classifier to perform task classification.
    Type: Grant
    Filed: January 23, 2003
    Date of Patent: December 18, 2012
    Assignee: Microsoft Corporation
    Inventors: Alejandro Acero, Ciprian Chelba, Ye-Yi Wang, Leon Wong, Brendan Frey
  • Patent number: 8306818
    Abstract: Methods are disclosed for estimating language models such that the conditional likelihood of a class given a word string, which is very well correlated with classification accuracy, is maximized. The methods comprise tuning statistical language model parameters jointly for all classes such that a classifier discriminates between the correct class and the incorrect ones for a given training sentence or utterance. Specific embodiments of the present invention pertain to implementation of the rational function growth transform in the context of a discriminative training technique for n-gram classifiers.
    Type: Grant
    Filed: April 15, 2008
    Date of Patent: November 6, 2012
    Assignee: Microsoft Corporation
    Inventors: Ciprian Chelba, Alejandro Acero, Milind Mahajan
  • Patent number: 8175878
    Abstract: Systems, methods, and apparatuses, including computer program products, are provided for representing language models. In some implementations, a computer-implemented method is provided. The method includes generating a compact language model including receiving a collection of n-grams from the corpus, each n-gram of the collection having a corresponding first probability of occurring in the corpus and generating a trie representing the collection of n-grams. The method also includes using the language model to identify a second probability of a particular string of words occurring.
    Type: Grant
    Filed: December 14, 2010
    Date of Patent: May 8, 2012
    Assignee: Google Inc.
    Inventors: Ciprian Chelba, Thorsten Brants
  • Patent number: 7877258
    Abstract: Systems, methods, and apparatuses, including computer program products, are provided for representing language models. In some implementations, a computer-implemented method is provided. The method includes generating a compact language model including receiving a collection of n-grams from the corpus, each n-gram of the collection having a corresponding first probability of occurring in the corpus and generating a trie representing the collection of n-grams. The method also includes using the language model to identify a second probability of a particular string of words occurring.
    Type: Grant
    Filed: March 29, 2007
    Date of Patent: January 25, 2011
    Assignee: Google Inc.
    Inventors: Ciprian Chelba, Thorsten Brants
  • Publication number: 20100318348
    Abstract: One feature of the present invention uses the parsing capabilities of a structured language model in the information extraction process. During training, the structured language model is first initialized with syntactically annotated training data. The model is then trained by generating parses on semantically annotated training data enforcing annotated constituent boundaries. The syntactic labels in the parse trees generated by the parser are then replaced with joint syntactic and semantic labels. The model is then trained by generating parses on the semantically annotated training data enforcing the semantic tags or labels found in the training data. The trained model can then be used to extract information from test data using the parses generated by the model.
    Type: Application
    Filed: August 24, 2010
    Publication date: December 16, 2010
    Applicant: MICROSOFT CORPORATION
    Inventors: Ciprian Chelba, Milind Mahajan
  • Patent number: 7805302
    Abstract: One feature of the present invention uses the parsing capabilities of a structured language model in the information extraction process. During training, the structured language model is first initialized with syntactically annotated training data. The model is then trained by generating parses on semantically annotated training data enforcing annotated constituent boundaries. The syntactic labels in the parse trees generated by the parser are then replaced with joint syntactic and semantic labels. The model is then trained by generating parses on the semantically annotated training data enforcing the semantic tags or labels found in the training data. The trained model can then be used to extract information from test data using the parses generated by the model.
    Type: Grant
    Filed: May 20, 2002
    Date of Patent: September 28, 2010
    Assignee: Microsoft Corporation
    Inventors: Ciprian Chelba, Milind Mahajan
  • Patent number: 7765102
    Abstract: A system and method for creating a mnemonics Language Model for use with a speech recognition software application, wherein the method includes generating an n-gram Language Model containing a predefined large body of characters, wherein the n-gram Language Model includes at least one character from the predefined large body of characters, constructing a new Language Model (LM) token for each of the at least one character, extracting pronunciations for each of the at least one character responsive to a predefined pronunciation dictionary to obtain a character pronunciation representation, creating at least one alternative pronunciation for each of the at least one character responsive to the character pronunciation representation to create an alternative pronunciation dictionary and compiling the n-gram Language Model for use with the speech recognition software application, wherein compiling the Language Model is responsive to the new Language Model token and the alternative pronunciation dictionary.
    Type: Grant
    Filed: July 11, 2008
    Date of Patent: July 27, 2010
    Assignee: Microsoft Corporation
    Inventors: David Mowatt, Robert Chambers, Ciprian Chelba, Qiang Wu
  • Patent number: 7624006
    Abstract: A statistical classifier is constructed by estimating Naïve Bayes classifiers such that the conditional likelihood of class given word sequence is maximized. The classifier is constructed using a rational function growth transform implemented for Naïve Bayes classifiers. The estimation method tunes the model parameters jointly for all classes such that the classifier discriminates between the correct class and the incorrect ones for a given training sentence or utterance. Optional parameter smoothing and/or convergence speedup can be used to improve model performance. The classifier can be integrated into a speech utterance classification system or other natural language processing system.
    Type: Grant
    Filed: September 15, 2004
    Date of Patent: November 24, 2009
    Assignee: Microsoft Corporation
    Inventors: Ciprian Chelba, Alejandro Acero
  • Patent number: 7478038
    Abstract: A method and apparatus are provided for adapting a language model. The method and apparatus provide supervised class-based adaptation of the language model utilizing in-domain semantic information.
    Type: Grant
    Filed: March 31, 2004
    Date of Patent: January 13, 2009
    Assignee: Microsoft Corporation
    Inventors: Ciprian Chelba, Milind Mahajan, Alejandro Acero, Yik-Cheung Tam
  • Publication number: 20080319749
    Abstract: A system and method for creating a mnemonics Language Model for use with a speech recognition software application, wherein the method includes generating an n-gram Language Model containing a predefined large body of characters, wherein the n-gram Language Model includes at least one character from the predefined large body of characters, constructing a new Language Model (LM) token for each of the at least one character, extracting pronunciations for each of the at least one character responsive to a predefined pronunciation dictionary to obtain a character pronunciation representation, creating at least one alternative pronunciation for each of the at least one character responsive to the character pronunciation representation to create an alternative pronunciation dictionary and compiling the n-gram Language Model for use with the speech recognition software application, wherein compiling the Language Model is responsive to the new Language Model token and the alternative pronunciation dictionary.
    Type: Application
    Filed: July 11, 2008
    Publication date: December 25, 2008
    Applicant: MICROSOFT CORPORATION
    Inventors: David Mowatt, Robert Chambers, Ciprian Chelba, Qiang Wu
  • Publication number: 20080215311
    Abstract: Methods are disclosed for estimating language models such that the conditional likelihood of a class given a word string, which is very well correlated with classification accuracy, is maximized. The methods comprise tuning statistical language model parameters jointly for all classes such that a classifier discriminates between the correct class and the incorrect ones for a given training sentence or utterance. Specific embodiments of the present invention pertain to implementation of the rational function growth transform in the context of a discriminative training technique for n-gram classifiers.
    Type: Application
    Filed: April 15, 2008
    Publication date: September 4, 2008
    Applicant: Microsoft Corporation
    Inventors: Ciprian Chelba, Alejandro Acero, Milind Mahajan
  • Patent number: 7418387
    Abstract: A system and method for creating a mnemonics Language Model for use with a speech recognition software application, wherein the method includes generating an n-gram Language Model containing a predefined large body of characters, wherein the n-gram Language Model includes at least one character from the predefined large body of characters, constructing a new language Model (LM) token for each of the at least one character, extracting pronunciations for each of the at least one character responsive to a predefined pronunciation dictionary to obtain a character pronunciation representation, creating at least one alternative pronunciation for each of the at least one character responsive to the character pronunciation representation to create an alternative pronunciation dictionary and compiling the n-gram Language Model for use with the speech recognition software application, wherein compiling the Language Model is responsive to the new Language Model token and the alternative pronunciation dictionary.
    Type: Grant
    Filed: November 24, 2004
    Date of Patent: August 26, 2008
    Assignee: Microsoft Corporation
    Inventors: David Mowatt, Robert Chambers, Ciprian Chelba, Qiang Wu
  • Patent number: 7406416
    Abstract: A method and apparatus are provided for storing parameters of a deleted interpolation language model as parameters of a backoff language model. In particular, the parameters of the deleted interpolation language model are stored in the standard ARPA format. Under one embodiment, the deleted interpolation language model parameters are formed using fractional counts.
    Type: Grant
    Filed: March 26, 2004
    Date of Patent: July 29, 2008
    Assignee: Microsoft Corporation
    Inventors: Ciprian Chelba, Milind Mahajan, Alejandro Acero
  • Patent number: 7379867
    Abstract: Methods are disclosed for estimating language models such that the conditional likelihood of a class given a word string, which is very well correlated with classification accuracy, is maximized. The methods comprise tuning statistical language model parameters jointly for all classes such that a classifier discriminates between the correct class and the incorrect ones for a given training sentence or utterance. Specific embodiments of the present invention pertain to implementation of the rational function growth transform in the context of a discriminative training technique for n-gram classifiers.
    Type: Grant
    Filed: June 3, 2003
    Date of Patent: May 27, 2008
    Assignee: Microsoft Corporation
    Inventors: Ciprian Chelba, Alejandro Acero, Milind Mahajan
  • Publication number: 20070143110
    Abstract: A computer-implemented method of indexing a speech lattice for search of audio corresponding to the speech lattice is provided. The method includes identifying at least two speech recognition hypotheses for a word which have time ranges satisfying a criteria. The method further includes merging the at least two speech recognition hypotheses to generate a merged speech recognition hypothesis for the word.
    Type: Application
    Filed: December 15, 2005
    Publication date: June 21, 2007
    Applicant: Microsoft Corporation
    Inventors: Alejandro Acero, Asela Gunawardana, Ciprian Chelba, Erik Selberg, Frank Torsten Seide, Patrick Nguyen, Roger Yu
  • Publication number: 20070106512
    Abstract: A speech segment is indexed by identifying at least two alternative word sequences for the speech segment. For each word in the alternative sequences, information is placed in an entry for the word in the index. Speech units are eliminated from entries in the index based on a comparison of a probability that the word appears in the speech segment and a threshold value.
    Type: Application
    Filed: November 9, 2005
    Publication date: May 10, 2007
    Applicant: Microsoft Corporation
    Inventors: Alejandro Acero, Ciprian Chelba, Jorge Sanchez
  • Publication number: 20070106509
    Abstract: An index for searching spoken documents having speech data and text meta-data is created by obtaining probabilities of occurrence of words and positional information of the words of the speech data and combining it with at least positional information of the words in the text meta-data. A single index can be created because the speech data and the text meta-data are treated the same and considered only different categories.
    Type: Application
    Filed: November 8, 2005
    Publication date: May 10, 2007
    Applicant: Microsoft Corporation
    Inventors: Alejandro Acero, Ciprian Chelba, Jorge Sanchez