Patents by Inventor Ciprian Chelba
Ciprian Chelba has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20230025739Abstract: Aspects of the technology employ a machine translation quality prediction (MTQP) model to refine datasets that are used in training machine translation systems. This includes receiving, by a machine translation quality prediction model, a sentence pair of a source sentence and a translated output (802). Then performing feature extraction on the sentence pair using a set of two or more feature extractors, where each feature extractor generates a corresponding feature vector (804). The corresponding feature vectors from the set of feature extractors are concatenated together (806). And the concatenated feature vectors are applied to a feedforward neural network, in which the feedforward neural network generates a machine translation quality prediction score for the translated output (808).Type: ApplicationFiled: June 29, 2022Publication date: January 26, 2023Inventors: Junpei Zhou, Yuezhang Li, Ciprian Chelba, Fangxiaoyu Feng, Bowen Liang, Pidong Wang
-
Patent number: 8725509Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, relating to language models stored for digital language processing. In one aspect, a method includes the actions of generating a language model, including: receiving a collection of n-grams from a corpus, each n-gram of the collection having a corresponding first probability of occurring in the corpus, and generating a trie representing the collection of n-grams, the trie being represented using one or more arrays of integers, and compressing an array representation of the trie using block encoding; and using the language model to identify a second probability of a particular string of words occurring.Type: GrantFiled: June 17, 2009Date of Patent: May 13, 2014Assignee: Google Inc.Inventors: Boulos Harb, Ciprian Chelba, Jeffrey A. Dean, Sanjay Ghemawat
-
Patent number: 8706491Abstract: One feature of the present invention uses the parsing capabilities of a structured language model in the information extraction process. During training, the structured language model is first initialized with syntactically annotated training data. The model is then trained by generating parses on semantically annotated training data enforcing annotated constituent boundaries. The syntactic labels in the parse trees generated by the parser are then replaced with joint syntactic and semantic labels. The model is then trained by generating parses on the semantically annotated training data enforcing the semantic tags or labels found in the training data. The trained model can then be used to extract information from test data using the parses generated by the model.Type: GrantFiled: August 24, 2010Date of Patent: April 22, 2014Assignee: Microsoft CorporationInventors: Ciprian Chelba, Milind Mahajan
-
Patent number: 8335683Abstract: The present invention involves using one or more statistical classifiers in order to perform task classification on natural language inputs. In another embodiment, the statistical classifiers can be used in conjunction with a rule-based classifier to perform task classification.Type: GrantFiled: January 23, 2003Date of Patent: December 18, 2012Assignee: Microsoft CorporationInventors: Alejandro Acero, Ciprian Chelba, Ye-Yi Wang, Leon Wong, Brendan Frey
-
Patent number: 8306818Abstract: Methods are disclosed for estimating language models such that the conditional likelihood of a class given a word string, which is very well correlated with classification accuracy, is maximized. The methods comprise tuning statistical language model parameters jointly for all classes such that a classifier discriminates between the correct class and the incorrect ones for a given training sentence or utterance. Specific embodiments of the present invention pertain to implementation of the rational function growth transform in the context of a discriminative training technique for n-gram classifiers.Type: GrantFiled: April 15, 2008Date of Patent: November 6, 2012Assignee: Microsoft CorporationInventors: Ciprian Chelba, Alejandro Acero, Milind Mahajan
-
Patent number: 8175878Abstract: Systems, methods, and apparatuses, including computer program products, are provided for representing language models. In some implementations, a computer-implemented method is provided. The method includes generating a compact language model including receiving a collection of n-grams from the corpus, each n-gram of the collection having a corresponding first probability of occurring in the corpus and generating a trie representing the collection of n-grams. The method also includes using the language model to identify a second probability of a particular string of words occurring.Type: GrantFiled: December 14, 2010Date of Patent: May 8, 2012Assignee: Google Inc.Inventors: Ciprian Chelba, Thorsten Brants
-
Patent number: 7877258Abstract: Systems, methods, and apparatuses, including computer program products, are provided for representing language models. In some implementations, a computer-implemented method is provided. The method includes generating a compact language model including receiving a collection of n-grams from the corpus, each n-gram of the collection having a corresponding first probability of occurring in the corpus and generating a trie representing the collection of n-grams. The method also includes using the language model to identify a second probability of a particular string of words occurring.Type: GrantFiled: March 29, 2007Date of Patent: January 25, 2011Assignee: Google Inc.Inventors: Ciprian Chelba, Thorsten Brants
-
Publication number: 20100318348Abstract: One feature of the present invention uses the parsing capabilities of a structured language model in the information extraction process. During training, the structured language model is first initialized with syntactically annotated training data. The model is then trained by generating parses on semantically annotated training data enforcing annotated constituent boundaries. The syntactic labels in the parse trees generated by the parser are then replaced with joint syntactic and semantic labels. The model is then trained by generating parses on the semantically annotated training data enforcing the semantic tags or labels found in the training data. The trained model can then be used to extract information from test data using the parses generated by the model.Type: ApplicationFiled: August 24, 2010Publication date: December 16, 2010Applicant: MICROSOFT CORPORATIONInventors: Ciprian Chelba, Milind Mahajan
-
Patent number: 7805302Abstract: One feature of the present invention uses the parsing capabilities of a structured language model in the information extraction process. During training, the structured language model is first initialized with syntactically annotated training data. The model is then trained by generating parses on semantically annotated training data enforcing annotated constituent boundaries. The syntactic labels in the parse trees generated by the parser are then replaced with joint syntactic and semantic labels. The model is then trained by generating parses on the semantically annotated training data enforcing the semantic tags or labels found in the training data. The trained model can then be used to extract information from test data using the parses generated by the model.Type: GrantFiled: May 20, 2002Date of Patent: September 28, 2010Assignee: Microsoft CorporationInventors: Ciprian Chelba, Milind Mahajan
-
Patent number: 7765102Abstract: A system and method for creating a mnemonics Language Model for use with a speech recognition software application, wherein the method includes generating an n-gram Language Model containing a predefined large body of characters, wherein the n-gram Language Model includes at least one character from the predefined large body of characters, constructing a new Language Model (LM) token for each of the at least one character, extracting pronunciations for each of the at least one character responsive to a predefined pronunciation dictionary to obtain a character pronunciation representation, creating at least one alternative pronunciation for each of the at least one character responsive to the character pronunciation representation to create an alternative pronunciation dictionary and compiling the n-gram Language Model for use with the speech recognition software application, wherein compiling the Language Model is responsive to the new Language Model token and the alternative pronunciation dictionary.Type: GrantFiled: July 11, 2008Date of Patent: July 27, 2010Assignee: Microsoft CorporationInventors: David Mowatt, Robert Chambers, Ciprian Chelba, Qiang Wu
-
Patent number: 7624006Abstract: A statistical classifier is constructed by estimating Naïve Bayes classifiers such that the conditional likelihood of class given word sequence is maximized. The classifier is constructed using a rational function growth transform implemented for Naïve Bayes classifiers. The estimation method tunes the model parameters jointly for all classes such that the classifier discriminates between the correct class and the incorrect ones for a given training sentence or utterance. Optional parameter smoothing and/or convergence speedup can be used to improve model performance. The classifier can be integrated into a speech utterance classification system or other natural language processing system.Type: GrantFiled: September 15, 2004Date of Patent: November 24, 2009Assignee: Microsoft CorporationInventors: Ciprian Chelba, Alejandro Acero
-
Patent number: 7478038Abstract: A method and apparatus are provided for adapting a language model. The method and apparatus provide supervised class-based adaptation of the language model utilizing in-domain semantic information.Type: GrantFiled: March 31, 2004Date of Patent: January 13, 2009Assignee: Microsoft CorporationInventors: Ciprian Chelba, Milind Mahajan, Alejandro Acero, Yik-Cheung Tam
-
Publication number: 20080319749Abstract: A system and method for creating a mnemonics Language Model for use with a speech recognition software application, wherein the method includes generating an n-gram Language Model containing a predefined large body of characters, wherein the n-gram Language Model includes at least one character from the predefined large body of characters, constructing a new Language Model (LM) token for each of the at least one character, extracting pronunciations for each of the at least one character responsive to a predefined pronunciation dictionary to obtain a character pronunciation representation, creating at least one alternative pronunciation for each of the at least one character responsive to the character pronunciation representation to create an alternative pronunciation dictionary and compiling the n-gram Language Model for use with the speech recognition software application, wherein compiling the Language Model is responsive to the new Language Model token and the alternative pronunciation dictionary.Type: ApplicationFiled: July 11, 2008Publication date: December 25, 2008Applicant: MICROSOFT CORPORATIONInventors: David Mowatt, Robert Chambers, Ciprian Chelba, Qiang Wu
-
Publication number: 20080215311Abstract: Methods are disclosed for estimating language models such that the conditional likelihood of a class given a word string, which is very well correlated with classification accuracy, is maximized. The methods comprise tuning statistical language model parameters jointly for all classes such that a classifier discriminates between the correct class and the incorrect ones for a given training sentence or utterance. Specific embodiments of the present invention pertain to implementation of the rational function growth transform in the context of a discriminative training technique for n-gram classifiers.Type: ApplicationFiled: April 15, 2008Publication date: September 4, 2008Applicant: Microsoft CorporationInventors: Ciprian Chelba, Alejandro Acero, Milind Mahajan
-
Patent number: 7418387Abstract: A system and method for creating a mnemonics Language Model for use with a speech recognition software application, wherein the method includes generating an n-gram Language Model containing a predefined large body of characters, wherein the n-gram Language Model includes at least one character from the predefined large body of characters, constructing a new language Model (LM) token for each of the at least one character, extracting pronunciations for each of the at least one character responsive to a predefined pronunciation dictionary to obtain a character pronunciation representation, creating at least one alternative pronunciation for each of the at least one character responsive to the character pronunciation representation to create an alternative pronunciation dictionary and compiling the n-gram Language Model for use with the speech recognition software application, wherein compiling the Language Model is responsive to the new Language Model token and the alternative pronunciation dictionary.Type: GrantFiled: November 24, 2004Date of Patent: August 26, 2008Assignee: Microsoft CorporationInventors: David Mowatt, Robert Chambers, Ciprian Chelba, Qiang Wu
-
Patent number: 7406416Abstract: A method and apparatus are provided for storing parameters of a deleted interpolation language model as parameters of a backoff language model. In particular, the parameters of the deleted interpolation language model are stored in the standard ARPA format. Under one embodiment, the deleted interpolation language model parameters are formed using fractional counts.Type: GrantFiled: March 26, 2004Date of Patent: July 29, 2008Assignee: Microsoft CorporationInventors: Ciprian Chelba, Milind Mahajan, Alejandro Acero
-
Patent number: 7379867Abstract: Methods are disclosed for estimating language models such that the conditional likelihood of a class given a word string, which is very well correlated with classification accuracy, is maximized. The methods comprise tuning statistical language model parameters jointly for all classes such that a classifier discriminates between the correct class and the incorrect ones for a given training sentence or utterance. Specific embodiments of the present invention pertain to implementation of the rational function growth transform in the context of a discriminative training technique for n-gram classifiers.Type: GrantFiled: June 3, 2003Date of Patent: May 27, 2008Assignee: Microsoft CorporationInventors: Ciprian Chelba, Alejandro Acero, Milind Mahajan
-
Publication number: 20070143110Abstract: A computer-implemented method of indexing a speech lattice for search of audio corresponding to the speech lattice is provided. The method includes identifying at least two speech recognition hypotheses for a word which have time ranges satisfying a criteria. The method further includes merging the at least two speech recognition hypotheses to generate a merged speech recognition hypothesis for the word.Type: ApplicationFiled: December 15, 2005Publication date: June 21, 2007Applicant: Microsoft CorporationInventors: Alejandro Acero, Asela Gunawardana, Ciprian Chelba, Erik Selberg, Frank Torsten Seide, Patrick Nguyen, Roger Yu
-
Publication number: 20070106512Abstract: A speech segment is indexed by identifying at least two alternative word sequences for the speech segment. For each word in the alternative sequences, information is placed in an entry for the word in the index. Speech units are eliminated from entries in the index based on a comparison of a probability that the word appears in the speech segment and a threshold value.Type: ApplicationFiled: November 9, 2005Publication date: May 10, 2007Applicant: Microsoft CorporationInventors: Alejandro Acero, Ciprian Chelba, Jorge Sanchez
-
Publication number: 20070106509Abstract: An index for searching spoken documents having speech data and text meta-data is created by obtaining probabilities of occurrence of words and positional information of the words of the speech data and combining it with at least positional information of the words in the text meta-data. A single index can be created because the speech data and the text meta-data are treated the same and considered only different categories.Type: ApplicationFiled: November 8, 2005Publication date: May 10, 2007Applicant: Microsoft CorporationInventors: Alejandro Acero, Ciprian Chelba, Jorge Sanchez