Patents by Inventor Cyril Goutte
Cyril Goutte has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 8594992Abstract: This application is related to a means and a method for facilitating the use of translation memories by aligning words of an input source language sentence with the correspondent translated words in target language sentence. More specifically, this invention relates to such a means and method where there is an enhanced translation memory comprising an alignment function.Type: GrantFiled: June 9, 2009Date of Patent: November 26, 2013Inventors: Roland Kuhn, Cyril Goutte, Pierre Isabelle, Michel Simard
-
Patent number: 8438009Abstract: The present document describes a method and a system for generating classifiers from multilingual corpora including subsets of content-equivalent documents written in different languages. When the documents are translations of each other, their classifications must be substantially the same. Embodiments of the invention utilize this similarity in order to enhance the accuracy of the classification in one language based on the classification results in the other language, and vice versa. A system in accordance with the present embodiments implements a method which comprises generating a first classifier from a first subset of the corpora in a first language; generating a second classifier from a second subset of the corpora in a second language; and re-training each of the classifiers on its respective subset based on the classification results of the other classifier, until a training cost between the classification results produced by subsequent iterations reaches a local minima.Type: GrantFiled: October 21, 2010Date of Patent: May 7, 2013Assignee: National Research Council of CanadaInventors: Massih Amini, Cyril Goutte
-
Publication number: 20110098999Abstract: The present document describes a method and a system for generating classifiers from multilingual corpora including subsets of content-equivalent documents written in different languages. When the documents are translations of each other, their classifications must be substantially the same. Embodiments of the invention utilize this similarity in order to enhance the accuracy of the classification in one language based on the classification results in the other language, and vice versa. A system in accordance with the present embodiments implements a method which comprises generating a first classifier from a first subset of the corpora in a first language; generating a second classifier from a second subset of the corpora in a second language; and re-training each of the classifiers on its respective subset based on the classification results of the other classifier, until a training cost between the classification results produced by subsequent iterations reaches a local minima.Type: ApplicationFiled: October 21, 2010Publication date: April 28, 2011Applicant: National Research Council of CanadaInventors: Massih Amini, Cyril Goutte
-
Publication number: 20110093254Abstract: This application is related to a means and a method for facilitating the use of translation memories by aligning words of an input source language sentence with the correspondent translated words in target language sentence. More specifically, this invention relates to such a means and method where there is an enhanced translation memory comprising an alignment function.Type: ApplicationFiled: June 9, 2009Publication date: April 21, 2011Inventors: Roland Kuhn, Cyril Goutte, Pierre Isabelle, Michel Simard
-
Patent number: 7849087Abstract: A probabilistic document categorizer has an associated vocabulary of words and an associated plurality of probabilistic categorizer parameters derived from a collection of documents. A new document is received. The probabilistic categorizer parameters are updated to reflect addition of the new document to the collection of documents based on vocabulary words contained in the new document, a category of the new document, and a collection size parameter indicative of an effective total number of instances of vocabulary words in the collection of documents.Type: GrantFiled: June 29, 2005Date of Patent: December 7, 2010Assignee: Xerox CorporationInventors: Cyril Goutte, Eric Gaussier
-
Patent number: 7813919Abstract: A class is to be characterized of a probabilistic classifier or clustering system that includes probabilistic model parameters. For each of a plurality of candidate words or word combinations, divergence of the class from other classes is computed based on one or more probabilistic model parameters profiling the candidate word or word combination. One or more words or word combinations are selected for characterizing the class as those candidate words or word combinations for which the class has substantial computed divergence from the other classes.Type: GrantFiled: December 20, 2005Date of Patent: October 12, 2010Assignee: Xerox CorporationInventor: Cyril Goutte
-
Patent number: 7720848Abstract: A probabilistic clustering system is defined at least in part by probabilistic model parameters indicative of word counts, ratios, or frequencies characterizing classes of the clustering system. An association of one or more documents in the probabilistic clustering system is changed from one or more source classes to one or more destination classes. Probabilistic model parameters characterizing classes affected by the changed association are locally updated without updating probabilistic model parameters characterizing classes not affected by the changed association.Type: GrantFiled: March 29, 2006Date of Patent: May 18, 2010Assignee: Xerox CorporationInventors: Agnes Guerraz, Caroline Privault, Cyril Goutte, Eric Gaussier, Francois Pacull, Jean-Michel Renders
-
Patent number: 7672830Abstract: Methods are disclosed for performing proper word alignment that satisfy constraints of coverage and transitive closure. Initially, a translation matrix which defines word association measures between source and target words of a corpus of bilingual translations of source and target sentences is computed. Subsequently, in a first method, the association measures in the translation matrix are factorized and orthogonalized to produce cepts for the source and target words, which resulting matrix factors may then be, optionally, multiplied to produce an alignment matrix. In a second method, the association measures in the translation matrix are thresholded, and then closed by transitivity, to produce an alignment matrix, which may then be, optionally, factorized to produce cepts. The resulting cepts or alignment matrices may then be used by any number of natural language applications for identifying words that are properly aligned.Type: GrantFiled: May 26, 2005Date of Patent: March 2, 2010Assignee: Xerox CorporationInventors: Cyril Goutte, Michel Simard, Kenji Yamada, Eric Gaussier, Arne Mauser
-
Publication number: 20090326913Abstract: The invention relates to a method and a means for automatically post-editing a translated text. A source language text is translated into an initial target language text. This initial target language text is then post-edited by an automatic post-editor into an improved target language text. The automatic post-editor is trained on a sentence aligned parallel corpus created from sentence pairs T? and T, where T? is an initial training translation of a source training language text, and T is second, independently derived, training translation of a source training language text.Type: ApplicationFiled: January 9, 2008Publication date: December 31, 2009Inventors: Michel Simard, Pierre Isabelle, George Foster, Cyril Goutte, Roland Kuhn
-
Patent number: 7630977Abstract: In categorizing an object respective to at least two categorization dimensions each defined by a plurality of categories, a probability value indicative of the object is determined for each category of each categorization dimension. A categorization label for the object is selected respective to each categorization dimension based on (i) the determined probability values of the categories of that categorization dimension and (ii) the determined probability values of categories of at least one other of the at least two categorization dimensions.Type: GrantFiled: June 29, 2005Date of Patent: December 8, 2009Assignee: Xerox CorporationInventors: Eric Gaussier, Jean-Michel Renders, Cyril Goutte, Caroline Privault
-
Patent number: 7620539Abstract: Various methods formulated using a geometric interpretation for identifying bilingual pairs in comparable corpora using a bilingual dictionary are disclosed. The methods may be used separately or in combination to compute the similarity between bilingual pairs.Type: GrantFiled: November 1, 2004Date of Patent: November 17, 2009Assignee: Xerox CorporationInventors: Eric Gaussier, Jean-Michel Renders, Herve Dejean, Cyril Goutte, Irina Matveeva
-
Patent number: 7542893Abstract: A machine translation method includes receiving source text in a first language and retrieving text fragments in a target language from a library of bi-fragments to generate a target hypothesis. Each bi-fragment includes a text fragment from the first language and a corresponding text fragment from the second language. Some of the bi-fragments are modeled as elastic bi-fragments where a gap between words is able to assume a variable size corresponding to a number of other words to occupy the gap. The target hypothesis is evaluated with a translation scoring function which scores the target hypothesis according to a plurality of feature functions, at least one of the feature functions comprising a gap size scoring feature which favors hypotheses with statistically more probable gap sizes over hypotheses with statically less probable gap sizes.Type: GrantFiled: May 10, 2006Date of Patent: June 2, 2009Assignee: Xerox CorporationInventors: Nicola Cancedda, Marc Dymetman, Eric Gaussier, Cyril Goutte
-
Patent number: 7536295Abstract: A machine translation method for translating source text from a first language to target text in a second language includes receiving the source text in the first language and accessing a library of bi-fragments, each of the bi-fragments including a text fragment from the first language and a text fragment from the second language, at least some of the bi-fragments comprising non-contiguous bi-fragments in which at least one of the text fragment from the first language and the text fragment from the second language comprises a non-contiguous fragment.Type: GrantFiled: December 22, 2005Date of Patent: May 19, 2009Assignee: Xerox CorporationInventors: Nicola Cancedda, Bruno Cavestro, Marc Dymetman, Eric Gaussier, Cyril Goutte, Michel Simard, Kenji Yamada
-
Patent number: 7457808Abstract: Feature selection is used to determine feature influence for a given categorization decision to identify those features in a categorized document that were important in classifying the document into one or more classes. In one embodiment, model parameters of a categorization model are used to determine the features that contributed to the categorization decision of a document. In another embodiment, the model parameters of the categorization model and the features of the categorized document are used to determine the features that contributed to the categorization decision of a document.Type: GrantFiled: December 17, 2004Date of Patent: November 25, 2008Assignee: Xerox CorporationInventors: Eric Gaussier, Cyril Goutte
-
Publication number: 20070265825Abstract: A machine translation method includes receiving source text in a first language and retrieving text fragments in a target language from a library of bi-fragments to generate a target hypothesis. Each bi-fragment includes a text fragment from the first language and a corresponding text fragment from the second language. Some of the bi-fragments are modeled as elastic bi-fragments where a gap between words is able to assume a variable size corresponding to a number of other words to occupy the gap. The target hypothesis is evaluated with a translation scoring function which scores the target hypothesis according to a plurality of feature functions, at least one of the feature functions comprising a gap size scoring feature which favors hypotheses with statistically more probable gap sizes over hypotheses with statically less probable gap sizes.Type: ApplicationFiled: May 10, 2006Publication date: November 15, 2007Inventors: Nicola Cancedda, Marc Dymetman, Eric Gaussier, Cyril Goutte
-
Publication number: 20070239745Abstract: A probabilistic clustering system is defined at least in part by probabilistic model parameters indicative of word counts, ratios, or frequencies characterizing classes of the clustering system. An association of one or more documents in the probabilistic clustering system is changed from one or more source classes to one or more destination classes. Probabilistic model parameters characterizing classes affected by the changed association are locally updated without updating probabilistic model parameters characterizing classes not affected by the changed association.Type: ApplicationFiled: March 29, 2006Publication date: October 11, 2007Inventors: Agnes Guerraz, Caroline Privault, Cyril Goutte, Eric Gaussier, Francois Pacull, Jean-Michel Renders
-
Publication number: 20070150257Abstract: A machine translation method for translating source text from a first language to target text in a second language includes receiving the source text in the first language and accessing a library of bi-fragments, each of the bi-fragments including a text fragment from the first language and a text fragment from the second language, at least some of the bi-fragments comprising non-contiguous bi-fragments in which at least one of the text fragment from the first language and the text fragment from the second language comprises a non-contiguous fragment.Type: ApplicationFiled: December 22, 2005Publication date: June 28, 2007Inventors: Nicola Cancedda, Bruno Cavestro, Marc Dymetman, Eric Gaussier, Cyril Goutte, Michel Simard, Kenji Yamada
-
Publication number: 20070143101Abstract: A class is to be characterized of a probabilistic classifier or clustering system that includes probabilistic model parameters. For each of a plurality of candidate words or word combinations, divergence of the class from other classes is computed based on one or more probabilistic model parameters profiling the candidate word or word combination. One or more words or word combinations are selected for characterizing the class as those candidate words or word combinations for which the class has substantial computed divergence from the other classes.Type: ApplicationFiled: December 20, 2005Publication date: June 21, 2007Inventor: Cyril Goutte
-
Publication number: 20070005340Abstract: A probabilistic document categorizer has an associated vocabulary of words and an associated plurality of probabilistic categorizer parameters derived from a collection of documents. A new document is received. The probabilistic categorizer parameters are updated to reflect addition of the new document to the collection of documents based on vocabulary words contained in the new document, a category of the new document, and a collection size parameter indicative of an effective total number of instances of vocabulary words in the collection of documents.Type: ApplicationFiled: June 29, 2005Publication date: January 4, 2007Inventors: Cyril Goutte, Eric Gaussier
-
Publication number: 20070005639Abstract: In categorizing an object respective to at least two categorization dimensions each defined by a plurality of categories, a probability value indicative of the object is determined for each category of each categorization dimension. A categorization label for the object is selected respective to each categorization dimension based on (i) the determined probability values of the categories of that categorization dimension and (ii) the determined probability values of categories of at least one other of the at least two categorization dimensions.Type: ApplicationFiled: June 29, 2005Publication date: January 4, 2007Inventors: Eric Gaussier, Jean-Michel Renders, Cyril Goutte, Caroline Privault