Patents by Inventor Cyril Goutte

Cyril Goutte has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 8594992
    Abstract: This application is related to a means and a method for facilitating the use of translation memories by aligning words of an input source language sentence with the correspondent translated words in target language sentence. More specifically, this invention relates to such a means and method where there is an enhanced translation memory comprising an alignment function.
    Type: Grant
    Filed: June 9, 2009
    Date of Patent: November 26, 2013
    Inventors: Roland Kuhn, Cyril Goutte, Pierre Isabelle, Michel Simard
  • Patent number: 8438009
    Abstract: The present document describes a method and a system for generating classifiers from multilingual corpora including subsets of content-equivalent documents written in different languages. When the documents are translations of each other, their classifications must be substantially the same. Embodiments of the invention utilize this similarity in order to enhance the accuracy of the classification in one language based on the classification results in the other language, and vice versa. A system in accordance with the present embodiments implements a method which comprises generating a first classifier from a first subset of the corpora in a first language; generating a second classifier from a second subset of the corpora in a second language; and re-training each of the classifiers on its respective subset based on the classification results of the other classifier, until a training cost between the classification results produced by subsequent iterations reaches a local minima.
    Type: Grant
    Filed: October 21, 2010
    Date of Patent: May 7, 2013
    Assignee: National Research Council of Canada
    Inventors: Massih Amini, Cyril Goutte
  • Publication number: 20110098999
    Abstract: The present document describes a method and a system for generating classifiers from multilingual corpora including subsets of content-equivalent documents written in different languages. When the documents are translations of each other, their classifications must be substantially the same. Embodiments of the invention utilize this similarity in order to enhance the accuracy of the classification in one language based on the classification results in the other language, and vice versa. A system in accordance with the present embodiments implements a method which comprises generating a first classifier from a first subset of the corpora in a first language; generating a second classifier from a second subset of the corpora in a second language; and re-training each of the classifiers on its respective subset based on the classification results of the other classifier, until a training cost between the classification results produced by subsequent iterations reaches a local minima.
    Type: Application
    Filed: October 21, 2010
    Publication date: April 28, 2011
    Applicant: National Research Council of Canada
    Inventors: Massih Amini, Cyril Goutte
  • Publication number: 20110093254
    Abstract: This application is related to a means and a method for facilitating the use of translation memories by aligning words of an input source language sentence with the correspondent translated words in target language sentence. More specifically, this invention relates to such a means and method where there is an enhanced translation memory comprising an alignment function.
    Type: Application
    Filed: June 9, 2009
    Publication date: April 21, 2011
    Inventors: Roland Kuhn, Cyril Goutte, Pierre Isabelle, Michel Simard
  • Patent number: 7849087
    Abstract: A probabilistic document categorizer has an associated vocabulary of words and an associated plurality of probabilistic categorizer parameters derived from a collection of documents. A new document is received. The probabilistic categorizer parameters are updated to reflect addition of the new document to the collection of documents based on vocabulary words contained in the new document, a category of the new document, and a collection size parameter indicative of an effective total number of instances of vocabulary words in the collection of documents.
    Type: Grant
    Filed: June 29, 2005
    Date of Patent: December 7, 2010
    Assignee: Xerox Corporation
    Inventors: Cyril Goutte, Eric Gaussier
  • Patent number: 7813919
    Abstract: A class is to be characterized of a probabilistic classifier or clustering system that includes probabilistic model parameters. For each of a plurality of candidate words or word combinations, divergence of the class from other classes is computed based on one or more probabilistic model parameters profiling the candidate word or word combination. One or more words or word combinations are selected for characterizing the class as those candidate words or word combinations for which the class has substantial computed divergence from the other classes.
    Type: Grant
    Filed: December 20, 2005
    Date of Patent: October 12, 2010
    Assignee: Xerox Corporation
    Inventor: Cyril Goutte
  • Patent number: 7720848
    Abstract: A probabilistic clustering system is defined at least in part by probabilistic model parameters indicative of word counts, ratios, or frequencies characterizing classes of the clustering system. An association of one or more documents in the probabilistic clustering system is changed from one or more source classes to one or more destination classes. Probabilistic model parameters characterizing classes affected by the changed association are locally updated without updating probabilistic model parameters characterizing classes not affected by the changed association.
    Type: Grant
    Filed: March 29, 2006
    Date of Patent: May 18, 2010
    Assignee: Xerox Corporation
    Inventors: Agnes Guerraz, Caroline Privault, Cyril Goutte, Eric Gaussier, Francois Pacull, Jean-Michel Renders
  • Patent number: 7672830
    Abstract: Methods are disclosed for performing proper word alignment that satisfy constraints of coverage and transitive closure. Initially, a translation matrix which defines word association measures between source and target words of a corpus of bilingual translations of source and target sentences is computed. Subsequently, in a first method, the association measures in the translation matrix are factorized and orthogonalized to produce cepts for the source and target words, which resulting matrix factors may then be, optionally, multiplied to produce an alignment matrix. In a second method, the association measures in the translation matrix are thresholded, and then closed by transitivity, to produce an alignment matrix, which may then be, optionally, factorized to produce cepts. The resulting cepts or alignment matrices may then be used by any number of natural language applications for identifying words that are properly aligned.
    Type: Grant
    Filed: May 26, 2005
    Date of Patent: March 2, 2010
    Assignee: Xerox Corporation
    Inventors: Cyril Goutte, Michel Simard, Kenji Yamada, Eric Gaussier, Arne Mauser
  • Publication number: 20090326913
    Abstract: The invention relates to a method and a means for automatically post-editing a translated text. A source language text is translated into an initial target language text. This initial target language text is then post-edited by an automatic post-editor into an improved target language text. The automatic post-editor is trained on a sentence aligned parallel corpus created from sentence pairs T? and T, where T? is an initial training translation of a source training language text, and T is second, independently derived, training translation of a source training language text.
    Type: Application
    Filed: January 9, 2008
    Publication date: December 31, 2009
    Inventors: Michel Simard, Pierre Isabelle, George Foster, Cyril Goutte, Roland Kuhn
  • Patent number: 7630977
    Abstract: In categorizing an object respective to at least two categorization dimensions each defined by a plurality of categories, a probability value indicative of the object is determined for each category of each categorization dimension. A categorization label for the object is selected respective to each categorization dimension based on (i) the determined probability values of the categories of that categorization dimension and (ii) the determined probability values of categories of at least one other of the at least two categorization dimensions.
    Type: Grant
    Filed: June 29, 2005
    Date of Patent: December 8, 2009
    Assignee: Xerox Corporation
    Inventors: Eric Gaussier, Jean-Michel Renders, Cyril Goutte, Caroline Privault
  • Patent number: 7620539
    Abstract: Various methods formulated using a geometric interpretation for identifying bilingual pairs in comparable corpora using a bilingual dictionary are disclosed. The methods may be used separately or in combination to compute the similarity between bilingual pairs.
    Type: Grant
    Filed: November 1, 2004
    Date of Patent: November 17, 2009
    Assignee: Xerox Corporation
    Inventors: Eric Gaussier, Jean-Michel Renders, Herve Dejean, Cyril Goutte, Irina Matveeva
  • Patent number: 7542893
    Abstract: A machine translation method includes receiving source text in a first language and retrieving text fragments in a target language from a library of bi-fragments to generate a target hypothesis. Each bi-fragment includes a text fragment from the first language and a corresponding text fragment from the second language. Some of the bi-fragments are modeled as elastic bi-fragments where a gap between words is able to assume a variable size corresponding to a number of other words to occupy the gap. The target hypothesis is evaluated with a translation scoring function which scores the target hypothesis according to a plurality of feature functions, at least one of the feature functions comprising a gap size scoring feature which favors hypotheses with statistically more probable gap sizes over hypotheses with statically less probable gap sizes.
    Type: Grant
    Filed: May 10, 2006
    Date of Patent: June 2, 2009
    Assignee: Xerox Corporation
    Inventors: Nicola Cancedda, Marc Dymetman, Eric Gaussier, Cyril Goutte
  • Patent number: 7536295
    Abstract: A machine translation method for translating source text from a first language to target text in a second language includes receiving the source text in the first language and accessing a library of bi-fragments, each of the bi-fragments including a text fragment from the first language and a text fragment from the second language, at least some of the bi-fragments comprising non-contiguous bi-fragments in which at least one of the text fragment from the first language and the text fragment from the second language comprises a non-contiguous fragment.
    Type: Grant
    Filed: December 22, 2005
    Date of Patent: May 19, 2009
    Assignee: Xerox Corporation
    Inventors: Nicola Cancedda, Bruno Cavestro, Marc Dymetman, Eric Gaussier, Cyril Goutte, Michel Simard, Kenji Yamada
  • Patent number: 7457808
    Abstract: Feature selection is used to determine feature influence for a given categorization decision to identify those features in a categorized document that were important in classifying the document into one or more classes. In one embodiment, model parameters of a categorization model are used to determine the features that contributed to the categorization decision of a document. In another embodiment, the model parameters of the categorization model and the features of the categorized document are used to determine the features that contributed to the categorization decision of a document.
    Type: Grant
    Filed: December 17, 2004
    Date of Patent: November 25, 2008
    Assignee: Xerox Corporation
    Inventors: Eric Gaussier, Cyril Goutte
  • Publication number: 20070265825
    Abstract: A machine translation method includes receiving source text in a first language and retrieving text fragments in a target language from a library of bi-fragments to generate a target hypothesis. Each bi-fragment includes a text fragment from the first language and a corresponding text fragment from the second language. Some of the bi-fragments are modeled as elastic bi-fragments where a gap between words is able to assume a variable size corresponding to a number of other words to occupy the gap. The target hypothesis is evaluated with a translation scoring function which scores the target hypothesis according to a plurality of feature functions, at least one of the feature functions comprising a gap size scoring feature which favors hypotheses with statistically more probable gap sizes over hypotheses with statically less probable gap sizes.
    Type: Application
    Filed: May 10, 2006
    Publication date: November 15, 2007
    Inventors: Nicola Cancedda, Marc Dymetman, Eric Gaussier, Cyril Goutte
  • Publication number: 20070239745
    Abstract: A probabilistic clustering system is defined at least in part by probabilistic model parameters indicative of word counts, ratios, or frequencies characterizing classes of the clustering system. An association of one or more documents in the probabilistic clustering system is changed from one or more source classes to one or more destination classes. Probabilistic model parameters characterizing classes affected by the changed association are locally updated without updating probabilistic model parameters characterizing classes not affected by the changed association.
    Type: Application
    Filed: March 29, 2006
    Publication date: October 11, 2007
    Inventors: Agnes Guerraz, Caroline Privault, Cyril Goutte, Eric Gaussier, Francois Pacull, Jean-Michel Renders
  • Publication number: 20070150257
    Abstract: A machine translation method for translating source text from a first language to target text in a second language includes receiving the source text in the first language and accessing a library of bi-fragments, each of the bi-fragments including a text fragment from the first language and a text fragment from the second language, at least some of the bi-fragments comprising non-contiguous bi-fragments in which at least one of the text fragment from the first language and the text fragment from the second language comprises a non-contiguous fragment.
    Type: Application
    Filed: December 22, 2005
    Publication date: June 28, 2007
    Inventors: Nicola Cancedda, Bruno Cavestro, Marc Dymetman, Eric Gaussier, Cyril Goutte, Michel Simard, Kenji Yamada
  • Publication number: 20070143101
    Abstract: A class is to be characterized of a probabilistic classifier or clustering system that includes probabilistic model parameters. For each of a plurality of candidate words or word combinations, divergence of the class from other classes is computed based on one or more probabilistic model parameters profiling the candidate word or word combination. One or more words or word combinations are selected for characterizing the class as those candidate words or word combinations for which the class has substantial computed divergence from the other classes.
    Type: Application
    Filed: December 20, 2005
    Publication date: June 21, 2007
    Inventor: Cyril Goutte
  • Publication number: 20070005340
    Abstract: A probabilistic document categorizer has an associated vocabulary of words and an associated plurality of probabilistic categorizer parameters derived from a collection of documents. A new document is received. The probabilistic categorizer parameters are updated to reflect addition of the new document to the collection of documents based on vocabulary words contained in the new document, a category of the new document, and a collection size parameter indicative of an effective total number of instances of vocabulary words in the collection of documents.
    Type: Application
    Filed: June 29, 2005
    Publication date: January 4, 2007
    Inventors: Cyril Goutte, Eric Gaussier
  • Publication number: 20070005639
    Abstract: In categorizing an object respective to at least two categorization dimensions each defined by a plurality of categories, a probability value indicative of the object is determined for each category of each categorization dimension. A categorization label for the object is selected respective to each categorization dimension based on (i) the determined probability values of the categories of that categorization dimension and (ii) the determined probability values of categories of at least one other of the at least two categorization dimensions.
    Type: Application
    Filed: June 29, 2005
    Publication date: January 4, 2007
    Inventors: Eric Gaussier, Jean-Michel Renders, Cyril Goutte, Caroline Privault