Patents by Inventor Eric Gaussier

Eric Gaussier has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 9020804
    Abstract: An alignment method includes, for a source sentence in a source language, identifying whether the sentence includes at least one candidate term comprising a contiguous subsequence of words of the source sentence. A target sentence in a target language is aligned with the source sentence. This includes developing a probabilistic model which models conditional probability distributions for alignments between words of the source sentence and words of the target sentence and generating an optimal alignment based on the probabilistic model, including, where the source sentence includes the at least one candidate term, enforcing a contiguity constraint which requires that all the words of the target sentence which are aligned with an identified candidate term form a contiguous subsequence of the target sentence.
    Type: Grant
    Filed: June 1, 2007
    Date of Patent: April 28, 2015
    Assignee: Xerox Corporation
    Inventors: Madalina Barbaiani, Nicola Cancedda, Christopher R. Dance, Szilárd Zsolt Fazekas, Tamás Gaál, Eric Gaussier
  • Patent number: 7849087
    Abstract: A probabilistic document categorizer has an associated vocabulary of words and an associated plurality of probabilistic categorizer parameters derived from a collection of documents. A new document is received. The probabilistic categorizer parameters are updated to reflect addition of the new document to the collection of documents based on vocabulary words contained in the new document, a category of the new document, and a collection size parameter indicative of an effective total number of instances of vocabulary words in the collection of documents.
    Type: Grant
    Filed: June 29, 2005
    Date of Patent: December 7, 2010
    Assignee: Xerox Corporation
    Inventors: Cyril Goutte, Eric Gaussier
  • Patent number: 7720848
    Abstract: A probabilistic clustering system is defined at least in part by probabilistic model parameters indicative of word counts, ratios, or frequencies characterizing classes of the clustering system. An association of one or more documents in the probabilistic clustering system is changed from one or more source classes to one or more destination classes. Probabilistic model parameters characterizing classes affected by the changed association are locally updated without updating probabilistic model parameters characterizing classes not affected by the changed association.
    Type: Grant
    Filed: March 29, 2006
    Date of Patent: May 18, 2010
    Assignee: Xerox Corporation
    Inventors: Agnes Guerraz, Caroline Privault, Cyril Goutte, Eric Gaussier, Francois Pacull, Jean-Michel Renders
  • Patent number: 7672830
    Abstract: Methods are disclosed for performing proper word alignment that satisfy constraints of coverage and transitive closure. Initially, a translation matrix which defines word association measures between source and target words of a corpus of bilingual translations of source and target sentences is computed. Subsequently, in a first method, the association measures in the translation matrix are factorized and orthogonalized to produce cepts for the source and target words, which resulting matrix factors may then be, optionally, multiplied to produce an alignment matrix. In a second method, the association measures in the translation matrix are thresholded, and then closed by transitivity, to produce an alignment matrix, which may then be, optionally, factorized to produce cepts. The resulting cepts or alignment matrices may then be used by any number of natural language applications for identifying words that are properly aligned.
    Type: Grant
    Filed: May 26, 2005
    Date of Patent: March 2, 2010
    Assignee: Xerox Corporation
    Inventors: Cyril Goutte, Michel Simard, Kenji Yamada, Eric Gaussier, Arne Mauser
  • Patent number: 7644102
    Abstract: Methods, systems, and articles of manufacture consistent with certain principles related to the present invention enable a computing system to perform hierarchical topical clustering of text data based on statistical modeling of co-occurrences of (document, word) pairs. The computing system may be configured to receive a collection of documents, each document including a plurality of words, and perform a modified deterministic annealing Expectation-Maximization (EM) process on the collection to produce a softly assigned hierarchy of nodes. The process may involve assigning documents and document fragments to multiple nodes in the hierarchy based on words included in the documents, such that a document may be assigned to any ancestor node included in the hierarchy, thus eliminating the hard assignment of documents in the hierarchy.
    Type: Grant
    Filed: October 19, 2001
    Date of Patent: January 5, 2010
    Assignee: Xerox Corporation
    Inventors: Eric Gaussier, Francine Chen, Ashok Chhabedia Popat
  • Patent number: 7630977
    Abstract: In categorizing an object respective to at least two categorization dimensions each defined by a plurality of categories, a probability value indicative of the object is determined for each category of each categorization dimension. A categorization label for the object is selected respective to each categorization dimension based on (i) the determined probability values of the categories of that categorization dimension and (ii) the determined probability values of categories of at least one other of the at least two categorization dimensions.
    Type: Grant
    Filed: June 29, 2005
    Date of Patent: December 8, 2009
    Assignee: Xerox Corporation
    Inventors: Eric Gaussier, Jean-Michel Renders, Cyril Goutte, Caroline Privault
  • Patent number: 7620539
    Abstract: Various methods formulated using a geometric interpretation for identifying bilingual pairs in comparable corpora using a bilingual dictionary are disclosed. The methods may be used separately or in combination to compute the similarity between bilingual pairs.
    Type: Grant
    Filed: November 1, 2004
    Date of Patent: November 17, 2009
    Assignee: Xerox Corporation
    Inventors: Eric Gaussier, Jean-Michel Renders, Herve Dejean, Cyril Goutte, Irina Matveeva
  • Patent number: 7542893
    Abstract: A machine translation method includes receiving source text in a first language and retrieving text fragments in a target language from a library of bi-fragments to generate a target hypothesis. Each bi-fragment includes a text fragment from the first language and a corresponding text fragment from the second language. Some of the bi-fragments are modeled as elastic bi-fragments where a gap between words is able to assume a variable size corresponding to a number of other words to occupy the gap. The target hypothesis is evaluated with a translation scoring function which scores the target hypothesis according to a plurality of feature functions, at least one of the feature functions comprising a gap size scoring feature which favors hypotheses with statistically more probable gap sizes over hypotheses with statically less probable gap sizes.
    Type: Grant
    Filed: May 10, 2006
    Date of Patent: June 2, 2009
    Assignee: Xerox Corporation
    Inventors: Nicola Cancedda, Marc Dymetman, Eric Gaussier, Cyril Goutte
  • Patent number: 7536295
    Abstract: A machine translation method for translating source text from a first language to target text in a second language includes receiving the source text in the first language and accessing a library of bi-fragments, each of the bi-fragments including a text fragment from the first language and a text fragment from the second language, at least some of the bi-fragments comprising non-contiguous bi-fragments in which at least one of the text fragment from the first language and the text fragment from the second language comprises a non-contiguous fragment.
    Type: Grant
    Filed: December 22, 2005
    Date of Patent: May 19, 2009
    Assignee: Xerox Corporation
    Inventors: Nicola Cancedda, Bruno Cavestro, Marc Dymetman, Eric Gaussier, Cyril Goutte, Michel Simard, Kenji Yamada
  • Publication number: 20080300857
    Abstract: An alignment method includes, for a source sentence in a source language, identifying whether the sentence includes at least one candidate term comprising a contiguous subsequence of words of the source sentence. A target sentence in a target language is aligned with the source sentence. This includes developing a probabilistic model which models conditional probability distributions for alignments between words of the source sentence and words of the target sentence and generating an optimal alignment based on the probabilistic model, including, where the source sentence includes the at least one candidate term, enforcing a contiguity constraint which requires that all the words of the target sentence which are aligned with an identified candidate term form a contiguous subsequence of the target sentence.
    Type: Application
    Filed: June 1, 2007
    Publication date: December 4, 2008
    Inventors: Madalina Barbaiani, Nicola Cancedda, Christopher R. Dance, Szilard Zsolt Fazekas, Tamas Gaal, Eric Gaussier
  • Patent number: 7457808
    Abstract: Feature selection is used to determine feature influence for a given categorization decision to identify those features in a categorized document that were important in classifying the document into one or more classes. In one embodiment, model parameters of a categorization model are used to determine the features that contributed to the categorization decision of a document. In another embodiment, the model parameters of the categorization model and the features of the categorized document are used to determine the features that contributed to the categorization decision of a document.
    Type: Grant
    Filed: December 17, 2004
    Date of Patent: November 25, 2008
    Assignee: Xerox Corporation
    Inventors: Eric Gaussier, Cyril Goutte
  • Publication number: 20070265825
    Abstract: A machine translation method includes receiving source text in a first language and retrieving text fragments in a target language from a library of bi-fragments to generate a target hypothesis. Each bi-fragment includes a text fragment from the first language and a corresponding text fragment from the second language. Some of the bi-fragments are modeled as elastic bi-fragments where a gap between words is able to assume a variable size corresponding to a number of other words to occupy the gap. The target hypothesis is evaluated with a translation scoring function which scores the target hypothesis according to a plurality of feature functions, at least one of the feature functions comprising a gap size scoring feature which favors hypotheses with statistically more probable gap sizes over hypotheses with statically less probable gap sizes.
    Type: Application
    Filed: May 10, 2006
    Publication date: November 15, 2007
    Inventors: Nicola Cancedda, Marc Dymetman, Eric Gaussier, Cyril Goutte
  • Publication number: 20070239745
    Abstract: A probabilistic clustering system is defined at least in part by probabilistic model parameters indicative of word counts, ratios, or frequencies characterizing classes of the clustering system. An association of one or more documents in the probabilistic clustering system is changed from one or more source classes to one or more destination classes. Probabilistic model parameters characterizing classes affected by the changed association are locally updated without updating probabilistic model parameters characterizing classes not affected by the changed association.
    Type: Application
    Filed: March 29, 2006
    Publication date: October 11, 2007
    Inventors: Agnes Guerraz, Caroline Privault, Cyril Goutte, Eric Gaussier, Francois Pacull, Jean-Michel Renders
  • Publication number: 20070150257
    Abstract: A machine translation method for translating source text from a first language to target text in a second language includes receiving the source text in the first language and accessing a library of bi-fragments, each of the bi-fragments including a text fragment from the first language and a text fragment from the second language, at least some of the bi-fragments comprising non-contiguous bi-fragments in which at least one of the text fragment from the first language and the text fragment from the second language comprises a non-contiguous fragment.
    Type: Application
    Filed: December 22, 2005
    Publication date: June 28, 2007
    Inventors: Nicola Cancedda, Bruno Cavestro, Marc Dymetman, Eric Gaussier, Cyril Goutte, Michel Simard, Kenji Yamada
  • Publication number: 20070005639
    Abstract: In categorizing an object respective to at least two categorization dimensions each defined by a plurality of categories, a probability value indicative of the object is determined for each category of each categorization dimension. A categorization label for the object is selected respective to each categorization dimension based on (i) the determined probability values of the categories of that categorization dimension and (ii) the determined probability values of categories of at least one other of the at least two categorization dimensions.
    Type: Application
    Filed: June 29, 2005
    Publication date: January 4, 2007
    Inventors: Eric Gaussier, Jean-Michel Renders, Cyril Goutte, Caroline Privault
  • Publication number: 20070005340
    Abstract: A probabilistic document categorizer has an associated vocabulary of words and an associated plurality of probabilistic categorizer parameters derived from a collection of documents. A new document is received. The probabilistic categorizer parameters are updated to reflect addition of the new document to the collection of documents based on vocabulary words contained in the new document, a category of the new document, and a collection size parameter indicative of an effective total number of instances of vocabulary words in the collection of documents.
    Type: Application
    Filed: June 29, 2005
    Publication date: January 4, 2007
    Inventors: Cyril Goutte, Eric Gaussier
  • Patent number: 7139754
    Abstract: A method of categorizing objects in which there can be multiple categories of objects and each object can belong to more than one category is described. The method defines a set of categories in which at least one category is dependent on another category and then organizes the categories in a hierarchy that embodies any dependencies among them. Each object is assigned to one or more categories in the set. A set of labels corresponding to all combinations of any number of the categories is defined, wherein if an object is relevant to several categories, the object must be assigned the label corresponding to the subset of all relevant categories. Once the new labels are defined, the multi-category, multi-label problem has been reduced to a multi-category, single-label problem, and the categorization task is reduced down to choosing the single best label set for an object.
    Type: Grant
    Filed: February 9, 2004
    Date of Patent: November 21, 2006
    Assignee: Xerox Corporation
    Inventors: Cyril Goutte, Eric Gaussier
  • Publication number: 20060190241
    Abstract: Methods are disclosed for performing proper word alignment that satisfy constraints of coverage and transitive closure. Initially, a translation matrix which defines word association measures between source and target words of a corpus of bilingual translations of source and target sentences is computed. Subsequently, in a first method, the association measures in the translation matrix are factorized and orthogonalized to produce cepts for the source and target words, which resulting matrix factors may then be, optionally, multiplied to produce an alignment matrix. In a second method, the association measures in the translation matrix are thresholded, and then closed by transitivity, to produce an alignment matrix, which may then be, optionally, factorized to produce cepts. The resulting cepts or alignment matrices may then be used by any number of natural language applications for identifying words that are properly aligned.
    Type: Application
    Filed: May 26, 2005
    Publication date: August 24, 2006
    Inventors: Cyril Goutte, Michel Simard, Kenji Yamada, Eric Gaussier, Arne Mauser
  • Publication number: 20060136410
    Abstract: Feature selection is used to determine feature influence for a given categorization decision to identify those features in a categorized document that were important in classifying the document into one or more classes. In one embodiment, model parameters of a categorization model are used to determine the features that contributed to the categorization decision of a document. In another embodiment, the model parameters of the categorization model and the features of the categorized document are used to determine the features that contributed to the categorization decision of a document.
    Type: Application
    Filed: December 17, 2004
    Publication date: June 22, 2006
    Inventors: Eric Gaussier, Cyril Goutte
  • Publication number: 20060123083
    Abstract: Electronic content is filtered to identify spam using image and linguistic processing. A plurality of information type gatherers assimilate and output different message attributes relating to message content associated with an information type. A categorizer may have a plurality of decision makers for providing as output a message class for classifying the message data. A history processor records the message attributes and the class decision as part of the prior history information and/or modifies the prior history information to reflect changes to fixed data and/or probability data. A categorizer coalescer assesses the message class output by the set of decision makers together with optional user input for producing a class decision identifying whether the message data is spam.
    Type: Application
    Filed: December 3, 2004
    Publication date: June 8, 2006
    Inventors: Cyril Goutte, Pierre Isabelle, Eric Gaussier, Stephen Kruger