Patents by Inventor Eric Gaussier

Eric Gaussier has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Method for aligning sentences at the word level enforcing selective contiguity constraints

Patent number: 9020804

Abstract: An alignment method includes, for a source sentence in a source language, identifying whether the sentence includes at least one candidate term comprising a contiguous subsequence of words of the source sentence. A target sentence in a target language is aligned with the source sentence. This includes developing a probabilistic model which models conditional probability distributions for alignments between words of the source sentence and words of the target sentence and generating an optimal alignment based on the probabilistic model, including, where the source sentence includes the at least one candidate term, enforcing a contiguity constraint which requires that all the words of the target sentence which are aligned with an identified candidate term form a contiguous subsequence of the target sentence.

Type: Grant

Filed: June 1, 2007

Date of Patent: April 28, 2015

Assignee: Xerox Corporation

Inventors: Madalina Barbaiani, Nicola Cancedda, Christopher R. Dance, Szilárd Zsolt Fazekas, Tamás Gaál, Eric Gaussier
Incremental training for probabilistic categorizer

Patent number: 7849087

Abstract: A probabilistic document categorizer has an associated vocabulary of words and an associated plurality of probabilistic categorizer parameters derived from a collection of documents. A new document is received. The probabilistic categorizer parameters are updated to reflect addition of the new document to the collection of documents based on vocabulary words contained in the new document, a category of the new document, and a collection size parameter indicative of an effective total number of instances of vocabulary words in the collection of documents.

Type: Grant

Filed: June 29, 2005

Date of Patent: December 7, 2010

Assignee: Xerox Corporation

Inventors: Cyril Goutte, Eric Gaussier
Hierarchical clustering with real-time updating

Patent number: 7720848

Abstract: A probabilistic clustering system is defined at least in part by probabilistic model parameters indicative of word counts, ratios, or frequencies characterizing classes of the clustering system. An association of one or more documents in the probabilistic clustering system is changed from one or more source classes to one or more destination classes. Probabilistic model parameters characterizing classes affected by the changed association are locally updated without updating probabilistic model parameters characterizing classes not affected by the changed association.

Type: Grant

Filed: March 29, 2006

Date of Patent: May 18, 2010

Assignee: Xerox Corporation

Inventors: Agnes Guerraz, Caroline Privault, Cyril Goutte, Eric Gaussier, Francois Pacull, Jean-Michel Renders
Apparatus and methods for aligning words in bilingual sentences

Patent number: 7672830

Abstract: Methods are disclosed for performing proper word alignment that satisfy constraints of coverage and transitive closure. Initially, a translation matrix which defines word association measures between source and target words of a corpus of bilingual translations of source and target sentences is computed. Subsequently, in a first method, the association measures in the translation matrix are factorized and orthogonalized to produce cepts for the source and target words, which resulting matrix factors may then be, optionally, multiplied to produce an alignment matrix. In a second method, the association measures in the translation matrix are thresholded, and then closed by transitivity, to produce an alignment matrix, which may then be, optionally, factorized to produce cepts. The resulting cepts or alignment matrices may then be used by any number of natural language applications for identifying words that are properly aligned.

Type: Grant

Filed: May 26, 2005

Date of Patent: March 2, 2010

Assignee: Xerox Corporation

Inventors: Cyril Goutte, Michel Simard, Kenji Yamada, Eric Gaussier, Arne Mauser
Methods, systems, and articles of manufacture for soft hierarchical clustering of co-occurring objects

Patent number: 7644102

Abstract: Methods, systems, and articles of manufacture consistent with certain principles related to the present invention enable a computing system to perform hierarchical topical clustering of text data based on statistical modeling of co-occurrences of (document, word) pairs. The computing system may be configured to receive a collection of documents, each document including a plurality of words, and perform a modified deterministic annealing Expectation-Maximization (EM) process on the collection to produce a softly assigned hierarchy of nodes. The process may involve assigning documents and document fragments to multiple nodes in the hierarchy based on words included in the documents, such that a document may be assigned to any ancestor node included in the hierarchy, thus eliminating the hard assignment of documents in the hierarchy.

Type: Grant

Filed: October 19, 2001

Date of Patent: January 5, 2010

Assignee: Xerox Corporation

Inventors: Eric Gaussier, Francine Chen, Ashok Chhabedia Popat
Categorization including dependencies between different category systems

Patent number: 7630977

Abstract: In categorizing an object respective to at least two categorization dimensions each defined by a plurality of categories, a probability value indicative of the object is determined for each category of each categorization dimension. A categorization label for the object is selected respective to each categorization dimension based on (i) the determined probability values of the categories of that categorization dimension and (ii) the determined probability values of categories of at least one other of the at least two categorization dimensions.

Type: Grant

Filed: June 29, 2005

Date of Patent: December 8, 2009

Assignee: Xerox Corporation

Inventors: Eric Gaussier, Jean-Michel Renders, Cyril Goutte, Caroline Privault
Methods and apparatuses for identifying bilingual lexicons in comparable corpora using geometric processing

Patent number: 7620539

Abstract: Various methods formulated using a geometric interpretation for identifying bilingual pairs in comparable corpora using a bilingual dictionary are disclosed. The methods may be used separately or in combination to compute the similarity between bilingual pairs.

Type: Grant

Filed: November 1, 2004

Date of Patent: November 17, 2009

Assignee: Xerox Corporation

Inventors: Eric Gaussier, Jean-Michel Renders, Herve Dejean, Cyril Goutte, Irina Matveeva
Machine translation using elastic chunks

Patent number: 7542893

Abstract: A machine translation method includes receiving source text in a first language and retrieving text fragments in a target language from a library of bi-fragments to generate a target hypothesis. Each bi-fragment includes a text fragment from the first language and a corresponding text fragment from the second language. Some of the bi-fragments are modeled as elastic bi-fragments where a gap between words is able to assume a variable size corresponding to a number of other words to occupy the gap. The target hypothesis is evaluated with a translation scoring function which scores the target hypothesis according to a plurality of feature functions, at least one of the feature functions comprising a gap size scoring feature which favors hypotheses with statistically more probable gap sizes over hypotheses with statically less probable gap sizes.

Type: Grant

Filed: May 10, 2006

Date of Patent: June 2, 2009

Assignee: Xerox Corporation

Inventors: Nicola Cancedda, Marc Dymetman, Eric Gaussier, Cyril Goutte
Machine translation using non-contiguous fragments of text

Patent number: 7536295

Abstract: A machine translation method for translating source text from a first language to target text in a second language includes receiving the source text in the first language and accessing a library of bi-fragments, each of the bi-fragments including a text fragment from the first language and a text fragment from the second language, at least some of the bi-fragments comprising non-contiguous bi-fragments in which at least one of the text fragment from the first language and the text fragment from the second language comprises a non-contiguous fragment.

Type: Grant

Filed: December 22, 2005

Date of Patent: May 19, 2009

Assignee: Xerox Corporation

Inventors: Nicola Cancedda, Bruno Cavestro, Marc Dymetman, Eric Gaussier, Cyril Goutte, Michel Simard, Kenji Yamada
METHOD FOR ALIGNING SENTENCES AT THE WORD LEVEL ENFORCING SELECTIVE CONTIGUITY CONSTRAINTS

Publication number: 20080300857

Abstract: An alignment method includes, for a source sentence in a source language, identifying whether the sentence includes at least one candidate term comprising a contiguous subsequence of words of the source sentence. A target sentence in a target language is aligned with the source sentence. This includes developing a probabilistic model which models conditional probability distributions for alignments between words of the source sentence and words of the target sentence and generating an optimal alignment based on the probabilistic model, including, where the source sentence includes the at least one candidate term, enforcing a contiguity constraint which requires that all the words of the target sentence which are aligned with an identified candidate term form a contiguous subsequence of the target sentence.

Type: Application

Filed: June 1, 2007

Publication date: December 4, 2008

Inventors: Madalina Barbaiani, Nicola Cancedda, Christopher R. Dance, Szilard Zsolt Fazekas, Tamas Gaal, Eric Gaussier
Method and apparatus for explaining categorization decisions

Patent number: 7457808

Abstract: Feature selection is used to determine feature influence for a given categorization decision to identify those features in a categorized document that were important in classifying the document into one or more classes. In one embodiment, model parameters of a categorization model are used to determine the features that contributed to the categorization decision of a document. In another embodiment, the model parameters of the categorization model and the features of the categorized document are used to determine the features that contributed to the categorization decision of a document.

Type: Grant

Filed: December 17, 2004

Date of Patent: November 25, 2008

Assignee: Xerox Corporation

Inventors: Eric Gaussier, Cyril Goutte
Machine translation using elastic chunks

Publication number: 20070265825

Abstract: A machine translation method includes receiving source text in a first language and retrieving text fragments in a target language from a library of bi-fragments to generate a target hypothesis. Each bi-fragment includes a text fragment from the first language and a corresponding text fragment from the second language. Some of the bi-fragments are modeled as elastic bi-fragments where a gap between words is able to assume a variable size corresponding to a number of other words to occupy the gap. The target hypothesis is evaluated with a translation scoring function which scores the target hypothesis according to a plurality of feature functions, at least one of the feature functions comprising a gap size scoring feature which favors hypotheses with statistically more probable gap sizes over hypotheses with statically less probable gap sizes.

Type: Application

Filed: May 10, 2006

Publication date: November 15, 2007

Inventors: Nicola Cancedda, Marc Dymetman, Eric Gaussier, Cyril Goutte
Hierarchical clustering with real-time updating

Publication number: 20070239745

Abstract: A probabilistic clustering system is defined at least in part by probabilistic model parameters indicative of word counts, ratios, or frequencies characterizing classes of the clustering system. An association of one or more documents in the probabilistic clustering system is changed from one or more source classes to one or more destination classes. Probabilistic model parameters characterizing classes affected by the changed association are locally updated without updating probabilistic model parameters characterizing classes not affected by the changed association.

Type: Application

Filed: March 29, 2006

Publication date: October 11, 2007

Inventors: Agnes Guerraz, Caroline Privault, Cyril Goutte, Eric Gaussier, Francois Pacull, Jean-Michel Renders
Machine translation using non-contiguous fragments of text

Publication number: 20070150257

Abstract: A machine translation method for translating source text from a first language to target text in a second language includes receiving the source text in the first language and accessing a library of bi-fragments, each of the bi-fragments including a text fragment from the first language and a text fragment from the second language, at least some of the bi-fragments comprising non-contiguous bi-fragments in which at least one of the text fragment from the first language and the text fragment from the second language comprises a non-contiguous fragment.

Type: Application

Filed: December 22, 2005

Publication date: June 28, 2007

Inventors: Nicola Cancedda, Bruno Cavestro, Marc Dymetman, Eric Gaussier, Cyril Goutte, Michel Simard, Kenji Yamada
Incremental training for probabilistic categorizer

Publication number: 20070005340

Abstract: A probabilistic document categorizer has an associated vocabulary of words and an associated plurality of probabilistic categorizer parameters derived from a collection of documents. A new document is received. The probabilistic categorizer parameters are updated to reflect addition of the new document to the collection of documents based on vocabulary words contained in the new document, a category of the new document, and a collection size parameter indicative of an effective total number of instances of vocabulary words in the collection of documents.

Type: Application

Filed: June 29, 2005

Publication date: January 4, 2007

Inventors: Cyril Goutte, Eric Gaussier
Categorization including dependencies between different category systems

Publication number: 20070005639

Abstract: In categorizing an object respective to at least two categorization dimensions each defined by a plurality of categories, a probability value indicative of the object is determined for each category of each categorization dimension. A categorization label for the object is selected respective to each categorization dimension based on (i) the determined probability values of the categories of that categorization dimension and (ii) the determined probability values of categories of at least one other of the at least two categorization dimensions.

Type: Application

Filed: June 29, 2005

Publication date: January 4, 2007

Inventors: Eric Gaussier, Jean-Michel Renders, Cyril Goutte, Caroline Privault
Method for multi-class, multi-label categorization using probabilistic hierarchical modeling

Patent number: 7139754

Abstract: A method of categorizing objects in which there can be multiple categories of objects and each object can belong to more than one category is described. The method defines a set of categories in which at least one category is dependent on another category and then organizes the categories in a hierarchy that embodies any dependencies among them. Each object is assigned to one or more categories in the set. A set of labels corresponding to all combinations of any number of the categories is defined, wherein if an object is relevant to several categories, the object must be assigned the label corresponding to the subset of all relevant categories. Once the new labels are defined, the multi-category, multi-label problem has been reduced to a multi-category, single-label problem, and the categorization task is reduced down to choosing the single best label set for an object.

Type: Grant

Filed: February 9, 2004

Date of Patent: November 21, 2006

Assignee: Xerox Corporation

Inventors: Cyril Goutte, Eric Gaussier
Apparatus and methods for aligning words in bilingual sentences

Publication number: 20060190241

Abstract: Methods are disclosed for performing proper word alignment that satisfy constraints of coverage and transitive closure. Initially, a translation matrix which defines word association measures between source and target words of a corpus of bilingual translations of source and target sentences is computed. Subsequently, in a first method, the association measures in the translation matrix are factorized and orthogonalized to produce cepts for the source and target words, which resulting matrix factors may then be, optionally, multiplied to produce an alignment matrix. In a second method, the association measures in the translation matrix are thresholded, and then closed by transitivity, to produce an alignment matrix, which may then be, optionally, factorized to produce cepts. The resulting cepts or alignment matrices may then be used by any number of natural language applications for identifying words that are properly aligned.

Type: Application

Filed: May 26, 2005

Publication date: August 24, 2006

Inventors: Cyril Goutte, Michel Simard, Kenji Yamada, Eric Gaussier, Arne Mauser
Method and apparatus for explaining categorization decisions

Publication number: 20060136410

Abstract: Feature selection is used to determine feature influence for a given categorization decision to identify those features in a categorized document that were important in classifying the document into one or more classes. In one embodiment, model parameters of a categorization model are used to determine the features that contributed to the categorization decision of a document. In another embodiment, the model parameters of the categorization model and the features of the categorized document are used to determine the features that contributed to the categorization decision of a document.

Type: Application

Filed: December 17, 2004

Publication date: June 22, 2006

Inventors: Eric Gaussier, Cyril Goutte
Adaptive spam message detector

Publication number: 20060123083

Abstract: Electronic content is filtered to identify spam using image and linguistic processing. A plurality of information type gatherers assimilate and output different message attributes relating to message content associated with an information type. A categorizer may have a plurality of decision makers for providing as output a message class for classifying the message data. A history processor records the message attributes and the class decision as part of the prior history information and/or modifies the prior history information to reflect changes to fixed data and/or probability data. A categorizer coalescer assesses the message class output by the set of decision makers together with optional user input for producing a class decision identifying whether the message data is spam.

Type: Application

Filed: December 3, 2004

Publication date: June 8, 2006

Inventors: Cyril Goutte, Pierre Isabelle, Eric Gaussier, Stephen Kruger

1 2 next