Patents by Inventor Eric Gaussier
Eric Gaussier has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 9020804Abstract: An alignment method includes, for a source sentence in a source language, identifying whether the sentence includes at least one candidate term comprising a contiguous subsequence of words of the source sentence. A target sentence in a target language is aligned with the source sentence. This includes developing a probabilistic model which models conditional probability distributions for alignments between words of the source sentence and words of the target sentence and generating an optimal alignment based on the probabilistic model, including, where the source sentence includes the at least one candidate term, enforcing a contiguity constraint which requires that all the words of the target sentence which are aligned with an identified candidate term form a contiguous subsequence of the target sentence.Type: GrantFiled: June 1, 2007Date of Patent: April 28, 2015Assignee: Xerox CorporationInventors: Madalina Barbaiani, Nicola Cancedda, Christopher R. Dance, Szilárd Zsolt Fazekas, Tamás Gaál, Eric Gaussier
-
Patent number: 7849087Abstract: A probabilistic document categorizer has an associated vocabulary of words and an associated plurality of probabilistic categorizer parameters derived from a collection of documents. A new document is received. The probabilistic categorizer parameters are updated to reflect addition of the new document to the collection of documents based on vocabulary words contained in the new document, a category of the new document, and a collection size parameter indicative of an effective total number of instances of vocabulary words in the collection of documents.Type: GrantFiled: June 29, 2005Date of Patent: December 7, 2010Assignee: Xerox CorporationInventors: Cyril Goutte, Eric Gaussier
-
Patent number: 7720848Abstract: A probabilistic clustering system is defined at least in part by probabilistic model parameters indicative of word counts, ratios, or frequencies characterizing classes of the clustering system. An association of one or more documents in the probabilistic clustering system is changed from one or more source classes to one or more destination classes. Probabilistic model parameters characterizing classes affected by the changed association are locally updated without updating probabilistic model parameters characterizing classes not affected by the changed association.Type: GrantFiled: March 29, 2006Date of Patent: May 18, 2010Assignee: Xerox CorporationInventors: Agnes Guerraz, Caroline Privault, Cyril Goutte, Eric Gaussier, Francois Pacull, Jean-Michel Renders
-
Patent number: 7672830Abstract: Methods are disclosed for performing proper word alignment that satisfy constraints of coverage and transitive closure. Initially, a translation matrix which defines word association measures between source and target words of a corpus of bilingual translations of source and target sentences is computed. Subsequently, in a first method, the association measures in the translation matrix are factorized and orthogonalized to produce cepts for the source and target words, which resulting matrix factors may then be, optionally, multiplied to produce an alignment matrix. In a second method, the association measures in the translation matrix are thresholded, and then closed by transitivity, to produce an alignment matrix, which may then be, optionally, factorized to produce cepts. The resulting cepts or alignment matrices may then be used by any number of natural language applications for identifying words that are properly aligned.Type: GrantFiled: May 26, 2005Date of Patent: March 2, 2010Assignee: Xerox CorporationInventors: Cyril Goutte, Michel Simard, Kenji Yamada, Eric Gaussier, Arne Mauser
-
Patent number: 7644102Abstract: Methods, systems, and articles of manufacture consistent with certain principles related to the present invention enable a computing system to perform hierarchical topical clustering of text data based on statistical modeling of co-occurrences of (document, word) pairs. The computing system may be configured to receive a collection of documents, each document including a plurality of words, and perform a modified deterministic annealing Expectation-Maximization (EM) process on the collection to produce a softly assigned hierarchy of nodes. The process may involve assigning documents and document fragments to multiple nodes in the hierarchy based on words included in the documents, such that a document may be assigned to any ancestor node included in the hierarchy, thus eliminating the hard assignment of documents in the hierarchy.Type: GrantFiled: October 19, 2001Date of Patent: January 5, 2010Assignee: Xerox CorporationInventors: Eric Gaussier, Francine Chen, Ashok Chhabedia Popat
-
Patent number: 7630977Abstract: In categorizing an object respective to at least two categorization dimensions each defined by a plurality of categories, a probability value indicative of the object is determined for each category of each categorization dimension. A categorization label for the object is selected respective to each categorization dimension based on (i) the determined probability values of the categories of that categorization dimension and (ii) the determined probability values of categories of at least one other of the at least two categorization dimensions.Type: GrantFiled: June 29, 2005Date of Patent: December 8, 2009Assignee: Xerox CorporationInventors: Eric Gaussier, Jean-Michel Renders, Cyril Goutte, Caroline Privault
-
Patent number: 7620539Abstract: Various methods formulated using a geometric interpretation for identifying bilingual pairs in comparable corpora using a bilingual dictionary are disclosed. The methods may be used separately or in combination to compute the similarity between bilingual pairs.Type: GrantFiled: November 1, 2004Date of Patent: November 17, 2009Assignee: Xerox CorporationInventors: Eric Gaussier, Jean-Michel Renders, Herve Dejean, Cyril Goutte, Irina Matveeva
-
Patent number: 7542893Abstract: A machine translation method includes receiving source text in a first language and retrieving text fragments in a target language from a library of bi-fragments to generate a target hypothesis. Each bi-fragment includes a text fragment from the first language and a corresponding text fragment from the second language. Some of the bi-fragments are modeled as elastic bi-fragments where a gap between words is able to assume a variable size corresponding to a number of other words to occupy the gap. The target hypothesis is evaluated with a translation scoring function which scores the target hypothesis according to a plurality of feature functions, at least one of the feature functions comprising a gap size scoring feature which favors hypotheses with statistically more probable gap sizes over hypotheses with statically less probable gap sizes.Type: GrantFiled: May 10, 2006Date of Patent: June 2, 2009Assignee: Xerox CorporationInventors: Nicola Cancedda, Marc Dymetman, Eric Gaussier, Cyril Goutte
-
Patent number: 7536295Abstract: A machine translation method for translating source text from a first language to target text in a second language includes receiving the source text in the first language and accessing a library of bi-fragments, each of the bi-fragments including a text fragment from the first language and a text fragment from the second language, at least some of the bi-fragments comprising non-contiguous bi-fragments in which at least one of the text fragment from the first language and the text fragment from the second language comprises a non-contiguous fragment.Type: GrantFiled: December 22, 2005Date of Patent: May 19, 2009Assignee: Xerox CorporationInventors: Nicola Cancedda, Bruno Cavestro, Marc Dymetman, Eric Gaussier, Cyril Goutte, Michel Simard, Kenji Yamada
-
Publication number: 20080300857Abstract: An alignment method includes, for a source sentence in a source language, identifying whether the sentence includes at least one candidate term comprising a contiguous subsequence of words of the source sentence. A target sentence in a target language is aligned with the source sentence. This includes developing a probabilistic model which models conditional probability distributions for alignments between words of the source sentence and words of the target sentence and generating an optimal alignment based on the probabilistic model, including, where the source sentence includes the at least one candidate term, enforcing a contiguity constraint which requires that all the words of the target sentence which are aligned with an identified candidate term form a contiguous subsequence of the target sentence.Type: ApplicationFiled: June 1, 2007Publication date: December 4, 2008Inventors: Madalina Barbaiani, Nicola Cancedda, Christopher R. Dance, Szilard Zsolt Fazekas, Tamas Gaal, Eric Gaussier
-
Patent number: 7457808Abstract: Feature selection is used to determine feature influence for a given categorization decision to identify those features in a categorized document that were important in classifying the document into one or more classes. In one embodiment, model parameters of a categorization model are used to determine the features that contributed to the categorization decision of a document. In another embodiment, the model parameters of the categorization model and the features of the categorized document are used to determine the features that contributed to the categorization decision of a document.Type: GrantFiled: December 17, 2004Date of Patent: November 25, 2008Assignee: Xerox CorporationInventors: Eric Gaussier, Cyril Goutte
-
Publication number: 20070265825Abstract: A machine translation method includes receiving source text in a first language and retrieving text fragments in a target language from a library of bi-fragments to generate a target hypothesis. Each bi-fragment includes a text fragment from the first language and a corresponding text fragment from the second language. Some of the bi-fragments are modeled as elastic bi-fragments where a gap between words is able to assume a variable size corresponding to a number of other words to occupy the gap. The target hypothesis is evaluated with a translation scoring function which scores the target hypothesis according to a plurality of feature functions, at least one of the feature functions comprising a gap size scoring feature which favors hypotheses with statistically more probable gap sizes over hypotheses with statically less probable gap sizes.Type: ApplicationFiled: May 10, 2006Publication date: November 15, 2007Inventors: Nicola Cancedda, Marc Dymetman, Eric Gaussier, Cyril Goutte
-
Publication number: 20070239745Abstract: A probabilistic clustering system is defined at least in part by probabilistic model parameters indicative of word counts, ratios, or frequencies characterizing classes of the clustering system. An association of one or more documents in the probabilistic clustering system is changed from one or more source classes to one or more destination classes. Probabilistic model parameters characterizing classes affected by the changed association are locally updated without updating probabilistic model parameters characterizing classes not affected by the changed association.Type: ApplicationFiled: March 29, 2006Publication date: October 11, 2007Inventors: Agnes Guerraz, Caroline Privault, Cyril Goutte, Eric Gaussier, Francois Pacull, Jean-Michel Renders
-
Publication number: 20070150257Abstract: A machine translation method for translating source text from a first language to target text in a second language includes receiving the source text in the first language and accessing a library of bi-fragments, each of the bi-fragments including a text fragment from the first language and a text fragment from the second language, at least some of the bi-fragments comprising non-contiguous bi-fragments in which at least one of the text fragment from the first language and the text fragment from the second language comprises a non-contiguous fragment.Type: ApplicationFiled: December 22, 2005Publication date: June 28, 2007Inventors: Nicola Cancedda, Bruno Cavestro, Marc Dymetman, Eric Gaussier, Cyril Goutte, Michel Simard, Kenji Yamada
-
Publication number: 20070005340Abstract: A probabilistic document categorizer has an associated vocabulary of words and an associated plurality of probabilistic categorizer parameters derived from a collection of documents. A new document is received. The probabilistic categorizer parameters are updated to reflect addition of the new document to the collection of documents based on vocabulary words contained in the new document, a category of the new document, and a collection size parameter indicative of an effective total number of instances of vocabulary words in the collection of documents.Type: ApplicationFiled: June 29, 2005Publication date: January 4, 2007Inventors: Cyril Goutte, Eric Gaussier
-
Publication number: 20070005639Abstract: In categorizing an object respective to at least two categorization dimensions each defined by a plurality of categories, a probability value indicative of the object is determined for each category of each categorization dimension. A categorization label for the object is selected respective to each categorization dimension based on (i) the determined probability values of the categories of that categorization dimension and (ii) the determined probability values of categories of at least one other of the at least two categorization dimensions.Type: ApplicationFiled: June 29, 2005Publication date: January 4, 2007Inventors: Eric Gaussier, Jean-Michel Renders, Cyril Goutte, Caroline Privault
-
Patent number: 7139754Abstract: A method of categorizing objects in which there can be multiple categories of objects and each object can belong to more than one category is described. The method defines a set of categories in which at least one category is dependent on another category and then organizes the categories in a hierarchy that embodies any dependencies among them. Each object is assigned to one or more categories in the set. A set of labels corresponding to all combinations of any number of the categories is defined, wherein if an object is relevant to several categories, the object must be assigned the label corresponding to the subset of all relevant categories. Once the new labels are defined, the multi-category, multi-label problem has been reduced to a multi-category, single-label problem, and the categorization task is reduced down to choosing the single best label set for an object.Type: GrantFiled: February 9, 2004Date of Patent: November 21, 2006Assignee: Xerox CorporationInventors: Cyril Goutte, Eric Gaussier
-
Publication number: 20060190241Abstract: Methods are disclosed for performing proper word alignment that satisfy constraints of coverage and transitive closure. Initially, a translation matrix which defines word association measures between source and target words of a corpus of bilingual translations of source and target sentences is computed. Subsequently, in a first method, the association measures in the translation matrix are factorized and orthogonalized to produce cepts for the source and target words, which resulting matrix factors may then be, optionally, multiplied to produce an alignment matrix. In a second method, the association measures in the translation matrix are thresholded, and then closed by transitivity, to produce an alignment matrix, which may then be, optionally, factorized to produce cepts. The resulting cepts or alignment matrices may then be used by any number of natural language applications for identifying words that are properly aligned.Type: ApplicationFiled: May 26, 2005Publication date: August 24, 2006Inventors: Cyril Goutte, Michel Simard, Kenji Yamada, Eric Gaussier, Arne Mauser
-
Publication number: 20060136410Abstract: Feature selection is used to determine feature influence for a given categorization decision to identify those features in a categorized document that were important in classifying the document into one or more classes. In one embodiment, model parameters of a categorization model are used to determine the features that contributed to the categorization decision of a document. In another embodiment, the model parameters of the categorization model and the features of the categorized document are used to determine the features that contributed to the categorization decision of a document.Type: ApplicationFiled: December 17, 2004Publication date: June 22, 2006Inventors: Eric Gaussier, Cyril Goutte
-
Publication number: 20060123083Abstract: Electronic content is filtered to identify spam using image and linguistic processing. A plurality of information type gatherers assimilate and output different message attributes relating to message content associated with an information type. A categorizer may have a plurality of decision makers for providing as output a message class for classifying the message data. A history processor records the message attributes and the class decision as part of the prior history information and/or modifies the prior history information to reflect changes to fixed data and/or probability data. A categorizer coalescer assesses the message class output by the set of decision makers together with optional user input for producing a class decision identifying whether the message data is spam.Type: ApplicationFiled: December 3, 2004Publication date: June 8, 2006Inventors: Cyril Goutte, Pierre Isabelle, Eric Gaussier, Stephen Kruger