Patents by Inventor Matthias Galle
Matthias Galle has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 12361214Abstract: There is disclosed a computer-implemented method for detecting machine-generated documents in a collection of documents including machine-generated and human-authored documents. The computer-implemented method includes computing a set of long-repeated substrings (such as super-maximal repeats) with respect to the collection of documents and using a subset of the long-repeated substrings to designate documents containing the subset of the repeated substrings as machine-generated. The documents designated as machine-generated serve as positive examples of machine-generated documents and a set of documents including at least one human-authored document serves as negative examples of machine-generated documents. A plurality of classifiers are trained with a dataset including both the positive and negative examples of machine-generated documents. Classified output of the classifiers is then used to detect an extent to which a given document of the dataset is machine-generated.Type: GrantFiled: August 5, 2022Date of Patent: July 15, 2025Assignee: Naver CorporationInventors: Matthias Galle, Hady Elsahar, Joseph Rozen, German Kruszewski
-
Patent number: 11907663Abstract: A system includes: a natural language processing (NLP) model trained in a training domain and configured to perform natural language processing on an input dataset; an accuracy module configured to: calculate a domain shift metric based on the input dataset; and calculate a predicted decrease in accuracy of the NLP model attributable to domain shift relative to the training domain based on the domain shift metric; and a retraining module configured to selectively trigger a retraining of the NLP model based on the predicted decrease in accuracy of the NLP model.Type: GrantFiled: April 26, 2021Date of Patent: February 20, 2024Assignee: NAVER FRANCEInventors: Matthias Galle, Hady Elsahar
-
Patent number: 11797591Abstract: A method for generating enriched training data for a multi-source transformer neural network for generation of a summary of one or more passages of input text comprises creating, from a plurality of input text sets, training points each comprising an input text subset of the input text set and a corresponding reference input text from the input text set, wherein the size of the input text subset is a predetermined number. Control codes are selected based on reference features corresponding to categorical labels of reference texts in the created training points. The input text is enriched with the selected control codes to generate enriched training data.Type: GrantFiled: March 5, 2021Date of Patent: October 24, 2023Assignee: NAVER CORPORATIONInventors: Matthias Galle, Maximin Coavoux, Hady Elsahar
-
Patent number: 11494564Abstract: A multi-document summarization system includes: an encoding module configured to receive multiple documents associated with a subject and to, using a first model, generate vector representations for sentences, respectively, of the documents; a grouping module configured to group first and second ones of the sentences associated with first and second aspects into first and second groups, respectively; a group representation module configured to generate a first vector representation based on the first ones of the sentences and a second vector representation based on the second ones of the sentences; a summary module configured to: using a second model: generate a first sentence regarding the first aspect based on the first vector representation; and generate a second sentence regarding the second aspect based on the second vector representation; and store a summary including the first and second sentences in memory in association with the subject.Type: GrantFiled: March 27, 2020Date of Patent: November 8, 2022Assignee: NAVER CORPORATIONInventors: Hady Elsahar, Maximin Coavoux, Matthias Galle
-
Patent number: 9760546Abstract: A system and method of identifying repeat subsequences having at least a value of x for threshold of different left contexts and a value of y for a threshold of different right contexts for an input sequence are disclosed. The method may include generating a lexicographically sorted suffix array for the input sequence and a longest common prefix array. The suffix array is traversed in lexicographic order comparing the longest common prefix values between consecutive suffixes. Suffixes with the same longest common prefix are representative of occurrence of the same repeat, a higher longest common prefix indicates a new occurrence of a longer repeat, and a lower longest common prefix indicates the last occurrence of a repeat.Type: GrantFiled: May 24, 2013Date of Patent: September 12, 2017Assignee: XEROX CORPORATIONInventor: Matthias Galle
-
Patent number: 9483463Abstract: A method, system, and computer program product for extracting text motifs from the electronic documents is disclosed. A user provides a largest-maximal repeat or a super-maximal repeat as a first text block. The occurrences of the first text block are detected to identify the second text blocks in the vicinity of the occurrences of the first text block on the basis of pre-defined parameters. The text motifs are determined by combining the first text block and the second text block. Finally, the text motifs are extracted from the electronic documents.Type: GrantFiled: September 10, 2012Date of Patent: November 1, 2016Assignee: Xerox CorporationInventors: Matthias Galle, Jean-Michel Renders
-
Patent number: 9268749Abstract: A method of updating a suffix tree includes providing an initial suffix tree based on a first sequence of symbols drawn from an alphabet. The suffix tree includes existing nodes representing respective subsequences occurring in the first sequence of symbols. The existing nodes are associated with information relating to membership of the subsequences in at least one class of repeat subsequences. A second sequence of symbols is received and the initial suffix tree is updated to form an updated suffix tree by adding new nodes representing subsequences occurring in the second sequence of symbols that are not represented by the existing nodes. The subsequences represented by the new nodes are ordered in a new node data structure which is processed to updating the information relating to the at least one class of repeat subsequences associated with at least some of the nodes in the updated suffix tree.Type: GrantFiled: October 7, 2013Date of Patent: February 23, 2016Assignee: XEROX CORPORATIONInventors: Matias D. Tealdi, Matthias Galle
-
Patent number: 9201980Abstract: A method for reconstruction includes providing a directed input graph generated from a set of n-grams and statistics for the n-grams, edges of the graph being joined through nodes of the graph. Each edge has an associated label and a multiplicity of at least one. Each of the n-grams in the set being represented by a respective one of the labels, whereby a Eulerian cycle through the graph traverses each edge the respective multiplicity of times. Reduction rules are applied iteratively to generate a refined graph which is both irreducible and equivalent to the input graph. Information is output based on the labels of the refined graph.Type: GrantFiled: November 19, 2013Date of Patent: December 1, 2015Assignee: XEROX CorporationInventors: Matias D. Tealdi, Matthias Galle
-
Patent number: 9183193Abstract: A system and method for representing a textual document based on the occurrence of repeats are disclosed. The system includes a sequence generator which defines a sequence representing words forming a collection of documents. A repeat calculator identifies a set of repeats within the sequence, the set of repeats comprising subsequences of the sequence which each occur more than once. A representation generator generates a representation for at least one document in the collection of documents based on occurrence, in the document, of repeats from the set of repeats.Type: GrantFiled: February 12, 2013Date of Patent: November 10, 2015Assignee: XEROX CORPORATIONInventor: Matthias Galle
-
Publication number: 20150142853Abstract: A method for reconstruction includes providing a directed input graph generated from a set of n-grams and statistics for the n-grams, edges of the graph being joined through nodes of the graph. Each edge has an associated label and a multiplicity of at least one. Each of the n-grams in the set being represented by a respective one of the labels, whereby a Eulerian cycle through the graph traverses each edge the respective multiplicity of times. Reduction rules are applied iteratively to generate a refined graph which is both irreducible and equivalent to the input graph. Information is output based on the labels of the refined graph.Type: ApplicationFiled: November 19, 2013Publication date: May 21, 2015Applicant: Xerox CorporationInventors: Matias D. Tealdi, Matthias Galle
-
Publication number: 20150100304Abstract: A method of updating a suffix tree includes providing an initial suffix tree based on a first sequence of symbols drawn from an alphabet. The suffix tree includes existing nodes representing respective subsequences occurring in the first sequence of symbols. The existing nodes are associated with information relating to membership of the subsequences in at least one class of repeat subsequences. A second sequence of symbols is received and the initial suffix tree is updated to form an updated suffix tree by adding new nodes representing subsequences occurring in the second sequence of symbols that are not represented by the existing nodes. The subsequences represented by the new nodes are ordered in a new node data structure which is processed to updating the information relating to the at least one class of repeat subsequences associated with at least some of the nodes in the updated suffix tree.Type: ApplicationFiled: October 7, 2013Publication date: April 9, 2015Applicant: Xerox CorporationInventors: Matias D. Tealdi, Matthias Galle
-
Publication number: 20140350917Abstract: A system and method of identifying repeat subsequences having at least a value of x for threshold of different left contexts and a value of y for a threshold of different right contexts for an input sequence are disclosed. The method may include generating a lexicographically sorted suffix array for the input sequence and a longest common prefix array. The suffix array is traversed in lexicographic order comparing the longest common prefix values between consecutive suffixes. Suffixes with the same longest common prefix are representative of occurrence of the same repeat, a higher longest common prefix indicates a new occurrence of a longer repeat, and a lower longest common prefix indicates the last occurrence of a repeat.Type: ApplicationFiled: May 24, 2013Publication date: November 27, 2014Applicant: Xerox CorporationInventor: Matthias Galle
-
Patent number: 8880525Abstract: A method for clustering documents is provided. Each document is represented by a multidimensional data point. The data points are initially assigned to a respective cluster and serve as their initial representative points. Thereafter, in an iterative process, the data points are clustered among the clusters, by assigning the data points to the clusters based on a comparison measure of each data point with the cluster or its representative point, and a threshold of the comparison measure. Based on this clustering, a new representative point for each of the clusters can be computed. Optionally, overlapping clusters are merged. For the next iteration, the new representative points are used as the representative points. An assignment of the documents to the clusters is output, based on a clustering of the data points in the latest iteration. Multiple batches may be processed, retaining the initial clusters to which the original batch was assigned.Type: GrantFiled: April 2, 2012Date of Patent: November 4, 2014Assignee: Xerox CorporationInventors: Matthias Galle, Jean-Michel Renders
-
Publication number: 20140229160Abstract: A system and method for representing a textual document based on the occurrence of repeats are disclosed. The system includes a sequence generator which defines a sequence representing words forming a collection of documents. A repeat calculator identifies a set of repeats within the sequence, the set of repeats comprising subsequences of the sequence which each occur more than once. A representation generator generates a representation for at least one document in the collection of documents based on occurrence, in the document, of repeats from the set of repeats.Type: ApplicationFiled: February 12, 2013Publication date: August 14, 2014Applicant: Xerox CorporationInventor: Matthias Galle
-
Publication number: 20140074455Abstract: A method, system, and computer program product for extracting text motifs from the electronic documents is disclosed. A user provides a largest-maximal repeat or a super-maximal repeat as a first text block. The occurrences of the first text block are detected to identify the second text blocks in the vicinity of the occurrences of the first text block on the basis of pre-defined parameters. The text motifs are determined by combining the first text block and the second text block. Finally, the text motifs are extracted from the electronic documents.Type: ApplicationFiled: September 10, 2012Publication date: March 13, 2014Applicant: Xerox CorporationInventors: Matthias Galle, Jean-Michel Renders
-
Publication number: 20130262465Abstract: A method for clustering documents is provided. Each document is represented by a multidimensional data point. The data points are initially assigned to a respective cluster and serve as their initial representative points. Thereafter, in an iterative process, the data points are clustered among the clusters, by assigning the data points to the clusters based on a comparison measure of each data point with the cluster or its representative point, and a threshold of the comparison measure. Based on this clustering, a new representative point for each of the clusters can be computed. Optionally, overlapping clusters are merged. For the next iteration, the new representative points are used as the representative points. An assignment of the documents to the clusters is output, based on a clustering of the data points in the latest iteration. Multiple batches may be processed, retaining the initial clusters to which the original batch was assigned.Type: ApplicationFiled: April 2, 2012Publication date: October 3, 2013Applicant: Xerox CorporationInventors: Matthias Galle, Jean-Michel Renders