Patents by Inventor Matthias Galle

Matthias Galle has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 12361214
    Abstract: There is disclosed a computer-implemented method for detecting machine-generated documents in a collection of documents including machine-generated and human-authored documents. The computer-implemented method includes computing a set of long-repeated substrings (such as super-maximal repeats) with respect to the collection of documents and using a subset of the long-repeated substrings to designate documents containing the subset of the repeated substrings as machine-generated. The documents designated as machine-generated serve as positive examples of machine-generated documents and a set of documents including at least one human-authored document serves as negative examples of machine-generated documents. A plurality of classifiers are trained with a dataset including both the positive and negative examples of machine-generated documents. Classified output of the classifiers is then used to detect an extent to which a given document of the dataset is machine-generated.
    Type: Grant
    Filed: August 5, 2022
    Date of Patent: July 15, 2025
    Assignee: Naver Corporation
    Inventors: Matthias Galle, Hady Elsahar, Joseph Rozen, German Kruszewski
  • Patent number: 11907663
    Abstract: A system includes: a natural language processing (NLP) model trained in a training domain and configured to perform natural language processing on an input dataset; an accuracy module configured to: calculate a domain shift metric based on the input dataset; and calculate a predicted decrease in accuracy of the NLP model attributable to domain shift relative to the training domain based on the domain shift metric; and a retraining module configured to selectively trigger a retraining of the NLP model based on the predicted decrease in accuracy of the NLP model.
    Type: Grant
    Filed: April 26, 2021
    Date of Patent: February 20, 2024
    Assignee: NAVER FRANCE
    Inventors: Matthias Galle, Hady Elsahar
  • Patent number: 11797591
    Abstract: A method for generating enriched training data for a multi-source transformer neural network for generation of a summary of one or more passages of input text comprises creating, from a plurality of input text sets, training points each comprising an input text subset of the input text set and a corresponding reference input text from the input text set, wherein the size of the input text subset is a predetermined number. Control codes are selected based on reference features corresponding to categorical labels of reference texts in the created training points. The input text is enriched with the selected control codes to generate enriched training data.
    Type: Grant
    Filed: March 5, 2021
    Date of Patent: October 24, 2023
    Assignee: NAVER CORPORATION
    Inventors: Matthias Galle, Maximin Coavoux, Hady Elsahar
  • Patent number: 11494564
    Abstract: A multi-document summarization system includes: an encoding module configured to receive multiple documents associated with a subject and to, using a first model, generate vector representations for sentences, respectively, of the documents; a grouping module configured to group first and second ones of the sentences associated with first and second aspects into first and second groups, respectively; a group representation module configured to generate a first vector representation based on the first ones of the sentences and a second vector representation based on the second ones of the sentences; a summary module configured to: using a second model: generate a first sentence regarding the first aspect based on the first vector representation; and generate a second sentence regarding the second aspect based on the second vector representation; and store a summary including the first and second sentences in memory in association with the subject.
    Type: Grant
    Filed: March 27, 2020
    Date of Patent: November 8, 2022
    Assignee: NAVER CORPORATION
    Inventors: Hady Elsahar, Maximin Coavoux, Matthias Galle
  • Patent number: 9760546
    Abstract: A system and method of identifying repeat subsequences having at least a value of x for threshold of different left contexts and a value of y for a threshold of different right contexts for an input sequence are disclosed. The method may include generating a lexicographically sorted suffix array for the input sequence and a longest common prefix array. The suffix array is traversed in lexicographic order comparing the longest common prefix values between consecutive suffixes. Suffixes with the same longest common prefix are representative of occurrence of the same repeat, a higher longest common prefix indicates a new occurrence of a longer repeat, and a lower longest common prefix indicates the last occurrence of a repeat.
    Type: Grant
    Filed: May 24, 2013
    Date of Patent: September 12, 2017
    Assignee: XEROX CORPORATION
    Inventor: Matthias Galle
  • Patent number: 9483463
    Abstract: A method, system, and computer program product for extracting text motifs from the electronic documents is disclosed. A user provides a largest-maximal repeat or a super-maximal repeat as a first text block. The occurrences of the first text block are detected to identify the second text blocks in the vicinity of the occurrences of the first text block on the basis of pre-defined parameters. The text motifs are determined by combining the first text block and the second text block. Finally, the text motifs are extracted from the electronic documents.
    Type: Grant
    Filed: September 10, 2012
    Date of Patent: November 1, 2016
    Assignee: Xerox Corporation
    Inventors: Matthias Galle, Jean-Michel Renders
  • Patent number: 9268749
    Abstract: A method of updating a suffix tree includes providing an initial suffix tree based on a first sequence of symbols drawn from an alphabet. The suffix tree includes existing nodes representing respective subsequences occurring in the first sequence of symbols. The existing nodes are associated with information relating to membership of the subsequences in at least one class of repeat subsequences. A second sequence of symbols is received and the initial suffix tree is updated to form an updated suffix tree by adding new nodes representing subsequences occurring in the second sequence of symbols that are not represented by the existing nodes. The subsequences represented by the new nodes are ordered in a new node data structure which is processed to updating the information relating to the at least one class of repeat subsequences associated with at least some of the nodes in the updated suffix tree.
    Type: Grant
    Filed: October 7, 2013
    Date of Patent: February 23, 2016
    Assignee: XEROX CORPORATION
    Inventors: Matias D. Tealdi, Matthias Galle
  • Patent number: 9201980
    Abstract: A method for reconstruction includes providing a directed input graph generated from a set of n-grams and statistics for the n-grams, edges of the graph being joined through nodes of the graph. Each edge has an associated label and a multiplicity of at least one. Each of the n-grams in the set being represented by a respective one of the labels, whereby a Eulerian cycle through the graph traverses each edge the respective multiplicity of times. Reduction rules are applied iteratively to generate a refined graph which is both irreducible and equivalent to the input graph. Information is output based on the labels of the refined graph.
    Type: Grant
    Filed: November 19, 2013
    Date of Patent: December 1, 2015
    Assignee: XEROX Corporation
    Inventors: Matias D. Tealdi, Matthias Galle
  • Patent number: 9183193
    Abstract: A system and method for representing a textual document based on the occurrence of repeats are disclosed. The system includes a sequence generator which defines a sequence representing words forming a collection of documents. A repeat calculator identifies a set of repeats within the sequence, the set of repeats comprising subsequences of the sequence which each occur more than once. A representation generator generates a representation for at least one document in the collection of documents based on occurrence, in the document, of repeats from the set of repeats.
    Type: Grant
    Filed: February 12, 2013
    Date of Patent: November 10, 2015
    Assignee: XEROX CORPORATION
    Inventor: Matthias Galle
  • Publication number: 20150142853
    Abstract: A method for reconstruction includes providing a directed input graph generated from a set of n-grams and statistics for the n-grams, edges of the graph being joined through nodes of the graph. Each edge has an associated label and a multiplicity of at least one. Each of the n-grams in the set being represented by a respective one of the labels, whereby a Eulerian cycle through the graph traverses each edge the respective multiplicity of times. Reduction rules are applied iteratively to generate a refined graph which is both irreducible and equivalent to the input graph. Information is output based on the labels of the refined graph.
    Type: Application
    Filed: November 19, 2013
    Publication date: May 21, 2015
    Applicant: Xerox Corporation
    Inventors: Matias D. Tealdi, Matthias Galle
  • Publication number: 20150100304
    Abstract: A method of updating a suffix tree includes providing an initial suffix tree based on a first sequence of symbols drawn from an alphabet. The suffix tree includes existing nodes representing respective subsequences occurring in the first sequence of symbols. The existing nodes are associated with information relating to membership of the subsequences in at least one class of repeat subsequences. A second sequence of symbols is received and the initial suffix tree is updated to form an updated suffix tree by adding new nodes representing subsequences occurring in the second sequence of symbols that are not represented by the existing nodes. The subsequences represented by the new nodes are ordered in a new node data structure which is processed to updating the information relating to the at least one class of repeat subsequences associated with at least some of the nodes in the updated suffix tree.
    Type: Application
    Filed: October 7, 2013
    Publication date: April 9, 2015
    Applicant: Xerox Corporation
    Inventors: Matias D. Tealdi, Matthias Galle
  • Publication number: 20140350917
    Abstract: A system and method of identifying repeat subsequences having at least a value of x for threshold of different left contexts and a value of y for a threshold of different right contexts for an input sequence are disclosed. The method may include generating a lexicographically sorted suffix array for the input sequence and a longest common prefix array. The suffix array is traversed in lexicographic order comparing the longest common prefix values between consecutive suffixes. Suffixes with the same longest common prefix are representative of occurrence of the same repeat, a higher longest common prefix indicates a new occurrence of a longer repeat, and a lower longest common prefix indicates the last occurrence of a repeat.
    Type: Application
    Filed: May 24, 2013
    Publication date: November 27, 2014
    Applicant: Xerox Corporation
    Inventor: Matthias Galle
  • Patent number: 8880525
    Abstract: A method for clustering documents is provided. Each document is represented by a multidimensional data point. The data points are initially assigned to a respective cluster and serve as their initial representative points. Thereafter, in an iterative process, the data points are clustered among the clusters, by assigning the data points to the clusters based on a comparison measure of each data point with the cluster or its representative point, and a threshold of the comparison measure. Based on this clustering, a new representative point for each of the clusters can be computed. Optionally, overlapping clusters are merged. For the next iteration, the new representative points are used as the representative points. An assignment of the documents to the clusters is output, based on a clustering of the data points in the latest iteration. Multiple batches may be processed, retaining the initial clusters to which the original batch was assigned.
    Type: Grant
    Filed: April 2, 2012
    Date of Patent: November 4, 2014
    Assignee: Xerox Corporation
    Inventors: Matthias Galle, Jean-Michel Renders
  • Publication number: 20140229160
    Abstract: A system and method for representing a textual document based on the occurrence of repeats are disclosed. The system includes a sequence generator which defines a sequence representing words forming a collection of documents. A repeat calculator identifies a set of repeats within the sequence, the set of repeats comprising subsequences of the sequence which each occur more than once. A representation generator generates a representation for at least one document in the collection of documents based on occurrence, in the document, of repeats from the set of repeats.
    Type: Application
    Filed: February 12, 2013
    Publication date: August 14, 2014
    Applicant: Xerox Corporation
    Inventor: Matthias Galle
  • Publication number: 20140074455
    Abstract: A method, system, and computer program product for extracting text motifs from the electronic documents is disclosed. A user provides a largest-maximal repeat or a super-maximal repeat as a first text block. The occurrences of the first text block are detected to identify the second text blocks in the vicinity of the occurrences of the first text block on the basis of pre-defined parameters. The text motifs are determined by combining the first text block and the second text block. Finally, the text motifs are extracted from the electronic documents.
    Type: Application
    Filed: September 10, 2012
    Publication date: March 13, 2014
    Applicant: Xerox Corporation
    Inventors: Matthias Galle, Jean-Michel Renders
  • Publication number: 20130262465
    Abstract: A method for clustering documents is provided. Each document is represented by a multidimensional data point. The data points are initially assigned to a respective cluster and serve as their initial representative points. Thereafter, in an iterative process, the data points are clustered among the clusters, by assigning the data points to the clusters based on a comparison measure of each data point with the cluster or its representative point, and a threshold of the comparison measure. Based on this clustering, a new representative point for each of the clusters can be computed. Optionally, overlapping clusters are merged. For the next iteration, the new representative points are used as the representative points. An assignment of the documents to the clusters is output, based on a clustering of the data points in the latest iteration. Multiple batches may be processed, retaining the initial clusters to which the original batch was assigned.
    Type: Application
    Filed: April 2, 2012
    Publication date: October 3, 2013
    Applicant: Xerox Corporation
    Inventors: Matthias Galle, Jean-Michel Renders