Patents by Inventor Matthias Gallé

Matthias Gallé has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11907663
    Abstract: A system includes: a natural language processing (NLP) model trained in a training domain and configured to perform natural language processing on an input dataset; an accuracy module configured to: calculate a domain shift metric based on the input dataset; and calculate a predicted decrease in accuracy of the NLP model attributable to domain shift relative to the training domain based on the domain shift metric; and a retraining module configured to selectively trigger a retraining of the NLP model based on the predicted decrease in accuracy of the NLP model.
    Type: Grant
    Filed: April 26, 2021
    Date of Patent: February 20, 2024
    Assignee: NAVER FRANCE
    Inventors: Matthias Galle, Hady Elsahar
  • Patent number: 11797591
    Abstract: A method for generating enriched training data for a multi-source transformer neural network for generation of a summary of one or more passages of input text comprises creating, from a plurality of input text sets, training points each comprising an input text subset of the input text set and a corresponding reference input text from the input text set, wherein the size of the input text subset is a predetermined number. Control codes are selected based on reference features corresponding to categorical labels of reference texts in the created training points. The input text is enriched with the selected control codes to generate enriched training data.
    Type: Grant
    Filed: March 5, 2021
    Date of Patent: October 24, 2023
    Assignee: NAVER CORPORATION
    Inventors: Matthias Galle, Maximin Coavoux, Hady Elsahar
  • Publication number: 20230214605
    Abstract: Methods and systems for unsupervised training for a neural multilingual sequence-to-sequence (seq2seq) model. Denoising adapters for each of one or more languages is inserted into an encoder and/or a decoder of the seq2seq model. Parameters of the one or more denoising adapters are trained on a language-specific denoising task using monolingual text for each of the one or more languages. Cross-attention weights of the seq2seq model with the trained denoising adapter layers are fine-tuned on a translation task in at least one of the one or more languages with parallel data.
    Type: Application
    Filed: September 9, 2022
    Publication date: July 6, 2023
    Inventors: Alexandre BÉRARD, Laurent BESACIER, Matthias GALLÉ, Ahmet ÜSTÜN
  • Publication number: 20230109734
    Abstract: There is disclosed a computer-implemented method for detecting machine-generated documents in a collection of documents including machine-generated and human-authored documents. The computer-implemented method includes computing a set of long-repeated substrings (such as super-maximal repeats) with respect to the collection of documents and using a subset of the long-repeated substrings to designate documents containing the subset of the repeated substrings as machine-generated. The documents designated as machine-generated serve as positive examples of machine-generated documents and a set of documents including at least one human-authored document serves as negative examples of machine-generated documents. A plurality of classifiers are trained with a dataset including both the positive and negative examples of machine-generated documents. Classified output of the classifiers is then used to detect an extent to which a given document of the dataset is machine-generated.
    Type: Application
    Filed: August 5, 2022
    Publication date: April 13, 2023
    Applicant: Naver Corporation
    Inventors: Matthias GALLE, Hady ELSAHAR, Joseph ROZEN, German KRUSZEWSKI
  • Patent number: 11494564
    Abstract: A multi-document summarization system includes: an encoding module configured to receive multiple documents associated with a subject and to, using a first model, generate vector representations for sentences, respectively, of the documents; a grouping module configured to group first and second ones of the sentences associated with first and second aspects into first and second groups, respectively; a group representation module configured to generate a first vector representation based on the first ones of the sentences and a second vector representation based on the second ones of the sentences; a summary module configured to: using a second model: generate a first sentence regarding the first aspect based on the first vector representation; and generate a second sentence regarding the second aspect based on the second vector representation; and store a summary including the first and second sentences in memory in association with the subject.
    Type: Grant
    Filed: March 27, 2020
    Date of Patent: November 8, 2022
    Assignee: NAVER CORPORATION
    Inventors: Hady Elsahar, Maximin Coavoux, Matthias Galle
  • Publication number: 20220147721
    Abstract: Multilingual neural machine translation systems having monolingual adapter layers and bilingual adapter layers for zero-shot translation include an encoder configured for encoding an input sentence in a source language into an encoder representation and a decoder configured for processing output of the encoder adapter layer to generate a decoder representation. The encoder includes an encoder adapter selector for selecting, from a plurality of encoder adapter layers, an encoder adapter layer for the source language to process the encoder representation. The decoder includes a decoder adapter selector for selecting, from a plurality of decoder adapter layers, a decoder adapter layer for a target language for generating a translated sentence of the input sentence in the target language from the decoder representation.
    Type: Application
    Filed: November 8, 2021
    Publication date: May 12, 2022
    Applicant: Naver Corporation
    Inventors: Matthias GALLE, Alexandre BERARD, Laurent BESACIER, Jerin PHILIP
  • Publication number: 20210342377
    Abstract: A method for generating enriched training data for a multi-source transformer neural network for generation of a summary of one or more passages of input text comprises creating, from a plurality of input text sets, training points each comprising an input text subset of the input text set and a corresponding reference input text from the input text set, wherein the size of the input text subset is a predetermined number. Control codes are selected based on reference features corresponding to categorical labels of reference texts in the created training points. The input text is enriched with the selected control codes to generate enriched training data.
    Type: Application
    Filed: March 5, 2021
    Publication date: November 4, 2021
    Inventors: Matthias GALLE, Maximin COAVOUX, Hady ELSAHAR
  • Publication number: 20210342544
    Abstract: A system includes: a natural language processing (NLP) model trained in a training domain and configured to perform natural language processing on an input dataset; an accuracy module configured to: calculate a domain shift metric based on the input dataset; and calculate a predicted decrease in accuracy of the NLP model attributable to domain shift relative to the training domain based on the domain shift metric; and a retraining module configured to selectively trigger a retraining of the NLP model based on the predicted decrease in accuracy of the NLP model.
    Type: Application
    Filed: April 26, 2021
    Publication date: November 4, 2021
    Applicant: NAVER FRANCE
    Inventors: Matthias GALLE, Hady ELSAHAR
  • Publication number: 20210303796
    Abstract: A multi-document summarization system includes: an encoding module configured to receive multiple documents associated with a subject and to, using a first model, generate vector representations for sentences, respectively, of the documents; a grouping module configured to group first and second ones of the sentences associated with first and second aspects into first and second groups, respectively; a group representation module configured to generate a first vector representation based on the first ones of the sentences and a second vector representation based on the second ones of the sentences; a summary module configured to: using a second model: generate a first sentence regarding the first aspect based on the first vector representation; and generate a second sentence regarding the second aspect based on the second vector representation; and store a summary including the first and second sentences in memory in association with the subject.
    Type: Application
    Filed: March 27, 2020
    Publication date: September 30, 2021
    Applicant: NAVER CORPORATION
    Inventors: Hady ELSAHAR, Maximin COAVOUX, Matthias GALLE
  • Patent number: 10546009
    Abstract: A computer-implemented system and method provide for mapping a set of strings onto an ontology which may be represented as a graph. The method includes receiving a set of strings, each string denoting a respective object. For each of the strings, a pairwise similarity is computed between the string and each of a set of objects in the ontology. For each of a set of candidate subsets (subgraphs) of the set of objects, a global score is computed, which is a function of the pairwise similarities between the strings and the objects in the subset and a tightness score. The tightness score is computed on the objects in the subset with a submodular function. An optimal subset is identified from the set of candidate subsets based on the global scores. Strings in the set of strings are mapped to the objects in the optimal subset, based on the pairwise similarities.
    Type: Grant
    Filed: October 22, 2014
    Date of Patent: January 28, 2020
    Assignee: CONDUENT BUSINESS SERVICES, LLC
    Inventors: Matthias Gallé, Nikolaos Lagos
  • Patent number: 10339122
    Abstract: A computer-implemented linking system and method provide for linking actionable phrases in a first document to other documents in a document corpus. The method includes identifying at least one actionable phrase in a first document. The actionable phrase may include an action, its direct object, and any modifier of the direct object. For each identified action phrase the document corpus is searched to identify other documents, which are scored using a scoring function which takes into account occurrences of words of the actionable phrase in each identified document. The actionable phrase is linked to at least a part of one of the most highly ranked documents in the set of documents.
    Type: Grant
    Filed: September 10, 2015
    Date of Patent: July 2, 2019
    Assignee: Conduent Business Services, LLC
    Inventors: Nikolaos Lagos, Matthias Gallé, Alexandr Chernov
  • Patent number: 10311046
    Abstract: A pruning method includes representing a set of sequences in a data structure. Each sequence s includes a first symbol w and a context c of at least one symbol. Some of the sequences are associated with a conditional probability p(w|c), based on observations of cw in training data. For others, p(w|c) is computed as a function of the probability p(w|?) of the respective symbol w in a back-off context ?, p(w|?) being based on observations of sequence ?w in the training data. A scoring function ƒ(cw) value is computed for each sequence in the set, based on p(w|c) for the sequence and a probability distribution p(s) of each symbol in the sequence if it is removed from the set of sequences. Iteratively, one of the represented sequences is selected to be removed, based on the computed scoring function values, and the scoring function values of remaining sequences are updated.
    Type: Grant
    Filed: September 12, 2016
    Date of Patent: June 4, 2019
    Assignee: Conduent Business Services, LLC
    Inventors: Matias Hunicken, Matthias Gallé
  • Publication number: 20180204126
    Abstract: A system and method guide the modification of an input feature vector to an automatic classifier model to cause the classifier to give a desired class without modifying the classifier. A user defines costs for independently modifying feature values for at least some of the features in an initial feature vector that the classifier model has given an undesired class. Subspaces are identified in a feature space in which the classifier model classifies feature vectors in the desired class. With a cost function which takes into account the user-defined costs, a modified feature vector is identified in one of the identified subspaces which optimizes the cost function. The modified feature vector or information based thereon is output.
    Type: Application
    Filed: January 17, 2017
    Publication date: July 19, 2018
    Applicant: Xerox Corporation
    Inventor: Matthias Gallé
  • Patent number: 9977817
    Abstract: A system and method provide for identifying coreference from serialized data coming from different services. The method includes generating a tree structure from serialized data. The serialized data includes responses to queries from the different services. The responses each identify a hierarchical relationship between a respective set of objects. Nodes of the tree structure each have a name corresponding to a respective one of the objects. The tree structure is traversed in a breadth first manner and, for each node in the tree structure, a respective pairwise similarity is computed with each of the other nodes of the tree structure. The computed pairwise similarity is compared with a threshold to identify co-referring nodes that refer to a same entity. The threshold is a function of a depth of the node in the tree structure.
    Type: Grant
    Filed: October 20, 2014
    Date of Patent: May 22, 2018
    Assignee: Conduent Business Services, LLC
    Inventors: Matthias Gallé, Nikolaos Lagos
  • Publication number: 20180011839
    Abstract: A symbol prediction method includes storing a statistic for each of a set of symbols w in at least one context, each context including a string of k preceding symbols and a string of l subsequent symbols, the statistic being based on observations of a string kwl in training data. For an input sequence of symbols, a prediction is computed for at least one symbol in the input sequence, based on the stored statistics. The computing includes, where the symbol is in a context in the sequence not having a stored statistic, computing the prediction for the symbol in that context based on a stored statistic for the symbol in a more general context.
    Type: Application
    Filed: July 7, 2016
    Publication date: January 11, 2018
    Applicant: Xerox Corporation
    Inventors: Matthias Gallé, Matías Hunicken
  • Publication number: 20170351786
    Abstract: A method for modeling a sparse function over sequences is described. The method includes inputting a set of sequences that support a function. A set of prefixes and a set of suffixes for the set of sequences are identified. A sub-block of a full matrix is identified which has the full structural rank as the full matrix. The full matrix includes an entry for each pair of a prefix and a suffix from the sets of prefixes and suffixes. A matrix for the sub-block is computed. A minimal non-deterministic weighted automaton which models the function is computed, based on the sub-block matrix. Information based on the identified minimal non-deterministic weighted automaton is output.
    Type: Application
    Filed: June 2, 2016
    Publication date: December 7, 2017
    Applicant: Xerox Corporation
    Inventors: Ariadna Julieta Quattoni, Xavier Carreras, Matthias Gallé
  • Patent number: 9760546
    Abstract: A system and method of identifying repeat subsequences having at least a value of x for threshold of different left contexts and a value of y for a threshold of different right contexts for an input sequence are disclosed. The method may include generating a lexicographically sorted suffix array for the input sequence and a longest common prefix array. The suffix array is traversed in lexicographic order comparing the longest common prefix values between consecutive suffixes. Suffixes with the same longest common prefix are representative of occurrence of the same repeat, a higher longest common prefix indicates a new occurrence of a longer repeat, and a lower longest common prefix indicates the last occurrence of a repeat.
    Type: Grant
    Filed: May 24, 2013
    Date of Patent: September 12, 2017
    Assignee: XEROX CORPORATION
    Inventor: Matthias Galle
  • Patent number: 9740679
    Abstract: A method for generating an output sequence includes receiving an input sequence of symbols. An output sequence is generated from a reduced directed graph derived from n-gram statistics for a corpus sequence of symbols. The graph includes nodes connected by edges that are labeled with a sequence of symbols and associated with a multiplicity representing a number of occurrences of the sequence of symbols in the corpus sequence. Each path through the graph where each edge is traversed its multiplicity of times reconstructs the corpus sequence. The sequences of symbols in the reduced graph vary in number of symbols. The output sequence from the first iteration, and optionally also an output sequence from at least one subsequent iteration, is/are output. The output sequence may be proposed to an author to assist in generating a document.
    Type: Grant
    Filed: December 8, 2015
    Date of Patent: August 22, 2017
    Assignee: XEROX CORPORATION
    Inventor: Matthias Gallé
  • Publication number: 20170161254
    Abstract: A method for generating an output sequence includes receiving an input sequence of symbols. An output sequence is generated from a reduced directed graph derived from n-gram statistics for a corpus sequence of symbols. The graph includes nodes connected by edges that are labeled with a sequence of symbols and associated with a multiplicity representing a number of occurrences of the sequence of symbols in the corpus sequence. Each path through the graph where each edge is traversed its multiplicity of times reconstructs the corpus sequence. The sequences of symbols in the reduced graph vary in number of symbols. The output sequence from the first iteration, and optionally also an output sequence from at least one subsequent iteration, is/are output. The output sequence may be proposed to an author to assist in generating a document.
    Type: Application
    Filed: December 8, 2015
    Publication date: June 8, 2017
    Applicant: Xerox Corporation
    Inventor: Matthias Gallé
  • Patent number: 9645995
    Abstract: A method for language prediction of a social network post includes generating a social network graph which includes nodes connected by edges. Some of the nodes are user nodes representing users of a social network and some of the nodes are social network post nodes representing social network posts. At least some of the users are authors of social network posts represented by respective social network post nodes. Edges of the graph are associated with respective weights. At least one of the social network post nodes is unlabeled. Language labels are predicted for the at least one unlabeled social network post node which includes propagating language labels through the graph. A language of the social network post is predicted based on the predicted language labels for the social network post node representing that social network post and optionally also based on content-based features.
    Type: Grant
    Filed: March 24, 2015
    Date of Patent: May 9, 2017
    Assignee: CONDUENT BUSINESS SERVICES, LLC
    Inventors: Matthias Gallé, William Radford