Patents by Inventor Hady ELSAHAR

Hady ELSAHAR has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11907663
    Abstract: A system includes: a natural language processing (NLP) model trained in a training domain and configured to perform natural language processing on an input dataset; an accuracy module configured to: calculate a domain shift metric based on the input dataset; and calculate a predicted decrease in accuracy of the NLP model attributable to domain shift relative to the training domain based on the domain shift metric; and a retraining module configured to selectively trigger a retraining of the NLP model based on the predicted decrease in accuracy of the NLP model.
    Type: Grant
    Filed: April 26, 2021
    Date of Patent: February 20, 2024
    Assignee: NAVER FRANCE
    Inventors: Matthias Galle, Hady Elsahar
  • Publication number: 20240054338
    Abstract: A processor-implemented method for fine-tuning a pre-trained neural conditional language model to perform a downstream task. A pre-trained conditional language model and at least one target constraint for satisfying a task-related control objective are received. A neural model is trained to approximate a target conditional model that optimally reconciles a distance from the pre-trained conditional language model and the control objective across multiple contexts.
    Type: Application
    Filed: October 3, 2022
    Publication date: February 15, 2024
    Inventors: Tomasz KORBAK, Hady ELSAHAR, German KRUSZEWSKI, Marc DYMETMAN
  • Publication number: 20240037184
    Abstract: A sampling system includes: an energy-based model (EBM) configured to generate non-negative scores of an input having discrete classifications, respectively; and a sampling module configured to: generate a sample from a probability distribution of the EBM using a proposal distribution; set a probability of acceptance of the sample based on a minimum of (a) 1 and (b) an acceptance value determined based on the sample, a score of the sample from the EBM, the proposal distribution, and an upper boundary value; determine a distribution value between 0 and 1 using a uniform distribution; and discard the sample when the distribution value is greater than the probability of acceptance of the sample.
    Type: Application
    Filed: July 29, 2022
    Publication date: February 1, 2024
    Applicant: NAVER CORPORATION
    Inventors: Bryan EIKEMA, German KRUSZEWSKI, Hady ELSAHAR, Stéphane CLINCHANT, Marc DYMETMAN
  • Patent number: 11797591
    Abstract: A method for generating enriched training data for a multi-source transformer neural network for generation of a summary of one or more passages of input text comprises creating, from a plurality of input text sets, training points each comprising an input text subset of the input text set and a corresponding reference input text from the input text set, wherein the size of the input text subset is a predetermined number. Control codes are selected based on reference features corresponding to categorical labels of reference texts in the created training points. The input text is enriched with the selected control codes to generate enriched training data.
    Type: Grant
    Filed: March 5, 2021
    Date of Patent: October 24, 2023
    Assignee: NAVER CORPORATION
    Inventors: Matthias Galle, Maximin Coavoux, Hady Elsahar
  • Publication number: 20230196018
    Abstract: A method for a language model applies in-context learning to detect problematic text and reformulate the problematic text to correct problematic text by (a) receiving, in the language model, a user generated text example; (b) determining if the user generated text example is a problematic text having a determined classification; (c) reformulating the user generated text example if the text example is a problematic text having the determined classification; (d) outputting the user generated text example if the text example is determined to be not a problematic text having the determined classification; and (e) outputting the reformulated text example if the text example is determined to be a problematic text having the determined classification.
    Type: Application
    Filed: October 21, 2022
    Publication date: June 22, 2023
    Applicant: Naver Corporation
    Inventors: Jos Rozen, Hady Elsahar
  • Publication number: 20230109734
    Abstract: There is disclosed a computer-implemented method for detecting machine-generated documents in a collection of documents including machine-generated and human-authored documents. The computer-implemented method includes computing a set of long-repeated substrings (such as super-maximal repeats) with respect to the collection of documents and using a subset of the long-repeated substrings to designate documents containing the subset of the repeated substrings as machine-generated. The documents designated as machine-generated serve as positive examples of machine-generated documents and a set of documents including at least one human-authored document serves as negative examples of machine-generated documents. A plurality of classifiers are trained with a dataset including both the positive and negative examples of machine-generated documents. Classified output of the classifiers is then used to detect an extent to which a given document of the dataset is machine-generated.
    Type: Application
    Filed: August 5, 2022
    Publication date: April 13, 2023
    Applicant: Naver Corporation
    Inventors: Matthias GALLE, Hady ELSAHAR, Joseph ROZEN, German KRUSZEWSKI
  • Patent number: 11494564
    Abstract: A multi-document summarization system includes: an encoding module configured to receive multiple documents associated with a subject and to, using a first model, generate vector representations for sentences, respectively, of the documents; a grouping module configured to group first and second ones of the sentences associated with first and second aspects into first and second groups, respectively; a group representation module configured to generate a first vector representation based on the first ones of the sentences and a second vector representation based on the second ones of the sentences; a summary module configured to: using a second model: generate a first sentence regarding the first aspect based on the first vector representation; and generate a second sentence regarding the second aspect based on the second vector representation; and store a summary including the first and second sentences in memory in association with the subject.
    Type: Grant
    Filed: March 27, 2020
    Date of Patent: November 8, 2022
    Assignee: NAVER CORPORATION
    Inventors: Hady Elsahar, Maximin Coavoux, Matthias Galle
  • Publication number: 20220108081
    Abstract: A method for generating a language model for text generation by receiving a pre-trained language model having attributes with existing probability distributions over the pre-trained language model; receiving at least one target constraint; the target constraint specifying an expectation of a target attribute over a language model that approximates the pre-trained language model; computing parameters of an energy based model by applying the target constraint to the pre-trained language model; obtaining samples from a reference policy; updating parameters of a target policy using the obtained samples and the energy based model; updating the reference policy with the target policy if the target policy is superior to the reference policy; and outputting the target policy as a target language model. The target language model is adapted to generate text with the target attribute over a probability distribution that approximates the desired probability distribution specified by the target constraint.
    Type: Application
    Filed: August 2, 2021
    Publication date: April 7, 2022
    Inventors: Marc Dymetman, Hady Elsahar, Muhammad Khalifa
  • Publication number: 20210342377
    Abstract: A method for generating enriched training data for a multi-source transformer neural network for generation of a summary of one or more passages of input text comprises creating, from a plurality of input text sets, training points each comprising an input text subset of the input text set and a corresponding reference input text from the input text set, wherein the size of the input text subset is a predetermined number. Control codes are selected based on reference features corresponding to categorical labels of reference texts in the created training points. The input text is enriched with the selected control codes to generate enriched training data.
    Type: Application
    Filed: March 5, 2021
    Publication date: November 4, 2021
    Inventors: Matthias GALLE, Maximin COAVOUX, Hady ELSAHAR
  • Publication number: 20210342544
    Abstract: A system includes: a natural language processing (NLP) model trained in a training domain and configured to perform natural language processing on an input dataset; an accuracy module configured to: calculate a domain shift metric based on the input dataset; and calculate a predicted decrease in accuracy of the NLP model attributable to domain shift relative to the training domain based on the domain shift metric; and a retraining module configured to selectively trigger a retraining of the NLP model based on the predicted decrease in accuracy of the NLP model.
    Type: Application
    Filed: April 26, 2021
    Publication date: November 4, 2021
    Applicant: NAVER FRANCE
    Inventors: Matthias GALLE, Hady ELSAHAR
  • Publication number: 20210303796
    Abstract: A multi-document summarization system includes: an encoding module configured to receive multiple documents associated with a subject and to, using a first model, generate vector representations for sentences, respectively, of the documents; a grouping module configured to group first and second ones of the sentences associated with first and second aspects into first and second groups, respectively; a group representation module configured to generate a first vector representation based on the first ones of the sentences and a second vector representation based on the second ones of the sentences; a summary module configured to: using a second model: generate a first sentence regarding the first aspect based on the first vector representation; and generate a second sentence regarding the second aspect based on the second vector representation; and store a summary including the first and second sentences in memory in association with the subject.
    Type: Application
    Filed: March 27, 2020
    Publication date: September 30, 2021
    Applicant: NAVER CORPORATION
    Inventors: Hady ELSAHAR, Maximin COAVOUX, Matthias GALLE