Patents by Inventor Hady ELSAHAR

Hady ELSAHAR has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Methods for unsupervised prediction of performance drop due to domain shift

Patent number: 11907663

Abstract: A system includes: a natural language processing (NLP) model trained in a training domain and configured to perform natural language processing on an input dataset; an accuracy module configured to: calculate a domain shift metric based on the input dataset; and calculate a predicted decrease in accuracy of the NLP model attributable to domain shift relative to the training domain based on the domain shift metric; and a retraining module configured to selectively trigger a retraining of the NLP model based on the predicted decrease in accuracy of the NLP model.

Type: Grant

Filed: April 26, 2021

Date of Patent: February 20, 2024

Assignee: NAVER FRANCE

Inventors: Matthias Galle, Hady Elsahar
METHOD AND SYSTEM FOR FINE-TUNING NEURAL CONDITIONAL LANGUAGE MODELS USING CONSTRAINTS

Publication number: 20240054338

Abstract: A processor-implemented method for fine-tuning a pre-trained neural conditional language model to perform a downstream task. A pre-trained conditional language model and at least one target constraint for satisfying a task-related control objective are received. A neural model is trained to approximate a target conditional model that optimally reconciles a distance from the pre-trained conditional language model and the control objective across multiple contexts.

Type: Application

Filed: October 3, 2022

Publication date: February 15, 2024

Inventors: Tomasz KORBAK, Hady ELSAHAR, German KRUSZEWSKI, Marc DYMETMAN
SAMPLING FROM DISCRETE ENERGY-BASED MODELS WITH QUALITY/EFFICIENCY TRADE-OFF

Publication number: 20240037184

Abstract: A sampling system includes: an energy-based model (EBM) configured to generate non-negative scores of an input having discrete classifications, respectively; and a sampling module configured to: generate a sample from a probability distribution of the EBM using a proposal distribution; set a probability of acceptance of the sample based on a minimum of (a) 1 and (b) an acceptance value determined based on the sample, a score of the sample from the EBM, the proposal distribution, and an upper boundary value; determine a distribution value between 0 and 1 using a uniform distribution; and discard the sample when the distribution value is greater than the probability of acceptance of the sample.

Type: Application

Filed: July 29, 2022

Publication date: February 1, 2024

Applicant: NAVER CORPORATION

Inventors: Bryan EIKEMA, German KRUSZEWSKI, Hady ELSAHAR, Stéphane CLINCHANT, Marc DYMETMAN
Abstractive multi-document summarization through self-supervision and control

Patent number: 11797591

Abstract: A method for generating enriched training data for a multi-source transformer neural network for generation of a summary of one or more passages of input text comprises creating, from a plurality of input text sets, training points each comprising an input text subset of the input text set and a corresponding reference input text from the input text set, wherein the size of the input text subset is a predetermined number. Control codes are selected based on reference features corresponding to categorical labels of reference texts in the created training points. The input text is enriched with the selected control codes to generate enriched training data.

Type: Grant

Filed: March 5, 2021

Date of Patent: October 24, 2023

Assignee: NAVER CORPORATION

Inventors: Matthias Galle, Maximin Coavoux, Hady Elsahar
METHOD FOR APPLYING IN-CONTEXT LEARNING FOR SELF-HEALING OF LANGUAGE MODELS

Publication number: 20230196018

Abstract: A method for a language model applies in-context learning to detect problematic text and reformulate the problematic text to correct problematic text by (a) receiving, in the language model, a user generated text example; (b) determining if the user generated text example is a problematic text having a determined classification; (c) reformulating the user generated text example if the text example is a problematic text having the determined classification; (d) outputting the user generated text example if the text example is determined to be not a problematic text having the determined classification; and (e) outputting the reformulated text example if the text example is determined to be a problematic text having the determined classification.

Type: Application

Filed: October 21, 2022

Publication date: June 22, 2023

Applicant: Naver Corporation

Inventors: Jos Rozen, Hady Elsahar
Computer-Implemented Method for Distributional Detection of Machine-Generated Text

Publication number: 20230109734

Abstract: There is disclosed a computer-implemented method for detecting machine-generated documents in a collection of documents including machine-generated and human-authored documents. The computer-implemented method includes computing a set of long-repeated substrings (such as super-maximal repeats) with respect to the collection of documents and using a subset of the long-repeated substrings to designate documents containing the subset of the repeated substrings as machine-generated. The documents designated as machine-generated serve as positive examples of machine-generated documents and a set of documents including at least one human-authored document serves as negative examples of machine-generated documents. A plurality of classifiers are trained with a dataset including both the positive and negative examples of machine-generated documents. Classified output of the classifiers is then used to detect an extent to which a given document of the dataset is machine-generated.

Type: Application

Filed: August 5, 2022

Publication date: April 13, 2023

Applicant: Naver Corporation

Inventors: Matthias GALLE, Hady ELSAHAR, Joseph ROZEN, German KRUSZEWSKI
Unsupervised aspect-based multi-document abstractive summarization

Patent number: 11494564

Abstract: A multi-document summarization system includes: an encoding module configured to receive multiple documents associated with a subject and to, using a first model, generate vector representations for sentences, respectively, of the documents; a grouping module configured to group first and second ones of the sentences associated with first and second aspects into first and second groups, respectively; a group representation module configured to generate a first vector representation based on the first ones of the sentences and a second vector representation based on the second ones of the sentences; a summary module configured to: using a second model: generate a first sentence regarding the first aspect based on the first vector representation; and generate a second sentence regarding the second aspect based on the second vector representation; and store a summary including the first and second sentences in memory in association with the subject.

Type: Grant

Filed: March 27, 2020

Date of Patent: November 8, 2022

Assignee: NAVER CORPORATION

Inventors: Hady Elsahar, Maximin Coavoux, Matthias Galle
METHOD AND SYSTEM FOR CONTROLLING DISTRIBUTIONS OF ATTRIBUTES IN LANGUAGE MODELS FOR TEXT GENERATION

Publication number: 20220108081

Abstract: A method for generating a language model for text generation by receiving a pre-trained language model having attributes with existing probability distributions over the pre-trained language model; receiving at least one target constraint; the target constraint specifying an expectation of a target attribute over a language model that approximates the pre-trained language model; computing parameters of an energy based model by applying the target constraint to the pre-trained language model; obtaining samples from a reference policy; updating parameters of a target policy using the obtained samples and the energy based model; updating the reference policy with the target policy if the target policy is superior to the reference policy; and outputting the target policy as a target language model. The target language model is adapted to generate text with the target attribute over a probability distribution that approximates the desired probability distribution specified by the target constraint.

Type: Application

Filed: August 2, 2021

Publication date: April 7, 2022

Inventors: Marc Dymetman, Hady Elsahar, Muhammad Khalifa
ABSTRACTIVE MULTI-DOCUMENT SUMMARIZATION THROUGH SELF-SUPERVISION AND CONTROL

Publication number: 20210342377

Abstract: A method for generating enriched training data for a multi-source transformer neural network for generation of a summary of one or more passages of input text comprises creating, from a plurality of input text sets, training points each comprising an input text subset of the input text set and a corresponding reference input text from the input text set, wherein the size of the input text subset is a predetermined number. Control codes are selected based on reference features corresponding to categorical labels of reference texts in the created training points. The input text is enriched with the selected control codes to generate enriched training data.

Type: Application

Filed: March 5, 2021

Publication date: November 4, 2021

Inventors: Matthias GALLE, Maximin COAVOUX, Hady ELSAHAR
METHODS FOR UNSUPERVISED PREDICTION OF PERFORMANCE DROP DUE TO DOMAIN SHIFT

Publication number: 20210342544

Abstract: A system includes: a natural language processing (NLP) model trained in a training domain and configured to perform natural language processing on an input dataset; an accuracy module configured to: calculate a domain shift metric based on the input dataset; and calculate a predicted decrease in accuracy of the NLP model attributable to domain shift relative to the training domain based on the domain shift metric; and a retraining module configured to selectively trigger a retraining of the NLP model based on the predicted decrease in accuracy of the NLP model.

Type: Application

Filed: April 26, 2021

Publication date: November 4, 2021

Applicant: NAVER FRANCE

Inventors: Matthias GALLE, Hady ELSAHAR
UNSUPERVISED ASPECT-BASED MULTI-DOCUMENT ABSTRACTIVE SUMMARIZATION

Publication number: 20210303796

Abstract: A multi-document summarization system includes: an encoding module configured to receive multiple documents associated with a subject and to, using a first model, generate vector representations for sentences, respectively, of the documents; a grouping module configured to group first and second ones of the sentences associated with first and second aspects into first and second groups, respectively; a group representation module configured to generate a first vector representation based on the first ones of the sentences and a second vector representation based on the second ones of the sentences; a summary module configured to: using a second model: generate a first sentence regarding the first aspect based on the first vector representation; and generate a second sentence regarding the second aspect based on the second vector representation; and store a summary including the first and second sentences in memory in association with the subject.

Type: Application

Filed: March 27, 2020

Publication date: September 30, 2021

Applicant: NAVER CORPORATION

Inventors: Hady ELSAHAR, Maximin COAVOUX, Matthias GALLE