Patents by Inventor Shachar Mirkin

Shachar Mirkin has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

System and method for predicting an optimal machine translation system for a user based on an updated user profile

Patent number: 10025779

Abstract: A system and method predict an optimal machine translation system for a first of a set of users. The method includes, for each of the users, providing a respective user profile which includes rankings for at least some machine translation systems from a set of machine translation systems. The user profile of the first user is updated, based on the user profiles of at least a subset of the other users. The updating includes generating at least one missing ranking. An optimal translation system for the first user from the set of machine translation systems is predicted, based on the updated user profile computed for the first user.

Type: Grant

Filed: August 13, 2015

Date of Patent: July 17, 2018

Assignee: XEROX Corporation

Inventors: Shachar Mirkin, Jean-Luc Meunier
PREDICTING TRANSLATIONAL PREFERENCES

Publication number: 20170046333

Abstract: A system and method predict an optimal machine translation system for a first of a set of users. The method includes, for each of the users, providing a respective user profile which includes rankings for at least some machine translation systems from a set of machine translation systems. The user profile of the first user is updated, based on the user profiles of at least a subset of the other users. The updating includes generating at least one missing ranking. An optimal translation system for the first user from the set of machine translation systems is predicted, based on the updated user profile computed for the first user.

Type: Application

Filed: August 13, 2015

Publication date: February 16, 2017

Applicant: Xerox Corporation

Inventors: Shachar Mirkin, Jean-Luc Meunier
Learning generation templates from dialog transcripts

Patent number: 9473637

Abstract: Agent utterances are generated for implementing dialog acts recommended by a dialog manager of a call center. To this end, a set of word lattices, each represented as a weighted finite state automaton (WFSA), is constructed from training dialogs between call center agents and second parties (e.g. customers). The word lattices are assigned conditional probabilities over dialog act type. For each dialog act received from the dialog manager, the word lattices are ranked by the conditional probabilities for the dialog act type. At least one word lattice is chosen from the ranking, and is instantiated to generate a recommended agent utterance for implementing the recommended dialog act. The word lattices may be constructed by clustering agent utterances of training dialogs using context features from preceding second party utterances and grammatical dependency link features between words within agent utterances. Path variations of the word lattices may define slots or paraphrases.

Type: Grant

Filed: July 28, 2015

Date of Patent: October 18, 2016

Assignee: XEROX CORPORATION

Inventors: Sriram Venkatapathy, Shachar Mirkin, Marc Dymetman
METHOD AND SYSTEM FOR SUMMARIZING A DOCUMENT

Publication number: 20160299881

Abstract: The disclosed embodiments illustrate methods and systems for summarizing an electronic document. The method includes extracting, by a natural language processor, one or more sentences from said electronic document. The method further includes creating a graph, comprising one or more nodes and one or more edges connecting said one or more nodes, each node being representative of a sentence. An edge is placed between a pair of sentences based on a threshold value and a first score. The first score corresponds to a measure of an entailment between said pair of sentences. Thereafter, the method includes identifying a set of nodes from said one or more nodes by applying a minimum vertex cover algorithm on said graph. The sentences associated with said identified set of nodes are utilizable to create a summary of said electronic document. The method is performed by one or more microprocessors.

Type: Application

Filed: April 7, 2015

Publication date: October 13, 2016

Inventors: Anand Gupta, Manpreet Kaur, Shachar Mirkin
System and method for incrementally updating a reordering model for a statistical machine translation system

Patent number: 9442922

Abstract: A method for updating a reordering model of a statistical machine translation system includes, at a first time, receiving new training data for retraining an existing statistical machine translation system, the new training data including at least one sentence pair, each pair including a source sentence in a source language and a target sentence in a target language. Phrase pairs are extracted from the new training data and used to generate a new reordering file. A reordering model of the existing statistical machine translation system is updated, based on the new reordering file. The reordering model includes a reordering table. At a second time after the first time, new training data is received. The extracting of phrase pairs, generating of the new reordering file and the updating the reordering model is reiterated, based on the new training data received at the second time.

Type: Grant

Filed: November 18, 2014

Date of Patent: September 13, 2016

Assignee: XEROX CORPORATION

Inventor: Shachar Mirkin
SYSTEM AND METHOD FOR INCREMENTALLY UPDATING A REORDERING MODEL FOR A STATISTICAL MACHINE TRANSLATION SYSTEM

Publication number: 20160140111

Abstract: A method for updating a reordering model of a statistical machine translation system includes, at a first time, receiving new training data for retraining an existing statistical machine translation system, the new training data including at least one sentence pair, each pair including a source sentence in a source language and a target sentence in a target language. Phrase pairs are extracted from the new training data and used to generate a new reordering file. A reordering model of the existing statistical machine translation system is updated, based on the new reordering file. The reordering model includes a reordering table. At a second time after the first time, new training data is received. The extracting of phrase pairs, generating of the new reordering file and the updating the reordering model is reiterated, based on the new training data received at the second time.

Type: Application

Filed: November 18, 2014

Publication date: May 19, 2016

Inventor: Shachar Mirkin
SEMANTIC REFINING OF CROSS-LINGUAL INFORMATION RETRIEVAL RESULTS

Publication number: 20150199339

Abstract: A method for cross language information retrieval includes receiving an input query which includes at least one word in a source language and translating the input query from the source language to a target language to provide a set of translated queries. A set of documents is retrieved from a document collection based on the translated queries. The retrieved documents are translated back into the source language to generate a set of translated documents. An entailment relationship between each of the translated documents and the input query is assessed. The set of translated documents is refined, based on the assessment of the entailment relationship. A subset (or all) of the refined set of translated documents, and/or the target documents to which the translated documents in the subset correspond, is output.

Type: Application

Filed: May 13, 2014

Publication date: July 16, 2015

Applicant: XEROX CORPORATION

Inventors: Shachar MIRKIN, Nikolaos LAGOS, loan CALAPODESCU
Machine translation-driven authoring system and method

Patent number: 9047274

Abstract: An authoring method includes generating an authoring interface configured for assisting a user to author a text string in a source language for translation to a target string in a target language. Initial source text entered by the user is received through the authoring interface. Source phrases are selected that each include at least one token of the initial source text as a prefix and at least one other token as a suffix. The source phrase selection is based on a translatability score and optionally on fluency and semantic relatedness scores. A set of candidate phrases is proposed for display on the authoring interface, each of the candidate phases being the suffix of a respective one of the selected source phrases. The user may select one of the candidate phrases, which is appended to the source text following its corresponding prefix, or may enter alternative text. The process may be repeated until the user is satisfied with the source text and the SMT model can then be used for its translation.

Type: Grant

Filed: January 21, 2013

Date of Patent: June 2, 2015

Assignee: XEROX CORPORATION

Inventors: Sriram Venkatapathy, Shachar Mirkin
REFINING INFERENCE RULES WITH TEMPORAL EVENT CLUSTERING

Publication number: 20150127323

Abstract: A method for computing similarity between paths includes extracting corpus statistics for triples from a corpus of text documents, each triple comprising a predicate and respective first and second arguments of the predicate. Documents in the corpus are clustered to form a set of clusters based on textual similarity and temporal similarity. An event-based path similarity is computed between first and second paths, the first path comprising a first predicate and first and second argument slots, the second path comprising a second predicate and first and second argument slots, the event-based path similarity being computed as a function of a corpus statistics-based similarity score which is a function of the corpus statistics for the extracted triples which are instances of the first and second paths, and a cluster-based similarity score which is a function of occurrences of the first and second predicates in the clusters.

Type: Application

Filed: November 4, 2013

Publication date: May 7, 2015

Applicant: Xerox Corporation

Inventors: Guillaume Jacquet, Shachar Mirkin
Confidence-driven rewriting of source texts for improved translation

Publication number: 20140358519

Abstract: A method for rewriting source text includes receiving source text including a source text string in a first natural language. The source text string is translated with a machine translation system to generate a first target text string in a second natural language. A translation confidence for the source text string is computed, based on the first target text string. At least one alternative text string is generated, where possible, in the first natural language by automatically rewriting the source string. Each alternative string is translated to generate a second target text string in the second natural language. A translation confidence is computed for the alternative text string based on the second target string. Based on the computed translation confidences, one of the alternative text strings may be selected as a candidate replacement for the source text string and may be proposed to a user on a graphical user interface.

Type: Application

Filed: June 3, 2013

Publication date: December 4, 2014

Inventors: Shachar Mirkin, Sriram Venkatapathy, Marc Dymetman
MACHINE TRANSLATION-DRIVEN AUTHORING SYSTEM AND METHOD

Publication number: 20140207439

Abstract: An authoring method includes generating an authoring interface configured for assisting a user to author a text string in a source language for translation to a target string in a target language. Initial source text entered by the user is received through the authoring interface. Source phrases are selected that each include at least one token of the initial source text as a prefix and at least one other token as a suffix. The source phrase selection is based on a translatability score and optionally on fluency and semantic relatedness scores. A set of candidate phrases is proposed for display on the authoring interface, each of the candidate phases being the suffix of a respective one of the selected source phrases. The user may select one of the candidate phrases, which is appended to the source text following its corresponding prefix, or may enter alternative text. The process may be repeated until the user is satisfied with the source text and the SMT model can then be used for its translation.

Type: Application

Filed: January 21, 2013

Publication date: July 24, 2014

Applicant: XEROX CORPORATION

Inventors: Sriram Venkatapathy, Shachar Mirkin