Patents by Inventor Marc Dymetman

Marc Dymetman has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

METHOD AND SYSTEM FOR ASSISTING CONTACT CENTER AGENTS IN COMPOSING ELECTRONIC MAIL REPLIES

Publication number: 20160330144

Abstract: A system and method are disclosed which enable more effective email response authoring by contact center agents, for example, by automatically suggesting prototypical (entire) email responses to the human agent and interactive suggestion of next sentence candidates during the writing process. In one method, a customer inquiry is received and a latent topic prediction is generated, based on a word-based representation of the customer inquiry. A latent topic prediction is generated for an entire agent's reply to the customer inquiry as a function of the latent topic prediction generated for the customer inquiry. A further latent topic prediction is generated for a next sentence of the agent's reply as a function of a topic prediction for the next sentence which is generated with a prediction model that has been trained on annotated sentences of agent replies. Information is output to assist the agent, based on the topic predictions.

Type: Application

Filed: May 4, 2015

Publication date: November 10, 2016

Inventors: Marc Dymetman, Jean-Michel Renders, Sriram Venkatapathy, Spandana Gella
Learning generation templates from dialog transcripts

Patent number: 9473637

Abstract: Agent utterances are generated for implementing dialog acts recommended by a dialog manager of a call center. To this end, a set of word lattices, each represented as a weighted finite state automaton (WFSA), is constructed from training dialogs between call center agents and second parties (e.g. customers). The word lattices are assigned conditional probabilities over dialog act type. For each dialog act received from the dialog manager, the word lattices are ranked by the conditional probabilities for the dialog act type. At least one word lattice is chosen from the ranking, and is instantiated to generate a recommended agent utterance for implementing the recommended dialog act. The word lattices may be constructed by clustering agent utterances of training dialogs using context features from preceding second party utterances and grammatical dependency link features between words within agent utterances. Path variations of the word lattices may define slots or paraphrases.

Type: Grant

Filed: July 28, 2015

Date of Patent: October 18, 2016

Assignee: XEROX CORPORATION

Inventors: Sriram Venkatapathy, Shachar Mirkin, Marc Dymetman
Procedure for building a max-ARPA table in order to compute optimistic back-offs in a language model

Patent number: 9400783

Abstract: Each entry of an ARPA table for a modeled language includes an n-gram Az, an associated backoff value Az.p equal to the conditional probability p(z|A) that symbol z follows context A in the modeled language, and an associated backoff weight value Az.b for the context A. A method comprises: (1) computing and adding for each entry of the ARPA table in descending n-gram order an associated maximum backoff weight product value Az.m; (2) after performing operation (1), computing and adding for each entry of the ARPA table in descending n-gram order an associated max-backoff value Az.w=maxh p(z|hA) which is the maximum backoff value for any head h preceding the context A of the n-gram Az; and (3) extending the ARPA table by adding a column storing the associated maximum backoff weight product values Az.m and a column storing the associated max-backoff values Az.w.

Type: Grant

Filed: November 26, 2013

Date of Patent: July 26, 2016

Assignee: XEROX CORPORATION

Inventor: Marc Dymetman
Terminological adaptation of statistical machine translation system through automatic generation of phrasal contexts for bilingual terms

Patent number: 9367541

Abstract: A method for terminological adaptation includes receiving a vocabulary pair including source and target language terms. Each term is in a class which includes a set of sequences. Contextual phrase pairs are extracted from a bilingual training corpus, each including source and target phrases. The phrases each include a sequence of the same class as the respective source and target terms as well as some associated context. Templates are generated, based on the contextual phrase pairs. In each template the source and target sequences of a contextual phrase pair are replaced with respective placeholders, each denoting the respective class of the sequence. Candidate phrase pairs are generated from these templates. In each candidate phrase pair, the placeholders of one of the templates are replaced with respective terms of a vocabulary pair of the same class. Some candidate phrase pairs are incorporated into a phrase table of a machine translation system.

Type: Grant

Filed: January 20, 2015

Date of Patent: June 14, 2016

Assignee: XEROX CORPORATION

Inventors: Christophe Servan, Marc Dymetman
Methods and systems for predicting learning curve for statistical machine translation system

Patent number: 9164961

Abstract: The disclosed embodiments relate to a system and method for predicting the learning curve of an SMT system. A set of anchor points are selected. The set of anchor points correspond to a size of a corpus. Thereafter, a gold curve or a benchmark curve is fitted based on the set of anchor points to determine the BLEU score. Based on the BLEU score and a set of parameters associated with the first set of anchor points, a confidence score is computed.

Type: Grant

Filed: November 30, 2012

Date of Patent: October 20, 2015

Assignee: Xerox Corporation

Inventors: Prasanth Kolachina, Nicola Cancedda, Marc Dymetman, Sriram Venkatapathy
PROCEDURE FOR BUILDING A MAX-ARPA TABLE IN ORDER TO COMPUTE OPTIMISTIC BACK-OFFS IN A LANGUAGE MODEL

Publication number: 20150149151

Abstract: Each entry of an ARPA table for a modeled language includes an n-gram Az, an associated backoff value Az.p equal to the conditional probability p(z|A) that symbol z follows context A in the modeled language, and an associated backoff weight value Az.b for the context A. A method comprises: (1) computing and adding for each entry of the ARPA table in descending n-gram order an associated maximum backoff weight product value Az.m; (2) after performing operation (1), computing and adding for each entry of the ARPA table in descending n-gram order an associated max-backoff value Az.w=maxh p(z|hA) which is the maximum backoff value for any head h preceding the context A of the n-gram Az; and (3) extending the ARPA table by adding a column storing the associated maximum backoff weight product values Az.m and a column storing the associated max-backoff values Az.w.

Type: Application

Filed: November 26, 2013

Publication date: May 28, 2015

Applicant: Xerox Corporation

Inventor: Marc Dymetman
Probabilistic sampling using search trees constrained by heuristic bounds

Patent number: 8983887

Abstract: Markov Chain Monte Carlo (MCMC) sampling of elements of a domain to be sampled is performed to generate a set of samples. The MCMC sampling is performed over a search tree of decision sequences representing the domain to be sampled and having terminal nodes corresponding to elements of the domain. In some embodiments the MCMC sampling is performed by Metropolis-Hastings (MH) sampling. The MCMC sampling is constrained using a bound on nodes of the search tree. The constraint may entail detecting a node whose bound value ensures that an acceptable element cannot be identified by continuing traversal of the tree past that node, and terminating the traversal in response. The constraint may entail selecting a node to serve as a starting node for a sampling attempt in accordance with a statistical promise distribution indicating likelihood that following a decision sequence rooted at the node will identify an acceptable element.

Type: Grant

Filed: April 11, 2011

Date of Patent: March 17, 2015

Assignee: Xerox Corporation

Inventor: Marc Dymetman
Sampling and optimization in phrase-based machine translation using an enriched language model representation

Patent number: 8972244

Abstract: Rejection sampling is performed to acquire at least one target language translation for a source language string s in accordance with a phrase-based statistical translation model p(x)=p(t, a|s) where t is a candidate translation, a is a candidate alignment comprising a biphrase sequence generating the candidate translation t, and x is a sequence representing the candidate alignment a. The rejection sampling uses a proposal distribution comprising a weighted finite state automaton (WFSA) q(n) that is refined responsive to rejection of a sample x* obtained in a current iteration of the rejection sampling to generate a refined WFSA q(n+1) for use in a next iteration of the rejection sampling. The refined WFSA q(n+1) is selected to satisfy the criteria p(x)?q(n+1)(x)?q(n)(x) for all x?X and q(n+1)(x*)<q(n)(x*) where the space X is the set of sequences x corresponding to candidate alignments a that generate candidate translations t for the source language string s.

Type: Grant

Filed: January 25, 2013

Date of Patent: March 3, 2015

Assignee: Xerox Corporation

Inventors: Marc Dymetman, Wilker Ferreira Aziz, Sriram Venkatapathy
TERMINOLOGY VERIFICATION SYSTEMS AND METHODS FOR MACHINE TRANSLATION SERVICES FOR DOMAIN-SPECIFIC TEXTS

Publication number: 20150039286

Abstract: A system and method for automated validation of a translation provided by a generic machine translation engine comprises a generic machine translation (GMT) engine, a domain-specific multilingual terminology dictionary (DS) and a translation processor. The translation processor effects the first call to the GMT engine to obtain a first translation of an input text to a first translation text; a verification of a presence of a domain-specific terminology item in the first translation text; correctness assessment of the first translation text using the DS wherein upon an assessment of incorrect translation, a replacement of the domain-specific terminology item with a domain-specific translation text item from the DS.

Type: Application

Filed: July 31, 2013

Publication date: February 5, 2015

Applicant: Xerox Corporation

Inventors: Vassilina Nikoulina, Marc Dymetman
Confidence-driven rewriting of source texts for improved translation

Publication number: 20140358519

Abstract: A method for rewriting source text includes receiving source text including a source text string in a first natural language. The source text string is translated with a machine translation system to generate a first target text string in a second natural language. A translation confidence for the source text string is computed, based on the first target text string. At least one alternative text string is generated, where possible, in the first natural language by automatically rewriting the source string. Each alternative string is translated to generate a second target text string in the second natural language. A translation confidence is computed for the alternative text string based on the second target string. Based on the computed translation confidences, one of the alternative text strings may be selected as a candidate replacement for the source text string and may be proposed to a user on a graphical user interface.

Type: Application

Filed: June 3, 2013

Publication date: December 4, 2014

Inventors: Shachar Mirkin, Sriram Venkatapathy, Marc Dymetman
Rejection sampling of a complex distribution including bound and proposal distribution refinement

Patent number: 8838415

Abstract: Iterative rejection sampling is performed on a domain in accordance with a target distribution. The domain is partitioned to define a partition comprising partition elements, and each iteration of the rejection sampling includes selecting a partition element from the partition in accordance with partition element selection probabilities. A sample of the domain is acquired in the selected partition element according to a normalized proposal distribution that is associated with and normalized over the selected partition element. The acquired sample is accepted or rejected based on the target distribution and a bound associated with the selected partition element. During the iterative rejection sampling, the partition is adapted by replacing a partition element of the partition with two or more split partition elements, associating bounds with the split partition elements, and computing partition element selection probabilities for the split partition elements.

Type: Grant

Filed: October 14, 2011

Date of Patent: September 16, 2014

Assignee: Xerox Corporation

Inventors: Marc Dymetman, Guillaume M. Bouchard
SAMPLING AND OPTIMIZATION IN PHRASE-BASED MACHINE TRANSLATION USING AN ENRICHED LANGUAGE MODEL REPRESENTATION

Publication number: 20140214397

Abstract: Rejection sampling is performed to acquire at least one target language translation for a source language string s in accordance with a phrase-based statistical translation model p(x)=p(t, a|s) where t is a candidate translation, a is a candidate alignment comprising a biphrase sequence generating the candidate translation t, and x is a sequence representing the candidate alignment a. The rejection sampling uses a proposal distribution comprising a weighted finite state automaton (WFSA) q(n) that is refined responsive to rejection of a sample x* obtained in a current iteration of the rejection sampling to generate a refined WFSA q(n+1) for use in a next iteration of the rejection sampling. The refined WFSA q(n+1) is selected to satisfy the criteria p(x)?q(n+1)(x)?q(n)(x) for all x?X and q(n+1)(x*)<q(n)(x*) where the space X is the set of sequences x corresponding to candidate alignments a that generate candidate translations t for the source language string s.

Type: Application

Filed: January 25, 2013

Publication date: July 31, 2014

Applicant: Xerox Corporation

Inventors: Marc Dymetman, Wilker Ferreira Aziz, Sriram Venkatapathy
Machine translation using overlapping biphrase alignments and sampling

Patent number: 8775155

Abstract: A system and method for machine translation are disclosed. Source sentences are received. For each source sentence, a target sentence comprising target words is generated. A plurality of translation neighbors of the target sentence is generated. Phrase alignments are computed between the source sentence and the translation neighbor. Translation neighbors are scored with a translation scoring model, based on the phrase alignment. Translation neighbors are ranked, based on the scores. In training the model, parameters of the model are updated based on an external ranking of the ranked translation neighbors. The generating of translation neighbors, scoring, ranking, and, in the case of training, updating the parameters, are iterated with one of the translation neighbors as the target sentence. In the case of decoding, one of the translation neighbors is output as a translation. The system and method may be at least partially implemented with a computer processor.

Type: Grant

Filed: October 25, 2010

Date of Patent: July 8, 2014

Assignee: Xerox Corporation

Inventors: Benjamin Roth, Andrew R. McCallum, Marc Dymetman, Nicola Cancedda
METHODS AND SYSTEMS FOR PREDICTING LEARNING CURVE FOR STATISTICAL MACHINE TRANSLATION SYSTEM

Publication number: 20140156565

Abstract: The disclosed embodiments relate to a system and method for predicting the learning curve of an SMT system. A set of anchor points are selected. The set of anchor points correspond to a size of a corpus. Thereafter, a gold curve or a benchmark curve is fitted based on the set of anchor points to determine the BLEU score. Based on the BLEU score and a set of parameters associated with the first set of anchor points, a confidence score is computed.

Type: Application

Filed: November 30, 2012

Publication date: June 5, 2014

Applicant: XEROX CORPORATION

Inventors: Prasanth Kolachina, Nicola Cancedda, Marc Dymetman, Sriram Venkatapathy
SOLUTION FOR MAX-STRING PROBLEM AND TRANSLATION AND TRANSCRIPTION SYSTEMS USING SAME

Publication number: 20140046651

Abstract: An unweighted automaton B is generated from a weighted finite state automaton (WFSA) A, having the same states as the WFSA and having unweighted transitions corresponding only to the transitions of the WFSA having strictly positive weights. A powerset construction on the unweighted automaton generates a deterministic automaton B? having states Q. For each state Q?, a set of points LQ? is defined representing all vectors w?=w·aQQ? where aQQ? is a transition label of a dominator of a predecessor state Q connecting Q with state Q? and w is a prefix of the transition label aQQ? in Q, and a set of dominators SQ? in LQ? are determined such that LQ? is included in hull(SQ?). The dominant vector is identified in final state Qf such that LQf is included in hull(wf). Backpointers from the dominant vector wf to the initial state Q0 are followed to generate the max-string result.

Type: Application

Filed: August 13, 2012

Publication date: February 13, 2014

Applicant: Xerox Corporation

Inventor: Marc Dymetman
JOINT ALGORITHM FOR SAMPLING AND OPTIMIZATION AND NATURAL LANGUAGE PROCESSING APPLICATIONS OF SAME

Publication number: 20130338999

Abstract: In rejection sampling of a function or distribution p over a space X, a proposal distribution q(n) is refined responsive to rejection of a sample x*?X to generate a refined proposal distribution q(n+1) selected to satisfy the criteria p(x)?q(n+1)(x)?q(n)(x) and q(n+1)(x*)<q(n)(x*). In a sampling mode, the sample x* is obtained by random sampling of the space X, the rejection sampling accepts or rejects x* based on comparison of a ratio p(x*)/q(x*) with a random draw, and the refined proposal distribution q(n+1) is selected to minimize a norm ?q(n+1)?? where ?<?. In an optimization mode, the sample x* is obtained such that q*=q(n)(x*) maximizes q(n) over the space X, the rejection sampling accepts or rejects x* based on a difference between or ratio of q* and p(x*), and the refined proposal distribution q(n+1) is selected to minimize a norm ?q(n+1)??=max{q(n+1)(x)}.

Type: Application

Filed: June 18, 2012

Publication date: December 19, 2013

Applicant: Xerox Corporation

Inventors: Marc Dymetman, Guillaume Bouchard
Word alignment method and system for improved vocabulary coverage in statistical machine translation

Patent number: 8612205

Abstract: A system and method for generating word alignments from pairs of aligned text strings are provided. A corpus of text strings provides pairs of text strings, primarily sentences, in source and target languages. A first alignment between a text string pair creates links therebetween. Each link links a single token of the first text string to a single token of the second text string. A second alignment also creates links between the text string pair. In some cases, these links may correspond to bi-phrases. A modified first alignment is generated by selectively modifying links in the first alignment which include a word which is infrequent in the corpus, based on links generated in the second alignment. This results in removing at least some of the links for the infrequent words, allowing more compact and better quality bi-phrases, with higher vocabulary coverage, to be extracted for use in a machine translation system.

Type: Grant

Filed: June 14, 2010

Date of Patent: December 17, 2013

Assignee: Xerox Corporation

Inventors: Gregory Alan Hanneman, Nicola Cancedda, Marc Dymetman
Translation system combining hierarchical and phrase-based models

Patent number: 8543374

Abstract: A method comprises: receiving or generating bi-content including source content in a source language or format and corresponding target content in a target language or format, wherein the target language or format is different from the source language or format; generating a source weighted finite state automaton representing the source content of the bi-content; generating a target weighted finite state automaton representing the target content of the bi-content; and computing a bilateral intersection between (i) the source weighted finite state automaton, (ii) a synchronous weighted context-free grammar comprising synchronized grammars for the source language or format and the target language or format, and (iii) the target weighted finite state automaton to generate an enriched synchronous weighted context free grammar.

Type: Grant

Filed: August 12, 2010

Date of Patent: September 24, 2013

Assignee: Xerox Corporation

Inventor: Marc Dymetman
Phrase-based statistical machine translation as a generalized traveling salesman problem

Patent number: 8504353

Abstract: Systems and methods are described that facilitate phrase-based statistical machine translation (SMT) incorporating bigram (or higher n-gram) language models by modeling bi-phrases as nodes in a graph. Additionally, construction of a translation is modeled as a “tour” amongst the nodes of the graph, such that a translation solution is generated by treating the graph as a generalized traveling salesman problem (GTSP) and solving for an optimal tour. The overall cost of a tour is computed by adding the costs associated with the edges traversed during the tour. Thus, the described systems and methods map the SMT problem directly into a GTSP problem, which itself can be directly converted into a TSP problem.

Type: Grant

Filed: July 27, 2009

Date of Patent: August 6, 2013

Assignee: Xerox Corporation

Inventors: Mikhail Zaslavskiy, Marc Dymetman, Nicola Cancedda
REJECTION SAMPLING OF A COMPLEX DISTRIBUTION INCLUDING BOUND AND PROPOSAL DISTRIBUTION REFINEMENT

Publication number: 20130096877

Abstract: Iterative rejection sampling is performed on a domain in accordance with a target distribution. The domain is partitioned to define a partition comprising partition elements, and each iteration of the rejection sampling includes selecting a partition element from the partition in accordance with partition element selection probabilities. A sample of the domain is acquired in the selected partition element according to a normalized proposal distribution that is associated with and normalized over the selected partition element. The acquired sample is accepted or rejected based on the target distribution and a bound associated with the selected partition element. During the iterative rejection sampling, the partition is adapted by replacing a partition element of the partition with two or more split partition elements, associating bounds with the split partition elements, and computing partition element selection probabilities for the split partition elements.

Type: Application

Filed: October 14, 2011

Publication date: April 18, 2013

Applicant: Xerox Corporation

Inventors: Marc Dymetman, Guillaume M. Bouchard

prev 1 2 3 4 5 next