Patents by Inventor Marc Dymetman

Marc Dymetman has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20160330144
    Abstract: A system and method are disclosed which enable more effective email response authoring by contact center agents, for example, by automatically suggesting prototypical (entire) email responses to the human agent and interactive suggestion of next sentence candidates during the writing process. In one method, a customer inquiry is received and a latent topic prediction is generated, based on a word-based representation of the customer inquiry. A latent topic prediction is generated for an entire agent's reply to the customer inquiry as a function of the latent topic prediction generated for the customer inquiry. A further latent topic prediction is generated for a next sentence of the agent's reply as a function of a topic prediction for the next sentence which is generated with a prediction model that has been trained on annotated sentences of agent replies. Information is output to assist the agent, based on the topic predictions.
    Type: Application
    Filed: May 4, 2015
    Publication date: November 10, 2016
    Inventors: Marc Dymetman, Jean-Michel Renders, Sriram Venkatapathy, Spandana Gella
  • Patent number: 9473637
    Abstract: Agent utterances are generated for implementing dialog acts recommended by a dialog manager of a call center. To this end, a set of word lattices, each represented as a weighted finite state automaton (WFSA), is constructed from training dialogs between call center agents and second parties (e.g. customers). The word lattices are assigned conditional probabilities over dialog act type. For each dialog act received from the dialog manager, the word lattices are ranked by the conditional probabilities for the dialog act type. At least one word lattice is chosen from the ranking, and is instantiated to generate a recommended agent utterance for implementing the recommended dialog act. The word lattices may be constructed by clustering agent utterances of training dialogs using context features from preceding second party utterances and grammatical dependency link features between words within agent utterances. Path variations of the word lattices may define slots or paraphrases.
    Type: Grant
    Filed: July 28, 2015
    Date of Patent: October 18, 2016
    Assignee: XEROX CORPORATION
    Inventors: Sriram Venkatapathy, Shachar Mirkin, Marc Dymetman
  • Patent number: 9400783
    Abstract: Each entry of an ARPA table for a modeled language includes an n-gram Az, an associated backoff value Az.p equal to the conditional probability p(z|A) that symbol z follows context A in the modeled language, and an associated backoff weight value Az.b for the context A. A method comprises: (1) computing and adding for each entry of the ARPA table in descending n-gram order an associated maximum backoff weight product value Az.m; (2) after performing operation (1), computing and adding for each entry of the ARPA table in descending n-gram order an associated max-backoff value Az.w=maxh p(z|hA) which is the maximum backoff value for any head h preceding the context A of the n-gram Az; and (3) extending the ARPA table by adding a column storing the associated maximum backoff weight product values Az.m and a column storing the associated max-backoff values Az.w.
    Type: Grant
    Filed: November 26, 2013
    Date of Patent: July 26, 2016
    Assignee: XEROX CORPORATION
    Inventor: Marc Dymetman
  • Patent number: 9367541
    Abstract: A method for terminological adaptation includes receiving a vocabulary pair including source and target language terms. Each term is in a class which includes a set of sequences. Contextual phrase pairs are extracted from a bilingual training corpus, each including source and target phrases. The phrases each include a sequence of the same class as the respective source and target terms as well as some associated context. Templates are generated, based on the contextual phrase pairs. In each template the source and target sequences of a contextual phrase pair are replaced with respective placeholders, each denoting the respective class of the sequence. Candidate phrase pairs are generated from these templates. In each candidate phrase pair, the placeholders of one of the templates are replaced with respective terms of a vocabulary pair of the same class. Some candidate phrase pairs are incorporated into a phrase table of a machine translation system.
    Type: Grant
    Filed: January 20, 2015
    Date of Patent: June 14, 2016
    Assignee: XEROX CORPORATION
    Inventors: Christophe Servan, Marc Dymetman
  • Patent number: 9164961
    Abstract: The disclosed embodiments relate to a system and method for predicting the learning curve of an SMT system. A set of anchor points are selected. The set of anchor points correspond to a size of a corpus. Thereafter, a gold curve or a benchmark curve is fitted based on the set of anchor points to determine the BLEU score. Based on the BLEU score and a set of parameters associated with the first set of anchor points, a confidence score is computed.
    Type: Grant
    Filed: November 30, 2012
    Date of Patent: October 20, 2015
    Assignee: Xerox Corporation
    Inventors: Prasanth Kolachina, Nicola Cancedda, Marc Dymetman, Sriram Venkatapathy
  • Publication number: 20150149151
    Abstract: Each entry of an ARPA table for a modeled language includes an n-gram Az, an associated backoff value Az.p equal to the conditional probability p(z|A) that symbol z follows context A in the modeled language, and an associated backoff weight value Az.b for the context A. A method comprises: (1) computing and adding for each entry of the ARPA table in descending n-gram order an associated maximum backoff weight product value Az.m; (2) after performing operation (1), computing and adding for each entry of the ARPA table in descending n-gram order an associated max-backoff value Az.w=maxh p(z|hA) which is the maximum backoff value for any head h preceding the context A of the n-gram Az; and (3) extending the ARPA table by adding a column storing the associated maximum backoff weight product values Az.m and a column storing the associated max-backoff values Az.w.
    Type: Application
    Filed: November 26, 2013
    Publication date: May 28, 2015
    Applicant: Xerox Corporation
    Inventor: Marc Dymetman
  • Patent number: 8983887
    Abstract: Markov Chain Monte Carlo (MCMC) sampling of elements of a domain to be sampled is performed to generate a set of samples. The MCMC sampling is performed over a search tree of decision sequences representing the domain to be sampled and having terminal nodes corresponding to elements of the domain. In some embodiments the MCMC sampling is performed by Metropolis-Hastings (MH) sampling. The MCMC sampling is constrained using a bound on nodes of the search tree. The constraint may entail detecting a node whose bound value ensures that an acceptable element cannot be identified by continuing traversal of the tree past that node, and terminating the traversal in response. The constraint may entail selecting a node to serve as a starting node for a sampling attempt in accordance with a statistical promise distribution indicating likelihood that following a decision sequence rooted at the node will identify an acceptable element.
    Type: Grant
    Filed: April 11, 2011
    Date of Patent: March 17, 2015
    Assignee: Xerox Corporation
    Inventor: Marc Dymetman
  • Patent number: 8972244
    Abstract: Rejection sampling is performed to acquire at least one target language translation for a source language string s in accordance with a phrase-based statistical translation model p(x)=p(t, a|s) where t is a candidate translation, a is a candidate alignment comprising a biphrase sequence generating the candidate translation t, and x is a sequence representing the candidate alignment a. The rejection sampling uses a proposal distribution comprising a weighted finite state automaton (WFSA) q(n) that is refined responsive to rejection of a sample x* obtained in a current iteration of the rejection sampling to generate a refined WFSA q(n+1) for use in a next iteration of the rejection sampling. The refined WFSA q(n+1) is selected to satisfy the criteria p(x)?q(n+1)(x)?q(n)(x) for all x?X and q(n+1)(x*)<q(n)(x*) where the space X is the set of sequences x corresponding to candidate alignments a that generate candidate translations t for the source language string s.
    Type: Grant
    Filed: January 25, 2013
    Date of Patent: March 3, 2015
    Assignee: Xerox Corporation
    Inventors: Marc Dymetman, Wilker Ferreira Aziz, Sriram Venkatapathy
  • Publication number: 20150039286
    Abstract: A system and method for automated validation of a translation provided by a generic machine translation engine comprises a generic machine translation (GMT) engine, a domain-specific multilingual terminology dictionary (DS) and a translation processor. The translation processor effects the first call to the GMT engine to obtain a first translation of an input text to a first translation text; a verification of a presence of a domain-specific terminology item in the first translation text; correctness assessment of the first translation text using the DS wherein upon an assessment of incorrect translation, a replacement of the domain-specific terminology item with a domain-specific translation text item from the DS.
    Type: Application
    Filed: July 31, 2013
    Publication date: February 5, 2015
    Applicant: Xerox Corporation
    Inventors: Vassilina Nikoulina, Marc Dymetman
  • Publication number: 20140358519
    Abstract: A method for rewriting source text includes receiving source text including a source text string in a first natural language. The source text string is translated with a machine translation system to generate a first target text string in a second natural language. A translation confidence for the source text string is computed, based on the first target text string. At least one alternative text string is generated, where possible, in the first natural language by automatically rewriting the source string. Each alternative string is translated to generate a second target text string in the second natural language. A translation confidence is computed for the alternative text string based on the second target string. Based on the computed translation confidences, one of the alternative text strings may be selected as a candidate replacement for the source text string and may be proposed to a user on a graphical user interface.
    Type: Application
    Filed: June 3, 2013
    Publication date: December 4, 2014
    Inventors: Shachar Mirkin, Sriram Venkatapathy, Marc Dymetman
  • Patent number: 8838415
    Abstract: Iterative rejection sampling is performed on a domain in accordance with a target distribution. The domain is partitioned to define a partition comprising partition elements, and each iteration of the rejection sampling includes selecting a partition element from the partition in accordance with partition element selection probabilities. A sample of the domain is acquired in the selected partition element according to a normalized proposal distribution that is associated with and normalized over the selected partition element. The acquired sample is accepted or rejected based on the target distribution and a bound associated with the selected partition element. During the iterative rejection sampling, the partition is adapted by replacing a partition element of the partition with two or more split partition elements, associating bounds with the split partition elements, and computing partition element selection probabilities for the split partition elements.
    Type: Grant
    Filed: October 14, 2011
    Date of Patent: September 16, 2014
    Assignee: Xerox Corporation
    Inventors: Marc Dymetman, Guillaume M. Bouchard
  • Publication number: 20140214397
    Abstract: Rejection sampling is performed to acquire at least one target language translation for a source language string s in accordance with a phrase-based statistical translation model p(x)=p(t, a|s) where t is a candidate translation, a is a candidate alignment comprising a biphrase sequence generating the candidate translation t, and x is a sequence representing the candidate alignment a. The rejection sampling uses a proposal distribution comprising a weighted finite state automaton (WFSA) q(n) that is refined responsive to rejection of a sample x* obtained in a current iteration of the rejection sampling to generate a refined WFSA q(n+1) for use in a next iteration of the rejection sampling. The refined WFSA q(n+1) is selected to satisfy the criteria p(x)?q(n+1)(x)?q(n)(x) for all x?X and q(n+1)(x*)<q(n)(x*) where the space X is the set of sequences x corresponding to candidate alignments a that generate candidate translations t for the source language string s.
    Type: Application
    Filed: January 25, 2013
    Publication date: July 31, 2014
    Applicant: Xerox Corporation
    Inventors: Marc Dymetman, Wilker Ferreira Aziz, Sriram Venkatapathy
  • Patent number: 8775155
    Abstract: A system and method for machine translation are disclosed. Source sentences are received. For each source sentence, a target sentence comprising target words is generated. A plurality of translation neighbors of the target sentence is generated. Phrase alignments are computed between the source sentence and the translation neighbor. Translation neighbors are scored with a translation scoring model, based on the phrase alignment. Translation neighbors are ranked, based on the scores. In training the model, parameters of the model are updated based on an external ranking of the ranked translation neighbors. The generating of translation neighbors, scoring, ranking, and, in the case of training, updating the parameters, are iterated with one of the translation neighbors as the target sentence. In the case of decoding, one of the translation neighbors is output as a translation. The system and method may be at least partially implemented with a computer processor.
    Type: Grant
    Filed: October 25, 2010
    Date of Patent: July 8, 2014
    Assignee: Xerox Corporation
    Inventors: Benjamin Roth, Andrew R. McCallum, Marc Dymetman, Nicola Cancedda
  • Publication number: 20140156565
    Abstract: The disclosed embodiments relate to a system and method for predicting the learning curve of an SMT system. A set of anchor points are selected. The set of anchor points correspond to a size of a corpus. Thereafter, a gold curve or a benchmark curve is fitted based on the set of anchor points to determine the BLEU score. Based on the BLEU score and a set of parameters associated with the first set of anchor points, a confidence score is computed.
    Type: Application
    Filed: November 30, 2012
    Publication date: June 5, 2014
    Applicant: XEROX CORPORATION
    Inventors: Prasanth Kolachina, Nicola Cancedda, Marc Dymetman, Sriram Venkatapathy
  • Publication number: 20140046651
    Abstract: An unweighted automaton B is generated from a weighted finite state automaton (WFSA) A, having the same states as the WFSA and having unweighted transitions corresponding only to the transitions of the WFSA having strictly positive weights. A powerset construction on the unweighted automaton generates a deterministic automaton B? having states Q. For each state Q?, a set of points LQ? is defined representing all vectors w?=w·aQQ? where aQQ? is a transition label of a dominator of a predecessor state Q connecting Q with state Q? and w is a prefix of the transition label aQQ? in Q, and a set of dominators SQ? in LQ? are determined such that LQ? is included in hull(SQ?). The dominant vector is identified in final state Qf such that LQf is included in hull(wf). Backpointers from the dominant vector wf to the initial state Q0 are followed to generate the max-string result.
    Type: Application
    Filed: August 13, 2012
    Publication date: February 13, 2014
    Applicant: Xerox Corporation
    Inventor: Marc Dymetman
  • Publication number: 20130338999
    Abstract: In rejection sampling of a function or distribution p over a space X, a proposal distribution q(n) is refined responsive to rejection of a sample x*?X to generate a refined proposal distribution q(n+1) selected to satisfy the criteria p(x)?q(n+1)(x)?q(n)(x) and q(n+1)(x*)<q(n)(x*). In a sampling mode, the sample x* is obtained by random sampling of the space X, the rejection sampling accepts or rejects x* based on comparison of a ratio p(x*)/q(x*) with a random draw, and the refined proposal distribution q(n+1) is selected to minimize a norm ?q(n+1)?? where ?<?. In an optimization mode, the sample x* is obtained such that q*=q(n)(x*) maximizes q(n) over the space X, the rejection sampling accepts or rejects x* based on a difference between or ratio of q* and p(x*), and the refined proposal distribution q(n+1) is selected to minimize a norm ?q(n+1)??=max{q(n+1)(x)}.
    Type: Application
    Filed: June 18, 2012
    Publication date: December 19, 2013
    Applicant: Xerox Corporation
    Inventors: Marc Dymetman, Guillaume Bouchard
  • Patent number: 8612205
    Abstract: A system and method for generating word alignments from pairs of aligned text strings are provided. A corpus of text strings provides pairs of text strings, primarily sentences, in source and target languages. A first alignment between a text string pair creates links therebetween. Each link links a single token of the first text string to a single token of the second text string. A second alignment also creates links between the text string pair. In some cases, these links may correspond to bi-phrases. A modified first alignment is generated by selectively modifying links in the first alignment which include a word which is infrequent in the corpus, based on links generated in the second alignment. This results in removing at least some of the links for the infrequent words, allowing more compact and better quality bi-phrases, with higher vocabulary coverage, to be extracted for use in a machine translation system.
    Type: Grant
    Filed: June 14, 2010
    Date of Patent: December 17, 2013
    Assignee: Xerox Corporation
    Inventors: Gregory Alan Hanneman, Nicola Cancedda, Marc Dymetman
  • Patent number: 8543374
    Abstract: A method comprises: receiving or generating bi-content including source content in a source language or format and corresponding target content in a target language or format, wherein the target language or format is different from the source language or format; generating a source weighted finite state automaton representing the source content of the bi-content; generating a target weighted finite state automaton representing the target content of the bi-content; and computing a bilateral intersection between (i) the source weighted finite state automaton, (ii) a synchronous weighted context-free grammar comprising synchronized grammars for the source language or format and the target language or format, and (iii) the target weighted finite state automaton to generate an enriched synchronous weighted context free grammar.
    Type: Grant
    Filed: August 12, 2010
    Date of Patent: September 24, 2013
    Assignee: Xerox Corporation
    Inventor: Marc Dymetman
  • Patent number: 8504353
    Abstract: Systems and methods are described that facilitate phrase-based statistical machine translation (SMT) incorporating bigram (or higher n-gram) language models by modeling bi-phrases as nodes in a graph. Additionally, construction of a translation is modeled as a “tour” amongst the nodes of the graph, such that a translation solution is generated by treating the graph as a generalized traveling salesman problem (GTSP) and solving for an optimal tour. The overall cost of a tour is computed by adding the costs associated with the edges traversed during the tour. Thus, the described systems and methods map the SMT problem directly into a GTSP problem, which itself can be directly converted into a TSP problem.
    Type: Grant
    Filed: July 27, 2009
    Date of Patent: August 6, 2013
    Assignee: Xerox Corporation
    Inventors: Mikhail Zaslavskiy, Marc Dymetman, Nicola Cancedda
  • Publication number: 20130096877
    Abstract: Iterative rejection sampling is performed on a domain in accordance with a target distribution. The domain is partitioned to define a partition comprising partition elements, and each iteration of the rejection sampling includes selecting a partition element from the partition in accordance with partition element selection probabilities. A sample of the domain is acquired in the selected partition element according to a normalized proposal distribution that is associated with and normalized over the selected partition element. The acquired sample is accepted or rejected based on the target distribution and a bound associated with the selected partition element. During the iterative rejection sampling, the partition is adapted by replacing a partition element of the partition with two or more split partition elements, associating bounds with the split partition elements, and computing partition element selection probabilities for the split partition elements.
    Type: Application
    Filed: October 14, 2011
    Publication date: April 18, 2013
    Applicant: Xerox Corporation
    Inventors: Marc Dymetman, Guillaume M. Bouchard