Patents by Inventor Marc Dymetman

Marc Dymetman has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 8326599
    Abstract: A computer-implemented system and a method for pruning a library of bi-phrases, suitable for use in a machine translation system are provided. The method includes partitioning a bi-phrase library into a set of sub-libraries. The sub-libraries may be of different complexity such that, when pruning bi-phrases from the plurality of sub-libraries is based on a common noise threshold, a complexity of bi-phrases is taken into account in pruning the bi-phrases.
    Type: Grant
    Filed: April 21, 2009
    Date of Patent: December 4, 2012
    Assignee: Xerox Corporation
    Inventors: Nadi Tomeh, Nicola Cancedda, Marc Dymetman
  • Publication number: 20120259807
    Abstract: Markov Chain Monte Carlo (MCMC) sampling of elements of a domain to be sampled is performed to generate a set of samples. The MCMC sampling is performed over a search tree of decision sequences representing the domain to be sampled and having terminal nodes corresponding to elements of the domain. In some embodiments the MCMC sampling is performed by Metropolis-Hastings (MH) sampling. The MCMC sampling is constrained using a bound on nodes of the search tree. The constraint may entail detecting a node whose bound value ensures that an acceptable element cannot be identified by continuing traversal of the tree past that node, and terminating the traversal in response. The constraint may entail selecting a node to serve as a starting node for a sampling attempt in accordance with a statistical promise distribution indicating likelihood that following a decision sequence rooted at the node will identify an acceptable element.
    Type: Application
    Filed: April 11, 2011
    Publication date: October 11, 2012
    Applicant: XEROX CORPORATION
    Inventor: Marc Dymetman
  • Patent number: 8244519
    Abstract: A translation method comprises: retrieving a fuzzy match text segment translation pair from a translation memory (TM) for an input source language text segment, the fuzzy match text segment translation pair comprising a fuzzy source language text segment having a fuzzy match to the input source language text segment and a corresponding translated target language text segment; extracting from the fuzzy match text segment translation pair an exact match phrase pair comprising a source language phrase that exactly matches a phrase of the input source language text segment and a corresponding translated target language phrase; and invoking a statistical machine translation (SMT) system to generate a proposed translation of the input source language text segment based on a statistical translation model that is enriched by the exact match phrase pair with the exact match phrase pair assigned a high statistical probability.
    Type: Grant
    Filed: December 3, 2008
    Date of Patent: August 14, 2012
    Assignee: Xerox Corporation
    Inventors: Ergun Bicici, Marc Dymetman
  • Publication number: 20120101804
    Abstract: A system and method for machine translation are disclosed. Source sentences are received. For each source sentence, a target sentence comprising target words is generated. A plurality of translation neighbors of the target sentence is generated. Phrase alignments are computed between the source sentence and the translation neighbor. Translation neighbors are scored with a translation scoring model, based on the phrase alignment. Translation neighbors are ranked, based on the scores. In training the model, parameters of the model are updated based on an external ranking of the ranked translation neighbors. The generating of translation neighbors, scoring, ranking, and, in the case of training, updating the parameters, are iterated with one of the translation neighbors as the target sentence. In the case of decoding, one of the translation neighbors is output as a translation. The system and method may be at least partially implemented with a computer processor.
    Type: Application
    Filed: October 25, 2010
    Publication date: April 26, 2012
    Applicant: Xerox Corporation
    Inventors: Benjamin Roth, Andrew R. McCallum, Marc Dymetman, Nicola Cancedda
  • Publication number: 20120041753
    Abstract: A method comprises: receiving or generating bi-content including source content in a source language or format and corresponding target content in a target language or format, wherein the target language or format is different from the source language or format; generating a source weighted finite state automaton representing the source content of the bi-content; generating a target weighted finite state automaton representing the target content of the bi-content; and computing a bilateral intersection between (i) the source weighted finite state automaton, (ii) a synchronous weighted context-free grammar comprising synchronized grammars for the source language or format and the target language or format, and (iii) the target weighted finite state automaton to generate an enriched synchronous weighted context free grammar.
    Type: Application
    Filed: August 12, 2010
    Publication date: February 16, 2012
    Applicant: XEROX CORPORATION
    Inventor: Marc Dymetman
  • Publication number: 20110307245
    Abstract: A system and method for generating word alignments from pairs of aligned text strings are provided. A corpus of text strings provides pairs of text strings, primarily sentences, in source and target languages. A first alignment between a text string pair creates links therebetween. Each link links a single token of the first text string to a single token of the second text string. A second alignment also creates links between the text string pair. In some cases, these links may correspond to bi-phrases. A modified first alignment is generated by selectively modifying links in the first alignment which include a word which is infrequent in the corpus, based on links generated in the second alignment. This results in removing at least some of the links for the infrequent words, allowing more compact and better quality bi-phrases, with higher vocabulary coverage, to be extracted for use in a machine translation system.
    Type: Application
    Filed: June 14, 2010
    Publication date: December 15, 2011
    Applicant: Xerox Corporation
    Inventors: Gregory Alan Hanneman, Nicola Cancedda, Marc Dymetman
  • Publication number: 20110288852
    Abstract: A system and a method for phrase-based translation are disclosed. The method includes receiving source language text to be translated into target language text. One or more dynamic bi-phrases are generated, based on the source text and the application of one or more rules, which may be based on user descriptions. A dynamic feature value is associated with each of the dynamic bi-phrases. For a sentence of the source text, static bi-phrases are retrieved from a bi-phrase table, each of the static bi-phrases being associated with one or more values of static features. Any of the dynamic bi-phrases which each cover at least one word of the source text are also retrieved, which together form a set of active bi-phrases. Translation hypotheses are generated using active bi-phrases from the set and scored with a translation scoring model which takes into account the static and dynamic feature values of the bi-phrases used in the respective hypothesis. A translation, based on the hypothesis scores, is then output.
    Type: Application
    Filed: May 20, 2010
    Publication date: November 24, 2011
    Applicant: Xerox Corporation
    Inventors: Marc Dymetman, Wilker Ferreira Aziz, Nicola Cancedda, Jean-Marc Coursimault, Vassilina Nikoulina, Lucia Specia
  • Publication number: 20110022380
    Abstract: Systems and methods are described that facilitate phrase-based statistical machine translation (SMT) incorporating bigram (or higher n-gram) language models by modeling bi-phrases as nodes in a graph. Additionally, construction of a translation is modeled as a “tour” amongst the nodes of the graph, such that a translation solution is generated by treating the graph as a generalized traveling salesman problem (GTSP) and solving for an optimal tour. The overall cost of a tour is computed by adding the costs associated with the edges traversed during the tour. Thus, the described systems and methods map the SMT problem directly into a GTSP problem, which itself can be directly converted into a TSP problem.
    Type: Application
    Filed: July 27, 2009
    Publication date: January 27, 2011
    Applicant: Xerox Corporation
    Inventors: Mikhail Zaslavskiy, Marc Dymetman, Nicola Cancedda
  • Patent number: 7831587
    Abstract: From a corpus of segments, a hierarchical index is derived that indexes high frequency events of a selected event type occurring in segments of the corpus at a frequency higher than or equal to a threshold frequency, and also indexes at least some low frequency events that occur in segments of the corpus at a frequency lower than the threshold frequency. The hierarchy relates events by an order structure in which coarser events subsume finer events. A query is processed respective to a queried event. The processing references information stored in the index relating to either (i) the queried event if the queried event is indexed or (ii) a coarser event that is indexed and that subsumes the queried event if the queried event is not indexed.
    Type: Grant
    Filed: May 10, 2007
    Date of Patent: November 9, 2010
    Assignee: Xerox Corporation
    Inventor: Marc Dymetman
  • Patent number: 7827026
    Abstract: A bilingual authoring apparatus includes a user interface (20) for inputting partially translated text including a text portion in a source language and surrounding or adjacent text in a target language. A bilingual dictionary (34) associates words and phrases in the target language and words and phrases in a source language. A context sensitive translation tool (30, 32, 38) communicates with the user interface, receives the partially translated text, and provides at least one proposed translation in the target language of the text portion in the source language. The at least one proposed translation in the target language is derived from the bilingual dictionary based on contextual analysis of at least a portion of the partially translated text.
    Type: Grant
    Filed: December 21, 2004
    Date of Patent: November 2, 2010
    Assignee: Xerox Corporation
    Inventors: Caroline Brun, Marc Dymetman, Frederique Segond
  • Publication number: 20100268527
    Abstract: A computer-implemented system and a method for pruning a library of bi-phrases, suitable for use in a machine translation system are provided. The method includes partitioning a bi-phrase library into a set of sub-libraries. The sub-libraries may be of different complexity such that, when pruning bi-phrases from the plurality of sub-libraries is based on a common noise threshold, a complexity of bi-phrases is taken into account in pruning the bi-phrases.
    Type: Application
    Filed: April 21, 2009
    Publication date: October 21, 2010
    Applicant: Xerox Corporation
    Inventors: Nadi TOMEH, Nicola Cancedda, Marc Dymetman
  • Publication number: 20100138213
    Abstract: A translation method comprises: retrieving a fuzzy match text segment translation pair from a translation memory (TM) for an input source language text segment, the fuzzy match text segment translation pair comprising a fuzzy source language text segment having a fuzzy match to the input source language text segment and a corresponding translated target language text segment; extracting from the fuzzy match text segment translation pair an exact match phrase pair comprising a source language phrase that exactly matches a phrase of the input source language text segment and a corresponding translated target language phrase; and invoking a statistical machine translation (SMT) system to generate a proposed translation of the input source language text segment based on a statistical translation model that is enriched by the exact match phrase pair with the exact match phrase pair assigned a high statistical probability.
    Type: Application
    Filed: December 3, 2008
    Publication date: June 3, 2010
    Applicant: Xerox Corporation
    Inventors: Ergun Bicici, Marc Dymetman
  • Patent number: 7717712
    Abstract: A method for testing a language learner's ability to create semantically coherent grammatical text in a language, comprising generating text having at least one active region and inactive regions; displaying the text in a graphical user interface on a display unit, wherein at least one active region comprises a key word or phrase; identifying at least one active region in the graphical user interface; selecting at least one active region to display a menu of linguistic choices comprised of at least one grammatically correct linguistic choice and at least one grammatically incorrect linguistic choice; selecting one of the linguistic choices; and displaying an error message when at least one grammatically incorrect linguistic choice is selected.
    Type: Grant
    Filed: December 19, 2003
    Date of Patent: May 18, 2010
    Assignee: Xerox Corporation
    Inventors: Caroline Brun, Marc Dymetman
  • Patent number: 7664629
    Abstract: A writing advisor program (20) receives a proposed text in an author's second language (L2) and determines at least one candidate replacement word for a selected word based on a determined language model (p(c)) and a determined corruption model (p(r|c)). The determined language model reflects correct usage of the text in the second language, independent of the native or first language (L1) of the author, based on (L2) corpora. The determined corruption model is based on some a priori knowledge about probable corruption paths leading the author to realize some inadequate expression in the second language instead of the correct, intended expression. Different types of corruption paths may be used that include bidirectional translations, false-friends, synonyms, common semantic features, second language internal cognates, preposition alternatives, and first language inserts.
    Type: Grant
    Filed: July 19, 2005
    Date of Patent: February 16, 2010
    Assignee: Xerox Corporation
    Inventors: Marc Dymetman, Pierre Isabelle
  • Patent number: 7593144
    Abstract: A scanning machine (10) includes a scan surface (12) for receiving a page or pages to be scanned. A motion detector (20) monitors the copy surface to detect movement of the pages. In response to detecting a cessation of motion on the scan surface, the motion detector generates a trigger signal which causes a capture circuit (30) to capture an image of a page on the scan surface. If, during the capture of the page image, the motion detector detects motion of the page, the motion detector generates an abort signal which terminates capture of the image and empties a buffer memory (32). Once cessation of motion of the page is again detected, another trigger signal is issued and another attempt is made to capture an image. In this manner, scanning is started and stopped by natural, simple gestures without the operator pushing buttons.
    Type: Grant
    Filed: June 28, 2005
    Date of Patent: September 22, 2009
    Assignee: Xerox Corporation
    Inventor: Marc Dymetman
  • Patent number: 7543758
    Abstract: A system for document internal localization includes an optical input device for retrieving an observed region of a hardcopy document and a processing component which provides a location identifier for actionable observed regions by matching the retrieved region with a region of an electronically stored version of the hardcopy document. The hardcopy document and electronically stored version of the document include machine readable disambiguating information which has been added to actionable regions of the document which are indistinguishable from other actionable regions, based on their original markings, whereby a region is distinguishable from similar regions of the document. The disambiguating information is localized in regions of the document whose markings are not distinguishable without the disambiguating information.
    Type: Grant
    Filed: December 20, 2005
    Date of Patent: June 9, 2009
    Assignee: Xerox Corporation
    Inventors: Marc Dymetman, Gabriela Csurka
  • Patent number: 7542893
    Abstract: A machine translation method includes receiving source text in a first language and retrieving text fragments in a target language from a library of bi-fragments to generate a target hypothesis. Each bi-fragment includes a text fragment from the first language and a corresponding text fragment from the second language. Some of the bi-fragments are modeled as elastic bi-fragments where a gap between words is able to assume a variable size corresponding to a number of other words to occupy the gap. The target hypothesis is evaluated with a translation scoring function which scores the target hypothesis according to a plurality of feature functions, at least one of the feature functions comprising a gap size scoring feature which favors hypotheses with statistically more probable gap sizes over hypotheses with statically less probable gap sizes.
    Type: Grant
    Filed: May 10, 2006
    Date of Patent: June 2, 2009
    Assignee: Xerox Corporation
    Inventors: Nicola Cancedda, Marc Dymetman, Eric Gaussier, Cyril Goutte
  • Patent number: 7536295
    Abstract: A machine translation method for translating source text from a first language to target text in a second language includes receiving the source text in the first language and accessing a library of bi-fragments, each of the bi-fragments including a text fragment from the first language and a text fragment from the second language, at least some of the bi-fragments comprising non-contiguous bi-fragments in which at least one of the text fragment from the first language and the text fragment from the second language comprises a non-contiguous fragment.
    Type: Grant
    Filed: December 22, 2005
    Date of Patent: May 19, 2009
    Assignee: Xerox Corporation
    Inventors: Nicola Cancedda, Bruno Cavestro, Marc Dymetman, Eric Gaussier, Cyril Goutte, Michel Simard, Kenji Yamada
  • Patent number: 7495792
    Abstract: A programmable document includes a physical document having at least one sheet of material and information recorded thereon and a computer attached to the physical document. The computer includes an input/output device, a processor and a memory storing the recorded information in digital form and metadata pertaining to the physical document. By attaching small, inexpensive, computing devices to paper documents, various electronic information associated with the physical document can be retained, updated and modified.
    Type: Grant
    Filed: December 21, 2000
    Date of Patent: February 24, 2009
    Assignee: Xerox Corporation
    Inventors: Dave Snowdon, Christer Fernstrom, Marc Dymetman, Natalie S. Glance
  • Publication number: 20080281857
    Abstract: From a corpus of segments, a hierarchical index is derived that indexes high frequency events of a selected event type occurring in segments of the corpus at a frequency higher than or equal to a threshold frequency, and also indexes at least some low frequency events that occur in segments of the corpus at a frequency lower than the threshold frequency. The hierarchy relates events by an order structure in which coarser events subsume finer events. A query is processed respective to a queried event. The processing references information stored in the index relating to either (i) the queried event if the queried event is indexed or (ii) a coarser event that is indexed and that subsumes the queried event if the queried event is not indexed.
    Type: Application
    Filed: May 10, 2007
    Publication date: November 13, 2008
    Inventor: Marc Dymetman