Patents by Inventor Marc Dymetman
Marc Dymetman has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 8326599Abstract: A computer-implemented system and a method for pruning a library of bi-phrases, suitable for use in a machine translation system are provided. The method includes partitioning a bi-phrase library into a set of sub-libraries. The sub-libraries may be of different complexity such that, when pruning bi-phrases from the plurality of sub-libraries is based on a common noise threshold, a complexity of bi-phrases is taken into account in pruning the bi-phrases.Type: GrantFiled: April 21, 2009Date of Patent: December 4, 2012Assignee: Xerox CorporationInventors: Nadi Tomeh, Nicola Cancedda, Marc Dymetman
-
Publication number: 20120259807Abstract: Markov Chain Monte Carlo (MCMC) sampling of elements of a domain to be sampled is performed to generate a set of samples. The MCMC sampling is performed over a search tree of decision sequences representing the domain to be sampled and having terminal nodes corresponding to elements of the domain. In some embodiments the MCMC sampling is performed by Metropolis-Hastings (MH) sampling. The MCMC sampling is constrained using a bound on nodes of the search tree. The constraint may entail detecting a node whose bound value ensures that an acceptable element cannot be identified by continuing traversal of the tree past that node, and terminating the traversal in response. The constraint may entail selecting a node to serve as a starting node for a sampling attempt in accordance with a statistical promise distribution indicating likelihood that following a decision sequence rooted at the node will identify an acceptable element.Type: ApplicationFiled: April 11, 2011Publication date: October 11, 2012Applicant: XEROX CORPORATIONInventor: Marc Dymetman
-
Patent number: 8244519Abstract: A translation method comprises: retrieving a fuzzy match text segment translation pair from a translation memory (TM) for an input source language text segment, the fuzzy match text segment translation pair comprising a fuzzy source language text segment having a fuzzy match to the input source language text segment and a corresponding translated target language text segment; extracting from the fuzzy match text segment translation pair an exact match phrase pair comprising a source language phrase that exactly matches a phrase of the input source language text segment and a corresponding translated target language phrase; and invoking a statistical machine translation (SMT) system to generate a proposed translation of the input source language text segment based on a statistical translation model that is enriched by the exact match phrase pair with the exact match phrase pair assigned a high statistical probability.Type: GrantFiled: December 3, 2008Date of Patent: August 14, 2012Assignee: Xerox CorporationInventors: Ergun Bicici, Marc Dymetman
-
Publication number: 20120101804Abstract: A system and method for machine translation are disclosed. Source sentences are received. For each source sentence, a target sentence comprising target words is generated. A plurality of translation neighbors of the target sentence is generated. Phrase alignments are computed between the source sentence and the translation neighbor. Translation neighbors are scored with a translation scoring model, based on the phrase alignment. Translation neighbors are ranked, based on the scores. In training the model, parameters of the model are updated based on an external ranking of the ranked translation neighbors. The generating of translation neighbors, scoring, ranking, and, in the case of training, updating the parameters, are iterated with one of the translation neighbors as the target sentence. In the case of decoding, one of the translation neighbors is output as a translation. The system and method may be at least partially implemented with a computer processor.Type: ApplicationFiled: October 25, 2010Publication date: April 26, 2012Applicant: Xerox CorporationInventors: Benjamin Roth, Andrew R. McCallum, Marc Dymetman, Nicola Cancedda
-
Publication number: 20120041753Abstract: A method comprises: receiving or generating bi-content including source content in a source language or format and corresponding target content in a target language or format, wherein the target language or format is different from the source language or format; generating a source weighted finite state automaton representing the source content of the bi-content; generating a target weighted finite state automaton representing the target content of the bi-content; and computing a bilateral intersection between (i) the source weighted finite state automaton, (ii) a synchronous weighted context-free grammar comprising synchronized grammars for the source language or format and the target language or format, and (iii) the target weighted finite state automaton to generate an enriched synchronous weighted context free grammar.Type: ApplicationFiled: August 12, 2010Publication date: February 16, 2012Applicant: XEROX CORPORATIONInventor: Marc Dymetman
-
WORD ALIGNMENT METHOD AND SYSTEM FOR IMPROVED VOCABULARY COVERAGE IN STATISTICAL MACHINE TRANSLATION
Publication number: 20110307245Abstract: A system and method for generating word alignments from pairs of aligned text strings are provided. A corpus of text strings provides pairs of text strings, primarily sentences, in source and target languages. A first alignment between a text string pair creates links therebetween. Each link links a single token of the first text string to a single token of the second text string. A second alignment also creates links between the text string pair. In some cases, these links may correspond to bi-phrases. A modified first alignment is generated by selectively modifying links in the first alignment which include a word which is infrequent in the corpus, based on links generated in the second alignment. This results in removing at least some of the links for the infrequent words, allowing more compact and better quality bi-phrases, with higher vocabulary coverage, to be extracted for use in a machine translation system.Type: ApplicationFiled: June 14, 2010Publication date: December 15, 2011Applicant: Xerox CorporationInventors: Gregory Alan Hanneman, Nicola Cancedda, Marc Dymetman -
Publication number: 20110288852Abstract: A system and a method for phrase-based translation are disclosed. The method includes receiving source language text to be translated into target language text. One or more dynamic bi-phrases are generated, based on the source text and the application of one or more rules, which may be based on user descriptions. A dynamic feature value is associated with each of the dynamic bi-phrases. For a sentence of the source text, static bi-phrases are retrieved from a bi-phrase table, each of the static bi-phrases being associated with one or more values of static features. Any of the dynamic bi-phrases which each cover at least one word of the source text are also retrieved, which together form a set of active bi-phrases. Translation hypotheses are generated using active bi-phrases from the set and scored with a translation scoring model which takes into account the static and dynamic feature values of the bi-phrases used in the respective hypothesis. A translation, based on the hypothesis scores, is then output.Type: ApplicationFiled: May 20, 2010Publication date: November 24, 2011Applicant: Xerox CorporationInventors: Marc Dymetman, Wilker Ferreira Aziz, Nicola Cancedda, Jean-Marc Coursimault, Vassilina Nikoulina, Lucia Specia
-
Publication number: 20110022380Abstract: Systems and methods are described that facilitate phrase-based statistical machine translation (SMT) incorporating bigram (or higher n-gram) language models by modeling bi-phrases as nodes in a graph. Additionally, construction of a translation is modeled as a “tour” amongst the nodes of the graph, such that a translation solution is generated by treating the graph as a generalized traveling salesman problem (GTSP) and solving for an optimal tour. The overall cost of a tour is computed by adding the costs associated with the edges traversed during the tour. Thus, the described systems and methods map the SMT problem directly into a GTSP problem, which itself can be directly converted into a TSP problem.Type: ApplicationFiled: July 27, 2009Publication date: January 27, 2011Applicant: Xerox CorporationInventors: Mikhail Zaslavskiy, Marc Dymetman, Nicola Cancedda
-
Patent number: 7831587Abstract: From a corpus of segments, a hierarchical index is derived that indexes high frequency events of a selected event type occurring in segments of the corpus at a frequency higher than or equal to a threshold frequency, and also indexes at least some low frequency events that occur in segments of the corpus at a frequency lower than the threshold frequency. The hierarchy relates events by an order structure in which coarser events subsume finer events. A query is processed respective to a queried event. The processing references information stored in the index relating to either (i) the queried event if the queried event is indexed or (ii) a coarser event that is indexed and that subsumes the queried event if the queried event is not indexed.Type: GrantFiled: May 10, 2007Date of Patent: November 9, 2010Assignee: Xerox CorporationInventor: Marc Dymetman
-
Patent number: 7827026Abstract: A bilingual authoring apparatus includes a user interface (20) for inputting partially translated text including a text portion in a source language and surrounding or adjacent text in a target language. A bilingual dictionary (34) associates words and phrases in the target language and words and phrases in a source language. A context sensitive translation tool (30, 32, 38) communicates with the user interface, receives the partially translated text, and provides at least one proposed translation in the target language of the text portion in the source language. The at least one proposed translation in the target language is derived from the bilingual dictionary based on contextual analysis of at least a portion of the partially translated text.Type: GrantFiled: December 21, 2004Date of Patent: November 2, 2010Assignee: Xerox CorporationInventors: Caroline Brun, Marc Dymetman, Frederique Segond
-
Publication number: 20100268527Abstract: A computer-implemented system and a method for pruning a library of bi-phrases, suitable for use in a machine translation system are provided. The method includes partitioning a bi-phrase library into a set of sub-libraries. The sub-libraries may be of different complexity such that, when pruning bi-phrases from the plurality of sub-libraries is based on a common noise threshold, a complexity of bi-phrases is taken into account in pruning the bi-phrases.Type: ApplicationFiled: April 21, 2009Publication date: October 21, 2010Applicant: Xerox CorporationInventors: Nadi TOMEH, Nicola Cancedda, Marc Dymetman
-
Publication number: 20100138213Abstract: A translation method comprises: retrieving a fuzzy match text segment translation pair from a translation memory (TM) for an input source language text segment, the fuzzy match text segment translation pair comprising a fuzzy source language text segment having a fuzzy match to the input source language text segment and a corresponding translated target language text segment; extracting from the fuzzy match text segment translation pair an exact match phrase pair comprising a source language phrase that exactly matches a phrase of the input source language text segment and a corresponding translated target language phrase; and invoking a statistical machine translation (SMT) system to generate a proposed translation of the input source language text segment based on a statistical translation model that is enriched by the exact match phrase pair with the exact match phrase pair assigned a high statistical probability.Type: ApplicationFiled: December 3, 2008Publication date: June 3, 2010Applicant: Xerox CorporationInventors: Ergun Bicici, Marc Dymetman
-
Patent number: 7717712Abstract: A method for testing a language learner's ability to create semantically coherent grammatical text in a language, comprising generating text having at least one active region and inactive regions; displaying the text in a graphical user interface on a display unit, wherein at least one active region comprises a key word or phrase; identifying at least one active region in the graphical user interface; selecting at least one active region to display a menu of linguistic choices comprised of at least one grammatically correct linguistic choice and at least one grammatically incorrect linguistic choice; selecting one of the linguistic choices; and displaying an error message when at least one grammatically incorrect linguistic choice is selected.Type: GrantFiled: December 19, 2003Date of Patent: May 18, 2010Assignee: Xerox CorporationInventors: Caroline Brun, Marc Dymetman
-
Patent number: 7664629Abstract: A writing advisor program (20) receives a proposed text in an author's second language (L2) and determines at least one candidate replacement word for a selected word based on a determined language model (p(c)) and a determined corruption model (p(r|c)). The determined language model reflects correct usage of the text in the second language, independent of the native or first language (L1) of the author, based on (L2) corpora. The determined corruption model is based on some a priori knowledge about probable corruption paths leading the author to realize some inadequate expression in the second language instead of the correct, intended expression. Different types of corruption paths may be used that include bidirectional translations, false-friends, synonyms, common semantic features, second language internal cognates, preposition alternatives, and first language inserts.Type: GrantFiled: July 19, 2005Date of Patent: February 16, 2010Assignee: Xerox CorporationInventors: Marc Dymetman, Pierre Isabelle
-
Patent number: 7593144Abstract: A scanning machine (10) includes a scan surface (12) for receiving a page or pages to be scanned. A motion detector (20) monitors the copy surface to detect movement of the pages. In response to detecting a cessation of motion on the scan surface, the motion detector generates a trigger signal which causes a capture circuit (30) to capture an image of a page on the scan surface. If, during the capture of the page image, the motion detector detects motion of the page, the motion detector generates an abort signal which terminates capture of the image and empties a buffer memory (32). Once cessation of motion of the page is again detected, another trigger signal is issued and another attempt is made to capture an image. In this manner, scanning is started and stopped by natural, simple gestures without the operator pushing buttons.Type: GrantFiled: June 28, 2005Date of Patent: September 22, 2009Assignee: Xerox CorporationInventor: Marc Dymetman
-
Patent number: 7543758Abstract: A system for document internal localization includes an optical input device for retrieving an observed region of a hardcopy document and a processing component which provides a location identifier for actionable observed regions by matching the retrieved region with a region of an electronically stored version of the hardcopy document. The hardcopy document and electronically stored version of the document include machine readable disambiguating information which has been added to actionable regions of the document which are indistinguishable from other actionable regions, based on their original markings, whereby a region is distinguishable from similar regions of the document. The disambiguating information is localized in regions of the document whose markings are not distinguishable without the disambiguating information.Type: GrantFiled: December 20, 2005Date of Patent: June 9, 2009Assignee: Xerox CorporationInventors: Marc Dymetman, Gabriela Csurka
-
Patent number: 7542893Abstract: A machine translation method includes receiving source text in a first language and retrieving text fragments in a target language from a library of bi-fragments to generate a target hypothesis. Each bi-fragment includes a text fragment from the first language and a corresponding text fragment from the second language. Some of the bi-fragments are modeled as elastic bi-fragments where a gap between words is able to assume a variable size corresponding to a number of other words to occupy the gap. The target hypothesis is evaluated with a translation scoring function which scores the target hypothesis according to a plurality of feature functions, at least one of the feature functions comprising a gap size scoring feature which favors hypotheses with statistically more probable gap sizes over hypotheses with statically less probable gap sizes.Type: GrantFiled: May 10, 2006Date of Patent: June 2, 2009Assignee: Xerox CorporationInventors: Nicola Cancedda, Marc Dymetman, Eric Gaussier, Cyril Goutte
-
Patent number: 7536295Abstract: A machine translation method for translating source text from a first language to target text in a second language includes receiving the source text in the first language and accessing a library of bi-fragments, each of the bi-fragments including a text fragment from the first language and a text fragment from the second language, at least some of the bi-fragments comprising non-contiguous bi-fragments in which at least one of the text fragment from the first language and the text fragment from the second language comprises a non-contiguous fragment.Type: GrantFiled: December 22, 2005Date of Patent: May 19, 2009Assignee: Xerox CorporationInventors: Nicola Cancedda, Bruno Cavestro, Marc Dymetman, Eric Gaussier, Cyril Goutte, Michel Simard, Kenji Yamada
-
Patent number: 7495792Abstract: A programmable document includes a physical document having at least one sheet of material and information recorded thereon and a computer attached to the physical document. The computer includes an input/output device, a processor and a memory storing the recorded information in digital form and metadata pertaining to the physical document. By attaching small, inexpensive, computing devices to paper documents, various electronic information associated with the physical document can be retained, updated and modified.Type: GrantFiled: December 21, 2000Date of Patent: February 24, 2009Assignee: Xerox CorporationInventors: Dave Snowdon, Christer Fernstrom, Marc Dymetman, Natalie S. Glance
-
Publication number: 20080281857Abstract: From a corpus of segments, a hierarchical index is derived that indexes high frequency events of a selected event type occurring in segments of the corpus at a frequency higher than or equal to a threshold frequency, and also indexes at least some low frequency events that occur in segments of the corpus at a frequency lower than the threshold frequency. The hierarchy relates events by an order structure in which coarser events subsume finer events. A query is processed respective to a queried event. The processing references information stored in the index relating to either (i) the queried event if the queried event is indexed or (ii) a coarser event that is indexed and that subsumes the queried event if the queried event is not indexed.Type: ApplicationFiled: May 10, 2007Publication date: November 13, 2008Inventor: Marc Dymetman