Patents by Inventor Nicola Cancedda

Nicola Cancedda has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 8612205
    Abstract: A system and method for generating word alignments from pairs of aligned text strings are provided. A corpus of text strings provides pairs of text strings, primarily sentences, in source and target languages. A first alignment between a text string pair creates links therebetween. Each link links a single token of the first text string to a single token of the second text string. A second alignment also creates links between the text string pair. In some cases, these links may correspond to bi-phrases. A modified first alignment is generated by selectively modifying links in the first alignment which include a word which is infrequent in the corpus, based on links generated in the second alignment. This results in removing at least some of the links for the infrequent words, allowing more compact and better quality bi-phrases, with higher vocabulary coverage, to be extracted for use in a machine translation system.
    Type: Grant
    Filed: June 14, 2010
    Date of Patent: December 17, 2013
    Assignee: Xerox Corporation
    Inventors: Gregory Alan Hanneman, Nicola Cancedda, Marc Dymetman
  • Publication number: 20130301920
    Abstract: A method, a system, and a computer program product for processing the output of an OCR are disclosed. The system receives a first character sequence from the OCR. A first set of characters from the first character sequence are converted to a corresponding second set of characters to generate a second character sequence based on a look-up table and language scores.
    Type: Application
    Filed: May 14, 2012
    Publication date: November 14, 2013
    Applicant: Xerox Corporation
    Inventors: Sriram Venkatapathy, Nicola Cancedda
  • Patent number: 8572379
    Abstract: A server and a client mutually exclusively execute server-side and client-side commutative cryptographic processes and server-side and client-side commutative permutation processes. The server has access to a hash table, while the client does not. The server and client perform a method including: encrypting and reordering the hash table using the server; communicating the encrypted and reordered hash table to the client; further encrypting and further reordering the hash table using the client; communicating the further encrypted and further reordered hash table back to the server; and partially decrypting and partially undoing the reordering using the server to generate a double-blind hash table. To read an entry, the client hashes and permute an index key and communicates same to the server which retrieves an item from the double-blind hash table using the hashed and permuted index key and sends it back to the client which decrypts the retrieved item.
    Type: Grant
    Filed: August 8, 2011
    Date of Patent: October 29, 2013
    Assignee: Xerox Corporation
    Inventor: Nicola Cancedda
  • Patent number: 8548796
    Abstract: A translation system and method for translating source text from a first language to target text in a second language are disclosed. A library of bi-phrases is accessed to retrieve bi-phrases which each match a part of the source text. Each of the bi-phrases includes respective text fragments from the first and second language. Words of some (or all) of the bi-phrases are tagged with restricted part of speech (RPOS) tags. At least one of the RPOS tags is configured for identifying a word from the second language as being one which also forms a part of a closed compound word in the library. At least one target hypothesis is generated from the bi-phrases, which includes text fragments in the second language. The target hypothesis or hypotheses are evaluated, based at least in part on combinations of the restricted part of speech tags. Based on the evaluation, one of the at least one target hypothesis is output as the optimal hypothesis for forming the translation.
    Type: Grant
    Filed: January 20, 2010
    Date of Patent: October 1, 2013
    Assignee: Xerox Corporation
    Inventors: Sara Stymne, Nicola Cancedda, Tamás Gaál
  • Patent number: 8504353
    Abstract: Systems and methods are described that facilitate phrase-based statistical machine translation (SMT) incorporating bigram (or higher n-gram) language models by modeling bi-phrases as nodes in a graph. Additionally, construction of a translation is modeled as a “tour” amongst the nodes of the graph, such that a translation solution is generated by treating the graph as a generalized traveling salesman problem (GTSP) and solving for an optimal tour. The overall cost of a tour is computed by adding the costs associated with the edges traversed during the tour. Thus, the described systems and methods map the SMT problem directly into a GTSP problem, which itself can be directly converted into a TSP problem.
    Type: Grant
    Filed: July 27, 2009
    Date of Patent: August 6, 2013
    Assignee: Xerox Corporation
    Inventors: Mikhail Zaslavskiy, Marc Dymetman, Nicola Cancedda
  • Patent number: 8407042
    Abstract: A cross-language question answering system includes a server which hosts a plurality of community question answering (CQA) websites for different countries. The websites can generate a graphical user interface on an associated user terminal. A machine translation system translates a user question from a first language into a second language. The system may alert the user to similar questions posted on other CQA websites in other languages. The system may post the translated question on another CQA website and notify the user of answers to the translated question that are posted on the website by other users. The system may include memory which stores a plurality of archives, each including questions and answers posted on a corresponding one of the CQA websites. The system may allow a user to enter a query in the user's language and receive responses to the queries retrieved from the archives of other CQA websites.
    Type: Grant
    Filed: December 9, 2008
    Date of Patent: March 26, 2013
    Assignee: Xerox Corporation
    Inventor: Nicola Cancedda
  • Publication number: 20130042108
    Abstract: A server and a client mutually exclusively execute server-side and client-side commutative cryptographic processes and server-side and client-side commutative permutation processes. The server has access to a hash table, while the client does not. The server and client perform a method including: encrypting and reordering the hash table using the server; communicating the encrypted and reordered hash table to the client; further encrypting and further reordering the hash table using the client; communicating the further encrypted and further reordered hash table back to the server; and partially decrypting and partially undoing the reordering using the server to generate a double-blind hash table. To read an entry, the client hashes and permute an index key and communicates same to the server which retrieves an item from the double-blind hash table using the hashed and permuted index key and sends it back to the client which decrypts the retrieved item.
    Type: Application
    Filed: August 8, 2011
    Publication date: February 14, 2013
    Applicant: Xerox Corporation
    Inventor: Nicola Cancedda
  • Publication number: 20130030787
    Abstract: A method and a system for making merging decisions for a translation are disclosed which are suited to use where the target language is a productive compounding one. The method includes outputting decisions on merging of pairs of words in a translated text string with a merging system. The merging system can include a set of stored heuristics and/or a merging model. In the case of heuristics, these can include a heuristic by which two consecutive words in the string are considered for merging if the first word of the two consecutive words is recognized as a compound modifier and their observed frequency f1 as a closed compound word is larger than an observed frequency f2 of the two consecutive words as a bigram. In the case of a merging model, it can be one that is trained on features associated with pairs of consecutive tokens of text strings in a training set and predetermined merging decisions for the pairs.
    Type: Application
    Filed: July 25, 2011
    Publication date: January 31, 2013
    Applicant: Xerox Corporation
    Inventors: Nicola Cancedda, Sara Stymne
  • Patent number: 8326599
    Abstract: A computer-implemented system and a method for pruning a library of bi-phrases, suitable for use in a machine translation system are provided. The method includes partitioning a bi-phrase library into a set of sub-libraries. The sub-libraries may be of different complexity such that, when pruning bi-phrases from the plurality of sub-libraries is based on a common noise threshold, a complexity of bi-phrases is taken into account in pruning the bi-phrases.
    Type: Grant
    Filed: April 21, 2009
    Date of Patent: December 4, 2012
    Assignee: Xerox Corporation
    Inventors: Nadi Tomeh, Nicola Cancedda, Marc Dymetman
  • Publication number: 20120278060
    Abstract: A system and method for building a language model for a translation system are provided. The method includes providing a first relative ranking of first and second translations in a target language of a same source string in a source language, determining a second relative ranking of the first and second translations using weights of a language model, the language model including a weight for each of a set of n-gram features, and comparing the first and second relative rankings to determine whether they are in agreement. The method further includes, when the rankings are not in agreement, updating one or more of the weights in the language model as a function of a measure of confidence in the weight, the confidence being a function of previous observations of the n-gram feature in the method.
    Type: Application
    Filed: April 27, 2011
    Publication date: November 1, 2012
    Applicant: Xerox Corporation
    Inventors: Nicola Cancedda, Viet Ha Thuc
  • Patent number: 8265923
    Abstract: A statistical machine translation (SMT) system employs a conditional translation probability conditioned on the source language content. A model parameters optimization engine is configured to optimize values of parameters of the conditional translation probability using a translation pool comprising candidate aligned translations for source language sentences having reference translations. The model parameters optimization engine adds candidate aligned translations to the translation pool by sampling available candidate aligned translations in accordance with the conditional translation probability.
    Type: Grant
    Filed: May 11, 2010
    Date of Patent: September 11, 2012
    Assignee: Xerox Corporation
    Inventors: Samidh Chatterjee, Nicola Cancedda
  • Publication number: 20120101804
    Abstract: A system and method for machine translation are disclosed. Source sentences are received. For each source sentence, a target sentence comprising target words is generated. A plurality of translation neighbors of the target sentence is generated. Phrase alignments are computed between the source sentence and the translation neighbor. Translation neighbors are scored with a translation scoring model, based on the phrase alignment. Translation neighbors are ranked, based on the scores. In training the model, parameters of the model are updated based on an external ranking of the ranked translation neighbors. The generating of translation neighbors, scoring, ranking, and, in the case of training, updating the parameters, are iterated with one of the translation neighbors as the target sentence. In the case of decoding, one of the translation neighbors is output as a translation. The system and method may be at least partially implemented with a computer processor.
    Type: Application
    Filed: October 25, 2010
    Publication date: April 26, 2012
    Applicant: Xerox Corporation
    Inventors: Benjamin Roth, Andrew R. McCallum, Marc Dymetman, Nicola Cancedda
  • Publication number: 20110307245
    Abstract: A system and method for generating word alignments from pairs of aligned text strings are provided. A corpus of text strings provides pairs of text strings, primarily sentences, in source and target languages. A first alignment between a text string pair creates links therebetween. Each link links a single token of the first text string to a single token of the second text string. A second alignment also creates links between the text string pair. In some cases, these links may correspond to bi-phrases. A modified first alignment is generated by selectively modifying links in the first alignment which include a word which is infrequent in the corpus, based on links generated in the second alignment. This results in removing at least some of the links for the infrequent words, allowing more compact and better quality bi-phrases, with higher vocabulary coverage, to be extracted for use in a machine translation system.
    Type: Application
    Filed: June 14, 2010
    Publication date: December 15, 2011
    Applicant: Xerox Corporation
    Inventors: Gregory Alan Hanneman, Nicola Cancedda, Marc Dymetman
  • Patent number: 8077984
    Abstract: A computer implemented method and an apparatus for comparing spans of text are disclosed. The method includes computing a similarity measure between a first sequence of symbols representing a first text span and a second sequence of symbols representing a second text span as a function of the occurrences of optionally noncontiguous subsequences of symbols shared by the two sequences of symbols. Each of the symbols comprises at least one consecutive word and is defined according to a set of linguistic factors. Pairs of symbols in the first and second sequences that form a shared subsequence of symbols are each matched according to at least one of the factors.
    Type: Grant
    Filed: January 4, 2008
    Date of Patent: December 13, 2011
    Assignee: Xerox Corporation
    Inventors: Nicola Cancedda, Pierre Mahé
  • Publication number: 20110288852
    Abstract: A system and a method for phrase-based translation are disclosed. The method includes receiving source language text to be translated into target language text. One or more dynamic bi-phrases are generated, based on the source text and the application of one or more rules, which may be based on user descriptions. A dynamic feature value is associated with each of the dynamic bi-phrases. For a sentence of the source text, static bi-phrases are retrieved from a bi-phrase table, each of the static bi-phrases being associated with one or more values of static features. Any of the dynamic bi-phrases which each cover at least one word of the source text are also retrieved, which together form a set of active bi-phrases. Translation hypotheses are generated using active bi-phrases from the set and scored with a translation scoring model which takes into account the static and dynamic feature values of the bi-phrases used in the respective hypothesis. A translation, based on the hypothesis scores, is then output.
    Type: Application
    Filed: May 20, 2010
    Publication date: November 24, 2011
    Applicant: Xerox Corporation
    Inventors: Marc Dymetman, Wilker Ferreira Aziz, Nicola Cancedda, Jean-Marc Coursimault, Vassilina Nikoulina, Lucia Specia
  • Publication number: 20110282643
    Abstract: A statistical machine translation (SMT) system employs a conditional translation probability conditioned on the source language content. A model parameters optimization engine is configured to optimize values of parameters of the conditional translation probability using a translation pool comprising candidate aligned translations for source language sentences having reference translations. The model parameters optimization engine adds candidate aligned translations to the translation pool by sampling available candidate aligned translations in accordance with the conditional translation probability.
    Type: Application
    Filed: May 11, 2010
    Publication date: November 17, 2011
    Applicant: XEROX CORPORATION
    Inventors: Samidh Chatterjee, Nicola Cancedda
  • Publication number: 20110178791
    Abstract: A translation system and method for translating source text from a first language to target text in a second language are disclosed. A library of bi-phrases is accessed to retrieve bi-phrases which each match a part of the source text. Each of the bi-phrases includes respective text fragments from the first and second language. Words of some (or all) of the bi-phrases are tagged with restricted part of speech (RPOS) tags. At least one of the RPOS tags is configured for identifying a word from the second language as being one which also forms a part of a closed compound word in the library. At least one target hypothesis is generated from the bi-phrases, which includes text fragments in the second language. The target hypothesis or hypotheses are evaluated, based at least in part on combinations of the restricted part of speech tags. Based on the evaluation, one of the at least one target hypothesis is output as the optimal hypothesis for forming the translation.
    Type: Application
    Filed: January 20, 2010
    Publication date: July 21, 2011
    Applicant: Xerox Corporation
    Inventors: Sara STYMNE, Nicola Cancedda, Tamás Gaál
  • Publication number: 20110022380
    Abstract: Systems and methods are described that facilitate phrase-based statistical machine translation (SMT) incorporating bigram (or higher n-gram) language models by modeling bi-phrases as nodes in a graph. Additionally, construction of a translation is modeled as a “tour” amongst the nodes of the graph, such that a translation solution is generated by treating the graph as a generalized traveling salesman problem (GTSP) and solving for an optimal tour. The overall cost of a tour is computed by adding the costs associated with the edges traversed during the tour. Thus, the described systems and methods map the SMT problem directly into a GTSP problem, which itself can be directly converted into a TSP problem.
    Type: Application
    Filed: July 27, 2009
    Publication date: January 27, 2011
    Applicant: Xerox Corporation
    Inventors: Mikhail Zaslavskiy, Marc Dymetman, Nicola Cancedda
  • Publication number: 20100268527
    Abstract: A computer-implemented system and a method for pruning a library of bi-phrases, suitable for use in a machine translation system are provided. The method includes partitioning a bi-phrase library into a set of sub-libraries. The sub-libraries may be of different complexity such that, when pruning bi-phrases from the plurality of sub-libraries is based on a common noise threshold, a complexity of bi-phrases is taken into account in pruning the bi-phrases.
    Type: Application
    Filed: April 21, 2009
    Publication date: October 21, 2010
    Applicant: Xerox Corporation
    Inventors: Nadi TOMEH, Nicola Cancedda, Marc Dymetman
  • Publication number: 20100145673
    Abstract: A cross-language question answering system includes a server which hosts a plurality of community question answering (CQA) websites for different countries. The websites can generate a graphical user interface on an associated user terminal. A machine translation system translates a user question from a first language into a second language. The system may alert the user to similar questions posted on other CQA websites in other languages. The system may post the translated question on another CQA website and notify the user of answers to the translated question that are posted on the website by other users. The system may include memory which stores a plurality of archives, each including questions and answers posted on a corresponding one of the CQA websites. The system may allow a user to enter a query in the user's language and receive responses to the queries retrieved from the archives of other CQA websites.
    Type: Application
    Filed: December 9, 2008
    Publication date: June 10, 2010
    Applicant: Xerox Corporation
    Inventor: Nicola Cancedda