Patents by Inventor Andre Kempe

Andre Kempe has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 8095356
    Abstract: Operations for weighted and non-weighted multi-tape automata are described for use in natural language processing tasks such as morphological analysis, disambiguation, and entity extraction.
    Type: Grant
    Filed: November 9, 2009
    Date of Patent: January 10, 2012
    Assignee: Xerox Corporation
    Inventors: Andre Kempe, Franck Guingne, Florent Nicart
  • Patent number: 7827484
    Abstract: To correct at least one extraneous or missing space in a document, weights are assigned to tokens contained in a dictionary. Each token is defined by an ordered sequence of non-space symbols. The weights are assigned based on at least one of a token length and frequency of occurrence of the token in the document. Corrected text is generated from text of the document by applying an ordered sequence of symbol-level transformations selected from a group of symbol-level transformations including at least (i) deleting a space, (ii) inserting a space, and (iii) copying a symbol. The ordered sequence of symbol-level transformations is optimized respective to an objective function dependent upon the weights of tokens of the corrected text.
    Type: Grant
    Filed: September 2, 2005
    Date of Patent: November 2, 2010
    Assignee: Xerox Corporation
    Inventors: Hervé Déjean, André Kempe
  • Publication number: 20100049503
    Abstract: Operations for weighted and non-weighted multi-tape automata are described for use in natural language processing tasks such as morphological analysis, disambiguation, and entity extraction.
    Type: Application
    Filed: November 9, 2009
    Publication date: February 25, 2010
    Applicant: Xerox Corporation
    Inventors: Andre Kempe, Franck Guingne, Florent Nicart
  • Patent number: 7617091
    Abstract: Operations for weighted and non-weighted multi-tape automata are described for use in natural language processing tasks such as morphological analysis, disambiguation, and entity extraction.
    Type: Grant
    Filed: May 21, 2004
    Date of Patent: November 10, 2009
    Assignee: Xerox Corporation
    Inventors: Andre Kempe, Franck Guingne, Florent Nicart
  • Patent number: 7386441
    Abstract: Operations for weighted and non-weighted multi-tape automata are described for use in natural language processing tasks such as morphological analysis, disambiguation, and entity extraction.
    Type: Grant
    Filed: May 21, 2004
    Date of Patent: June 10, 2008
    Assignee: Xerox Corporation
    Inventors: Andre Kempe, Franck Guingne, Florent Nicart
  • Publication number: 20070055933
    Abstract: To correct at least one extraneous or missing space in a document, weights are assigned to tokens contained in a dictionary. Each token is defined by an ordered sequence of non-space symbols. The weights are assigned based on at least one of a token length and frequency of occurrence of the token in the document. Corrected text is generated from text of the document by applying an ordered sequence of symbol-level transformations selected from a group of symbol-level transformations including at least (i) deleting a space, (ii) inserting a space, and (iii) copying a symbol. The ordered sequence of symbol-level transformations is optimized respective to an objective function dependent upon the weights of tokens of the corrected text.
    Type: Application
    Filed: September 2, 2005
    Publication date: March 8, 2007
    Inventors: Herve Dejean, Andre Kempe
  • Patent number: 7107205
    Abstract: A method prepares a functional finite-state transducer (FST) with an epsilon or empty string on the input side for factorization into a bimachine. The method creates a left-deterministic input finite-state automation (FSA) by extracting and left-determinizing the input side of the functional FST. Subsequently, the corresponding sub-paths in the FST are identified for each arc in the left-deterministic FST and aligned.
    Type: Grant
    Filed: December 18, 2000
    Date of Patent: September 12, 2006
    Assignee: Xerox Corporation
    Inventor: Andre Kempe
  • Patent number: 6965858
    Abstract: A method reduces the number of diacritics and other intermediate symbols occurring between two factors that result from any factorization such as extraction of infinite ambiguity, factorization of finitely ambiguous finite-state transducer, or bimachine factorization. The method a posteriori removes all redundant intermediate symbols. The method can be used with any two finite-state transducers (FSTs) that operate in a cascade. With longer cascades, the method can be applied pair-wise to all FSTs, preferably starting from the last pair.
    Type: Grant
    Filed: December 18, 2000
    Date of Patent: November 15, 2005
    Assignee: Xerox Corporation
    Inventor: Andre Kempe
  • Patent number: 6961693
    Abstract: A method factors an ambiguous finite state transducer (FST) into two finite state transducers. The first FST is functional (i.e., unambiguous). The second FST retains the ambiguity of the original FST but is fail-safe (i.e., no failing paths) when applied to the output of the first FST. That is, the application of the second FST to an input string never leads to a state that does not provide a transition for the next symbol in the input. Subsequently, the first FST can be factorized into a left-sequential FST and a right-sequential FST that jointly represent a bi-machine.
    Type: Grant
    Filed: December 18, 2000
    Date of Patent: November 1, 2005
    Assignee: Xerox Corporation
    Inventor: Andre Kempe
  • Patent number: 6959273
    Abstract: A method factors an input finite state transducer (FST) with unknown symbols into a left-sequential FST and a right-sequential FST while avoiding direct factorization of the unknown symbols. The left-sequential FST is formed by replacing each occurrence of the unknown symbol in the input FST with a sequence of the unknown symbol and a diacritic. The right-sequential FST is formed by replacing each occurrence of the diacritic with a symbol representative of an empty string and an output symbol.
    Type: Grant
    Filed: December 18, 2000
    Date of Patent: October 25, 2005
    Assignee: Xerox Corporation
    Inventor: Andre Kempe
  • Patent number: 6952667
    Abstract: A method extracts all infinite ambiguity from an input finite-state transducer (FST). The input FST is factorized into a first factor and a second factor such that the first factor is finitely ambiguous, and the second factor retains all infinite ambiguity of the original FST. The first factor is defined so that it replaces every loop where the input symbol of every arc is an ? (i.e., epsilon, empty string) by a single arc with ? on the input side and a diacritic on the output side. The second factor is defined so that it maps every diacritic to one or more ?-loops.
    Type: Grant
    Filed: December 18, 2000
    Date of Patent: October 4, 2005
    Assignee: Xerox Corporation
    Inventor: Andre Kempe
  • Patent number: 6944588
    Abstract: A method factors a functional (i.e., ambiguous) finite state transducer (FST) into a bimachine with a reduced intermediate alphabet. Initially, the method determines an emission matrix corresponding to a factorization of the functional FST. Subsequently, the emission matrix is split into a plurality of emission sub-matrices equal in number to the number of input symbols to reduce the intermediate alphabet. Equal rows of each emission sub-matrix are assigned an identical index value in its corresponding factorization matrix before creating the bimachine.
    Type: Grant
    Filed: December 18, 2000
    Date of Patent: September 13, 2005
    Assignee: Xerox Corporation
    Inventor: Andre Kempe
  • Publication number: 20050108000
    Abstract: Operations for weighted and non-weighted multi-tape automata are described for use in natural language processing tasks such as morphological analysis, disambiguation, and entity extraction.
    Type: Application
    Filed: May 21, 2004
    Publication date: May 19, 2005
    Inventors: Andre Kempe, Franck Guingne, Florent Nicart
  • Publication number: 20050107999
    Abstract: Operations for weighted and non-weighted multi-tape automata are described for use in natural language processing tasks such as morphological analysis, disambiguation, and entity extraction.
    Type: Application
    Filed: May 21, 2004
    Publication date: May 19, 2005
    Inventors: Andre Kempe, Franck Guingne, Florent Nicart
  • Patent number: 6816830
    Abstract: A finite state data structure includes paths that represent pairs of strings, with a first string that is a string of tag combinations and a second string that is a string of tags for tokens in a language. The second strings of a set of paths with the same first string include only highly probable strings of tags for the first string. The data structure can be an FST or a bimachine, and can be used for mapping strings of tag combinations to strings of tags. The tags can, for example, indicate parts of speech of words, and the tag combinations can be ambiguity classes or, in a bimachine, reduced ambiguity classes. An FST can be obtained by approximating a Hidden Markov Model. A bimachine can include left-to-right and right-to-left sequential FSTs obtained based on frequencies of tokens in a training corpus.
    Type: Grant
    Filed: October 15, 1999
    Date of Patent: November 9, 2004
    Assignee: Xerox Corporation
    Inventor: Andre Kempe
  • Patent number: 6760636
    Abstract: A method extracts all “short” ambiguity from an input FST (i.e., ambiguities of one arc in length). The method factors the input FST into a first factor and a second factor such that the second factor contains all ambiguity that is one arc long, and the first factor contains all other parts of the input FST. The method a priori prevents the creation of some redundant intermediate symbols.
    Type: Grant
    Filed: December 18, 2000
    Date of Patent: July 6, 2004
    Assignee: Xerox Corporation
    Inventor: Andre Kempe
  • Publication number: 20030046055
    Abstract: A method factors a functional (i.e., ambiguous) finite state transducer (FST) into a bimachine with a reduced intermediate alphabet. Initially, the method determines an emission matrix corresponding to a factorization of the functional FST. Subsequently, the emission matrix is split into a plurality of emission sub-matrices equal in number to the number of input symbols to reduce the intermediate alphabet. Equal rows of each emission sub-matrix are assigned an identical index value in its corresponding factorization matrix before creating the bimachine.
    Type: Application
    Filed: December 18, 2000
    Publication date: March 6, 2003
    Applicant: Xerox Corporation
    Inventor: Andre Kempe
  • Publication number: 20030033135
    Abstract: A method extracts all infinite ambiguity from an input finite-state transducer (FST). The input FST is factorized into a first factor and a second factor such that the first factor is finitely ambiguous, and the second factor retains all infinite ambiguity of the original FST. The first factor is defined so that it replaces every loop where the input symbol of every arc is an &egr; (i.e., epsilon, empty string) by a single arc with &egr; on the input side and a diacritic on the output side. The second factor is defined so that it maps every diacritic to one or more &egr;-loops.
    Type: Application
    Filed: December 18, 2000
    Publication date: February 13, 2003
    Applicant: Xerox Corporation
    Inventor: Andre Kempe
  • Publication number: 20030004705
    Abstract: A method factors an ambiguous finite state transducer (FST) into two finite state transducers. The first FST is functional (i.e., unambiguous). The second FST retains the ambiguity of the original FST but is fail-safe (i.e., no failing paths) when applied to the output of the first FST. That is, the application of the second FST to an input string never leads to a state that does not provide a transition for the next symbol in the input. Subsequently, the first FST can be factorized into a left-sequential FST and a right-sequential FST that jointly represent a bi-machine.
    Type: Application
    Filed: December 18, 2000
    Publication date: January 2, 2003
    Applicant: Xerox Corporation
    Inventor: Andre Kempe
  • Publication number: 20020198702
    Abstract: A method factors an input finite state transducer (FST) with unknown symbols into a left-sequential FST and a right-sequential FST while avoiding direct factorization of the unknown symbols. The left-sequential FST is formed by replacing each occurrence of the unknown symbol in the input FST with a sequence of the unknown symbol and a diacritic. The right-sequential FST is formed by replacing each occurrence of the diacritic with a symbol representative of an empty string and an output symbol.
    Type: Application
    Filed: December 18, 2000
    Publication date: December 26, 2002
    Applicant: Xerox Corporation
    Inventor: Andre Kempe