Patents by Inventor Thierry Lehoux

Thierry Lehoux has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 9613267
    Abstract: This disclosure provides an exemplary method and system for extracting structured label and value pairwise textual data from a textual document. According to an exemplary method, initially a layout analysis is performed resulting in one or more alternatives for grouping and ordering the textual elements of interest. Next, textual elements are tagged as including a label term, a value term or a label and value term. Finally, a sequence-based method is applied to the tagged elements to generate one or more sequence listings representative of the label and value pairwise data structure(s) and label:value pairwise data is extracted.
    Type: Grant
    Filed: September 3, 2014
    Date of Patent: April 4, 2017
    Assignee: Xerox Corporation
    Inventors: Hervé Déjean, Thierry Lehoux, Eric H. Cheminot
  • Publication number: 20160063322
    Abstract: This disclosure provides an exemplary method and system for extracting structured label and value pairwise textual data from a textual document. According to an exemplary method, initially a layout analysis is performed resulting in one or more alternatives for grouping and ordering the textual elements of interest. Next, textual elements are tagged as including a label term, a value term or a label and value term. Finally, a sequence-based method is applied to the tagged elements to generate one or more sequence listings representative of the label and value pairwise data structure(s) and label:value pairwise data is extracted.
    Type: Application
    Filed: September 3, 2014
    Publication date: March 3, 2016
    Inventors: Hervé Déjean, Thierry Lehoux, Eric H. Cheminot
  • Patent number: 8566349
    Abstract: A method and an apparatus for training a handwritten document categorizer are disclosed. For each category in a set into which handwritten documents are to be categorized, discriminative words are identified from the OCR output of a training set of typed documents labeled by category. A group of keywords is established including some of the discriminative words identified for each category. Samples of each of the keywords in the group are synthesized using a plurality of different type fonts. A keyword model is then generated for each keyword, parameters of the model being estimated, at least initially, based on features extracted from the synthesized samples. Keyword statistics for each of a set of scanned handwritten documents labeled by category are generated by applying the generated keyword models to word images extracted from the scanned handwritten documents. The categorizer is trained with the keyword statistics and respective handwritten document labels.
    Type: Grant
    Filed: September 28, 2009
    Date of Patent: October 22, 2013
    Assignee: Xerox Corporation
    Inventors: Francois Ragnet, Florent C. Perronnin, Thierry Lehoux
  • Patent number: 8509537
    Abstract: A wordspotting system and method are disclosed. The method includes receiving a keyword and, for each of a set of typographical fonts, synthesizing a word image based on the keyword. A keyword model is trained based on the synthesized word images and the respective weights for each of the set of typographical fonts. Using the trained keyword model, handwritten word images of a collection of handwritten word images which match the keyword are identified. The weights allow a large set of fonts to be considered, with the weights indicating the relative relevance of each font for modeling a set of handwritten word images.
    Type: Grant
    Filed: August 5, 2010
    Date of Patent: August 13, 2013
    Assignee: Xerox Corporation
    Inventors: Florent C. Perronnin, Thierry Lehoux, Francois Ragnet
  • Patent number: 8453922
    Abstract: A method for separating and categorizing documents includes receiving a scanned batch of documents. The batch includes scanned documents to which document separator stamps have been applied before scanning. Each stamp includes machine recognizable patterns applied on a same page of a document, spaced by a designated field for receiving a user-applied category code. The scanned batch of documents is processed to identify pages that contain a document separator, including identifying at least one of two spaced patterns. For a document page for which a document separator is identified, the the corresponding designated field is located and the category code associated with the designated field identified. The document containing the is separated from other documents in the batch based the identified separator and a document category is assigned to the document, based on the identified category code.
    Type: Grant
    Filed: February 9, 2010
    Date of Patent: June 4, 2013
    Assignee: Xerox Corporation
    Inventors: Francois Ragnet, John A. Moore, Nicolas Raphaël Saubat, Eric H. Cheminot, Thierry Lehoux
  • Publication number: 20120033874
    Abstract: A wordspotting system and method are disclosed. The method includes receiving a keyword and, for each of a set of typographical fonts, synthesizing a word image based on the keyword. A keyword model is trained based on the synthesized word images and the respective weights for each of the set of typographical fonts. Using the trained keyword model, handwritten word images of a collection of handwritten word images which match the keyword are identified. The weights allow a large set of fonts to be considered, with the weights indicating the relative relevance of each font for modeling a set of handwritten word images.
    Type: Application
    Filed: August 5, 2010
    Publication date: February 9, 2012
    Applicant: Xerox Corporation
    Inventors: Florent Perronnin, Thierry Lehoux, Francois Ragnet
  • Publication number: 20110192894
    Abstract: A method, apparatus, and hardcopy document are provided. The method provides for separating and categorizing documents and includes receiving a scanned batch of documents. The batch includes a plurality of scanned documents to which document separator stamps have been applied before scanning. Each document separator stamp includes first and second machine recognizable patterns applied on a same page of a document, the first and second patterns being spaced by a designated field for receiving a user-applied category code. The scanned batch of documents is processed to identify pages that contain a document separator, the processing including identifying at least one of the first and second spaced patterns. For each of a plurality of document pages for which a document separator is identified, the method includes locating the corresponding designated field and identifying the category code associated with the designated field.
    Type: Application
    Filed: February 9, 2010
    Publication date: August 11, 2011
    Applicant: Xerox Corporation
    Inventors: Francois Ragnet, John A. Moore, Nicolas Raphaël Saubat, Eric H. Cheminot, Thierry Lehoux
  • Publication number: 20110078191
    Abstract: A method and an apparatus for training a handwritten document categorizer are disclosed. For each category in a set into which handwritten documents are to be categorized, discriminative words are identified from the OCR output of a training set of typed documents labeled by category. A group of keywords is established including some of the discriminative words identified for each category. Samples of each of the keywords in the group are synthesized using a plurality of different type fonts. A keyword model is then generated for each keyword, parameters of the model being estimated, at least initially, based on features extracted from the synthesized samples. Keyword statistics for each of a set of scanned handwritten documents labeled by category are generated by applying the generated keyword models to word images extracted from the scanned handwritten documents. The categorizer is trained with the keyword statistics and respective handwritten document labels.
    Type: Application
    Filed: September 28, 2009
    Publication date: March 31, 2011
    Applicant: Xerox Corporation
    Inventors: Francois RAGNET, Florent C. Perronnin, Thierry Lehoux