Patents by Inventor Hervé Déjean

Hervé Déjean has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20250209100
    Abstract: In a method for training a first-stage neural retriever, adapter layers are inserted into one or more transformer layers of a pretrained language model (PLM) in an encoder configured to receive one or more documents and generate a sparse representation for each of the documents. The first-stage retriever is trained on a downstream task to update one or more parameters of the inserted adapter layers.
    Type: Application
    Filed: January 19, 2024
    Publication date: June 26, 2025
    Inventors: Stéphane CLINCHANT, Carlos LASSANCE, Hervé DÉJEAN, Vaishali PAL
  • Patent number: 10803233
    Abstract: This disclosure provides an exemplary method and system for extracting structured data from an unstructured textual document. According to an exemplary method, initially a layout analysis is performed resulting in one or more alternatives for grouping and ordering the page elements of interest. Next, the content of these page elements are tagged based on application-specific heuristics. Finally, a sequence-based method is applied to the tags for identifying repetitive contiguous patterns.
    Type: Grant
    Filed: December 16, 2013
    Date of Patent: October 13, 2020
    Assignee: Conduent Business Services LLC
    Inventors: Hervé Déjean, Darren S. Schroeder
  • Publication number: 20180129944
    Abstract: A multi-page document is represented as a graph in which extracted page objects of the document, such as text blocks, are represented by nodes that are connected by intra-page edges and/or cross-page edges. The nodes and edges of the graph are associated with respective sets of features, the edge features distinguishing between intra-page and cross-page edges. A trained first model jointly predicts class labels for page objects, based on node and edge features. Page labels for the pages may be predicted, based on the page object predictions, optionally enforcing a constraint, such a maximum of one class label for a given class, per page. The pages can be assigned a respective category, based on the predicted classes of the page objects and respective features. Information based on the predictions is output, such as one or more of the page object class labels, the page labels, and information based thereon.
    Type: Application
    Filed: November 7, 2016
    Publication date: May 10, 2018
    Applicant: Xerox Corporation
    Inventors: Jean-Luc Meunier, Hervé Déjean
  • Patent number: 9965809
    Abstract: Disclosed is a method and system for extracting a mathematical structure associated with a financial table. According to an exemplary embodiment, the method uses a LR-(Left-to-Right) parser reducing stack and a LR-parser nonreducing stack to generate a final reducing stack representative of the mathematical structure.
    Type: Grant
    Filed: July 25, 2016
    Date of Patent: May 8, 2018
    Assignee: Xerox Corporation
    Inventor: Hervé Déjean
  • Publication number: 20180025436
    Abstract: Disclosed is a method and system for extracting a mathematical structure associated with a financial table. According to an exemplary embodiment, the method uses a LR-(Left-to-Right) parser reducing stack and a LR-parser nonreducing stack to generate a final reducing stack representative of the mathematical structure.
    Type: Application
    Filed: July 25, 2016
    Publication date: January 25, 2018
    Applicant: Xerox Corporation
    Inventor: Hervé Déjean
  • Patent number: 9798711
    Abstract: This disclosure provides a method and system of generating a graphical organization of a document page. According to an exemplary embodiment, the method includes identifying grid-based structures represented by graphical lines of a document page. The exemplary method includes a sequence of steps where a rectangular zone associated with the page is analyzed by looking for lines that entirely cross the zone, either horizontally or vertically. A hierarchy of grid-based structures are then identified, which can be used for analysis of the document and/or data extraction.
    Type: Grant
    Filed: December 1, 2015
    Date of Patent: October 24, 2017
    Assignee: XEROX CORPORATION
    Inventor: Hervé Déjean
  • Patent number: 9672195
    Abstract: Disclosed is a method and system that generates a page construct structure associated with a sequentially-ordered set of pages, each being characterized by a set of page construct features. N-grams, i.e., a sequence of n features, are computed from a set of page construct features for n contiguous pages, and n-grams which are repetitive are selected. Pages matching the most frequent repetitive n-ram are grouped together under a new node, and a new sequence is created. The method is iteratively applied to this new sequence. The output is an ordered set of trees.
    Type: Grant
    Filed: December 24, 2013
    Date of Patent: June 6, 2017
    Assignee: Xerox Corporation
    Inventor: Hervé Déjean
  • Publication number: 20170154025
    Abstract: This disclosure provides a method and system of generating a graphical organization of a document page. According to an exemplary embodiment, the method includes identifying grid-based structures represented by graphical lines of a document page. The exemplary method includes a sequence of steps where a rectangular zone associated with the page is analyzed by looking for lines that entirely cross the zone, either horizontally or vertically. A hierarchy of grid-based structures are then identified, which can be used for analysis of the document and/or data extraction.
    Type: Application
    Filed: December 1, 2015
    Publication date: June 1, 2017
    Applicant: Xerox Corporation
    Inventor: Hervé Déjean
  • Patent number: 9613267
    Abstract: This disclosure provides an exemplary method and system for extracting structured label and value pairwise textual data from a textual document. According to an exemplary method, initially a layout analysis is performed resulting in one or more alternatives for grouping and ordering the textual elements of interest. Next, textual elements are tagged as including a label term, a value term or a label and value term. Finally, a sequence-based method is applied to the tagged elements to generate one or more sequence listings representative of the label and value pairwise data structure(s) and label:value pairwise data is extracted.
    Type: Grant
    Filed: September 3, 2014
    Date of Patent: April 4, 2017
    Assignee: Xerox Corporation
    Inventors: Hervé Déjean, Thierry Lehoux, Eric H. Cheminot
  • Patent number: 9524274
    Abstract: Disclosed is a method that structures a sequentially-ordered set of elements, each being characterized by a set of features. N-grams (sequence of n features) are computed from a set for n contiguous elements, and n-grams which are repetitive (Kleene cross) are selected. Elements matching the most frequent repetitive n-gram are grouped together under a new node, and a new sequence is created. The method is iteratively applied to this new sequence. The output is an ordered set of trees.
    Type: Grant
    Filed: June 6, 2013
    Date of Patent: December 20, 2016
    Assignee: Xerox Corporation
    Inventor: Hervé Déjean
  • Publication number: 20160063322
    Abstract: This disclosure provides an exemplary method and system for extracting structured label and value pairwise textual data from a textual document. According to an exemplary method, initially a layout analysis is performed resulting in one or more alternatives for grouping and ordering the textual elements of interest. Next, textual elements are tagged as including a label term, a value term or a label and value term. Finally, a sequence-based method is applied to the tagged elements to generate one or more sequence listings representative of the label and value pairwise data structure(s) and label:value pairwise data is extracted.
    Type: Application
    Filed: September 3, 2014
    Publication date: March 3, 2016
    Inventors: Hervé Déjean, Thierry Lehoux, Eric H. Cheminot
  • Patent number: 9189461
    Abstract: Disclosed is a method that generates a page frame structure associated with a sequentially-ordered set of pages, each being characterized by a set of page frame features. N-grams (sequence of n features) are computed from a set for n contiguous pages, and n-grams which are repetitive (Kleene cross) are selected. Pages matching the most frequent repetitive n-ram are grouped together under a new node, and a new sequence is created. The method is iteratively applied to this new sequence. The output is an ordered set of trees.
    Type: Grant
    Filed: July 16, 2013
    Date of Patent: November 17, 2015
    Assignee: Xerox Corporation
    Inventor: Hervé Déjean
  • Patent number: 9110868
    Abstract: A system, method, and computer program product for determining the structure of a document are provided. The method includes receiving a set of document pages for a document and linking one page frame to each of a plurality of document pages in the set. For each document page linked to a page frame, a content bounding box surrounding the content on the document page is identified, and the document page categorized, based at least in part on the geometrical relationship between the page frame and the content bounding box of the document page. The document page can then be identified as a logical cut based at least in part on the categorization of the document page. Information, such as a table of contents or updated table of contents, can then be output, based on the determined logical unit(s) of the document.
    Type: Grant
    Filed: December 21, 2010
    Date of Patent: August 18, 2015
    Assignee: XEROX CORPORATION
    Inventor: Hervé Déjean
  • Publication number: 20150169510
    Abstract: This disclosure provides an exemplary method and system for extracting structured data from an unstructured textual document. According to an exemplary method, initially a layout analysis is performed resulting in one or more alternatives for grouping and ordering the page elements of interest. Next, the content of these page elements are tagged based on application-specific heuristics. Finally, a sequence-based method is applied to the tags for identifying repetitive contiguous patterns.
    Type: Application
    Filed: December 16, 2013
    Publication date: June 18, 2015
    Applicant: Xerox Corporation
    Inventors: Hervé Déjean, Darren S. Schroeder
  • Patent number: 9008443
    Abstract: A system and method for identifying regular geometric structures in a document page are disclosed. In the method, for a document page for which a set of page elements have been identified, the method includes identifying, where present, geometric relations among a subset of the page elements, from a predefined set of geometric relations, and a geometric structure comprising regular rows and regular columns, based on the identified geometric relations. Constraints of a definition of a regular geometric structure are applied to the identified geometric structure and, where the subset of page elements includes regular rows and regular columns forming a geometric structure which meets the constraints of the definition of a regular geometric structure, the subset of the page elements is identified as forming a regular geometric structure and may be labeled or tested to determine if it can be expanded by adding one or more rows or columns.
    Type: Grant
    Filed: June 22, 2012
    Date of Patent: April 14, 2015
    Assignee: Xerox Corporation
    Inventor: Hervé Déjean
  • Publication number: 20150026558
    Abstract: Disclosed is a method that generates a page frame structure associated with a sequentially-ordered set of pages, each being characterized by a set of page frame features. N-grams (sequence of n features) are computed from a set for n contiguous pages, and n-grams which are repetitive (Kleene cross) are selected. Pages matching the most frequent repetitive n-ram are grouped together under a new node, and a new sequence is created. The method is iteratively applied to this new sequence. The output is an ordered set of trees.
    Type: Application
    Filed: July 16, 2013
    Publication date: January 22, 2015
    Inventor: Hervé Déjean
  • Publication number: 20140365872
    Abstract: Disclosed is a method that structures a sequentially-ordered set of elements, each being characterized by a set of features. N-grams (sequence of n features) are computed from a set for n contiguous elements, and n-grams which are repetitive (Kleene cross) are selected. Elements matching the most frequent repetitive n-gram are grouped together under a new node, and a new sequence is created. The method is iteratively applied to this new sequence. The output is an ordered set of trees.
    Type: Application
    Filed: June 6, 2013
    Publication date: December 11, 2014
    Inventor: Hervé Déjean
  • Patent number: 8719700
    Abstract: A computer-implemented method and system for generation of page templates are provided. The method includes providing a document in computer memory. Using a computer processor, page elements within the document are identified and labeled. For each page of the document, a set of geometric relations between pairs of page elements co-occurring on the page is computed, and the set of geometric relations is associated with the page. The method also includes generating a set of page template candidates based at least in part on the computed geometric relations, selecting page templates from the set of page template candidates, and outputting the selected page templates.
    Type: Grant
    Filed: May 4, 2010
    Date of Patent: May 6, 2014
    Assignee: Xerox Corporation
    Inventor: Hervé Déjean
  • Patent number: 8645819
    Abstract: A method and a system for detecting and extracting images in an electronic document are disclosed. The method includes receiving an electronic document and identifying elements of a page. The identified elements include a set of graphical elements and a set of text elements. The method may include identifying and excluding elements which serve as graphical page constructs and/or text formatting elements. The page can then be segmented, based on (remaining) graphical elements and identified white spaces, to generate a set of image blocks. Text elements that are associated with a respective image block are identified as captions. Overlapping candidate images are then grouped to form a new image. The new image can thus include candidate images which would, without the identification of their caption(s), each be treated as a respective image.
    Type: Grant
    Filed: June 17, 2011
    Date of Patent: February 4, 2014
    Assignee: Xerox Corporation
    Inventor: Hervé Déjean
  • Patent number: 8645821
    Abstract: A system and method for page frame detection for pages of a document are disclosed. The method includes receiving a set of document pages for a document, each page having at least one detected object. For each page in the set, the method includes determining dimensions of bounding box which encompasses the detected objects of the page and determining margin dimensions, based on a position of the bounding box on the page. A page frame is computed as a combination of bounding box dimensions and margin dimensions, based on frequencies of the bounding box dimensions and margin dimensions computed for the set of pages. The computed page frame is matched to pages of the document. Information based on the matching, such as content of text objects within the matched page frame, can be output.
    Type: Grant
    Filed: September 28, 2010
    Date of Patent: February 4, 2014
    Assignee: Xerox Corporation
    Inventor: Hervé Déjean