Patents by Inventor Michele Dolfi

Michele Dolfi has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11017498
    Abstract: A plurality of electronic documents comprising one or more document pages are received. First position markers, second position markers and page identifiers are inserted to the pages. The plurality of electronic documents are printed, thereby generating a printed corpus comprising a plurality of printed documents. The plurality of printed documents are scanned, thereby generating a scanned corpus comprising a plurality of scanned images. Scanning frame positions of the first and the second position markers are detected and the detected scanning frame positions and the page positions are used to define affine transformations between the plurality of scanned images and the corresponding document pages. The affine transformations are applied to the plurality of scanned images to align the plurality of scanned images with the corresponding document pages of the plurality of electronic documents.
    Type: Grant
    Filed: March 14, 2019
    Date of Patent: May 25, 2021
    Assignee: International Business Machines Corporation
    Inventors: Peter Willem Jan Staar, Michele Dolfi, Christoph Auer, Leonidas Georgopoulos, Konstantinos Bekas
  • Patent number: 10885323
    Abstract: A computer-implemented method for digitizing a document, wherein the document has assigned a classification scheme may be provided. A digital image and an identifier of the classification scheme may be received, the image representing a portion of the document. A segmentation of the image may be determined into one or more image segments; for each of the image segments, content information may be captured from the image segment and a category may be assigned to the image segment, the category being selected from the classification scheme. One or more digitization segments may be selected from the segmentation. A graph model of the document may be populated, wherein each of the digitization segments is represented by a segment node of the graph model.
    Type: Grant
    Filed: February 28, 2019
    Date of Patent: January 5, 2021
    Assignee: International Business Machines Corporation
    Inventors: Peter Willem Jan Staar, Michele Dolfi, Christoph Auer, Leonidas Georgopoulos, Konstantinos Bekas
  • Publication number: 20200401590
    Abstract: A computer-implemented method for generating ground-truth for natural language querying may include providing a knowledge graph as data model, receiving a natural language query from a user and translating the natural language query into a formal data query. The method can also include visualizing the formal data query to the user and receiving a feedback response from the user. The feedback response can include a verified and/or edited formal data query. The method can also include storing the natural language query and the corresponding feedback response as ground-truth pair. Corresponding system and a related computer program product may be provided.
    Type: Application
    Filed: June 20, 2019
    Publication date: December 24, 2020
    Inventors: Peter Willem Jan Staar, Michele Dolfi, Christoph Auer, Leonidas Georgopoulos, Aleksandros Sobczyk, Tim Jan Baccaert, Konstantinos Bekas
  • Patent number: 10824788
    Abstract: A method of collecting training data of a document component may be provided. The documents have a structure and are coded in the typesetting language TeX. The method comprise receiving a TeX source file, compiling it into a PDF file and a related sync file, analyzing the PDF file, thereby determining a non-text-only document component. The method comprises also determining first coordinates of the non-text-only document component and a corresponding page number, determining a typesetting command relating to a non-text-only document component and determining second coordinates of a bounding box and a corresponding page number from the sync file, determining text elements in the non-text-only document component of the PDF file for which the first coordinates and the second coordinates overlap, and combining the determined text elements and linking them to a type of a non-text document component determined in the non-text-only document component in the TeX source file.
    Type: Grant
    Filed: February 8, 2019
    Date of Patent: November 3, 2020
    Assignee: International Business Machines Corporation
    Inventors: Peter Willem Jan Staar, Michele Dolfi, Christoph Auer, Aleksandros Sobczyk, Konstantinos Bekas
  • Publication number: 20200302307
    Abstract: Embodiments of the invention disclose a computer-implemented method for the automatic generation of a hypothesis from a graph. The method includes receiving an initial graph, wherein the initial graph includes a plurality of nodes and a plurality of edges between the plurality of nodes. A predefined property of the initial graph is computed, and one or more of the plurality of edges of the initial graph are amended, thereby creating an amended graph that includes a plurality of original edges and one or more amended edges. The predefined property of the amended graph is computed, and the predefined property of the initial graph is compared with the predefined property of the amended graph. The one or more amended edges are marked as hypothesis if a predefined measure of difference between the predefined property of the initial graph and the predefined property of the amended graph exceeds a predefined threshold.
    Type: Application
    Filed: March 21, 2019
    Publication date: September 24, 2020
    Inventors: Konstantinos Bekas, Peter Staar, Christoph Auer, Michele Dolfi, Alessandro Curioni
  • Publication number: 20200294187
    Abstract: A plurality of electronic documents comprising one or more document pages are received. First position markers, second position markers and page identifiers are inserted to the pages. The plurality of electronic documents are printed, thereby generating a printed corpus comprising a plurality of printed documents. The plurality of printed documents are scanned, thereby generating a scanned corpus comprising a plurality of scanned images. Scanning frame positions of the first and the second position markers are detected and the detected scanning frame positions and the page positions are used to define affine transformations between the plurality of scanned images and the corresponding document pages. The affine transformations are applied to the plurality of scanned images to align the plurality of scanned images with the corresponding document pages of the plurality of electronic documents.
    Type: Application
    Filed: March 14, 2019
    Publication date: September 17, 2020
    Inventors: Peter Willem Jan Staar, Michele Dolfi, Christoph Auer, Leonidas Georgopoulos, Konstantinos Bekas
  • Publication number: 20200279107
    Abstract: A computer-implemented method for digitizing a document, wherein the document has assigned a classification scheme may be provided. A digital image and an identifier of the classification scheme may be received, the image representing a portion of the document. A segmentation of the image may be determined into one or more image segments; for each of the image segments, content information may be captured from the image segment and a category may be assigned to the image segment, the category being selected from the classification scheme. One or more digitization segments may be selected from the segmentation. A graph model of the document may be populated, wherein each of the digitization segments is represented by a segment node of the graph model.
    Type: Application
    Filed: February 28, 2019
    Publication date: September 3, 2020
    Inventors: Peter Willem Jan Staar, Michele Dolfi, Christoph Auer, Leonidas Georgopoulos, Konstantinos Bekas
  • Publication number: 20200257755
    Abstract: A method of collecting training data of a document component may be provided. The documents have a structure and are coded in the typesetting language TeX. The method comprise receiving a TeX source file, compiling it into a PDF file and a related sync file, analyzing the PDF file, thereby determining a non-text-only document component. The method comprises also determining first coordinates of the non-text-only document component and a corresponding page number, determining a typesetting command relating to a non-text-only document component and determining second coordinates of a bounding box and a corresponding page number from the sync file, determining text elements in the non-text-only document component of the PDF file for which the first coordinates and the second coordinates overlap, and combining the determined text elements and linking them to a type of a non-text document component determined in the non-text-only document component in the TeX source file.
    Type: Application
    Filed: February 8, 2019
    Publication date: August 13, 2020
    Inventors: Peter Willem Jan Staar, Michele Dolfi, Christoph Auer, Aleksandros Sobczyk, Konstantinos Bekas