Abstract: Embodiments are directed to techniques for image content extraction. Some embodiments include extracting contextually structured data from document images, such as by automatically identifying document layout, document data, document metadata, and/or correlations therebetween in a document image, for instance. Some embodiments utilize breakpoints to enable the system to match different documents with internal variations to a common template. Several embodiments include extracting contextually structured data from table images, such as gridded and non-gridded tables. Many embodiments are directed to generating and utilizing a document template database for automatically extracting document image contents into a contextually structured format. Several embodiments are directed to automatically identifying and associating document metadata with corresponding document data in a document image to generate a machine-facilitated annotation of the document image.
Type:
Application
Filed:
August 17, 2022
Publication date:
December 8, 2022
Applicant:
SAS Institute Inc.
Inventors:
David James Wheaton, Stuart Dakari Cooke, III, William Robert Nadolski
Abstract: Various embodiments are generally directed to techniques for image content extraction. Some embodiments include extracting contextually structured data from document images, such as by automatically identifying document layout, document data, document metadata, and/or correlations therebetween in a document image, for instance. Several embodiments include extracting contextually structured data from table images, such as gridded and non-gridded tables. For example, the contents of cells may be extracted from a table image along with structural context including the corresponding row and column information. Many embodiments are directed to generating and utilizing a document template database for automatically extracting document image contents into a contextually structured format. Several embodiments are directed to automatically identifying and associating document metadata with corresponding document data in a document image to generate a machine-facilitated annotation of the document image.
Type:
Application
Filed:
August 9, 2021
Publication date:
November 25, 2021
Applicant:
SAS Institute Inc.
Inventors:
Yi Liao, Charles Franklin Board, William Robert Nadolski, David James Wheaton, Heather Michelle Goodykoontz, Adheesha Sanjuaya Arangala, Karthik Nakkeeran
Abstract: Various embodiments are generally directed to techniques for image content extraction. Some embodiments include extracting contextually structured data from document images, such as by automatically identifying document layout, document data, document metadata, and/or correlations therebetween in a document image, for instance. Several embodiments include extracting contextually structured data from table images, such as gridded and non-gridded tables. For example, the contents of cells may be extracted from a table image along with structural context including the corresponding row and column information. Many embodiments are directed to generating and utilizing a document template database for automatically extracting document image contents into a contextually structured format. Several embodiments are directed to automatically identifying and associating document metadata with corresponding document data in a document image to generate a machine-facilitated annotation of the document image.
Type:
Grant
Filed:
August 9, 2021
Date of Patent:
September 13, 2022
Assignee:
SAS INSTITUTE INC.
Inventors:
Yi Liao, Charles Franklin Board, William Robert Nadolski, David James Wheaton, Heather Michelle Goodykoontz, Adheesha Sanjuaya Arangala, Karthik Nakkeeran
Abstract: Embodiments are directed to techniques for image content extraction. Some embodiments include extracting contextually structured data from document images, such as by automatically identifying document layout, document data, document metadata, and/or correlations therebetween in a document image, for instance. Some embodiments utilize breakpoints to enable the system to match different documents with internal variations to a common template. Several embodiments include extracting contextually structured data from table images, such as gridded and non-gridded tables. Many embodiments are directed to generating and utilizing a document template database for automatically extracting document image contents into a contextually structured format. Several embodiments are directed to automatically identifying and associating document metadata with corresponding document data in a document image to generate a machine-facilitated annotation of the document image.
Type:
Grant
Filed:
August 17, 2022
Date of Patent:
July 18, 2023
Assignee:
SAS INSTITUTE INC.
Inventors:
David James Wheaton, Stuart Dakari Cooke, III, William Robert Nadolski