Abstract: In a document analysis system that receives and processes jobs from a plurality of users, in which each job may contain multiple electronic documents, to extract data from the electronic documents, a method of automatically pre-processing each received electronic document using a plurality of image transformation algorithms to improve subsequent data extraction from said document is provided. The method includes: electronically partitioning each received electronic document page into pieces; automatically processing each piece of the received electronic document page using each of a plurality of image pre-processing algorithms to produce a plurality of image variations of each piece; and analyzing the outputs of subsequent processing and data extraction, on each of the image variations of the pieces to determine which output is best, from the plurality of outputs for each piece.
Type:
Grant
Filed:
October 28, 2013
Date of Patent:
November 25, 2014
Assignee:
Gruntworx, LLC
Inventors:
Girish Welling, Nirupam Sarkar, Tushar Mahata, Vartika Singh, Depankar Neogi, Steven K. Ladd
Abstract: In a document analysis system that receives and processes jobs from a plurality of users, in which each job may contain multiple electronic documents, to extract data from the electronic documents, a method of automatically pre-processing each received electronic document using a plurality of image transformation algorithms to improve subsequent data extraction from said document is provided. The method includes: electronically partitioning each received electronic document page into pieces; automatically processing each piece of the received electronic document page using each of a plurality of image pre-processing algorithms to produce a plurality of image variations of each piece; and analyzing the outputs of subsequent processing and data extraction, on each of the image variations of the pieces to determine which output is best, from the plurality of outputs for each piece.
Type:
Grant
Filed:
January 14, 2011
Date of Patent:
October 29, 2013
Assignee:
Gruntworx, LLC
Inventors:
Girish Welling, Nirupam Sarkar, Tushar Mahata, Vartika Singh, Depankar Neogi, Steven K. Ladd
Abstract: A method of enhancing electronic documents received from a plurality of users by a document analysis system for improving automatic recognition and classification of the received electronic documents, is provided. For each page of a received electronic document, the method filters the page to infer binarized-background artifacts resulting from the binarization of the original grayscale or color image source document and which reside in the vicinity of binarized text and binarized image features in the page, so that the binarized text and binarized images may be distinguished from the binarized-background artifacts and extracted from the document. The method then uses the extracted features from the filtered document to automatically recognized and classify a document into a document category.
Type:
Grant
Filed:
November 6, 2008
Date of Patent:
September 17, 2013
Assignee:
Gruntworx, LLC
Inventors:
Depankar Neogi, Steven K. Ladd, Dilnawaj Ahmed, Lohith Chiprawada Shesharam
Abstract: The present invention includes a method of secure data entry that enables complex data entry work to be performed by unskilled workers that results in data entry with higher productivity, higher quality and higher security than data entry performed by highly skilled workers. The invention identifies data fields on an electronic image of an identified input page, sequences identified data field images, and individually displays data field images for manual data entry. The invention also provides for extracting data from a data field image and displaying extracted data along with the corresponding data field image for approval or correction. Sequenced data field images are optionally reordered or randomized for display and manual entry.
Type:
Grant
Filed:
February 20, 2007
Date of Patent:
September 18, 2012
Assignee:
Gruntworx, LLC
Inventors:
Steven Kif Ladd, Mark Andrew Robinson, Depankar Neogi, Dilnawaj Ahmed