Patents Assigned to COPANION, INC.

SYSTEMS AND METHODS FOR AUTOMATICALLY EXTRACTING DATA FROM ELECTRONIC DOCUMENTS USING EXTERNAL DATA

Publication number: 20110255788

Abstract: In a document analysis system that receives and processes jobs from a plurality of users, in which each job may contain multiple electronic documents, to extract data from the electronic documents, a method of automatically extracting data from each received electronic document at least in part using data external to the electronic document but associated with the job containing the document is provided. The method includes: analyzing each electronic document in a job to automatically extract images and text features; and, if any of the images and text features extracted from the electronic document is not recognized, using data external to said document but associated with said job to identify the unrecognized feature, wherein the external source may be one of at least one other document in the job and a database having known values associated with the job.

Type: Application

Filed: January 14, 2011

Publication date: October 20, 2011

Applicant: COPANION, INC.

Inventors: Matthew DUGGAN, Janice O'NEIL, Girish WELLING, Depankar NEOGI, Steven K. LADD
SYSTEMS AND METHODS FOR AUTOMATICALLY EXTRACTING DATA BY NARROWING DATA SEARCH SCOPE USING CONTOUR MATCHING

Publication number: 20110255794

Abstract: A method of extracting data by narrowing a scope of data search using contour matching of select elements in a document is provided. The method includes: analyzing each document to automatically extract images and text features wherein said analyzing compares extracted features with a first search space of candidate features to try and recognize the extracted features; automatically processing each unrecognized feature using a contour recognition engine to generate a contour of the unrecognized feature; automatically selecting a second search space of candidate features through contour matching using the contour of the unrecognized feature, wherein the second search space of candidate features is narrower than the first search space of candidate features; and comparing the unrecognized feature with said second search space to identify the previously unrecognized feature.

Type: Application

Filed: January 14, 2011

Publication date: October 20, 2011

Applicant: Copanion, Inc.

Inventors: Depankar Neogi, Vartika Singh, Girish Welling, Steven K. Ladd, Xujun Peng
SYSTEMS AND METHODS FOR AUTOMATICALLY EXTRACTING DATA FROM ELETRONIC DOCUMENTS USING MULTIPLE CHARACTER RECOGNITION ENGINES

Publication number: 20110255784

Abstract: In a document analysis system that receives and processes jobs from a plurality of users, in which each job may contain multiple electronic documents, to extract data from the electronic documents, a method of automatically extracting data from each received electronic document using a plurality of character recognition engines is provided. The method includes: automatically processing each received electronic document page using each of a plurality of recognition engines to extract data; comparing quality of data extracted from each of the recognition engines to assign a confidence score to the extracted data; and selecting extracted data having highest confidence score as the correct extracted data.

Type: Application

Filed: January 14, 2011

Publication date: October 20, 2011

Applicant: COPANION, INC.

Inventors: Girish WELLING, Vartika SINGH, Gopal KRISHNA, Tushar MAHATA, Nirupam SARKAR, Depankar NEOGI, Steven K. LADD
SYSTEMS AND METHODS FOR AUTOMATICALLY PROCESSING ELECTRONIC DOCUMENTS USING MULTIPLE IMAGE TRANSFORMATION ALGORITHMS

Publication number: 20110255782

Abstract: In a document analysis system that receives and processes jobs from a plurality of users, in which each job may contain multiple electronic documents, to extract data from the electronic documents, a method of automatically pre-processing each received electronic document using a plurality of image transformation algorithms to improve subsequent data extraction from said document is provided. The method includes: electronically partitioning each received electronic document page into pieces; automatically processing each piece of the received electronic document page using each of a plurality of image pre-processing algorithms to produce a plurality of image variations of each piece; and analyzing the outputs of subsequent processing and data extraction, on each of the image variations of the pieces to determine which output is best, from the plurality of outputs for each piece.

Type: Application

Filed: January 14, 2011

Publication date: October 20, 2011

Applicant: Copanion, Inc.

Inventors: Girish Welling, Nirupam Sarkar, Tushar Mahta, Vartika Singh, Depankar Neogi, Steven K. Ladd
SYSTEMS AND METHODS FOR TRAINING DOCUMENT ANALYSIS SYSTEM FOR AUTOMATICALLY EXTRACTING DATA FROM DOCUMENTS

Publication number: 20110258150

Abstract: A method of training a document analysis system to extract data from documents is provided. The method includes: automatically analyzing images and text features extracted from a document to associate the document with a corresponding document category; comparing the extracted text features with a set of text features associated with corresponding category of the document, in which the set of text features includes a set of characters, words, and phrases; if the extracted features are found to consist of the characters, words, and phrases belonging to the set of text features associated with the corresponding document category, storing the extracted text features as the data contained in the corresponding document; and, if the extracted text features are found to include at least one text feature that does not belong to the set of text features associated with the corresponding document category, submitting the unrecognized text features to a training phase.

Type: Application

Filed: January 14, 2011

Publication date: October 20, 2011

Applicant: COPANION, INC.

Inventors: Depankar NEOGI, Steven K. LADD, Girish WELLING, Arjun KUMAR, Vartika SINGH, Matthew DUGGAN, Tushar MAHATA, Xiaobin YANG, Jian-Wu XU, Janice O'NEIL, Nirupam SARKAR, Gopal KRISHNA
SYSTEMS AND METHODS FOR AUTOMATICALLY GROUPING ELECTRONIC DOCUMENT PAGES

Publication number: 20110255790

Abstract: A method of grouping electronic document pages of a job that belong together is provided. The method includes: automatically analyzing images and text features extracted from each received electronic document page to associate the electronic document page with a corresponding document category; automatically identifying features extracted from the electronic document page that potentially indicate to which document group the electronic document page belongs; comparing the identified features with a set of group identifying features associated with corresponding document group, in which the set of group identifying features includes at least a set of page numbers and account numbers; and, if the identified features are found to include a set of a page number and an account number belonging to the set of group identifying features associated with the corresponding document group, grouping the electronic document page into the corresponding document group.

Type: Application

Filed: January 14, 2011

Publication date: October 20, 2011

Applicant: Copanion, Inc.

Inventors: Matthew DUGGAN, Girish WELLING, Janice O'NEIL, Depankar NEOGI, Steven K. LADD
SYSTEMS AND METHODS FOR AUTOMATICALLY EXTRACTING DATA FROM ELECTRONIC DOCUMENTS CONTAINING MULTIPLE LAYOUT FEATURES

Publication number: 20110255789

Abstract: A method of automatically extracting data from an electronic document containing a plurality of layout features through progressive refinement is provided. The method includes: analyzing each document to automatically extract images and text features wherein each document includes at least two features that are related to each other, and wherein said analyzing compares extracted features with a first search space of candidate features to try and recognize the extracted features; if one of the at least two related features is not recognized and at least one feature is recognized, selecting a second search space of candidate features in response thereto and in response to predefined rules about the relationship between the two features; and comparing the unrecognized feature with said selected second search space.

Type: Application

Filed: January 14, 2011

Publication date: October 20, 2011

Applicant: COPANION, INC.

Inventors: Depankar NEOGI, Steven K. LADD, Girish WELLING, Arjun KUMAR, Vartika SINGH, Matthew DUGGAN, Tushar MAHATA, Xiaobin YANG, Jian-Wu XU, Janice O'NEIL, Nirupam SARKAR, Gopal KRISHNA
SYSTEMS AND METHODS FOR AUTOMATICALLY EXTRACTING DATA FROM ELECTRONIC DOCUMENTS INCLUDING TABLES

Publication number: 20110249905

Abstract: A method of automatically extracting data from an electronic document including tables is provided. The method includes: automatically identifying rows of the table using gaps in horizontal projections of the plurality of image sections, wherein at least some of the identified rows in close proximity are collected to form table formations; and automatically identifying columns of the table using at least some of the plurality of image sections that are vertically aligned, wherein the identified columns are grown in each of the table formations using gaps in vertical projections of the plurality of image sections until an obstruction is reached. The method further includes automatically identifying labels in the plurality of corresponding image sections to associate the identified labels with at least one of the identified columns and the identified rows; and automatically extracting data from cells of the table formed by the identified rows and columns.

Type: Application

Filed: June 23, 2011

Publication date: October 13, 2011

Applicant: Copanion, Inc.

Inventors: Vartika SINGH, Girish Welling, Depankar Neogi, Steven K. Ladd
SYSTEM FOR OPTIMAL DOCUMENT SCANNING

Publication number: 20090201541

Abstract: A method of controlling a scanner to improve automatic recognition and classification of scanned physical documents for a document analysis system, which receives and processes jobs containing at least one electronic document from a plurality of users to automatically recognize and classify the job documents into document categories, is disclosed. The method comprises, using a scan control system, obtaining the capability of, and existing scanner settings for, the scanner upon receiving a command to initiate scanning of physical documents; saving the existing scanner settings of the scanner; automatically commanding the scanner to use new scanner settings, wherein the new scanner settings are selected in accordance with the capability of the recognition system; commanding the scanner to begin scanning operation with the new scanner settings; and automatically resetting the scanner settings of the scanner back to the saved existing scanner settings upon completing of the scanning operation.

Type: Application

Filed: January 9, 2009

Publication date: August 13, 2009

Applicant: COPANION, INC.

Inventors: Depankar NEOGI, Steven K. LADD, Arjun KUMAR, Matthew DUGGAN
SYSTEMS AND METHODS FOR CLASSIFYING ELECTRONIC DOCUMENTS BY EXTRACTING AND RECOGNIZING TEXT AND IMAGE FEATURES INDICATIVE OF DOCUMENT CATEGORIES

Publication number: 20090116757

Abstract: A method in a document analysis system automatically extracts from each received electronic document image and text features, in which the image features are indicative of how the document is laid out or textually-organized and therefore indicative of a corresponding document category, next compares the extracted image and text features with feature sets associated with each document category, and then classifies each document to a document category, the feature set of which best matches the extracted features of the document.

Type: Application

Filed: November 6, 2008

Publication date: May 7, 2009

Applicant: Copanion, Inc.

Inventors: Depankar Neogi, Steven K. Ladd, Venugopal Govindaraju
SYSTEMS AND METHODS FOR ENABLING MANUAL CLASSIFICATION OF UNRECOGNIZED DOCUMENTS TO COMPLETE WORKFLOW FOR ELECTRONIC JOBS AND TO ASSIST MACHINE LEARNING OF A RECOGNITION SYSTEM USING AUTOMATICALLY EXTRACTED FEATURES OF UNRECOGNIZED DOCUMENTS

Publication number: 20090116755

Abstract: A method in a document analysis system automatically extracts image and text features from each received electronic document and compares the extracted features with feature sets associated with each category of document to determine whether the document is recognizable as belonging to a document category. If an electronic document is recognized as belonging to one of the document categories, the method classifies the electronic document as belonging to that document category. If, however, an electronic document is unrecognized, the method submits the unrecognized document to a learning phase, in which the unrecognized document is presented to a human trainer for manual classification of the unrecognized electronic document into a document category, and automatically modifies at least one of the features and the weights of the feature set of the document category corresponding to the manually-classified electronic document using the automatically extracted features of the manually-classified document.

Type: Application

Filed: November 6, 2008

Publication date: May 7, 2009

Applicant: Copanion, Inc.

Inventors: Depankar Neogi, Steven K. Ladd, Arjun Kumar, Dilnawaj Ahmed
SYSTEMS AND METHODS FOR HANDLING AND DISTINGUISHING BINARIZED, BACKGROUND ARTIFACTS IN THE VICINITY OF DOCUMENT TEXT AND IMAGE FEATURES INDICATIVE OF A DOCUMENT CATEGORY

Publication number: 20090119296

Abstract: A method of enhancing electronic documents received from a plurality of users by a document analysis system for improving automatic recognition and classification of the received electronic documents, is provided. For each page of a received electronic document, the method filters the page to infer binarized-background artifacts resulting from the binarization of the original grayscale or color image source document and which reside in the vicinity of binarized text and binarized image features in the page, so that the binarized text and binarized images may be distinguished from the binarized-background artifacts and extracted from the document. The method then uses the extracted features from the filtered document to automatically recognized and classify a document into a document category.

Type: Application

Filed: November 6, 2008

Publication date: May 7, 2009

Applicant: Copanion, Inc.

Inventors: Depankar Neogi, Steven K. Ladd, Dilnawaj Ahmed, Lohith C. Shesharam
SYSTEMS AND METHODS FOR PARALLEL PROCESSING OF DOCUMENT RECOGNITION AND CLASSIFICATION USING EXTRACTED IMAGE AND TEXT FEATURES

Publication number: 20090116746

Abstract: A method of parallel processing jobs received from a plurality of users by a document analysis system that automatically classifies documents to organize each job, automatically separates each job into its constituent electronic document and automatically separate the document into subsets of electronic pages. For each page of each subset, the method automatically extracts image features that are indicative of how the document is laid out or textually-organized. For each subset, the method automatically compares the extracted features with feature sets associated with each document category to determine a comparison score for the subset. The method then classifies the electronic document as being one of the categories of documents using the comparison score for each of the subsets and organize the job according to the categories of documents the job contains.

Type: Application

Filed: November 6, 2008

Publication date: May 7, 2009

Applicant: Copanion, Inc.

Inventors: Depankar Neogi, Steven K. Ladd, Dilnawaj Ahmed, Girish Welling
SYSTEMS AND METHODS FOR TRAINING A DOCUMENT CLASSIFICATION SYSTEM USING DOCUMENTS FROM A PLURALITY OF USERS

Publication number: 20090116756

Abstract: A method of training a document analysis system that automatically extracts image and text features from each received electronic document and compares the extracted features with feature sets associated with each document category is provided. If an electronic document is recognized as belonging to one of the document categories with predetermined confidence, the method classifies the electronic document as being of that one document category. If an electronic document is not recognized as belonging to one of the document categories with predetermined confidence, however, the method submits the unrecognized document to a training phase in which the document is recognized as belonging to a document category and automatically modifies at least one of the features and the weights of the features of the feature set for the document category for the now-recognized document.

Type: Application

Filed: November 6, 2008

Publication date: May 7, 2009

Applicant: Copanion, Inc.

Inventors: Depankar Neogi, Steven K. Ladd, Arjun Kumar, Dilnawaj Ahmed
SYSTEMS AND METHODS TO AUTOMATICALLY CLASSIFY ELECTRONIC DOCUMENTS USING EXTRACTED IMAGE AND TEXT FEATURES AND USING A MACHINE LEARNING SUBSYSTEM

Publication number: 20090116736

Abstract: A document analysis system that automatically classifies documents by recognizing in each document distinctive features comprises a document acquisition system, a document recognition training system, a document classification system, a document recognition system, and a job organization system. The document acquisition system receives jobs wherein each job containing at least one electronic document. The document feature recognition system automatically extracts image and text features from each received document. The document classification system automatically classifies recognized electronic documents by finding the best match between the extracted features of each of the document and feature sets associated with each category of document. The document recognition training system automatically trains the feature set for each corresponding category of documents, wherein the training system using extracted features of unrecognized documents automatically modifies the feature set for a document category.

Type: Application

Filed: November 6, 2008

Publication date: May 7, 2009

Applicant: Copanion, Inc.

Inventors: Depankar Neogi, Steven K. Ladd, Dilnawaj Ahmed, Arjun Kumar, Tushar Mahata