Abstract: A system and/or method for increasing the accuracy of optical character recognition (OCR) for at least one item, comprising: obtaining OCR results of OCR scanning from at least one OCR module; creating at least one OCR seed using at least a portion of the OCR results; creating at least one OCR learn set using at least a portion of the OCR seed; and applying the OCR learn set to the at least one item to obtain additional optical character recognition (OCR) results.
Type:
Application
Filed:
November 2, 2009
Publication date:
May 5, 2011
Inventors:
Harry Urbschat, Ralph Meier, Thorsten Wanschura, Johannes Hausmann
Abstract: A system and/or method for increasing the accuracy of optical character recognition (OCR) for at least one item, comprising: obtaining OCR results of OCR scanning from at least one OCR module; creating at least one OCR seed using at least a portion of the OCR results; creating at least one OCR learn set using at least a portion of the OCR seed; and applying the OCR learn set to the at least one item to obtain additional optical character recognition (OCR) results.
Type:
Grant
Filed:
November 2, 2009
Date of Patent:
October 6, 2015
Inventors:
Harry Urbschat, Ralph Meier, Thorsten Wanschura, Johannes Hausmann
Abstract: Systems and methods for triage of passages of text output from an OCR system by use of trainable models of the accuracy of the OCR system based on attributes of individual characters. The systems and methods according to this invention automatically triage an OCR-output text passage by determining at least one OCR-output character attribute for each OCR-output character, determining an error rate for the OCR-output text passage using a triage model and the determined at least one OCR-output character attribute, and comparing the determined error rate for the OCR-output text passage with an OCR-output text passage threshold error rate to perform an OCR-output text passage triage decision. Triage decision includes for example, sending OCR results directly to an end user without any post-OCR processing, sending the OCR results through a post-OCR inspection and processing stage, sending the original document image to be completely keyed in manually, and a combination thereof.
Type:
Grant
Filed:
July 12, 2002
Date of Patent:
January 30, 2007
Assignee:
Xerox Corporation
Inventors:
Prateek Sarkar, Henry S. Baird, John R. Henderson
Abstract: The present disclosure discloses methods and systems for creating a multi-layered Optical Character Recognition (OCR) document, the multi-layered OCR document facilitates selection of the desired text from the multi-layered OCR document. The method includes receiving a scanned image corresponding to a document, the document includes text information. A binary image is generated from the scanned image. Then, a morphological dilation operation is performed to create one or more text groups, using a horizontal structuring element and a vertical structuring element. Thereafter, OCR operation is applied on each text group to generate a corresponding OCR layer. The one or more OCR layers are then combined while creating a multi-layered OCR document. Finally, the combined OCR layers are superimposed as invisible text layers over the scanned image to create the multi-layered OCR document.
Abstract: Systems and methods for triage of passages of text output from an OCR system by use of trainable models of the accuracy of the OCR system based on attributes of individual characters.
Type:
Application
Filed:
July 12, 2002
Publication date:
January 15, 2004
Inventors:
Prateek Sarkar, Henry S. Baird, John R. Henderson
Abstract: An OCR system which acquires character data from a form (50) through OCR processing is characterized by: managing an OCR information table (34e) in which an issuer name of an issuer on the form (50) is associated with a font name of a font used in the OCR processing; and, when the OCR processing is performed on an issuer-recorded content reading target area in the form (50), performing the OCR processing (S156) in the font indicated by the font name associated in the OCR information table with the issuer name of the issuer of the form (50).
Abstract: The present disclosure discloses methods and systems for creating a multi-layered Optical Character Recognition (OCR) document, the multi-layered OCR document facilitates selection of the desired text from the multi-layered OCR document. The method includes receiving a scanned image corresponding to a document, the document includes text information. A binary image is generated from the scanned image. Then, a morphological dilation operation is performed to create one or more text groups, using a horizontal structuring element and a vertical structuring element. Thereafter, OCR operation is applied on each text group to generate a corresponding OCR layer. The one or more OCR layers are then combined while creating a multi-layered OCR document. Finally, the combined OCR layers are superimposed as invisible text layers over the scanned image to create the multi-layered OCR document.
Abstract: An OCR system which acquires character data from a form (50) through OCR processing is characterized by: managing an OCR information table (34e) in which an issuer name of an issuer on the form (50) is associated with a font name of a font used in the OCR processing; and, when the OCR processing is performed on an issuer-recorded content reading target area in the form (50), performing the OCR processing (S156) in the font indicated by the font name associated in the OCR information table with the issuer name of the issuer of the form (50).
Abstract: A document OCR implementing device, includes a reading part configured to read a document and form a recognition image; an obtaining part configured to perform image processing of the recognition image and obtain a state of the recognition image; a plurality of OCR engines configured to perform a character recognition process of the recognition image; and a designating part configured to designate the OCR engine by combining the recognition image and the OCR engine; wherein the character recognition process is implemented by using the OCR engine designated by the designating part.
Abstract: An image of a known text sample having a text type is generated. The image of the known text sample is input into each OCR engine of a number of OCR engines. Output text corresponding to the image of the known text sample is received from each OCR engine. For each OCR engine, the output text received from the OCR engine is compared with the known text sample, to determine a confidence value of the OCR engine for the text type of the known text sample.
Abstract: The present disclosure is directed to systems, methods, and devices that enable the revising of Optical Character Recognition (OCR) data by indexing and displaying potential error locations within the OCR data. The primary method for revising the OCR data includes a terminal device indexing, displaying, receiving editing operations for, and editing the OCR data. The terminal device is configured to revise OCR data and includes an OCR review element, which, in some embodiments, is a software stored on a non-transitory, computer-readable medium, that is executed by a processing unit to cause the terminal device to index, display, receive editing operations for, and edit the OCR data.
Abstract: A computer-implemented method is provided of identifying an optical character recognition (OCR) font to assist an operator in setting up a bank remittance coupon application. The computer-implemented method comprises electronically on a processor reading an OCR font in a zone of the bank remittance coupon, electronically on a processor comparing the read OCR font with an OCR font stored in a look-up table to determine if the read OCR font meets predetermined criteria, and storing the read OCR font into a configuration file to setup at least a portion of the bank remittance coupon application when the read OCR font meets the predetermined criteria.
Abstract: The present disclosure is directed to systems, methods, and devices that enable the revising of Optical Character Recognition (OCR) data by indexing and displaying potential error locations within the OCR data. The primary method for revising the OCR data includes a terminal device indexing, displaying, receiving editing operations for, and editing the OCR data. The terminal device is configured to revise OCR data and includes an OCR review element, which, in some embodiments, is a software stored on a non-transitory, computer-readable medium, that is executed by a processing unit to cause the terminal device to index, display, receive editing operations for, and edit the OCR data.
Abstract: Embodiments of the present disclosure disclose a method for performing Optical Character Recognition (OCR) of an article. The method comprises acquiring an image of the article. The image of the article is scanned using predetermined scan settings. Then, textual regions of the scanned image of the article are identified. The OCR of the at least one of the textual regions is performed using predetermined OCR settings. One or more textual regions of the textual regions are marked upon determining an error in performing the OCR of the one or more textual regions. The OCR of the one or more textual regions is iterated as per one or more predefined OCR scanning parameters based on an OCR quality of the one or more textual regions upon marking the one or more textual regions.
Type:
Application
Filed:
June 22, 2015
Publication date:
September 8, 2016
Applicant:
Wipro Limited
Inventors:
Tomson Ganapathiplackal GEORGE, Sudheesh Joseph
Abstract: Dynamically configuring OCR processing may include determining a device type and determining whether to perform optical character recognition (OCR) processing of the received image locally based on one or more OCR parameters. Example OCR parameters may include the device type, the image type, the size of the received image, the available amount of the memory, the measured/benchmarked throughput of OCR processing on the device relative to an OCR server throughput and network throughput, and/or the current level of network connectivity. If it is determined that OCR processing of the received image should be performed locally, the device may compute one or more name-value pairs corresponding to the received image and transmit the name-value pairs to a remote data server for processing.
Type:
Application
Filed:
August 13, 2013
Publication date:
February 19, 2015
Applicant:
Bank of America Corporation
Inventors:
Georgios Katsaros, Donald Werner Schoppe, Bryan Anthony VonCannon, Pavan Chayanam
Abstract: An image of a known text sample having a text type is generated. The image of the known text sample is input into each OCR engine of a number of OCR engines. Output text corresponding to the image of the known text sample is received from each OCR engine. For each OCR engine, the output text received from the OCR engine is compared with the known text sample, to determine a confidence value of the OCR engine for the text type of the known text sample.
Type:
Grant
Filed:
November 27, 2010
Date of Patent:
May 28, 2013
Assignee:
Hewlett-Packard Development Company, L.P.
Abstract: Embodiments of the present disclosure disclose a method for performing Optical Character Recognition (OCR) of an article. The method comprises acquiring an image of the article. The image of the article is scanned using predetermined scan settings. Then, textual regions of the scanned image of the article are identified. The OCR of the at least one of the textual regions is performed using predetermined OCR settings. One or more textual regions of the textual regions are marked upon determining an error in performing the OCR of the one or more textual regions. The OCR of the one or more textual regions is iterated as per one or more predefined OCR scanning parameters based on an OCR quality of the one or more textual regions upon marking the one or more textual regions.
Type:
Grant
Filed:
June 22, 2015
Date of Patent:
May 29, 2018
Assignee:
Wipro Limited
Inventors:
Tomson Ganapathiplackal George, Sudheesh Joseph
Abstract: Disclosed herein are computer-implemented methods, computer-implemented systems, and non-transitory, computer-readable media for automatic Optical Character Recognition (OCR) correction. One computer-implemented method includes evaluating an OCR result using a trained Long short-term memory (LSTM) neural network language model to determine whether correction to the OCR result is required. If correction to the OCR result is required, a most similar text relative to the OCR result is determined from a name and address corpus using a modified edit distance technique. The OCR result is corrected with the determined most similar text.
Abstract: A document OCR implementing device, includes a reading part configured to read a document and form a recognition image; an obtaining part configured to perform image processing of the recognition image and obtain a state of the recognition image; a plurality of OCR engines configured to perform a character recognition process of the recognition image; and a designating part configured to designate the OCR engine by combining the recognition image and the OCR engine; wherein the character recognition process is implemented by using the OCR engine designated by the designating part.
Abstract: Dynamically configuring OCR processing may include determining a device type and determining whether to perform optical character recognition (OCR) processing of the received image locally based on one or more OCR parameters. Example OCR parameters may include the device type, the image type, the size of the received image, the available amount of the memory, the measured/benchmarked throughput of OCR processing on the device relative to an OCR server throughput and network throughput, and/or the current level of network connectivity. If it is determined that OCR processing of the received image should be performed locally, the device may compute one or more name-value pairs corresponding to the received image and transmit the name-value pairs to a remote data server for processing.
Type:
Grant
Filed:
August 13, 2013
Date of Patent:
March 17, 2015
Assignee:
Bank of America Corporation
Inventors:
Georgios Katsaros, Donald Werner Schoppe, Bryan Anthony VonCannon, Pavan Chayanam