Abstract: A number of different tags are input in a fax cover sheet that tell an OCR system not only the identity of the supplier, but also to which client the document should be routed. The OCR system identifies a number of these tags and compares them to stored supplier data to validate to which supplier the document belongs. If the system cannot validate the document, it is routed to a GUI for manual sorting. If there is no coversheet, the system relies upon the OCR system to locate keywords on the document and caller ID information to suggest a correct supplier. The OCR system also clips a separate, horizontal slice of the document ('snippet) that corresponds to the display of any line item and places it in a data base for future reference and reporting. The application collects and associates all corresponding snippets to their originating line items.
Type:
Application
Filed:
March 10, 2011
Publication date:
September 1, 2011
Inventors:
Joseph FLYNN, Kerry Edward Koitzsch, Wassim G. Jraige
Abstract: A number of different tags are input in a fax cover sheet that tell an OCR system not only the identity of the supplier, but also to which client the document should be routed. The OCR system identifies a number of these tags and compares them to stored supplier data to validate to which supplier the document belongs. If the system cannot validate the document, it is routed to a GUI for manual sorting. If there is no coversheet, the system relies upon the OCR system to locate keywords on the document and caller ID information to suggest a correct supplier. The OCR system also clips a separate, horizontal slice of the document (‘snippet’) that corresponds to the display of any line item and places it in a data base for future reference and reporting. The application collects and associates all corresponding snippets to their originating line items.
Type:
Application
Filed:
February 27, 2015
Publication date:
June 25, 2015
Inventors:
Joseph FLYNN, Kerry Edward KOITZSCH, Wassim G. JRAIGE
Abstract: The technology of the present disclosure includes computer-implemented methods, computer program products, and systems to filter images before transmitting to a system for optical character recognition (“OCR”). A user computing device obtains a first image of the card from the digital scan of a physical card and analyzes features of the first image, the analysis being sufficient to determine if the first image is likely to be usable by an OCR algorithm. If the user computing device determines that the first image is likely to be usable, then the first image is transmitted to an OCR system associated with the OCR algorithm. Upon a determination that the first image is unlikely to be usable, a second image of the card from the digital scan of the physical card is analyzed. The optical character recognition system performs an optical character recognition algorithm on the filtered card.
Type:
Application
Filed:
October 27, 2014
Publication date:
May 21, 2015
Inventors:
Xiaohang Wang, Alessandro Bissacco, Glen Berntson, Marria Nazif, Justin Scheiner, Sam Shih, Mark Leslie Snyder, Daniel Talavera
Abstract: There is disclosed a method of analyzing a digital image of a document (to determine, as example, a document suitability for server-based OCR processing) in a computer system that includes a user electronic device (for acquiring or storing a digital image of a document) connectable to a server (for executing the server-based OCR processing of the digital image to create a recognized-text document). The method is executable by the user electronic device and comprises: acquiring the digital image of the document; analyzing an OCR quality parameter associated with a compressed digital image to be created from the digital image using a compression algorithm and a compression parameter; in response to the OCR quality parameter being above or equal to a pre-determined threshold: transmitting the compressed digital image to the server.
Abstract: A method to improve the efficacy of optical character recognition (OCR) includes scanning an electronically stored representation of a whole or partial document, identifying an image having text in the electronically stored representation of a whole or partial document, identifying the text within the image, and generating a plurality of bounding boxes around the identified text using blob detection. The method also includes grouping together certain text bounding boxes of the plurality of text bounding boxes that are vertically aligned with each other to generate a plurality of aligned text bounding boxes and performing OCR on the aligned text bounding boxes to generate a plurality of OCR groups of text. In addition, the method includes generating a resultant representation of a whole or partial document electronically using the plurality of OCR groups of text and saving the resultant representation of a whole or partial document electronically.
Abstract: Disclosed is a method, system and computer readable recording medium for correcting an OCR result. According to an exemplary embodiment of the present invention, there is provided a method for correcting an OCR result, the method including performing character recognition on content including character information using an OCR technique, removing extra carriage return information from the content, outputting the character recognition result, and correcting word spacing on the outputted result.
Type:
Application
Filed:
December 30, 2009
Publication date:
July 1, 2010
Applicant:
NHN Corporation
Inventors:
Byoung Seok YANG, Hee Cheol Seo, Do Gil Lee, Ki Joon Sung
Abstract: The present disclosure includes techniques for selecting a candidate presentation style for individual documents for inclusion in an aggregate training data set for a document type that may be used to train an OCR processing engine prior to identifying text in an image of a document of the document type. In one embodiment, text input corresponding to a text sample in a document is received, and an image of the text sample in the document is received. For each of a plurality of candidate presentation styles, an OCR processing engine is trained using a training data set corresponding to the given candidate presentation style, and the OCR processing engine is used, as trained, to identify text in the received image. The OCR processing results for each candidate presentation style are compared to the received text input. A candidate presentation style for the document is selected based on the comparisons.
Type:
Grant
Filed:
September 21, 2016
Date of Patent:
October 23, 2018
Assignee:
Intuit inc.
Inventors:
Eugene Krivopaltsev, Sreeneel K. Maddika, Vijay S. Yellapragada
Abstract: A character based system and method for correcting low confidence characters from an OCR system facilitates operator review, editing and correction of character and field level data generated by an OCR system without the need for an application that is installed at the operator workstation. The system creates a data structure of OCR information and provides that information to an operator through an HTML interface that is rendered using HTML and JavaScript. The data structure includes an OCR confidence level for each character and/or field and the operator is prompted to review only those characters/fields that meet a predetermined threshold for the confidence level. The operator can use an input key (e.g., TAB or ENTER) to navigate to each character/field with a low confidence level and thereby correct or validate each low confidence character/field as appropriate.
Abstract: The technology of the present disclosure includes computer-implemented methods, computer program products, and systems to filter images before transmitting to a system for optical character recognition (“OCR”). A user computing device obtains a first image of the card from the digital scan of a physical card and analyzes features of the first image, the analysis being sufficient to determine if the first image is likely to be usable by an OCR algorithm. If the user computing device determines that the first image is likely to be usable, then the first image is transmitted to an OCR system associated with the OCR algorithm. Upon a determination that the first image is unlikely to be usable, a second image of the card from the digital scan of the physical card is analyzed. The optical character recognition system performs an optical character recognition algorithm on the filtered card.
Type:
Application
Filed:
March 13, 2017
Publication date:
June 29, 2017
Inventors:
Xiaohang Wang, Alessandro Bissacco, Glenn Merlind Berntson, Marria Nazif, Justin Scheiner, Sam Shih, Mark Leslie Snyder, Daniel Talavera
Abstract: A number of different tags are input in a fax cover sheet that tell an OCR system not only the identity of the supplier, but also to which client the document should be routed. The OCR system identifies a number of these tags and compares them to stored supplier data to validate to which supplier the document belongs. If the system cannot validate the document, it is routed to a GUI for manual sorting. If there is no coversheet, the system relies upon the OCR system to locate keywords on the document and caller ID information to suggest a correct supplier. The OCR system also clips a separate, horizontal slice of the document (‘snippet’) that corresponds to the display of any line item and places it in a data base for future reference and reporting. The application collects and associates all corresponding snippets to their originating line items.
Type:
Grant
Filed:
March 10, 2011
Date of Patent:
March 31, 2015
Assignee:
Lavante, Inc.
Inventors:
Joseph Flynn, Kerry Edward Koitzsch, Wassim G. Jraige
Abstract: Disclosed is a method, system and computer readable recording medium for correcting an OCR result. According to an exemplary embodiment of the present invention, there is provided a method for correcting an OCR result, the method including performing character recognition on content including character information using an OCR technique, removing extra carriage return information from the content, outputting the character recognition result, and correcting word spacing on the outputted result.
Type:
Grant
Filed:
December 30, 2009
Date of Patent:
June 18, 2013
Assignee:
NHN Corporation
Inventors:
Byoung Seok Yang, Hee Cheol Seo, Do Gil Lee, Ki Joon Sung
Abstract: The technology of the present disclosure includes computer-implemented methods, computer program products, and systems to filter images before transmitting to a system for optical character recognition (“OCR”). A user computing device obtains a first image of the card from the digital scan of a physical card and analyzes features of the first image, the analysis being sufficient to determine if the first image is likely to be usable by an OCR algorithm. If the user computing device determines that the first image is likely to be usable, then the first image is transmitted to an OCR system associated with the OCR algorithm. Upon a determination that the first image is unlikely to be usable, a second image of the card from the digital scan of the physical card is analyzed. The optical character recognition system performs an optical character recognition algorithm on the filtered card.
Type:
Grant
Filed:
March 13, 2017
Date of Patent:
August 22, 2017
Assignee:
GOOGLE INC.
Inventors:
Xiaohang Wang, Alessandro Bissacco, Glenn Merlind Berntson, Marria Nazif, Justin Scheiner, Sam Shih, Mark Leslie Snyder, Daniel Talavera
Abstract: There is disclosed a method of analyzing a digital image of a document (to determine, as example, a document suitability for server-based OCR processing) in a computer system that includes a user electronic device (for acquiring or storing a digital image of a document) connectable to a server (for executing the server-based OCR processing of the digital image to create a recognized-text document). The method is executable by the user electronic device and comprises: acquiring the digital image of the document; analyzing an OCR quality parameter associated with a compressed digital image to be created from the digital image using a compression algorithm and a compression parameter; in response to the OCR quality parameter being above or equal to a pre-determined threshold: transmitting the compressed digital image to the server.
Abstract: The technology of the present disclosure includes computer-implemented methods, computer program products, and systems to filter images before transmitting to a system for optical character recognition (“OCR”). A user computing device obtains a first image of the card from the digital scan of a physical card and analyzes features of the first image, the analysis being sufficient to determine if the first image is likely to be usable by an OCR algorithm. If the user computing device determines that the first image is likely to be usable, then the first image is transmitted to an OCR system associated with the OCR algorithm. Upon a determination that the first image is unlikely to be usable, a second image of the card from the digital scan of the physical card is analyzed. The optical character recognition system performs an optical character recognition algorithm on the filtered card.
Type:
Grant
Filed:
December 18, 2013
Date of Patent:
December 2, 2014
Assignee:
Google Inc.
Inventors:
Xiaohang Wang, Alessandro Bissacco, Glen Berntson, Marria Nazif, Justin Scheiner, Sam Shih, Mark Leslie Snyder, Daniel Talavera
Abstract: A system is presented for scanning entire books or document all at once using an adaptive process where the book or document has known fonts and unknown fonts. The known fonts are processed through a verification system where sure words and error words are determined. Both the sure words and error words are sent to OCR training where they are re-OCR'ed and repeatedly verified until they meet a predetermined quality criteria. Characters or words not meeting the predetermined quality criteria receive additional OCR training until all the characters and words pass the predetermined quality criteria. Unknown fonts are scanned and clustered together by shape. Outliers in the shapes are manually keyed-in. Those symbols that are manually classified go to OCR training and then to the known type optimization process.
Type:
Grant
Filed:
November 24, 2008
Date of Patent:
December 1, 2009
Assignee:
International Business Machines Corporation
Abstract: A camera system with dual embedded optical character recognition (OCR) engines. The camera system includes a camera module for capturing an image of a vehicle, the image including a license plate with a license plate number containing characters; a first OCR engine that produces a first read and first confidence level by extracting the characters from the license plate; and a second OCR engine, different from the first OCR engine, that produces a second read and second confidence level extracting the characters from the license plate. The camera system further includes a comparator for comparing the first read to the second read. If the first read and the second read match, the system produces the matching read as a final read. If the first read and the second read do not match, a fusion module produces a final read using the first read, the first confidence level, the second read, and the second confidence level.
Type:
Application
Filed:
April 6, 2016
Publication date:
April 19, 2018
Inventors:
Peter ISTENES, Stephanie R. SCHUMACHER, Benjamin W. WATSON
Abstract: A system/method is presented for scanning entire books or document all at once using an adaptive process where the book or document has known fonts and unknown fonts. The known fonts are processed through a verification system where sure words and error words are determined. Both the sure words and error words are sent to OCR training where they are re-OCR'ed and repeatedly verified until they meet a predetermined quality criteria. Characters or word not meeting the predetermined quality criteria receive additional OCR training until all the characters and words pass the predetermined quality criteria. Unknown fonts are scanned and clustered together by shape. Outliers in the shapes are manually key-in. Those symbols that are manually classified go to OCR training and then to the known type optimization process.
Type:
Grant
Filed:
March 3, 2008
Date of Patent:
January 20, 2009
Assignee:
International Business Machines Corporation
Abstract: The technology of the present disclosure includes computer-implemented methods, computer program products, and systems to filter images before transmitting to a system for optical character recognition (“OCR”). A user computing device obtains a first image of the card from the digital scan of a physical card and analyzes features of the first image, the analysis being sufficient to determine if the first image is likely to be usable by an OCR algorithm. If the user computing device determines that the first image is likely to be usable, then the first image is transmitted to an OCR system associated with the OCR algorithm. Upon a determination that the first image is unlikely to be usable, a second image of the card from the digital scan of the physical card is analyzed. The optical character recognition system performs an optical character recognition algorithm on the filtered card.
Type:
Grant
Filed:
October 27, 2014
Date of Patent:
April 18, 2017
Assignee:
GOOGLE INC.
Inventors:
Xiaohang Wang, Alessandro Bissacco, Glenn Merlind Berntson, Marria Nazif, Justin Scheiner, Sam Shih, Mark Leslie Snyder, Daniel Talavera
Abstract: Systems and methods for computer-implemented pre-optimization of input data before further processing thereof by a computer-implemented analyzation process, such as optical character recognition (OCR). A cooperative model is employed that combines one or more supervised-learning based inspector sub-models, and one or more filter sub-models that operating in series with the inspector sub-model(s). The inspectors first receive the input data and calculate one predicted transformation parameters then used to perform transformations on the input data. The inspector-transformed data is then passed to the filters, which derive respective convolution kernels and apply same to the inspector-transformed data before passing same to the OCR or other analyzation process. The inspectors may be pretrained with different training data.
Abstract: A data processing system is disclosed for selecting the correct form of a garbled input word misread by an optical character reader so as to change the number of characters in the word by character splitting or concatenation. Dictionary words are stored in the system, having characters which are flagged for segmentation or concatenation OCR misread propensity. The OCR word and a dictionary word are loaded into a pair of associated shift registers, aligning their letters on one end. The dictionary word characters are inspected for error propensity flags. When a splitting propensity, for example, is found for a character, special conductional probability values are accessed from a storage and a calculation is performed of the probability that the first character of the dictionary word was split by the OCR into the first and second characters of the OCR word. This regional context probability is compared with the probability of a simple substitution error for the characters.
Type:
Grant
Filed:
July 30, 1975
Date of Patent:
July 13, 1976
Assignee:
International Business Machines Corporation
Inventors:
Ellen Willis Bollinger, Anne Marie Chaires, Jean Marie Ciconte, Allen Harold Ett, John Joseph Hilliard, Walter Steven Rosenbaum