Abstract: An information processing methodology gives rise to an application program interface which includes an automated digitizing unit, such as a scanner, which inputs information from a diversity of hard copy documents and stores information from the hard copy documents into a memory as stored document information. Portions of the stored document information are selected in accordance with content instructions which designate portions of the stored document information required by a particular application program. The selected stored document information is then placed into the transmission format required by a particular application program in accordance with transmission format instructions. After the information has been transmission formatted, the information is transmitted to the application program. In one operational mode, the interface interactively prompts the user to identify, on a display, portions of the hard copy documents containing information used in application programs or for storage.
Type:
Grant
Filed:
November 28, 1994
Date of Patent:
April 29, 1997
Assignee:
International Patent Holdings Ltd.
Inventors:
Robert Lech, Mitchell A. Medina, Catherine B. Elias
Abstract: A method and apparatus for detection of highlighted regions of a document. A document containing highlighted regions is scanned using a gray scale scanner. Morphology and threshold reduction techniques are used to separate highlighted and non-highlighted portions of the document. Having separated the highlighted and non-highlighted portions, optical character recognition (OCR) techniques can then be used to extract text from the highlighted regions.
Type:
Grant
Filed:
June 7, 1995
Date of Patent:
April 8, 1997
Assignee:
Xerox Corporation
Inventors:
Dan S. Bloomberg, Henry W. Sang, Jr., Lakshmi Dasari
Abstract: This invention is a mail address reading apparatus including a circuit for detecting image data of a mail, a circuit for detecting marks indicating the position of an address area from the image data, and a circuit for specifying the address area of the image data based on the position indicated by the marks and reading the address of the image data in the address area. In this apparatus, the address area can be specified by the detected mark position without causing any problem even if image information other than the address is present and the address can be correctly read.
Abstract: An optical character recognition system which can extract information from documents into machine readable form for selected inclusion into a data base uses human classification through the use of translucent ink pens of colors which correlate to field designations. The ink pens, commonly known as highlighters, are used to mark the selected text. An optical scanner reads the marked document and converts it to electronic data which is stored into data base fields according to the color marked regions.
Abstract: A robust technique for determining whether a field (43, 45, 47a-d) on a form (40'), which has been converted to a binary input image, contains a mark utilizes an approach of making an initial determination of the approximate location of the field, and then refining such determination. The form is assumed to have registration marks (fiducials) with the field at a known location relative to the fiducials. The fiducials are identified (50), and the approximate location of the field is determined (55) from the fiducial positions and the known relation between the fiducials and the field. At this point, a portion of the image (referred to as the subimage) is extracted (57). The subimage is typically somewhat larger than the field so that it can be assumed that the field is within the subimage. The field has machine-printed lines along at least part of the field perimeter.
Abstract: A multi-color marker editing system for editing a color image by reading a designated marker. The multi-color marker editing system includes an image reading unit for reading color image data, a color-coordinate converting unit for converting the read image data into color data in a color coordinate system defined by optical density, hue and saturation, a color detecting unit for detecting a designated marker color from the read color image data, an image density converting unit for converting a density of the detected marker color image data, and a marker editing unit for making a marker color editing for each color to the density-converted marker color image data.
Abstract: Tabular documents have column structures that can be determined without decoding the bitmap. The method searches for separation intervals that separate word fragments in the table. These separation intervals are processed by intersecting them with other intervals and ranking the resulting intervals. A structured closure of separation intervals are maintained in bins. The intervals in the bins are sorted and used to determine new intersections when the next separation interval is processed. The intervals with the highest ranking are selected as the column separation intervals. The columns are easily identified with the method without first decoding the bitmap.