Patents by Inventor Dan S. Bloomberg
Dan S. Bloomberg has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 5943679Abstract: A document display system arranges images of a document ordered in a linear array of pages on a display screen. One page of the document is defined as a focus page which is displayed at the center of the display screen. Images of pages preceding the focus page in the linear array of pages are presented to a user using a first recursive block that is located to the left of the focus page on the display screen. Images of pages following the focus page in the linear array of pages are presented to a user using a second recursive block that is located to the right of the focus page on the display screen. Each recursive block is initially filled with images that are arranged proximate to the focus page in the array of pages. This arrangement of the pages of a document on the display screen provides a context within which to view the selected focus page of a document.Type: GrantFiled: October 30, 1996Date of Patent: August 24, 1999Assignee: Xerox CorporationInventors: Leslie T. Niles, Dan S. Bloomberg
-
Patent number: 5892842Abstract: A method of automatically identifying sentence boundaries in a document image without performing character recognition to generate an ASCII representation of the document text. The identification process begins by selecting a connected component from the multiplicity of connected components of a text line. Next, it is determined whether the selected connected component might represent a period based upon its shape. If the selected connected component is dot shaped, then it is determined whether the selected connected component might represent a colon. Finally, if the selected connected component is dot shaped and not part of a colon, the selected connected component is labeled as a sentence boundary.Type: GrantFiled: December 14, 1995Date of Patent: April 6, 1999Assignee: Xerox CorporationInventor: Dan S. Bloomberg
-
Patent number: 5862255Abstract: The glyphs of self-clocking glyph codes are written on regular hexagonal or pseudo-hexagonal lattice-like patterns of centers to reduce the risk of interglyph interference during the read process while also enabling the glyphs to be packed more densely while maintaining a given center-to-center spacing between them.Type: GrantFiled: June 18, 1996Date of Patent: January 19, 1999Assignee: Xerox CorporationInventors: Daniel Davies, Dan S. Bloomberg, Robert E. Weltman
-
Patent number: 5848191Abstract: A method of automatically generating a thematic summary from a document image without performing character recognition to generate an ASCII representation of the document text. The method begins with decomposition of the document image into text blocks, and text lines. Using the median x-height of text blocks the main body of text is identified. Afterward, word image equivalence classes and sentence boundaries within the blocks of the main body of text are determined. The word image equivalence classes are used to identify thematic words. These, in turn are used to score the sentences within the main body of text, and the highest scoring sentences are selected for extraction.Type: GrantFiled: December 14, 1995Date of Patent: December 8, 1998Assignee: Xerox CorporationInventors: Francine R. Chen, Dan S. Bloomberg, John W. Tukey
-
Patent number: 5828771Abstract: An efficient image processing technique automatically analyzes an image scanned at 300 or greater dpi and measures an image characteristic of the input image from which it is possible to determine whether the image has ever been previously scanned or printed at low resolution at some time in its history. The technique is effective in classifying an image that was at one time embodied in paper form and scanned at a vertical resolution of 100 dpi or less, such as a facsimile document scanned in standard mode, or at 200 pixels/inch (referred to as "fine fax mode".) The technique performs measurements on the pixels included in the vertical or horizontal edges of symbols contained in the input image, and produces a distribution of the measurements. A numerical interpretation of the measurement distribution data is used to classify the image. The invention is computationally efficient because it may be applied to only a small percentage (e.g.Type: GrantFiled: December 15, 1995Date of Patent: October 27, 1998Assignee: Xerox CorporationInventor: Dan S. Bloomberg
-
Patent number: 5825919Abstract: Font-independent spotting of user-defined keywords in a scanned image. Word identification is based on features of the entire word without the need for segmentation or OCR, and without the need to recognize non-keywords. Font-independent character models are created using hidden Markov models (HMMs) and arbitrary keyword models are built from the character HMM components. Word or text line bounding boxes are extracted from the image, a set of features based on the word shape, (and preferably also the word internal structure) within each bounding box is extracted, this set of features is applied to a network that includes one or more keyword HMMs, and a determination is made. The identification of word bounding boxes for potential keywords includes the steps of reducing the image (say by 2.times.) and subjecting the reduced image to vertical and horizontal morphological closing operations.Type: GrantFiled: September 20, 1994Date of Patent: October 20, 1998Assignee: Xerox CorporationInventors: Dan S. Bloomberg, Lynn D. Wilcox, Francine R. Chen
-
Performing document image management tasks using an iconic image having embedded encoded information
Patent number: 5765176Abstract: Encoded data embedded in an iconic, or reduced size, version of an original text image is decoded and used in a variety of document image management applications to provide input to, or to control the functionality of, an application. The iconic image may be printed in a suitable place (e.g., the margin or other background region) in the original text image so that a text image so annotated will then always carry the embedded data in subsequent copies made from the annotated original. The iconic image may also be used as part of a graphical user interface as a surrogate for the original text image. An encoding operation encodes the data unobtrusively in the form of rectangular blocks that have a foreground color and size dimensions proportional to the iconic image so that when placed in the iconic image in horizontal lines, the blocks appear to a viewer to be representative of the text portion of the original image that they replace.Type: GrantFiled: September 6, 1996Date of Patent: June 9, 1998Assignee: Xerox CorporationInventor: Dan S. Bloomberg -
Patent number: 5761686Abstract: An encoding operation encodes binary data that is then embedded in an iconic, or size-reduced, version of an original text image, in a position in the iconic image that replaces a text portion in the original text image. The encoding operation produces rectangular blocks that have a foreground color and size dimensions proportional to the iconic image so that when placed in the iconic image in horizontal lines, the blocks appear to a viewer to be representative of the text portion of the original image that they replace. Exemplary encoding operations are described, including operations based on run-length limited encoding. A second message may be encoded in the background color regions that separate the blocks. The message carried by the binary data may be any information suitable for a particular application, and need not be restricted to information about or related to the original image.Type: GrantFiled: June 27, 1996Date of Patent: June 2, 1998Assignee: Xerox CorporationInventor: Dan S. Bloomberg
-
Patent number: 5745600Abstract: Font-independent spotting of user-defined keywords in a scanned image. Word identification is based on features of the entire word without the need for segmentation or OCR, and without the need to recognize non-keywords. Font-independent character models are created using hidden Markov models (HMMS) and arbitrary keyword models are built from the character HMM components. Word or text line bounding boxes are extracted from the image, a set of features based on the word shape, (and preferably also the word internal structure) within each bounding box is extracted, this set of features is applied to a network that includes one or more keyword HMMs, and a determination is made. The identification of word bounding boxes for potential keywords includes the steps of reducing the image (say by 2.times.) and subjecting the reduced image to vertical and horizontal morphological closing operations.Type: GrantFiled: November 9, 1994Date of Patent: April 28, 1998Assignee: Xerox CorporationInventors: Francine R. Chen, Lynn D. Wilcox, Dan S. Bloomberg
-
Patent number: 5740285Abstract: In brief, a method of reducing an M X N input binary image (M rows of N pixels each) by a factor of m vertically and n horizontally includes the steps of performing at least one logical operation between bits in consecutive groups of m adjacent rows to provide a resultant single row for each group of m rows, and performing at least one logical operation between bits in consecutive groups of n adjacent columns to provide a resultant single column for each groups of n columns. For certain types of reductions, the resulting reduced image will be the desired output image, while for other types, the resultant image will be one of a required plurality of intermediate images, which are then combined to provide the desired output image.Type: GrantFiled: January 29, 1993Date of Patent: April 14, 1998Assignee: Xerox CorporationInventors: Dan S. Bloomberg, Daniel Davies
-
Patent number: 5689585Abstract: A method for establishing a relationship between a text image and a transcription associated with the text image uses conventional image processing techniques to identify one or more geometric attributes, or image parameters, of each of a sequence of regions of the text image. The transcription labels in the transcription are analyzed to determine a comparable set of parameters in transcription label sequence. A matching operation then matches the respective parameters of the two sequences to identify image regions that match with transcription regions. The result is an output data structure that minimally identifies image locations of interest to a subsequent operation that processes the text image. The output data structure may also pair each of the image locations of interest to a transcription location, in effect producing a set of labeled image locations. In one embodiment, the sequence of locations of words and their observed lengths in the text image are determined.Type: GrantFiled: April 28, 1995Date of Patent: November 18, 1997Assignee: Xerox CorporationInventors: Dan S. Bloomberg, Leslie T. Niles, Gary E. Kopec, Philip Andrew Chou
-
Patent number: 5619592Abstract: A method and apparatus for detection of highlighted regions of a document. A document containing highlighted regions is scanned using a gray scale scanner. Morphology and threshold reduction techniques are used to separate highlighted and non-highlighted portions of the document. Having separated the highlighted and non-highlighted portions, optical character recognition (OCR) techniques can then be used to extract text from the highlighted regions.Type: GrantFiled: June 7, 1995Date of Patent: April 8, 1997Assignee: Xerox CorporationInventors: Dan S. Bloomberg, Henry W. Sang, Jr., Lakshmi Dasari
-
Patent number: 5572601Abstract: A robust technique for determining whether a field (43, 45, 47a-d) on a form (40'), which has been converted to a binary input image, contains a mark utilizes an approach of making an initial determination of the approximate location of the field, and then refining such determination. The form is assumed to have registration marks (fiducials) with the field at a known location relative to the fiducials. The fiducials are identified (50), and the approximate location of the field is determined (55) from the fiducial positions and the known relation between the fiducials and the field. At this point, a portion of the image (referred to as the subimage) is extracted (57). The subimage is typically somewhat larger than the field so that it can be assumed that the field is within the subimage. The field has machine-printed lines along at least part of the field perimeter.Type: GrantFiled: October 19, 1994Date of Patent: November 5, 1996Assignee: Xerox CorporationInventor: Dan S. Bloomberg
-
Patent number: 5570435Abstract: A method and apparatus for differentiating and extracting handwritten annotations and machine printed text in an image. The method provides for the use of morphological operations, preferably at reduced scale, to eliminate for example, the handwritten annotations from an image. A separation mask is produced that, for example, converts all the image pixels corresponding to machine printed text, and none of the image pixels corresponding to handwritten or handprinted annotations. The separation mask is used in conjunction with the original image to produce separate handwritten annotations and machine printed text images. The invention also provides a method and apparatus for identifying the location of specialized type styles such as bold and italic is disclosed. The method erodes a binary image utilizing structuring elements which provide a relatively large number of hits in regions containing the specialized type styles.Type: GrantFiled: December 28, 1994Date of Patent: October 29, 1996Assignee: Xerox CorporationInventors: Dan S. Bloomberg, M. Margaret Withgott
-
Patent number: 5491760Abstract: A method and apparatus for excerpting and summarizing an undecoded document image, without first converting the document image to optical character codes such as ASCII text, identifies significant words, phrases and graphics in the document image using automatic or interactive morphological image recognition techniques, document summaries or indices are produced based on the identified significant portions of the document image. The disclosed method is particularly adept for improvement of reading machines for the blind.Type: GrantFiled: May 9, 1994Date of Patent: February 13, 1996Assignee: Xerox CorporationInventors: M. Margaret Withgott, Steven C. Bagley, Dan S. Bloomberg, Per-Kristian Halvorsen, Daniel P. Huttenlocher, Todd A. Cass, Ronald M. Kaplan, Ramana R. Rao
-
Patent number: 5486686Abstract: Machine readable electronic domain definitions of part or all of the electronic domain descriptions of hardcopy documents and/or of part or all of the transforms that are performed to produce and reproduce such hardcopies documents are encoded in codes that are printed on such documents, thereby permitting the electronic domain descriptions of such documents and/or such transforms to be recovered more robustly and reliably when the information carried by such documents is transformed from the hardcopy domain to the electronic domain.Type: GrantFiled: May 18, 1992Date of Patent: January 23, 1996Assignee: Xerox CorporationInventors: Frank Zdybel, Jr., Henry W. Sang, Jr., Jan O. Pedersen, Z. E. Smith, III, D. A. Henderson, Jr., David L. Hecht, Dan S. Bloomberg
-
Patent number: 5467410Abstract: The present invention provides a robust technique for quickly determining whether a binary input image originated as a blank page. The technique provides reliable sensing in the presence of various image and scanner noise in the input image. In broad terms, the invention contemplates reducing the input image with a low threshold, labeling (by size) connected components (8-connected or 4-connected), and performing a threshold analysis. The threshold analysis typically entails size and numerical thresholds, taking into account the characteristic dimensions of expected types of noise. In specific embodiments, the reduction is performed as a textured reduction wherein the image is divided into tiles, and a single row of pixels in each tile is checked to see whether there are any ON pixels. If there are, the corresponding pixel in the reduced image is ON, otherwise it is OFF. Optional morphological operations are performed to remove expected sources of noise (e.g., pepper noise and thin horizontal lines).Type: GrantFiled: March 20, 1992Date of Patent: November 14, 1995Assignee: Xerox CorporationInventor: Dan S. Bloomberg
-
Patent number: 5455871Abstract: A method and apparatus detects function words in a first image of a scanned document without first converting the image to character codes. Function words include determiners, prepositions, articles, and other words that play a largely grammatical role, as opposed to words such as nouns and verbs that convey topic information. Non-content based morphological characteristics of image units are predetermined as well as the presence or omission of character ascenders and descenders in image units. Predetermined characteristics of function word image units are compared with the image units of an image and when a match occurs, the image unit is identified as a function word. Conversely when no matching characteristics occur, the image unit is identified as a non-function word. Additionally, image units are classified and identified as containing only upper case characters, only lower case characters, only digits, and mixed character types.Type: GrantFiled: May 16, 1994Date of Patent: October 3, 1995Assignee: Xerox CorporationInventors: Dan S. Bloomberg, John W. Tukey, M. Margaret Withgott
-
Patent number: 5438630Abstract: Font-independent spotting of user-defined keywords in a scanned image. Word identification is based on features of the entire word without the need for segmentation or OCR, and without the need to recognize non-keywords. Font-independent character models are created using hidden Markov models (HMMs) and arbitrary keyword models are built from the character HMM components. Word or text line bounding boxes are extracted from the image, a set of features based on the word shape, (and preferably also the word internal structure) within each bounding box is extracted, this set of features is applied to a network that includes one or more keyword HMMs, and a determination is made. The identification of word bounding boxes for potential keywords includes the steps of reducing the image (say by 2.times.) and subjecting the reduced image to vertical and horizontal morphological closing operations.Type: GrantFiled: December 17, 1992Date of Patent: August 1, 1995Assignee: Xerox CorporationInventors: Francine R. Chen, Lynn D. Wilcox, Dan S. Bloomberg
-
Patent number: 5434953Abstract: A technique for reducing images that provides useful information about the image and allows fast computation. Using threshold values near the extreme possible values for the convolution window size and using large subsampling tiles nevertheless allows extraction of the information about the typical textures that exist in the document image: text words, text lines, rules, and halftones. In a particular embodiment, 16.times.16 tiles are used for subsampling, 16.times.1 and 1.times.16 windows are used for the convolution, and threshold values of 1 and 16 are used. If the horizontal windows in tiles are aligned with 16-bit boundaries in the computer, the implementation is particularly efficient. For the 16.times.1 horizontal window, a threshold convolution with T=1 can be done on any of the sixteen 16-bit words in the tile by checking whether the word is zero or non-zero. For a 1.times.Type: GrantFiled: March 20, 1992Date of Patent: July 18, 1995Assignee: Xerox CorporationInventor: Dan S. Bloomberg