Patents by Inventor Dan S. Bloomberg

Dan S. Bloomberg has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Multi-page document viewer having a focus image and recursively nested images of varying resolutions less than the resolution of the focus image

Patent number: 5943679

Abstract: A document display system arranges images of a document ordered in a linear array of pages on a display screen. One page of the document is defined as a focus page which is displayed at the center of the display screen. Images of pages preceding the focus page in the linear array of pages are presented to a user using a first recursive block that is located to the left of the focus page on the display screen. Images of pages following the focus page in the linear array of pages are presented to a user using a second recursive block that is located to the right of the focus page on the display screen. Each recursive block is initially filled with images that are arranged proximate to the focus page in the array of pages. This arrangement of the pages of a document on the display screen provides a context within which to view the selected focus page of a document.

Type: Grant

Filed: October 30, 1996

Date of Patent: August 24, 1999

Assignee: Xerox Corporation

Inventors: Leslie T. Niles, Dan S. Bloomberg
Automatic method of identifying sentence boundaries in a document image

Patent number: 5892842

Abstract: A method of automatically identifying sentence boundaries in a document image without performing character recognition to generate an ASCII representation of the document text. The identification process begins by selecting a connected component from the multiplicity of connected components of a text line. Next, it is determined whether the selected connected component might represent a period based upon its shape. If the selected connected component is dot shaped, then it is determined whether the selected connected component might represent a colon. Finally, if the selected connected component is dot shaped and not part of a colon, the selected connected component is labeled as a sentence boundary.

Type: Grant

Filed: December 14, 1995

Date of Patent: April 6, 1999

Assignee: Xerox Corporation

Inventor: Dan S. Bloomberg
Broad bandwidth image domain communication channel with symbol interference suppression

Patent number: 5862255

Abstract: The glyphs of self-clocking glyph codes are written on regular hexagonal or pseudo-hexagonal lattice-like patterns of centers to reduce the risk of interglyph interference during the read process while also enabling the glyphs to be packed more densely while maintaining a given center-to-center spacing between them.

Type: Grant

Filed: June 18, 1996

Date of Patent: January 19, 1999

Assignee: Xerox Corporation

Inventors: Daniel Davies, Dan S. Bloomberg, Robert E. Weltman
Automatic method of generating thematic summaries from a document image without performing character recognition

Patent number: 5848191

Abstract: A method of automatically generating a thematic summary from a document image without performing character recognition to generate an ASCII representation of the document text. The method begins with decomposition of the document image into text blocks, and text lines. Using the median x-height of text blocks the main body of text is identified. Afterward, word image equivalence classes and sentence boundaries within the blocks of the main body of text are determined. The word image equivalence classes are used to identify thematic words. These, in turn are used to score the sentences within the main body of text, and the highest scoring sentences are selected for extraction.

Type: Grant

Filed: December 14, 1995

Date of Patent: December 8, 1998

Assignee: Xerox Corporation

Inventors: Francine R. Chen, Dan S. Bloomberg, John W. Tukey
Method and article of manufacture for determining whether a scanned image is an original image or fax image

Patent number: 5828771

Abstract: An efficient image processing technique automatically analyzes an image scanned at 300 or greater dpi and measures an image characteristic of the input image from which it is possible to determine whether the image has ever been previously scanned or printed at low resolution at some time in its history. The technique is effective in classifying an image that was at one time embodied in paper form and scanned at a vertical resolution of 100 dpi or less, such as a facsimile document scanned in standard mode, or at 200 pixels/inch (referred to as "fine fax mode".) The technique performs measurements on the pixels included in the vertical or horizontal edges of symbols contained in the input image, and produces a distribution of the measurements. A numerical interpretation of the measurement distribution data is used to classify the image. The invention is computationally efficient because it may be applied to only a small percentage (e.g.

Type: Grant

Filed: December 15, 1995

Date of Patent: October 27, 1998

Assignee: Xerox Corporation

Inventor: Dan S. Bloomberg
Technique for generating bounding boxes for word spotting in bitmap images

Patent number: 5825919

Abstract: Font-independent spotting of user-defined keywords in a scanned image. Word identification is based on features of the entire word without the need for segmentation or OCR, and without the need to recognize non-keywords. Font-independent character models are created using hidden Markov models (HMMs) and arbitrary keyword models are built from the character HMM components. Word or text line bounding boxes are extracted from the image, a set of features based on the word shape, (and preferably also the word internal structure) within each bounding box is extracted, this set of features is applied to a network that includes one or more keyword HMMs, and a determination is made. The identification of word bounding boxes for potential keywords includes the steps of reducing the image (say by 2.times.) and subjecting the reduced image to vertical and horizontal morphological closing operations.

Type: Grant

Filed: September 20, 1994

Date of Patent: October 20, 1998

Assignee: Xerox Corporation

Inventors: Dan S. Bloomberg, Lynn D. Wilcox, Francine R. Chen
Performing document image management tasks using an iconic image having embedded encoded information

Patent number: 5765176

Abstract: Encoded data embedded in an iconic, or reduced size, version of an original text image is decoded and used in a variety of document image management applications to provide input to, or to control the functionality of, an application. The iconic image may be printed in a suitable place (e.g., the margin or other background region) in the original text image so that a text image so annotated will then always carry the embedded data in subsequent copies made from the annotated original. The iconic image may also be used as part of a graphical user interface as a surrogate for the original text image. An encoding operation encodes the data unobtrusively in the form of rectangular blocks that have a foreground color and size dimensions proportional to the iconic image so that when placed in the iconic image in horizontal lines, the blocks appear to a viewer to be representative of the text portion of the original image that they replace.

Type: Grant

Filed: September 6, 1996

Date of Patent: June 9, 1998

Assignee: Xerox Corporation

Inventor: Dan S. Bloomberg
Embedding encoded information in an iconic version of a text image

Patent number: 5761686

Abstract: An encoding operation encodes binary data that is then embedded in an iconic, or size-reduced, version of an original text image, in a position in the iconic image that replaces a text portion in the original text image. The encoding operation produces rectangular blocks that have a foreground color and size dimensions proportional to the iconic image so that when placed in the iconic image in horizontal lines, the blocks appear to a viewer to be representative of the text portion of the original image that they replace. Exemplary encoding operations are described, including operations based on run-length limited encoding. A second message may be encoded in the background color regions that separate the blocks. The message carried by the binary data may be any information suitable for a particular application, and need not be restricted to information about or related to the original image.

Type: Grant

Filed: June 27, 1996

Date of Patent: June 2, 1998

Assignee: Xerox Corporation

Inventor: Dan S. Bloomberg
Word spotting in bitmap images using text line bounding boxes and hidden Markov models

Patent number: 5745600

Abstract: Font-independent spotting of user-defined keywords in a scanned image. Word identification is based on features of the entire word without the need for segmentation or OCR, and without the need to recognize non-keywords. Font-independent character models are created using hidden Markov models (HMMS) and arbitrary keyword models are built from the character HMM components. Word or text line bounding boxes are extracted from the image, a set of features based on the word shape, (and preferably also the word internal structure) within each bounding box is extracted, this set of features is applied to a network that includes one or more keyword HMMs, and a determination is made. The identification of word bounding boxes for potential keywords includes the steps of reducing the image (say by 2.times.) and subjecting the reduced image to vertical and horizontal morphological closing operations.

Type: Grant

Filed: November 9, 1994

Date of Patent: April 28, 1998

Assignee: Xerox Corporation

Inventors: Francine R. Chen, Lynn D. Wilcox, Dan S. Bloomberg
Image reduction/enlargement technique

Patent number: 5740285

Abstract: In brief, a method of reducing an M X N input binary image (M rows of N pixels each) by a factor of m vertically and n horizontally includes the steps of performing at least one logical operation between bits in consecutive groups of m adjacent rows to provide a resultant single row for each group of m rows, and performing at least one logical operation between bits in consecutive groups of n adjacent columns to provide a resultant single column for each groups of n columns. For certain types of reductions, the resulting reduced image will be the desired output image, while for other types, the resultant image will be one of a required plurality of intermediate images, which are then combined to provide the desired output image.

Type: Grant

Filed: January 29, 1993

Date of Patent: April 14, 1998

Assignee: Xerox Corporation

Inventors: Dan S. Bloomberg, Daniel Davies
Method for aligning a text image to a transcription of the image

Patent number: 5689585

Abstract: A method for establishing a relationship between a text image and a transcription associated with the text image uses conventional image processing techniques to identify one or more geometric attributes, or image parameters, of each of a sequence of regions of the text image. The transcription labels in the transcription are analyzed to determine a comparable set of parameters in transcription label sequence. A matching operation then matches the respective parameters of the two sequences to identify image regions that match with transcription regions. The result is an output data structure that minimally identifies image locations of interest to a subsequent operation that processes the text image. The output data structure may also pair each of the image locations of interest to a transcription location, in effect producing a set of labeled image locations. In one embodiment, the sequence of locations of words and their observed lengths in the text image are determined.

Type: Grant

Filed: April 28, 1995

Date of Patent: November 18, 1997

Assignee: Xerox Corporation

Inventors: Dan S. Bloomberg, Leslie T. Niles, Gary E. Kopec, Philip Andrew Chou
Detection of highlighted regions

Patent number: 5619592

Abstract: A method and apparatus for detection of highlighted regions of a document. A document containing highlighted regions is scanned using a gray scale scanner. Morphology and threshold reduction techniques are used to separate highlighted and non-highlighted portions of the document. Having separated the highlighted and non-highlighted portions, optical character recognition (OCR) techniques can then be used to extract text from the highlighted regions.

Type: Grant

Filed: June 7, 1995

Date of Patent: April 8, 1997

Assignee: Xerox Corporation

Inventors: Dan S. Bloomberg, Henry W. Sang, Jr., Lakshmi Dasari
Mark sensing on a form

Patent number: 5572601

Abstract: A robust technique for determining whether a field (43, 45, 47a-d) on a form (40'), which has been converted to a binary input image, contains a mark utilizes an approach of making an initial determination of the approximate location of the field, and then refining such determination. The form is assumed to have registration marks (fiducials) with the field at a known location relative to the fiducials. The fiducials are identified (50), and the approximate location of the field is determined (55) from the fiducial positions and the known relation between the fiducials and the field. At this point, a portion of the image (referred to as the subimage) is extracted (57). The subimage is typically somewhat larger than the field so that it can be assumed that the field is within the subimage. The field has machine-printed lines along at least part of the field perimeter.

Type: Grant

Filed: October 19, 1994

Date of Patent: November 5, 1996

Assignee: Xerox Corporation

Inventor: Dan S. Bloomberg
Segmentation of text styles

Patent number: 5570435

Abstract: A method and apparatus for differentiating and extracting handwritten annotations and machine printed text in an image. The method provides for the use of morphological operations, preferably at reduced scale, to eliminate for example, the handwritten annotations from an image. A separation mask is produced that, for example, converts all the image pixels corresponding to machine printed text, and none of the image pixels corresponding to handwritten or handprinted annotations. The separation mask is used in conjunction with the original image to produce separate handwritten annotations and machine printed text images. The invention also provides a method and apparatus for identifying the location of specialized type styles such as bold and italic is disclosed. The method erodes a binary image utilizing structuring elements which provide a relatively large number of hits in regions containing the specialized type styles.

Type: Grant

Filed: December 28, 1994

Date of Patent: October 29, 1996

Assignee: Xerox Corporation

Inventors: Dan S. Bloomberg, M. Margaret Withgott
Method and apparatus for summarizing a document without document image decoding

Patent number: 5491760

Abstract: A method and apparatus for excerpting and summarizing an undecoded document image, without first converting the document image to optical character codes such as ASCII text, identifies significant words, phrases and graphics in the document image using automatic or interactive morphological image recognition techniques, document summaries or indices are produced based on the identified significant portions of the document image. The disclosed method is particularly adept for improvement of reading machines for the blind.

Type: Grant

Filed: May 9, 1994

Date of Patent: February 13, 1996

Assignee: Xerox Corporation

Inventors: M. Margaret Withgott, Steven C. Bagley, Dan S. Bloomberg, Per-Kristian Halvorsen, Daniel P. Huttenlocher, Todd A. Cass, Ronald M. Kaplan, Ramana R. Rao
Hardcopy lossless data storage and communications for electronic document processing systems

Patent number: 5486686

Abstract: Machine readable electronic domain definitions of part or all of the electronic domain descriptions of hardcopy documents and/or of part or all of the transforms that are performed to produce and reproduce such hardcopies documents are encoded in codes that are printed on such documents, thereby permitting the electronic domain descriptions of such documents and/or such transforms to be recovered more robustly and reliably when the information carried by such documents is transformed from the hardcopy domain to the electronic domain.

Type: Grant

Filed: May 18, 1992

Date of Patent: January 23, 1996

Assignee: Xerox Corporation

Inventors: Frank Zdybel, Jr., Henry W. Sang, Jr., Jan O. Pedersen, Z. E. Smith, III, D. A. Henderson, Jr., David L. Hecht, Dan S. Bloomberg
Identification of a blank page in an image processing system

Patent number: 5467410

Abstract: The present invention provides a robust technique for quickly determining whether a binary input image originated as a blank page. The technique provides reliable sensing in the presence of various image and scanner noise in the input image. In broad terms, the invention contemplates reducing the input image with a low threshold, labeling (by size) connected components (8-connected or 4-connected), and performing a threshold analysis. The threshold analysis typically entails size and numerical thresholds, taking into account the characteristic dimensions of expected types of noise. In specific embodiments, the reduction is performed as a textured reduction wherein the image is divided into tiles, and a single row of pixels in each tile is checked to see whether there are any ON pixels. If there are, the corresponding pixel in the reduced image is ON, otherwise it is OFF. Optional morphological operations are performed to remove expected sources of noise (e.g., pepper noise and thin horizontal lines).

Type: Grant

Filed: March 20, 1992

Date of Patent: November 14, 1995

Assignee: Xerox Corporation

Inventor: Dan S. Bloomberg
Detecting function words without converting a scanned document to character codes

Patent number: 5455871

Abstract: A method and apparatus detects function words in a first image of a scanned document without first converting the image to character codes. Function words include determiners, prepositions, articles, and other words that play a largely grammatical role, as opposed to words such as nouns and verbs that convey topic information. Non-content based morphological characteristics of image units are predetermined as well as the presence or omission of character ascenders and descenders in image units. Predetermined characteristics of function word image units are compared with the image units of an image and when a match occurs, the image unit is identified as a function word. Conversely when no matching characteristics occur, the image unit is identified as a non-function word. Additionally, image units are classified and identified as containing only upper case characters, only lower case characters, only digits, and mixed character types.

Type: Grant

Filed: May 16, 1994

Date of Patent: October 3, 1995

Assignee: Xerox Corporation

Inventors: Dan S. Bloomberg, John W. Tukey, M. Margaret Withgott
Word spotting in bitmap images using word bounding boxes and hidden Markov models

Patent number: 5438630

Abstract: Font-independent spotting of user-defined keywords in a scanned image. Word identification is based on features of the entire word without the need for segmentation or OCR, and without the need to recognize non-keywords. Font-independent character models are created using hidden Markov models (HMMs) and arbitrary keyword models are built from the character HMM components. Word or text line bounding boxes are extracted from the image, a set of features based on the word shape, (and preferably also the word internal structure) within each bounding box is extracted, this set of features is applied to a network that includes one or more keyword HMMs, and a determination is made. The identification of word bounding boxes for potential keywords includes the steps of reducing the image (say by 2.times.) and subjecting the reduced image to vertical and horizontal morphological closing operations.

Type: Grant

Filed: December 17, 1992

Date of Patent: August 1, 1995

Assignee: Xerox Corporation

Inventors: Francine R. Chen, Lynn D. Wilcox, Dan S. Bloomberg
Use of fast textured reduction for discrimination of document image components

Patent number: 5434953

Abstract: A technique for reducing images that provides useful information about the image and allows fast computation. Using threshold values near the extreme possible values for the convolution window size and using large subsampling tiles nevertheless allows extraction of the information about the typical textures that exist in the document image: text words, text lines, rules, and halftones. In a particular embodiment, 16.times.16 tiles are used for subsampling, 16.times.1 and 1.times.16 windows are used for the convolution, and threshold values of 1 and 16 are used. If the horizontal windows in tiles are aligned with 16-bit boundaries in the computer, the implementation is particularly efficient. For the 16.times.1 horizontal window, a threshold convolution with T=1 can be done on any of the sixteen 16-bit words in the tile by checking whether the word is zero or non-zero. For a 1.times.

Type: Grant

Filed: March 20, 1992

Date of Patent: July 18, 1995

Assignee: Xerox Corporation

Inventor: Dan S. Bloomberg

prev 1 2 3 4 next