Patents by Inventor Gary E. Kopec

Gary E. Kopec has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Automatic training of layout parameters in a 2D image model

Patent number: 6687404

Abstract: A two-dimensional (2D) image model models the layout structure of a class of document images as an image grammar and includes production rules having explicit layout parameters as data items that indicate information about the spatial relationships among image constituents occurring in images included in the class. The parameters are explicitly represented in the grammar rules in a manner that permits them to be automatically trained by a training operation that makes use of sample document images from the class of modeled documents. After each sample image is aligned with the 2D grammar, document-specific measurements about the spatial relationships between image constituents are taken from the image. Optimal values for the layout parameters are then computed from the measurement data collected from all samples.

Type: Grant

Filed: June 20, 1997

Date of Patent: February 3, 2004

Assignee: Xerox Corporation

Inventors: Jesse Hull, Philip A. Chou, Gary E. Kopec, Dennis S. Arnon
Unsupervised training of character templates using unsegmented samples

Patent number: 5956419

Abstract: A method for operating a machine to perform unsupervised training of a set of character templates uses as the source of training samples an image source of character images, called glyphs, that need not be manually or automatically segmented or isolated prior to training. A recognition operation performed on the image source of character images produces a labeled glyph position data structure that includes, for each glyph in the image source, a glyph image position in the image source associating an estimated image location of the glyph in the image source with a character label paired with the glyph image position that indicates the character in the character set being trained. The labeled glyph position data and the image source are then used to determine sample image regions in the image source; each sample image region is large enough to contain at least a single glyph but need not be restricted in size to only contain a single glyph.

Type: Grant

Filed: April 28, 1995

Date of Patent: September 21, 1999

Assignee: Xerox Corporation

Inventors: Gary E. Kopec, Philip Andrew Chou
Method and system for automatic transcription correction

Patent number: 5883986

Abstract: A method and system for automatically modifying an original transcription produced as the output of a recognition operation produces a second, modified transcription, such as, for example, automatically correcting an errorful transcription produced by an OCR operation. The invention uses information in an input text image of character images and in an original transcription associated with the input text image to modify aspects of a formal image source model that models as a grammar the spatial image structure of a set of text images. A recognition operation is then performed on the input text image using the modified formal image source model to produce a second, modified transcription. When the original transcription is errorful, the second transcription is a corrected transcription. Several aspects of the formal image source model may be modified; in particular, character templates to be used in the recognition operation are trained in the font of the glyphs occurring in the input text image.

Type: Grant

Filed: June 2, 1995

Date of Patent: March 16, 1999

Assignee: Xerox Corporation

Inventors: Gary E. Kopec, Philip A. Chou, Leslie T. Niles
Method of producing character templates using unsegmented samples

Patent number: 5706364

Abstract: A method for producing, or training, a set of character templates uses as the source of training samples an image source of character images, called glyphs, that are not previously segmented or isolated for training. Also used is a labeled glyph position data structure that includes, for each glyph in the image source, a glyph image position in the image source associating an image location of the glyph with a character label paired with the glyph image position that indicates the character in the character set being trained. The labeled glyph position data is used to identify a collection of glyph sample image regions in the image source for each character in the character set; each glyph sample image region is large enough to contain a glyph and typically contains adjacent glyphs for other characters.

Type: Grant

Filed: April 28, 1995

Date of Patent: January 6, 1998

Assignee: Xerox Corporation

Inventors: Gary E. Kopec, Philip Andrew Chou
Automatic training of character templates using a transcription and a two-dimensional image source model

Patent number: 5689620

Abstract: A technique for automatically training a set of character templates using unsegmented training samples uses as input a two-dimensional (2D) image of characters, called glyphs, as the source of training samples, a transcription associated with the 2D image as a source of labels for the glyph samples, and an explicit, formal 2D image source model that models as a grammar the structural and functional features of a set of 2D images that may be used as the source of training data. The input transcription may be a literal transcription associated with the 2D input image, or it may be nonliteral, for example containing logical structure tags for document formatting, such as found in markup languages. The technique uses spatial positioning information about the 2D image modeled by the 2D image source model and uses labels in the transcription to determine labeled glyph positions in the 2D image that identify locations of glyph samples.

Type: Grant

Filed: April 28, 1995

Date of Patent: November 18, 1997

Assignee: Xerox Corporation

Inventors: Gary E. Kopec, Philip Andrew Chou, Leslie T. Niles
Method for aligning a text image to a transcription of the image

Patent number: 5689585

Abstract: A method for establishing a relationship between a text image and a transcription associated with the text image uses conventional image processing techniques to identify one or more geometric attributes, or image parameters, of each of a sequence of regions of the text image. The transcription labels in the transcription are analyzed to determine a comparable set of parameters in transcription label sequence. A matching operation then matches the respective parameters of the two sequences to identify image regions that match with transcription regions. The result is an output data structure that minimally identifies image locations of interest to a subsequent operation that processes the text image. The output data structure may also pair each of the image locations of interest to a transcription location, in effect producing a set of labeled image locations. In one embodiment, the sequence of locations of words and their observed lengths in the text image are determined.

Type: Grant

Filed: April 28, 1995

Date of Patent: November 18, 1997

Assignee: Xerox Corporation

Inventors: Dan S. Bloomberg, Leslie T. Niles, Gary E. Kopec, Philip Andrew Chou
Automatic training of character templates using a text line image, a text line transcription and a line image source model

Patent number: 5594809

Abstract: A technique for automatically producing, or training, a set of bitmapped character templates defined according to the sidebearing model of character image positioning uses as input a text line image of unsegmented characters, called glyphs, as the source of training samples. The training process also uses a transcription associated with the text line image, and an explicit, grammar-based text line image source model that describes the structural and functional features of a set of possible text line images that may be used as the source of training samples. The transcription may be a literal transcription of the line image, or it may be nonliteral, for example containing logical structure tags for document formatting and layout, such as found in markup languages.

Type: Grant

Filed: April 28, 1995

Date of Patent: January 14, 1997

Assignee: Xerox Corporation

Inventors: Gary E. Kopec, Philip A. Chou, Leslie T. Niles
Editing text in an image

Patent number: 5548700

Abstract: Character level text editing is performed on an image without recognizing characters, by operating on a character-size array obtained from a two-dimensional array defining an image region. A processor, in response to a request for a text editing operation, accesses an edit data structure that includes the image region array and performs the operation. The character-size array is obtained by dividing the image region array when necessary. An image region array that includes more than one line is divided along interline spaces. An image region array that includes one line is divided along intercharacter spaces. Character-size arrays are divided out of larger arrays by finding connected component bounding boxes, and then determining from the bounding boxes whether the connected components are likely to form a character. If so, the connected components are used to obtain the character-size array and spatial data about position, size, and shape of the character.

Type: Grant

Filed: March 29, 1993

Date of Patent: August 20, 1996

Assignee: Xerox Corporation

Inventors: Steven C. Bagley, Gary E. Kopec
Document image decoding using modified branch-and-bound methods

Patent number: 5526444

Abstract: An image decoding and recognition system and method comprising a fast heuristic algorithm using hidden Markov models (HMM). The new search algorithm, called an "iterative complete path" (ICP) algorithm, patterned after well-known branch-and-bound (B&B) methods, significantly reduces the complexity and improves the speed of HMM image decoding without sacrificing the optimality of the straightforward procedure. An advantageous form of the heuristic functions which is useful in applying the ICP algorithm to text-like images is described. The ICP algorithm is directly applicable to the separable type of finite-state source models. Also disclosed is a technique for transforming more general source models into such a separable form.

Type: Grant

Filed: May 7, 1993

Date of Patent: June 11, 1996

Assignee: Xerox Corporation

Inventors: Gary E. Kopec, Anthony C. Kam, Philip A. Chou
Method and apparatus for identification of document skew

Patent number: 5355420

Abstract: A method and apparatus for identifying and correcting for document skew. Lines of a bitmap are scanned and a variance in the number of ON pixels as a function of skew angle is calculated. Skew of the original document occurs when the variance is a maximum.

Type: Grant

Filed: October 19, 1992

Date of Patent: October 11, 1994

Assignee: Xerox Corporation

Inventors: Dan S. Bloomberg, Gary E. Kopec
Image recognition method using finite state networks

Patent number: 5321773

Abstract: An image recognition system, in particular for document image recognition, using an imaging model employing a 2-dimensional finite state automaton corresponding to a regular string grammar. This approach is not only less computationally intensive than previous grammar-based approaches to document image recognition, but also can handle a wider variety of image types. Features of the imaging model include a sidebearing model of glyph positioning, an image decoder based on linear scheduling theory for regular interative algorithms, the combining of overlapping image sub-regions, and a least-squares estimation procedure for measuring character parameters from character samples in the image.

Type: Grant

Filed: December 10, 1991

Date of Patent: June 14, 1994

Assignee: Xerox Corporation

Inventors: Gary E. Kopec, Philip A. Chou
Method and apparatus for identification and correction of document skew

Patent number: 5187753

Abstract: A method and apparatus for identifying and correcting for document skew. Lines of a bitmap are scanned and a variance in the number of ON pixels as a function of skew angle is calculated. Skew of the original document occurs when the variance is a maximum. Once the skew has been identified, the document is deskewed accordingly.

Type: Grant

Filed: December 8, 1989

Date of Patent: February 16, 1993

Assignee: Xerox Corporation

Inventors: Dan S. Bloomberg, Gary E. Kopec