Segmenting Individual Characters Or Words Patents (Class 382/177)

Separating touching or overlapping characters (Class 382/178)

Segmenting hand-printed characters (Class 382/179)

Compression of digital images of scanned documents

Patent number: 8068684

Abstract: A first aspect of the invention relates to a method for creating a binary mask image from an a inputted digital image of a scanned document, comprising the steps of creating a binarized image by binarizing the inputted digital image, detecting first text regions representing light text on a dark background, and inverting the first text regions, such that the inverted first text regions are interpretable in the same way as dark text on a light background. A second aspect of the invention relates to a method for comparing in a binary image a first pixel blob with a second pixel blob to determine whether they represent matching symbols, comprising the steps of detecting a line in one blob not present in the other and/or determining if one of the blobs represents an italicized symbol where the other does not.

Type: Grant

Filed: May 4, 2007

Date of Patent: November 29, 2011

Assignee: I.R.I.S.

Inventors: Michel Dauw, Pierre Demuelenaere
Fine stereoscopic image matching and dedicated instrument having a low stereoscopic coefficient

Patent number: 8064687

Abstract: The invention relates to a method and system for the acquisition and correlation matching of points belonging to a stereoscopic pair of images, whereby the pair is formed by a first image and a second image representing a scene. According to the invention, the two images of the pair are acquired with a single acquisition instrument (30) comprising two sensors CCD (31, 32) in the optical focal plane. The matching of the acquired stereoscopic pair consists in determining, by means of correlation, the point in the second image that is homologous to a point in the first image. Said correlation is performed for a point from the first image using an optimally-sized correlation window. When the homologous point of a point from the first image has been determined, the position deviation between the point from the first image and the homologous point thereof is entered in a table. Once all of the homologous points of the points from the first image have been found, the results table is reset barycentrically.

Type: Grant

Filed: March 3, 2010

Date of Patent: November 22, 2011

Assignee: Centre National d'etudes Spatiales

Inventors: Bernard Rouge, Hélène Vadon, Alain Giros
USER CORRECTION OF ERRORS ARISING IN A TEXTUAL DOCUMENT UNDERGOING OPTICAL CHARACTER RECOGNITION (OCR) PROCESS

Publication number: 20110280481

Abstract: An electronic model of the image document is created by undergoing an OCR process. The electronic model includes elements (e.g., words, text lines, paragraphs, images) of the image document that have been determined by each of a plurality of sequentially executed stages in the OCR process. The electronic model serves as input information which is supplied to each of the stages by a previous stage that processed the image document. A graphical user interface is presented to the user so that the user can provide user input data correcting a mischaracterized item appearing in the document. Based on the user input data, the processing stage which produced the initial error that gave rise to the mischaracterized item corrects the initial error. Stages of the OCR process subsequent to this stage then correct any consequential errors arising in their respective stages as a result of the initial error.

Type: Application

Filed: May 17, 2010

Publication date: November 17, 2011

Applicant: MICROSOFT CORPORATION

Inventors: Bogdan Radakovic, Milan Vugdelija, Nikola Todic, Aleksandar Uzelac, Bodin Dresevic
Character recognition processing system and computer readable medium storing program for character recognition processing

Patent number: 8059896

Abstract: A character recognition processing system includes a character recognition confidence evaluating unit that evaluates whether confidence of character recognition of a plurality of areas are low or high, a character area classification unit that classifies a first area evaluated low by the character recognition confidence evaluating unit into a plurality of components, a character separation unit that separates the components classified by the character area classification unit into a character component and non-character components, according to information relating to a second area evaluated high by the character recognition confidence evaluating unit, and a first character recognition unit that performs character recognition processing for the character component separated by the character separation unit.

Type: Grant

Filed: February 23, 2007

Date of Patent: November 15, 2011

Assignee: Fuji Xerox Co., Ltd.

Inventor: Etsuko Ito
SEGMENTATION OF A WORD BITMAP INTO INDIVIDUAL CHARACTERS OR GLYPHS DURING AN OCR PROCESS

Publication number: 20110274354

Abstract: An image processing apparatus is provided that includes a character chopper component that segments words into individual characters in a bitmap of a textual image undergoing an OCR process. The Character chopper component is configured to produce a set of (possibly curved) chop-lines which divide a bitmap of any given word into its individual character or glyph candidates. Cases where an input bitmap contains two separate words are handled by marking a place where those words should be split. The character segmentation algorithm computes the set of vertically oriented, curved chop-lines by considering glyph and background colors in a given word bitmap. The set is filtered afterwards using various heuristics, in order to preserve those lines that indeed do separate a word's glyphs and minimize the number of those that do not.

Type: Application

Filed: May 10, 2010

Publication date: November 10, 2011

Applicant: MICROSOFT CORPORATION

Inventor: Djordje Nijemcevic
WORD RECOGNITION OF TEXT UNDERGOING AN OCR PROCESS

Publication number: 20110268360

Abstract: A method for identifying words in a textual image undergoing optical character recognition includes receiving a bitmap of an input image which includes textual lines that have been segmented by a plurality of chop lines. The chop lines are each associated with a confidence level reflecting a degree to which the respective chop line properly segments the textual line into individual characters. One or more words are identified in one of the textual lines based at least in part on the textual lines and a first subset of the plurality of chop lines which have a chop line confidence level above a first threshold value. If the first word is not associated with a sufficiently high word confidence level, at least a second word in the textual line is identified based at least in part on a second subset of the plurality of chop lines which have a confidence level above a second threshold value lower than the first threshold value.

Type: Application

Filed: May 3, 2010

Publication date: November 3, 2011

Applicant: MICROSOFT CORPORATION

Inventors: Aleksandar Antonijevic, Ivan Mitic, Mircea Cimpoi, Djordje Nijemcevic
CHARACTER RECOGNITION

Publication number: 20110249897

Abstract: Systems and methods for character recognition by performing lateral view-based analysis on the character data and generating a feature vector based on the lateral view-based analysis.

Type: Application

Filed: April 8, 2010

Publication date: October 13, 2011

Applicant: UNIVERSITY OF CALCUTTA

Inventors: Nabendu CHAKI, Soharab Hossain Shaikh
Character extracting apparatus, method, and program

Patent number: 8036463

Abstract: The present invention provides a technique of accurately extracting areas of characters included in a captured image. A character extracting device of the present invention extracts each character in an image with compensated pixel values. In more detail, the character extracting device integrates pixel values at each coordinate position in the image along a character extracting direction. Then, the character extracting device predicts the background area in the image based on the integrated pixel value. The compensated pixel values are compensated based on integrated pixel values at the predicted background area from integrated pixel values at each coordinate position.

Type: Grant

Filed: September 13, 2007

Date of Patent: October 11, 2011

Assignee: Keyence Corporation

Inventor: Masato Shimodaira
System and method for automatic segmentation of ASR transcripts

Patent number: 8036464

Abstract: Text segmentation based on topic boundary detection has been an industry problem in automating information dissemination to targeted users. A system for automatic segmentation of ASR output text involves boundary identification based on “topic” changes. The proposed approach is based on building a weighted graph to determine dependency in input sentences based on bi-directional analysis of the input sentences. Furthermore, the input sentences are segmented based on the notion of segment cohesiveness and the segmented sentences are merged based on preamble and postamble analyses.

Type: Grant

Filed: September 7, 2007

Date of Patent: October 11, 2011

Assignee: Satyam Computer Services Limited

Inventors: Varadarajan Sridhar, Mohamed Abdul Karim Sadiq, K. Kalyana Rao
DETECTING POSITION OF WORD BREAKS IN A TEXTUAL LINE IMAGE

Publication number: 20110243445

Abstract: Line segmentation in an OCR process is performed to detect the positions of words within an input textual line image by extracting features from the input to locate breaks and then classifying the breaks into one of two break classes which include inter-word breaks and inter-character breaks. An output including the bounding boxes of the detected words and a probability that a given break belongs to the identified class can then be provided to downstream OCR or other components for post-processing. Advantageously, by reducing line segmentation to the extraction of features, including the position of each break and the number of break features, and break classification, the task of line segmentation is made less complex but with no loss of generality.

Type: Application

Filed: March 30, 2010

Publication date: October 6, 2011

Applicant: MICROSOFT CORPORATION

Inventors: Aleksandar Uzelac, Bodin Dresevic, Sasa Galic, Bogdan Radakovic
Apparatus and method of scanning and/or printing an image

Patent number: 8027054

Abstract: A scanning apparatus and a method thereof include a scanning unit scanning a document and outputting a scanned result, at least one external storage unit detachably attached to the apparatus, at least one internal storage unit, and a controller detecting an attachment state of the external storage unit and storing the scanned result in one of the external storage unit and the internal storage unit according to the attachment state of the external storage unit. The scanning unit of the scanning apparatus is combined with a user scanning unit and a user printing unit into a combination apparatus, and the scanned result is printed in a printing apparatus spaced-apart from the scanning apparatus by a distance, thereby removing cables between the scanning or printing apparatus and a personal computer.

Type: Grant

Filed: September 30, 2003

Date of Patent: September 27, 2011

Assignee: Samsung Electronics Co., Ltd.

Inventors: Hyung-jong Kang, Jung-soo Seo
CHARACTER RECOGNITION PREPROCESSING METHOD AND APPARATUS

Publication number: 20110228124

Abstract: Disclosed is a character recognition preprocessing method and apparatus for correcting a nonlinear character string into a linear character string. A binarized character string region is divided into character regions on a character-by-character basis. Upper and lower feature points of each character region are derived, and an upper boundary line, which is a curve connecting the upper feature points of the character regions, and a lower boundary line, which is a curve connecting the lower feature points of the character regions, are generated by applying cubic spline interpolation. Nonlinearity is corrected through adaptive region enlargement by using the maximum horizontal length and the maximum height of the divided character regions.

Type: Application

Filed: March 21, 2011

Publication date: September 22, 2011

Applicants: Samsung Electronics Co., Ltd., Industry Foundation of Chonnam National University

Inventors: Hee-Bum AHN, Jong-Hyun Park, Soo-Hyung Kim, Hyung-Jeong Yang, Guee-Sang Lee
Method and computer program product for recognition error correction data

Patent number: 8019158

Abstract: A method for altering a recognition error correction data structure, the method includes: altering at least one key out of a set of semantically similar keys in response to text appearance probabilities of keys of the set of semantically similar keys to provide an at least one altered key; and replacing the at least one key by the at least one altered key.

Type: Grant

Filed: January 2, 2008

Date of Patent: September 13, 2011

Assignee: International Business Machines Corporation

Inventors: Ella Barkan, Tal Drory, André Heilper
OCR of books by word recognition

Patent number: 8014604

Abstract: Disclosed embodiments of the invention provide automated global optimization methods and systems of OCR, tailored to each document being digitized. A document-specific database is created from an OCR scan of a document of interest, which contains an exhaustive listing of words in the document. Images of each word, taken from all the fonts encountered, are entered into the database and mapped to a corresponding textual representation. After entry of a first instance of an image of a word written in a particular font, each new occurrence of the word in that font can be quickly recognized by image processing techniques. The disclosed methods and systems may be used in conjunction with adaptive character recognition training and word recognition training of the OCR engines.

Type: Grant

Filed: April 16, 2008

Date of Patent: September 6, 2011

Assignee: International Business Machines Corporation

Inventors: Asaf Tzadok, Eugeniusz Walach
System and method for characterizing handwritten or typed words in a document

Patent number: 8014603

Abstract: A method of characterizing a word image includes traversing the word image stepwise with a window to provide a plurality of window images. For each of the plurality of window images, the method includes splitting the window image to provide a plurality of cells. A feature, such as a gradient direction histogram, is extracted from each of the plurality of cells. The word image can then be characterized based on the features extracted from the plurality of window images.

Type: Grant

Filed: August 30, 2007

Date of Patent: September 6, 2011

Assignee: Xerox Corporation

Inventors: José A. Rodriguez Serrano, Florent C. Perronnin
Method and system for detecting and recognizing text in images

Patent number: 8009928

Abstract: Various embodiments of the present invention relate to a method, system and computer program product for detecting and recognizing text in the images captured by cameras and scanners. First, a series of image-processing techniques is applied to detect text regions in the image. Subsequently, the detected text regions pass through different processing stages that reduce blurring and the negative effects of variable lighting. This results in the creation of multiple images that are versions of the same text region. Some of these multiple versions are sent to a character-recognition system. The resulting texts from each of the versions of the image sent to the character-recognition system are then combined to a single result, wherein the single result is detected text.

Type: Grant

Filed: September 19, 2008

Date of Patent: August 30, 2011

Assignee: A9.com, Inc.

Inventors: Raghavan Manmatha, Mark A. Ruzon
Logical structure analyzing apparatus, method, and computer product

Patent number: 8010564

Abstract: A logical structure analyzing apparatus includes an extracting unit that extracts word candidates from a form, a first generating unit that classifies each of the word candidates into a group of heading candidates or a group of data candidates to generate, based on positions of the word candidates on the form, first candidates sets each including one heading candidate and one data candidate identifiable by the heading candidate, and a second generating unit that combines the first candidate sets to generate second candidate sets that each include plural heading candidates that differ and one data candidate. The apparatus also includes a removing unit that, based on positions of the heading candidates and the data word candidate in each second candidate set, removes from among the second candidates sets, a determined set including a data item and headings identifying the data item, and an output unit that outputs the determined set.

Type: Grant

Filed: July 25, 2008

Date of Patent: August 30, 2011

Assignee: Fujitsu Limited

Inventors: Akihiro Minagawa, Yoshinobu Hotta, Yusaku Fujii, Katsuhito Fujimoto
Image forming apparatus

Patent number: 8004731

Abstract: An image forming apparatus is provided which includes: an image acquisition section (110) which reads an original and acquires an original image; a specific-pattern storage section (141) which stores a specific pattern which expresses, using a dot pattern, apparatus identification information for identifying an apparatus that prints the original image on a sheet of recording paper; an extraction section (132) which extracts an actual image area except a blank area in the original image, and base on the extracted actual image area, extracts a specific area corresponding to an area for printing the specific pattern; and a print section (150) which prints the specific pattern within the actual image area, using a yellow toner.

Type: Grant

Filed: February 14, 2008

Date of Patent: August 23, 2011

Assignee: Kyocera Mita Corporation

Inventor: Kunihiko Tanaka
Image processing apparatus and method

Patent number: 8004712

Abstract: It is desired that only necessary document pages be picked up from an enormous quantity of documents and copied by controlling copying operation on the basis of information designated by a user. For this purpose, a plurality of images are input, each image is segmented into objects, and an object as a search key is set. It is then determined, with respect to each of the plurality of images, whether the objects segmented from the image includes the object as the search key. Images containing the object as the search key are selectively copied out of the plurality of images.

Type: Grant

Filed: January 31, 2006

Date of Patent: August 23, 2011

Assignee: Canon Kabushiki Kaisha

Inventors: Noboru Hamada, Masakazu Kitora
Method and apparatus for authenticating printed documents using multi-level image comparison based on document characteristics

Patent number: 8000528

Abstract: A document authentication method compares a target document image (scanned image) with an original document image at multiple levels, such as block (e.g. paragraph, graphics, image), line, word and character levels. The paragraph level comparison determines whether the target and original images have the same number of paragraphs and whether the paragraphs have the same sizes and locations; the line level comparison determines if the target and original images have the same number of lines and whether the lines have the same sizes and locations; etc. Document segmentation is performed on the target and original images to segment them into paragraph units, line units, etc. for purposes of the comparisons. The original document may be segmented beforehand and the segmentation information stored for later use. The authentication process may be designed to stop when alterations are detected at a higher level, so lower level comparisons are not carried out.

Type: Grant

Filed: December 29, 2009

Date of Patent: August 16, 2011

Assignee: Konica Minolta Systems Laboratory, Inc.

Inventors: Wei Ming, Yibin Tian
Non-rigidly coupled, overlapping, non-feedback, optical systems for spatial filtering of fourier transform optical patterns and image shape content characterization

Patent number: 7988297

Abstract: Non-rigidly coupled, overlapping, non-feedback optical systems for spatial filtering of Fourier transform optical patterns and image shape characterization comprises a first optical subsystem that includes a lens for focusing a polarized, coherent beam to a focal point, an image input device that spatially modulates phase positioned between the lens and the focal point, and a spatial filter at the Fourier transform pattern, and a second optical subsystem overlapping the first optical subsystem includes a projection lens and a detector. The second optical subsystem is optically coupled to the first optical subsystem.

Type: Grant

Filed: October 19, 2007

Date of Patent: August 2, 2011

Assignee: Look Dynamics, Inc.

Inventor: Rikk Crill
WORD-BASED DOCUMENT IMAGE COMPRESSION

Publication number: 20110182513

Abstract: Locations of word images corresponding to words in a document image are ascertained. The word images are grouped into clusters. For each of multiple of the clusters, a respective compressed word image cluster is determined based on a joint compression of respective ones of the word images that are grouped into the cluster. The positions of the word images in the document image are associated with the respective ones of the compressed word image clusters corresponding to the clusters respectively containing the word images.

Type: Application

Filed: January 26, 2010

Publication date: July 28, 2011

Inventors: Kave Eshghi, George Forman, Prakash Reddy
Image processing apparatus and image processing method for confirming electronic data character quality, and computer program therefor

Patent number: 7982922

Abstract: According to the present invention, an image processing apparatus comprises scanning unit that converts a original image into image data; extraction unit that extracts an area that contains characters of every character size from the image data scanned by the scanning unit; and display unit that displays images of the area that contains characters extracted by the extraction unit at a plurality of resolutions.

Type: Grant

Filed: August 16, 2005

Date of Patent: July 19, 2011

Assignee: Canon Kabushiki Kaisha

Inventor: Junichi Takano
TIME-SERIES ANALYSIS OF KEYWORDS

Publication number: 20110170777

Abstract: Processing for a time-series analysis of keywords comprises clustering or classifying pieces of document data, each of which is description of a phenomenon in a natural language, on the basis of frequencies of occurrence of keywords in the pieces of document data, individual keywords being also clustered or classified by clustering or classifying the pieces of document data, and performing a time-series analysis of frequencies of occurrence of pieces of document data containing individual keywords in clusters or classes into which the pieces of document data are clustered or classified or a time-series analysis of frequencies of occurrence of pieces of document data containing clusters or classes into which the individual keywords are clustered or classified. Frequency distribution showing variation of the frequencies of occurrence of the pieces of document data is acquired by the time-series analysis.

Type: Application

Filed: December 31, 2010

Publication date: July 14, 2011

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventor: Takeshi Inagaki
Triggering Actions in Response to Optically or Acoustically Capturing Keywords from a Rendered Document

Publication number: 20110150335

Abstract: A system for processing text captured from rendered documents is described. The system receives a sequence of one or more words optically or acoustically captured from a rendered document by a user. The system identifies among words of the sequence a word with which an action has been associated. The system then performs the associated action with respect to the user.

Type: Application

Filed: February 21, 2011

Publication date: June 23, 2011

Applicant: GOOGLE INC.

Inventors: Martin T. King, Dale L. Grover, Clifford A. Kushler, James Q. Stafford-Fraser
Image processing apparatus, image processing method, and storage medium

Patent number: 7949155

Abstract: An image processing apparatus includes image memory that stores an image; character recognition rate acquisition unit that segments the image stored in the image memory into a plurality of partial images and acquiring a character recognition rate for each partial image; image quality assessment unit that calculates a parameter showing the image quality of the image based on the character recognition rates of the plural partial images acquired by the character recognition rate acquisition unit; and output unit that outputs assessment results obtained by the image quality assessment unit.

Type: Grant

Filed: September 12, 2005

Date of Patent: May 24, 2011

Assignee: Fuji Xerox Co., Ltd.

Inventor: Shunichi Kimura
Character string recognition method and device

Patent number: 7949187

Abstract: A character string recognition method for recognizing a character string may include a first step in which a first projection data of image data are calculated in a direction of the character string and a second step in which a position of the character string is detected on the basis of the first projection data. In the first step, the image data are divided into a plurality of segments in the direction of the character string and projection in the segment is calculated. The method may further include a third step in which a second projection data in the segment are calculated on the basis of the position of the character string and a fourth step in which a position where the second projection data exceeds a threshold value is detected as a boundary position of a character, and the threshold value may be changed according to pixel number between both ends of the character string.

Type: Grant

Filed: March 29, 2007

Date of Patent: May 24, 2011

Assignee: NIDEC Sankyo Corporation

Inventor: Hiroshi Nakamura
Computer-Implemented System And Method For Recognizing Patterns In A Digital Image Through Document Image Decomposition

Publication number: 20110116715

Abstract: A computer-implemented system and method for retrieving a digital image through document image decomposition is provided. A stored digital image is retrieved. Generic visual features are extracted. The features are grouped into a primitive layer including word-graphs that each include words and features. The words are grouped into a layout layer including zone hypotheses that each include one or more of the words. Causal dependencies between the word-graphs and the zone hypotheses are expressed through zone models that include a joint probability defining a pair of probabilistic models generated through a learned binary edge classifier. Each pair of probabilistic models is expressed as an optimal set selection problem including a set of cost functions and constraints. The optimal set selection problem is evaluated through a heuristic search of the cost functions and constraints and a non-overlapping optimal set of the zone hypotheses is provided that characterize the stored digital image.

Type: Application

Filed: January 24, 2011

Publication date: May 19, 2011

Applicant: PALO ALTO RESEARCH CENTER INCORPORATED

Inventors: Yizhou Wang, Dashan Gao, Haitham Hindi, Minh Binh Do
Landmark-based form reading with declarative language

Patent number: 7916972

Abstract: A form reader includes a landmarks extractor configured to select textboxes of a converted document as form landmarks based on textual characteristics. A set of positional constraints constrain the form entries relative to the identified form landmarks. A constraints solver selects textboxes of the converted document as form entries by solving the set of positional constraints respective to a set of facts including the selected form landmarks and converted document. In some embodiments, the constraints solver includes a query engine configured to (i) construct a query in a logic programming language setting forth the set of positional constraints and the set of facts and to (ii) input said query to a logic programming language query solving engine and to (iii) receive a response from the query solving engine responsive to the input.

Type: Grant

Filed: July 31, 2006

Date of Patent: March 29, 2011

Assignee: Xerox Corporation

Inventor: Jean-Luc Meunier
Image processing apparatus and method of image processing capable of effective labeling

Patent number: 7912286

Abstract: A method of labeling of image data includes reading the image data sequentially with units of two successive pixels and providing one label to a target unit of two successive pixels in the image data when a preliminary label is to be assigned to at least one of the two successive pixels of the target unit. And an image processing apparatus includes a memory configured to store image data, a processor configured to process the image data with units of two successive pixels and to provide one label to a target unit of two successive pixels when a preliminary label is to be assigned to at least one of the two successive pixels of the target unit and a memory controller arranged between the memory and the processor and configured to control reading and writing the image data.

Type: Grant

Filed: May 10, 2006

Date of Patent: March 22, 2011

Assignee: Ricoh Company, Ltd.

Inventors: Tomoaki Ozaki, Shinichi Yamaura
Image processing device

Patent number: 7903881

Abstract: An image processing device is structured such that an appropriate judgement of an image, at which blurring or disappearance or the like will occur, is possible. When pixels, which form a line image at which there is the possibility that blurring or disappearance will occur at the time of printing by using a printing plate, are extracted, a line image warning function gives notice by displaying a warning message on a monitor of a client terminal. Thereafter, image converting and print setting are carried out such that an extracted line image is clarified. In this way, when a proof is prepared, an image, at which there is the possibility that blurring or disappearance will occur on a printed matter obtained by using a printing plate, is clarified, and appropriate proofing is possible.

Type: Grant

Filed: October 9, 2008

Date of Patent: March 8, 2011

Assignee: Fuji Xerox Co., Ltd.

Inventors: Ryuichi Ishizuka, Mari Kodama, Yasushi Nishide
Triggering actions in response to optically or acoustically capturing keywords from a rendered document

Patent number: 7894670

Abstract: A system for processing text captured from rendered documents is described. The system receives a sequence of one or more words optically or acoustically captured from a rendered document by a user. The system identifies among words of the sequence a word with which an action has been associated. The system then performs the associated action with respect to the user.

Type: Grant

Filed: August 10, 2009

Date of Patent: February 22, 2011

Assignee: Exbiblio B.V.

Inventors: Martin Towle King, Dale L. Grover, Clifford A. Kushler, James Quentin Stafford-Fraser
Distortion correction of a scanned image

Patent number: 7873216

Abstract: Disclosed are embodiments of systems and methods for eliminating or reducing the distortion in a scanned image. In embodiments, the image is segmented into foreground and background pixels. Foreground pixels may be grouped into “letters.” Using index-based searching, “letters” may be grouped into “words” and “words” may be grouped into baselines. One or more dominant baselines may be selected and the characteristics of the dominant baseline or baselines may be used to unwarp the image.

Type: Grant

Filed: February 27, 2007

Date of Patent: January 18, 2011

Assignee: Seiko Epson Corporation

Inventors: Ali Zandifar, Anoop K. Bhattacharjya
SYSTEM AND METHOD FOR CLASSIFYING CONNECTED GROUPS OF FOREGROUND PIXELS IN SCANNED DOCUMENT IMAGES ACCORDING TO THE TYPE OF MARKING

Publication number: 20110007366

Abstract: Methods and systems for classifying markings on images in a document are undertaken according to marking types. The document containing the images is supplied to a segmenter which breaks the images into fragments of foreground pixel structures that are identified as being likely to be of the same marking type by finding connected components, extracting near-horizontal or -vertical rule lines and subdividing some connected components to obtain the fragments. The fragments are then supplied to a classifier, where the classifier provides a category score for each fragment, wherein the classifier is trained from the groundtruth images whose pixels are labeled according to known marking types. Thereafter, a same label is assigned to all pixels in a particular fragment, when the fragment is classified by the classifier.

Type: Application

Filed: July 10, 2009

Publication date: January 13, 2011

Applicant: Palo Alto Research Center Incorporated

Inventors: Prateek Sarkar, Eric Saund
METHOD OF SCANNING

Publication number: 20100321714

Abstract: A computer-implemented method of scanning a document (e.g. a newspaper or a book) is provided where the text may be legally protected from unauthorized copying, comprising the steps of: acquiring to a memory at least one recording confined to a field that covers a delimited area of a document; processing the at least one recording to perform character recognition; when a character is recognized, registering it in a memory, and performing the above steps repeatedly while recording at shifted positions so as to progressively obtain a string of characters; and evaluating the string against a predefined condition; if condition is not satisfied, determining whether to clear from the memory at least a portion of the at least one recording; if condition is satisfied, provide an output and clear from the memory at least a portion of the string and at least a portion of the at least one recording.

Type: Application

Filed: March 5, 2009

Publication date: December 23, 2010

Applicant: Jala ApS

Inventors: Lars Stig Nielsen, Jacob Meibom
Image processing apparatus, image processing method, computer program

Patent number: 7848572

Abstract: An image processing method according to the present invention includes extracting from a document image an area to be determined, calculating the number of closed loops within the extracted area, and making a determination based on the calculated number of closed loops, whether the area is a character area. This invention makes it possible to determine with a high accuracy whether an area to be determined is a character area.

Type: Grant

Filed: April 19, 2006

Date of Patent: December 7, 2010

Assignee: Canon Kabushiki Kaisha

Inventor: Reiji Misawa
METHOD AND SYSTEM FOR PROCESSING TEXT

Publication number: 20100278427

Abstract: The present invention provides a method and system for text processing. The method comprises determining at least a part of characters in a text; dividing the text into a plurality of text segments by using the at least a part of characters as separators; and decoding the plurality of text segments respectively.

Type: Application

Filed: April 29, 2010

Publication date: November 4, 2010

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: BIN LI, LI QUN PANG, ZHI QIANG SHA, ZHI BO ZUO
APPARATUS, METHOD AND PROGRAM FOR TEXT SEGMENTATION

Publication number: 20100278428

Abstract: There is provided an apparatus including a model based topic segmentation section that segments a text using a topic model representing semantic coherence, a parameter estimation section that estimates a control parameter used in segmenting the text based on detection of a change point of word distribution in the text, using the result of segmentation by the model based topic segmentation unit as training data, and a change point detection topic segmentation section that segments the text, based on detection of the change point of word distribution in the text, using the parameter estimated by the parameter estimation section (FIG. 1).

Type: Application

Filed: December 25, 2008

Publication date: November 4, 2010

Inventors: Makoto Terao, Takafumi Koshinaka
Image processing method, image processing program, and storage medium with a prescribed data format to delete information not desired

Patent number: 7813550

Abstract: The object of this invention is to reduce the effort of deleting and information symbol from a read image. To accomplish this, an image of a document with an information symbol is read (S100), and the information symbol is identified in the read image (S130). The identified information symbol is decoded (S150), and it is determined on the basis of the decoding result whether the data format of the information symbol is a desired one (S160). On the basis of the determination, if the data format is the desired one, the information symbol is deleted from the read image (S170).

Type: Grant

Filed: August 24, 2006

Date of Patent: October 12, 2010

Assignee: Canon Kabushiki Kaisha

Inventor: Yasuo Komada
Optimizing typographical content for transmission and display

Patent number: 7810026

Abstract: A method for optimizing a source document comprising a plurality of pages of content, comprising each of the following, is presented. A source document is obtained. An optimized document is created corresponding to the source document. Thereafter, for each page in the source document, the following are applied. A page record is created for the page. Each page record comprises a word table comprising a list of the page's words in the order that they appear in the page's content. Each page record further comprises a paragraph entry list for the page including a paragraph entry for each paragraph in the page. Each paragraph entry includes a reference to the first and last word of that paragraph in the word table. The page record is compressed using a compression technique. Thereafter, the compressed page record is stored in the optimized document.

Type: Grant

Filed: September 29, 2006

Date of Patent: October 5, 2010

Assignee: Amazon Technologies, Inc.

Inventors: Joshua Shagam, Robert L Goodwin
Image processing apparatus, image processing method and computer program

Patent number: 7805022

Abstract: The present invention allows a thumbnail display representing the outline of input images in a digital image printer to be made, in which it is determined whether an image is a first kind of image or a second kind of image, and if it is determined that the image is the first kind of image, a feature part of the first kind of image is enlarged in the thumbnail display to make the contents of image more understandable. Also, the invention allows a thumbnail display representing the outline of input images in a digital image printer to be made, in which it is determined whether an image is a character image or a gradation image, and if it is determined that the image is the character image, a part of the character image is enlarged in the thumbnail display to make the characters more understandable.

Type: Grant

Filed: August 24, 2004

Date of Patent: September 28, 2010

Assignee: Canon Kabushiki Kaisha

Inventor: Mamoru Tanaka
Image forming system having reprint function

Patent number: 7796281

Abstract: In an image forming system in which a printing device is communicably connected to a server and a terminal, an automatic determination is performed to determine whether or not print data created by the terminal needs to be stored in a memory for placing the data in a reprintable condition. The print data stored in the memory can be reprinted without need for resending the same print data from the terminal or server to the printing device. The automatic determination is, for example, performed by referring to the header of the print data and determining whether the print data is from the terminal or the server.

Type: Grant

Filed: January 21, 2005

Date of Patent: September 14, 2010

Assignee: Brother Kogyo Kabushiki Kaisha

Inventor: Toru Tsuzuki
System and Methods for Automatically Accessing a Web Site on Behalf of a Client

Publication number: 20100215270

Abstract: A system for performing an automated network-based login procedure on an interactive keypad image includes a software agent executable from a digital medium connected to the network for navigating to a login page, accessing the keypad image, and performing an automated login, and an automated login support application executable from the same or a different digital medium connected to the network, the support application including at least an image processor, an optical character recognizer, and an image data encoder and decoder. The software agent performs a login at the virtual keypad image based on character image matching and location information acquisition for each character of a client's specific set of credential characters included in the image of the keypad.

Type: Application

Filed: February 26, 2009

Publication date: August 26, 2010

Inventors: Pradheesh Manohar, Prashant Nalwaya, Prashant Kumar Agrawal
DEVICES AND METHODS FOR RESTORING LOW-RESOLUTION TEXT IMAGES

Publication number: 20100208996

Abstract: A system that extracts text from an image includes a capture device that captures the image having a low resolution. An image segmentation subsystem partitions the image into image segments. An image restoration subsystem generates a resolution-expanded image from the image segments and negates degradation effects of the low-resolution image by transforming the image segments from a first domain to a second domain and deconvolving the transformed image segments in the second domain to determine parameters of the low-resolution image. A text recognition subsystem transforms the restored image data into computer readable text data based on the determined parameters.

Type: Application

Filed: October 6, 2008

Publication date: August 19, 2010

Applicant: TUFTS UNIVERSITY

Inventors: Joseph P. Noonan, Prabahan Basu
Classifying an Input Character

Publication number: 20100189352

Abstract: A method for classifying an input character is disclosed. Character models are used. Each character model is associated with an output character and defines a model specific segmentation scheme for that output character and an associated segment model. The model specific segmentation scheme defines a minimum length corresponding to a number of points in a stroke of the output character and a minimum length threshold. Using each of the character models, the input character is decomposed into segments and the segments are evaluated against the segment model of the respective character model to produce a score indicative of the conformity of the segments with the segment model. The character model that produced the highest score is selected and the input character is classified as the output character associated with the character model that produces the highest score.

Type: Application

Filed: March 30, 2010

Publication date: July 29, 2010

Inventor: Jonathon Leigh Napper
Data segmentation algorithm using a decrementing sliding walker chunking approach

Patent number: 7765170

Abstract: A method for segmenting a data set is disclosed. The method consists of setting a maximum walker size and setting a walker size. Then, a first segment of data from the data set is obtained, wherein the first segment of data is the size of the walker. Then, a second segment of data from the data set is obtained, wherein the second segment of data is not greater than the maximum walker size.

Type: Grant

Filed: July 11, 2006

Date of Patent: July 27, 2010

Assignee: Samsung Electronics Co., Ltd.

Inventor: Michael David Hall
TRIGGERING ACTIONS IN RESPONSE TO OPTICALLY OR ACOUSTICALLY CAPTURING KEYWORDS FROM A RENDERED DOCUMENT

Publication number: 20100177964

Abstract: A system for processing text captured from rendered documents is described. The system receives a sequence of one or more words optically or acoustically captured from a rendered document by a user. The system identifies among words of the sequence a word with which an action has been associated. The system then performs the associated action with respect to the user.

Type: Application

Filed: August 10, 2009

Publication date: July 15, 2010

Applicant: Exbiblio B.V.

Inventors: Martin T. King, Dale L. Grover, Clifford A. Kushler, James Q. Stafford-Fraser
Method and apparatus for identifying and/or removing combs from scanned images

Patent number: 7756340

Abstract: Methods and apparatus for detecting the presence of combs, determining their shape and removing the combs from a scanned form in an automated manner are described. Horizontal and vertical line feature analysis is combined with knowledge of the usual size, shape, and spacing characteristics of lines which form a comb. Vertical and horizontal lines failing to meet certain characteristics, e.g., size or shape characteristics, are eliminated from consideration. Vertical lines which do not intersect a horizontal line are also eliminated from consideration. Confidence measures for different possible comb shapes are generated and the most probable comb shapes as indicated by the confidence measures are included in a comb list. The comb list may be output for use in further processing, e.g., comb removal and/or data extraction processing.

Type: Grant

Filed: July 11, 2006

Date of Patent: July 13, 2010

Assignee: Pegasus Imaging Corporation

Inventor: M. Scot Alexander
Automatic colorization of monochromatic printed documents

Patent number: 7751087

Abstract: Embodiments herein include a method of adding color to a monochrome (single color printing) document that begins by inputting/creating colorization rules relating to the previously printed monochromatic document and scanning the previously printed monochromatic document to locate rasterized data. After the scanning, the method performs optical character recognition on the rasterized data to search for text corresponding to the previously printed monochromatic document. After the rules are input and the rasterized data is produced, the method automatically colorizes portions of rasterized content according to the colorization rules and this generates a colorized electronic document.

Type: Grant

Filed: April 3, 2007

Date of Patent: July 6, 2010

Assignee: Xerox Corporation

Inventors: Javier A. Morales, Arlene Buck, Michael E. Farrell
Method and Apparatus for Removing Noise from a Digital Image

Publication number: 20100166307

Abstract: One embodiment of the present invention provides a system that removes noise from an image. During operation, the system first identifies blobs in the image, wherein a blob is a set of contiguous pixels which possibly represents a character or a portion of a character in the image. Next, the system analyzes the blobs to dynamically determine a “noise threshold” for the blobs. The system then removes blobs from the image which are below the noise threshold.

Type: Application

Filed: December 28, 2009

Publication date: July 1, 2010

Inventor: Dennis G. Nicholson

prev … 3 4 5 6 7 8 9 10 11 … next