Segmenting Individual Characters Or Words Patents (Class 382/177)
  • Patent number: 8068684
    Abstract: A first aspect of the invention relates to a method for creating a binary mask image from an a inputted digital image of a scanned document, comprising the steps of creating a binarized image by binarizing the inputted digital image, detecting first text regions representing light text on a dark background, and inverting the first text regions, such that the inverted first text regions are interpretable in the same way as dark text on a light background. A second aspect of the invention relates to a method for comparing in a binary image a first pixel blob with a second pixel blob to determine whether they represent matching symbols, comprising the steps of detecting a line in one blob not present in the other and/or determining if one of the blobs represents an italicized symbol where the other does not.
    Type: Grant
    Filed: May 4, 2007
    Date of Patent: November 29, 2011
    Assignee: I.R.I.S.
    Inventors: Michel Dauw, Pierre Demuelenaere
  • Patent number: 8064687
    Abstract: The invention relates to a method and system for the acquisition and correlation matching of points belonging to a stereoscopic pair of images, whereby the pair is formed by a first image and a second image representing a scene. According to the invention, the two images of the pair are acquired with a single acquisition instrument (30) comprising two sensors CCD (31, 32) in the optical focal plane. The matching of the acquired stereoscopic pair consists in determining, by means of correlation, the point in the second image that is homologous to a point in the first image. Said correlation is performed for a point from the first image using an optimally-sized correlation window. When the homologous point of a point from the first image has been determined, the position deviation between the point from the first image and the homologous point thereof is entered in a table. Once all of the homologous points of the points from the first image have been found, the results table is reset barycentrically.
    Type: Grant
    Filed: March 3, 2010
    Date of Patent: November 22, 2011
    Assignee: Centre National d'etudes Spatiales
    Inventors: Bernard Rouge, Hélène Vadon, Alain Giros
  • Publication number: 20110280481
    Abstract: An electronic model of the image document is created by undergoing an OCR process. The electronic model includes elements (e.g., words, text lines, paragraphs, images) of the image document that have been determined by each of a plurality of sequentially executed stages in the OCR process. The electronic model serves as input information which is supplied to each of the stages by a previous stage that processed the image document. A graphical user interface is presented to the user so that the user can provide user input data correcting a mischaracterized item appearing in the document. Based on the user input data, the processing stage which produced the initial error that gave rise to the mischaracterized item corrects the initial error. Stages of the OCR process subsequent to this stage then correct any consequential errors arising in their respective stages as a result of the initial error.
    Type: Application
    Filed: May 17, 2010
    Publication date: November 17, 2011
    Applicant: MICROSOFT CORPORATION
    Inventors: Bogdan Radakovic, Milan Vugdelija, Nikola Todic, Aleksandar Uzelac, Bodin Dresevic
  • Patent number: 8059896
    Abstract: A character recognition processing system includes a character recognition confidence evaluating unit that evaluates whether confidence of character recognition of a plurality of areas are low or high, a character area classification unit that classifies a first area evaluated low by the character recognition confidence evaluating unit into a plurality of components, a character separation unit that separates the components classified by the character area classification unit into a character component and non-character components, according to information relating to a second area evaluated high by the character recognition confidence evaluating unit, and a first character recognition unit that performs character recognition processing for the character component separated by the character separation unit.
    Type: Grant
    Filed: February 23, 2007
    Date of Patent: November 15, 2011
    Assignee: Fuji Xerox Co., Ltd.
    Inventor: Etsuko Ito
  • Publication number: 20110274354
    Abstract: An image processing apparatus is provided that includes a character chopper component that segments words into individual characters in a bitmap of a textual image undergoing an OCR process. The Character chopper component is configured to produce a set of (possibly curved) chop-lines which divide a bitmap of any given word into its individual character or glyph candidates. Cases where an input bitmap contains two separate words are handled by marking a place where those words should be split. The character segmentation algorithm computes the set of vertically oriented, curved chop-lines by considering glyph and background colors in a given word bitmap. The set is filtered afterwards using various heuristics, in order to preserve those lines that indeed do separate a word's glyphs and minimize the number of those that do not.
    Type: Application
    Filed: May 10, 2010
    Publication date: November 10, 2011
    Applicant: MICROSOFT CORPORATION
    Inventor: Djordje Nijemcevic
  • Publication number: 20110268360
    Abstract: A method for identifying words in a textual image undergoing optical character recognition includes receiving a bitmap of an input image which includes textual lines that have been segmented by a plurality of chop lines. The chop lines are each associated with a confidence level reflecting a degree to which the respective chop line properly segments the textual line into individual characters. One or more words are identified in one of the textual lines based at least in part on the textual lines and a first subset of the plurality of chop lines which have a chop line confidence level above a first threshold value. If the first word is not associated with a sufficiently high word confidence level, at least a second word in the textual line is identified based at least in part on a second subset of the plurality of chop lines which have a confidence level above a second threshold value lower than the first threshold value.
    Type: Application
    Filed: May 3, 2010
    Publication date: November 3, 2011
    Applicant: MICROSOFT CORPORATION
    Inventors: Aleksandar Antonijevic, Ivan Mitic, Mircea Cimpoi, Djordje Nijemcevic
  • Publication number: 20110249897
    Abstract: Systems and methods for character recognition by performing lateral view-based analysis on the character data and generating a feature vector based on the lateral view-based analysis.
    Type: Application
    Filed: April 8, 2010
    Publication date: October 13, 2011
    Applicant: UNIVERSITY OF CALCUTTA
    Inventors: Nabendu CHAKI, Soharab Hossain Shaikh
  • Patent number: 8036463
    Abstract: The present invention provides a technique of accurately extracting areas of characters included in a captured image. A character extracting device of the present invention extracts each character in an image with compensated pixel values. In more detail, the character extracting device integrates pixel values at each coordinate position in the image along a character extracting direction. Then, the character extracting device predicts the background area in the image based on the integrated pixel value. The compensated pixel values are compensated based on integrated pixel values at the predicted background area from integrated pixel values at each coordinate position.
    Type: Grant
    Filed: September 13, 2007
    Date of Patent: October 11, 2011
    Assignee: Keyence Corporation
    Inventor: Masato Shimodaira
  • Patent number: 8036464
    Abstract: Text segmentation based on topic boundary detection has been an industry problem in automating information dissemination to targeted users. A system for automatic segmentation of ASR output text involves boundary identification based on “topic” changes. The proposed approach is based on building a weighted graph to determine dependency in input sentences based on bi-directional analysis of the input sentences. Furthermore, the input sentences are segmented based on the notion of segment cohesiveness and the segmented sentences are merged based on preamble and postamble analyses.
    Type: Grant
    Filed: September 7, 2007
    Date of Patent: October 11, 2011
    Assignee: Satyam Computer Services Limited
    Inventors: Varadarajan Sridhar, Mohamed Abdul Karim Sadiq, K. Kalyana Rao
  • Publication number: 20110243445
    Abstract: Line segmentation in an OCR process is performed to detect the positions of words within an input textual line image by extracting features from the input to locate breaks and then classifying the breaks into one of two break classes which include inter-word breaks and inter-character breaks. An output including the bounding boxes of the detected words and a probability that a given break belongs to the identified class can then be provided to downstream OCR or other components for post-processing. Advantageously, by reducing line segmentation to the extraction of features, including the position of each break and the number of break features, and break classification, the task of line segmentation is made less complex but with no loss of generality.
    Type: Application
    Filed: March 30, 2010
    Publication date: October 6, 2011
    Applicant: MICROSOFT CORPORATION
    Inventors: Aleksandar Uzelac, Bodin Dresevic, Sasa Galic, Bogdan Radakovic
  • Patent number: 8027054
    Abstract: A scanning apparatus and a method thereof include a scanning unit scanning a document and outputting a scanned result, at least one external storage unit detachably attached to the apparatus, at least one internal storage unit, and a controller detecting an attachment state of the external storage unit and storing the scanned result in one of the external storage unit and the internal storage unit according to the attachment state of the external storage unit. The scanning unit of the scanning apparatus is combined with a user scanning unit and a user printing unit into a combination apparatus, and the scanned result is printed in a printing apparatus spaced-apart from the scanning apparatus by a distance, thereby removing cables between the scanning or printing apparatus and a personal computer.
    Type: Grant
    Filed: September 30, 2003
    Date of Patent: September 27, 2011
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Hyung-jong Kang, Jung-soo Seo
  • Publication number: 20110228124
    Abstract: Disclosed is a character recognition preprocessing method and apparatus for correcting a nonlinear character string into a linear character string. A binarized character string region is divided into character regions on a character-by-character basis. Upper and lower feature points of each character region are derived, and an upper boundary line, which is a curve connecting the upper feature points of the character regions, and a lower boundary line, which is a curve connecting the lower feature points of the character regions, are generated by applying cubic spline interpolation. Nonlinearity is corrected through adaptive region enlargement by using the maximum horizontal length and the maximum height of the divided character regions.
    Type: Application
    Filed: March 21, 2011
    Publication date: September 22, 2011
    Applicants: Samsung Electronics Co., Ltd., Industry Foundation of Chonnam National University
    Inventors: Hee-Bum AHN, Jong-Hyun Park, Soo-Hyung Kim, Hyung-Jeong Yang, Guee-Sang Lee
  • Patent number: 8019158
    Abstract: A method for altering a recognition error correction data structure, the method includes: altering at least one key out of a set of semantically similar keys in response to text appearance probabilities of keys of the set of semantically similar keys to provide an at least one altered key; and replacing the at least one key by the at least one altered key.
    Type: Grant
    Filed: January 2, 2008
    Date of Patent: September 13, 2011
    Assignee: International Business Machines Corporation
    Inventors: Ella Barkan, Tal Drory, André Heilper
  • Patent number: 8014604
    Abstract: Disclosed embodiments of the invention provide automated global optimization methods and systems of OCR, tailored to each document being digitized. A document-specific database is created from an OCR scan of a document of interest, which contains an exhaustive listing of words in the document. Images of each word, taken from all the fonts encountered, are entered into the database and mapped to a corresponding textual representation. After entry of a first instance of an image of a word written in a particular font, each new occurrence of the word in that font can be quickly recognized by image processing techniques. The disclosed methods and systems may be used in conjunction with adaptive character recognition training and word recognition training of the OCR engines.
    Type: Grant
    Filed: April 16, 2008
    Date of Patent: September 6, 2011
    Assignee: International Business Machines Corporation
    Inventors: Asaf Tzadok, Eugeniusz Walach
  • Patent number: 8014603
    Abstract: A method of characterizing a word image includes traversing the word image stepwise with a window to provide a plurality of window images. For each of the plurality of window images, the method includes splitting the window image to provide a plurality of cells. A feature, such as a gradient direction histogram, is extracted from each of the plurality of cells. The word image can then be characterized based on the features extracted from the plurality of window images.
    Type: Grant
    Filed: August 30, 2007
    Date of Patent: September 6, 2011
    Assignee: Xerox Corporation
    Inventors: José A. Rodriguez Serrano, Florent C. Perronnin
  • Patent number: 8009928
    Abstract: Various embodiments of the present invention relate to a method, system and computer program product for detecting and recognizing text in the images captured by cameras and scanners. First, a series of image-processing techniques is applied to detect text regions in the image. Subsequently, the detected text regions pass through different processing stages that reduce blurring and the negative effects of variable lighting. This results in the creation of multiple images that are versions of the same text region. Some of these multiple versions are sent to a character-recognition system. The resulting texts from each of the versions of the image sent to the character-recognition system are then combined to a single result, wherein the single result is detected text.
    Type: Grant
    Filed: September 19, 2008
    Date of Patent: August 30, 2011
    Assignee: A9.com, Inc.
    Inventors: Raghavan Manmatha, Mark A. Ruzon
  • Patent number: 8010564
    Abstract: A logical structure analyzing apparatus includes an extracting unit that extracts word candidates from a form, a first generating unit that classifies each of the word candidates into a group of heading candidates or a group of data candidates to generate, based on positions of the word candidates on the form, first candidates sets each including one heading candidate and one data candidate identifiable by the heading candidate, and a second generating unit that combines the first candidate sets to generate second candidate sets that each include plural heading candidates that differ and one data candidate. The apparatus also includes a removing unit that, based on positions of the heading candidates and the data word candidate in each second candidate set, removes from among the second candidates sets, a determined set including a data item and headings identifying the data item, and an output unit that outputs the determined set.
    Type: Grant
    Filed: July 25, 2008
    Date of Patent: August 30, 2011
    Assignee: Fujitsu Limited
    Inventors: Akihiro Minagawa, Yoshinobu Hotta, Yusaku Fujii, Katsuhito Fujimoto
  • Patent number: 8004731
    Abstract: An image forming apparatus is provided which includes: an image acquisition section (110) which reads an original and acquires an original image; a specific-pattern storage section (141) which stores a specific pattern which expresses, using a dot pattern, apparatus identification information for identifying an apparatus that prints the original image on a sheet of recording paper; an extraction section (132) which extracts an actual image area except a blank area in the original image, and base on the extracted actual image area, extracts a specific area corresponding to an area for printing the specific pattern; and a print section (150) which prints the specific pattern within the actual image area, using a yellow toner.
    Type: Grant
    Filed: February 14, 2008
    Date of Patent: August 23, 2011
    Assignee: Kyocera Mita Corporation
    Inventor: Kunihiko Tanaka
  • Patent number: 8004712
    Abstract: It is desired that only necessary document pages be picked up from an enormous quantity of documents and copied by controlling copying operation on the basis of information designated by a user. For this purpose, a plurality of images are input, each image is segmented into objects, and an object as a search key is set. It is then determined, with respect to each of the plurality of images, whether the objects segmented from the image includes the object as the search key. Images containing the object as the search key are selectively copied out of the plurality of images.
    Type: Grant
    Filed: January 31, 2006
    Date of Patent: August 23, 2011
    Assignee: Canon Kabushiki Kaisha
    Inventors: Noboru Hamada, Masakazu Kitora
  • Patent number: 8000528
    Abstract: A document authentication method compares a target document image (scanned image) with an original document image at multiple levels, such as block (e.g. paragraph, graphics, image), line, word and character levels. The paragraph level comparison determines whether the target and original images have the same number of paragraphs and whether the paragraphs have the same sizes and locations; the line level comparison determines if the target and original images have the same number of lines and whether the lines have the same sizes and locations; etc. Document segmentation is performed on the target and original images to segment them into paragraph units, line units, etc. for purposes of the comparisons. The original document may be segmented beforehand and the segmentation information stored for later use. The authentication process may be designed to stop when alterations are detected at a higher level, so lower level comparisons are not carried out.
    Type: Grant
    Filed: December 29, 2009
    Date of Patent: August 16, 2011
    Assignee: Konica Minolta Systems Laboratory, Inc.
    Inventors: Wei Ming, Yibin Tian
  • Patent number: 7988297
    Abstract: Non-rigidly coupled, overlapping, non-feedback optical systems for spatial filtering of Fourier transform optical patterns and image shape characterization comprises a first optical subsystem that includes a lens for focusing a polarized, coherent beam to a focal point, an image input device that spatially modulates phase positioned between the lens and the focal point, and a spatial filter at the Fourier transform pattern, and a second optical subsystem overlapping the first optical subsystem includes a projection lens and a detector. The second optical subsystem is optically coupled to the first optical subsystem.
    Type: Grant
    Filed: October 19, 2007
    Date of Patent: August 2, 2011
    Assignee: Look Dynamics, Inc.
    Inventor: Rikk Crill
  • Publication number: 20110182513
    Abstract: Locations of word images corresponding to words in a document image are ascertained. The word images are grouped into clusters. For each of multiple of the clusters, a respective compressed word image cluster is determined based on a joint compression of respective ones of the word images that are grouped into the cluster. The positions of the word images in the document image are associated with the respective ones of the compressed word image clusters corresponding to the clusters respectively containing the word images.
    Type: Application
    Filed: January 26, 2010
    Publication date: July 28, 2011
    Inventors: Kave Eshghi, George Forman, Prakash Reddy
  • Patent number: 7982922
    Abstract: According to the present invention, an image processing apparatus comprises scanning unit that converts a original image into image data; extraction unit that extracts an area that contains characters of every character size from the image data scanned by the scanning unit; and display unit that displays images of the area that contains characters extracted by the extraction unit at a plurality of resolutions.
    Type: Grant
    Filed: August 16, 2005
    Date of Patent: July 19, 2011
    Assignee: Canon Kabushiki Kaisha
    Inventor: Junichi Takano
  • Publication number: 20110170777
    Abstract: Processing for a time-series analysis of keywords comprises clustering or classifying pieces of document data, each of which is description of a phenomenon in a natural language, on the basis of frequencies of occurrence of keywords in the pieces of document data, individual keywords being also clustered or classified by clustering or classifying the pieces of document data, and performing a time-series analysis of frequencies of occurrence of pieces of document data containing individual keywords in clusters or classes into which the pieces of document data are clustered or classified or a time-series analysis of frequencies of occurrence of pieces of document data containing clusters or classes into which the individual keywords are clustered or classified. Frequency distribution showing variation of the frequencies of occurrence of the pieces of document data is acquired by the time-series analysis.
    Type: Application
    Filed: December 31, 2010
    Publication date: July 14, 2011
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventor: Takeshi Inagaki
  • Publication number: 20110150335
    Abstract: A system for processing text captured from rendered documents is described. The system receives a sequence of one or more words optically or acoustically captured from a rendered document by a user. The system identifies among words of the sequence a word with which an action has been associated. The system then performs the associated action with respect to the user.
    Type: Application
    Filed: February 21, 2011
    Publication date: June 23, 2011
    Applicant: GOOGLE INC.
    Inventors: Martin T. King, Dale L. Grover, Clifford A. Kushler, James Q. Stafford-Fraser
  • Patent number: 7949155
    Abstract: An image processing apparatus includes image memory that stores an image; character recognition rate acquisition unit that segments the image stored in the image memory into a plurality of partial images and acquiring a character recognition rate for each partial image; image quality assessment unit that calculates a parameter showing the image quality of the image based on the character recognition rates of the plural partial images acquired by the character recognition rate acquisition unit; and output unit that outputs assessment results obtained by the image quality assessment unit.
    Type: Grant
    Filed: September 12, 2005
    Date of Patent: May 24, 2011
    Assignee: Fuji Xerox Co., Ltd.
    Inventor: Shunichi Kimura
  • Patent number: 7949187
    Abstract: A character string recognition method for recognizing a character string may include a first step in which a first projection data of image data are calculated in a direction of the character string and a second step in which a position of the character string is detected on the basis of the first projection data. In the first step, the image data are divided into a plurality of segments in the direction of the character string and projection in the segment is calculated. The method may further include a third step in which a second projection data in the segment are calculated on the basis of the position of the character string and a fourth step in which a position where the second projection data exceeds a threshold value is detected as a boundary position of a character, and the threshold value may be changed according to pixel number between both ends of the character string.
    Type: Grant
    Filed: March 29, 2007
    Date of Patent: May 24, 2011
    Assignee: NIDEC Sankyo Corporation
    Inventor: Hiroshi Nakamura
  • Publication number: 20110116715
    Abstract: A computer-implemented system and method for retrieving a digital image through document image decomposition is provided. A stored digital image is retrieved. Generic visual features are extracted. The features are grouped into a primitive layer including word-graphs that each include words and features. The words are grouped into a layout layer including zone hypotheses that each include one or more of the words. Causal dependencies between the word-graphs and the zone hypotheses are expressed through zone models that include a joint probability defining a pair of probabilistic models generated through a learned binary edge classifier. Each pair of probabilistic models is expressed as an optimal set selection problem including a set of cost functions and constraints. The optimal set selection problem is evaluated through a heuristic search of the cost functions and constraints and a non-overlapping optimal set of the zone hypotheses is provided that characterize the stored digital image.
    Type: Application
    Filed: January 24, 2011
    Publication date: May 19, 2011
    Applicant: PALO ALTO RESEARCH CENTER INCORPORATED
    Inventors: Yizhou Wang, Dashan Gao, Haitham Hindi, Minh Binh Do
  • Patent number: 7916972
    Abstract: A form reader includes a landmarks extractor configured to select textboxes of a converted document as form landmarks based on textual characteristics. A set of positional constraints constrain the form entries relative to the identified form landmarks. A constraints solver selects textboxes of the converted document as form entries by solving the set of positional constraints respective to a set of facts including the selected form landmarks and converted document. In some embodiments, the constraints solver includes a query engine configured to (i) construct a query in a logic programming language setting forth the set of positional constraints and the set of facts and to (ii) input said query to a logic programming language query solving engine and to (iii) receive a response from the query solving engine responsive to the input.
    Type: Grant
    Filed: July 31, 2006
    Date of Patent: March 29, 2011
    Assignee: Xerox Corporation
    Inventor: Jean-Luc Meunier
  • Patent number: 7912286
    Abstract: A method of labeling of image data includes reading the image data sequentially with units of two successive pixels and providing one label to a target unit of two successive pixels in the image data when a preliminary label is to be assigned to at least one of the two successive pixels of the target unit. And an image processing apparatus includes a memory configured to store image data, a processor configured to process the image data with units of two successive pixels and to provide one label to a target unit of two successive pixels when a preliminary label is to be assigned to at least one of the two successive pixels of the target unit and a memory controller arranged between the memory and the processor and configured to control reading and writing the image data.
    Type: Grant
    Filed: May 10, 2006
    Date of Patent: March 22, 2011
    Assignee: Ricoh Company, Ltd.
    Inventors: Tomoaki Ozaki, Shinichi Yamaura
  • Patent number: 7903881
    Abstract: An image processing device is structured such that an appropriate judgement of an image, at which blurring or disappearance or the like will occur, is possible. When pixels, which form a line image at which there is the possibility that blurring or disappearance will occur at the time of printing by using a printing plate, are extracted, a line image warning function gives notice by displaying a warning message on a monitor of a client terminal. Thereafter, image converting and print setting are carried out such that an extracted line image is clarified. In this way, when a proof is prepared, an image, at which there is the possibility that blurring or disappearance will occur on a printed matter obtained by using a printing plate, is clarified, and appropriate proofing is possible.
    Type: Grant
    Filed: October 9, 2008
    Date of Patent: March 8, 2011
    Assignee: Fuji Xerox Co., Ltd.
    Inventors: Ryuichi Ishizuka, Mari Kodama, Yasushi Nishide
  • Patent number: 7894670
    Abstract: A system for processing text captured from rendered documents is described. The system receives a sequence of one or more words optically or acoustically captured from a rendered document by a user. The system identifies among words of the sequence a word with which an action has been associated. The system then performs the associated action with respect to the user.
    Type: Grant
    Filed: August 10, 2009
    Date of Patent: February 22, 2011
    Assignee: Exbiblio B.V.
    Inventors: Martin Towle King, Dale L. Grover, Clifford A. Kushler, James Quentin Stafford-Fraser
  • Patent number: 7873216
    Abstract: Disclosed are embodiments of systems and methods for eliminating or reducing the distortion in a scanned image. In embodiments, the image is segmented into foreground and background pixels. Foreground pixels may be grouped into “letters.” Using index-based searching, “letters” may be grouped into “words” and “words” may be grouped into baselines. One or more dominant baselines may be selected and the characteristics of the dominant baseline or baselines may be used to unwarp the image.
    Type: Grant
    Filed: February 27, 2007
    Date of Patent: January 18, 2011
    Assignee: Seiko Epson Corporation
    Inventors: Ali Zandifar, Anoop K. Bhattacharjya
  • Publication number: 20110007366
    Abstract: Methods and systems for classifying markings on images in a document are undertaken according to marking types. The document containing the images is supplied to a segmenter which breaks the images into fragments of foreground pixel structures that are identified as being likely to be of the same marking type by finding connected components, extracting near-horizontal or -vertical rule lines and subdividing some connected components to obtain the fragments. The fragments are then supplied to a classifier, where the classifier provides a category score for each fragment, wherein the classifier is trained from the groundtruth images whose pixels are labeled according to known marking types. Thereafter, a same label is assigned to all pixels in a particular fragment, when the fragment is classified by the classifier.
    Type: Application
    Filed: July 10, 2009
    Publication date: January 13, 2011
    Applicant: Palo Alto Research Center Incorporated
    Inventors: Prateek Sarkar, Eric Saund
  • Publication number: 20100321714
    Abstract: A computer-implemented method of scanning a document (e.g. a newspaper or a book) is provided where the text may be legally protected from unauthorized copying, comprising the steps of: acquiring to a memory at least one recording confined to a field that covers a delimited area of a document; processing the at least one recording to perform character recognition; when a character is recognized, registering it in a memory, and performing the above steps repeatedly while recording at shifted positions so as to progressively obtain a string of characters; and evaluating the string against a predefined condition; if condition is not satisfied, determining whether to clear from the memory at least a portion of the at least one recording; if condition is satisfied, provide an output and clear from the memory at least a portion of the string and at least a portion of the at least one recording.
    Type: Application
    Filed: March 5, 2009
    Publication date: December 23, 2010
    Applicant: Jala ApS
    Inventors: Lars Stig Nielsen, Jacob Meibom
  • Patent number: 7848572
    Abstract: An image processing method according to the present invention includes extracting from a document image an area to be determined, calculating the number of closed loops within the extracted area, and making a determination based on the calculated number of closed loops, whether the area is a character area. This invention makes it possible to determine with a high accuracy whether an area to be determined is a character area.
    Type: Grant
    Filed: April 19, 2006
    Date of Patent: December 7, 2010
    Assignee: Canon Kabushiki Kaisha
    Inventor: Reiji Misawa
  • Publication number: 20100278427
    Abstract: The present invention provides a method and system for text processing. The method comprises determining at least a part of characters in a text; dividing the text into a plurality of text segments by using the at least a part of characters as separators; and decoding the plurality of text segments respectively.
    Type: Application
    Filed: April 29, 2010
    Publication date: November 4, 2010
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: BIN LI, LI QUN PANG, ZHI QIANG SHA, ZHI BO ZUO
  • Publication number: 20100278428
    Abstract: There is provided an apparatus including a model based topic segmentation section that segments a text using a topic model representing semantic coherence, a parameter estimation section that estimates a control parameter used in segmenting the text based on detection of a change point of word distribution in the text, using the result of segmentation by the model based topic segmentation unit as training data, and a change point detection topic segmentation section that segments the text, based on detection of the change point of word distribution in the text, using the parameter estimated by the parameter estimation section (FIG. 1).
    Type: Application
    Filed: December 25, 2008
    Publication date: November 4, 2010
    Inventors: Makoto Terao, Takafumi Koshinaka
  • Patent number: 7813550
    Abstract: The object of this invention is to reduce the effort of deleting and information symbol from a read image. To accomplish this, an image of a document with an information symbol is read (S100), and the information symbol is identified in the read image (S130). The identified information symbol is decoded (S150), and it is determined on the basis of the decoding result whether the data format of the information symbol is a desired one (S160). On the basis of the determination, if the data format is the desired one, the information symbol is deleted from the read image (S170).
    Type: Grant
    Filed: August 24, 2006
    Date of Patent: October 12, 2010
    Assignee: Canon Kabushiki Kaisha
    Inventor: Yasuo Komada
  • Patent number: 7810026
    Abstract: A method for optimizing a source document comprising a plurality of pages of content, comprising each of the following, is presented. A source document is obtained. An optimized document is created corresponding to the source document. Thereafter, for each page in the source document, the following are applied. A page record is created for the page. Each page record comprises a word table comprising a list of the page's words in the order that they appear in the page's content. Each page record further comprises a paragraph entry list for the page including a paragraph entry for each paragraph in the page. Each paragraph entry includes a reference to the first and last word of that paragraph in the word table. The page record is compressed using a compression technique. Thereafter, the compressed page record is stored in the optimized document.
    Type: Grant
    Filed: September 29, 2006
    Date of Patent: October 5, 2010
    Assignee: Amazon Technologies, Inc.
    Inventors: Joshua Shagam, Robert L Goodwin
  • Patent number: 7805022
    Abstract: The present invention allows a thumbnail display representing the outline of input images in a digital image printer to be made, in which it is determined whether an image is a first kind of image or a second kind of image, and if it is determined that the image is the first kind of image, a feature part of the first kind of image is enlarged in the thumbnail display to make the contents of image more understandable. Also, the invention allows a thumbnail display representing the outline of input images in a digital image printer to be made, in which it is determined whether an image is a character image or a gradation image, and if it is determined that the image is the character image, a part of the character image is enlarged in the thumbnail display to make the characters more understandable.
    Type: Grant
    Filed: August 24, 2004
    Date of Patent: September 28, 2010
    Assignee: Canon Kabushiki Kaisha
    Inventor: Mamoru Tanaka
  • Patent number: 7796281
    Abstract: In an image forming system in which a printing device is communicably connected to a server and a terminal, an automatic determination is performed to determine whether or not print data created by the terminal needs to be stored in a memory for placing the data in a reprintable condition. The print data stored in the memory can be reprinted without need for resending the same print data from the terminal or server to the printing device. The automatic determination is, for example, performed by referring to the header of the print data and determining whether the print data is from the terminal or the server.
    Type: Grant
    Filed: January 21, 2005
    Date of Patent: September 14, 2010
    Assignee: Brother Kogyo Kabushiki Kaisha
    Inventor: Toru Tsuzuki
  • Publication number: 20100215270
    Abstract: A system for performing an automated network-based login procedure on an interactive keypad image includes a software agent executable from a digital medium connected to the network for navigating to a login page, accessing the keypad image, and performing an automated login, and an automated login support application executable from the same or a different digital medium connected to the network, the support application including at least an image processor, an optical character recognizer, and an image data encoder and decoder. The software agent performs a login at the virtual keypad image based on character image matching and location information acquisition for each character of a client's specific set of credential characters included in the image of the keypad.
    Type: Application
    Filed: February 26, 2009
    Publication date: August 26, 2010
    Inventors: Pradheesh Manohar, Prashant Nalwaya, Prashant Kumar Agrawal
  • Publication number: 20100208996
    Abstract: A system that extracts text from an image includes a capture device that captures the image having a low resolution. An image segmentation subsystem partitions the image into image segments. An image restoration subsystem generates a resolution-expanded image from the image segments and negates degradation effects of the low-resolution image by transforming the image segments from a first domain to a second domain and deconvolving the transformed image segments in the second domain to determine parameters of the low-resolution image. A text recognition subsystem transforms the restored image data into computer readable text data based on the determined parameters.
    Type: Application
    Filed: October 6, 2008
    Publication date: August 19, 2010
    Applicant: TUFTS UNIVERSITY
    Inventors: Joseph P. Noonan, Prabahan Basu
  • Publication number: 20100189352
    Abstract: A method for classifying an input character is disclosed. Character models are used. Each character model is associated with an output character and defines a model specific segmentation scheme for that output character and an associated segment model. The model specific segmentation scheme defines a minimum length corresponding to a number of points in a stroke of the output character and a minimum length threshold. Using each of the character models, the input character is decomposed into segments and the segments are evaluated against the segment model of the respective character model to produce a score indicative of the conformity of the segments with the segment model. The character model that produced the highest score is selected and the input character is classified as the output character associated with the character model that produces the highest score.
    Type: Application
    Filed: March 30, 2010
    Publication date: July 29, 2010
    Inventor: Jonathon Leigh Napper
  • Patent number: 7765170
    Abstract: A method for segmenting a data set is disclosed. The method consists of setting a maximum walker size and setting a walker size. Then, a first segment of data from the data set is obtained, wherein the first segment of data is the size of the walker. Then, a second segment of data from the data set is obtained, wherein the second segment of data is not greater than the maximum walker size.
    Type: Grant
    Filed: July 11, 2006
    Date of Patent: July 27, 2010
    Assignee: Samsung Electronics Co., Ltd.
    Inventor: Michael David Hall
  • Publication number: 20100177964
    Abstract: A system for processing text captured from rendered documents is described. The system receives a sequence of one or more words optically or acoustically captured from a rendered document by a user. The system identifies among words of the sequence a word with which an action has been associated. The system then performs the associated action with respect to the user.
    Type: Application
    Filed: August 10, 2009
    Publication date: July 15, 2010
    Applicant: Exbiblio B.V.
    Inventors: Martin T. King, Dale L. Grover, Clifford A. Kushler, James Q. Stafford-Fraser
  • Patent number: 7756340
    Abstract: Methods and apparatus for detecting the presence of combs, determining their shape and removing the combs from a scanned form in an automated manner are described. Horizontal and vertical line feature analysis is combined with knowledge of the usual size, shape, and spacing characteristics of lines which form a comb. Vertical and horizontal lines failing to meet certain characteristics, e.g., size or shape characteristics, are eliminated from consideration. Vertical lines which do not intersect a horizontal line are also eliminated from consideration. Confidence measures for different possible comb shapes are generated and the most probable comb shapes as indicated by the confidence measures are included in a comb list. The comb list may be output for use in further processing, e.g., comb removal and/or data extraction processing.
    Type: Grant
    Filed: July 11, 2006
    Date of Patent: July 13, 2010
    Assignee: Pegasus Imaging Corporation
    Inventor: M. Scot Alexander
  • Patent number: 7751087
    Abstract: Embodiments herein include a method of adding color to a monochrome (single color printing) document that begins by inputting/creating colorization rules relating to the previously printed monochromatic document and scanning the previously printed monochromatic document to locate rasterized data. After the scanning, the method performs optical character recognition on the rasterized data to search for text corresponding to the previously printed monochromatic document. After the rules are input and the rasterized data is produced, the method automatically colorizes portions of rasterized content according to the colorization rules and this generates a colorized electronic document.
    Type: Grant
    Filed: April 3, 2007
    Date of Patent: July 6, 2010
    Assignee: Xerox Corporation
    Inventors: Javier A. Morales, Arlene Buck, Michael E. Farrell
  • Publication number: 20100166307
    Abstract: One embodiment of the present invention provides a system that removes noise from an image. During operation, the system first identifies blobs in the image, wherein a blob is a set of contiguous pixels which possibly represents a character or a portion of a character in the image. Next, the system analyzes the blobs to dynamically determine a “noise threshold” for the blobs. The system then removes blobs from the image which are below the noise threshold.
    Type: Application
    Filed: December 28, 2009
    Publication date: July 1, 2010
    Inventor: Dennis G. Nicholson