Segmenting Individual Characters Or Words Patents (Class 382/177)
-
Patent number: 8068684Abstract: A first aspect of the invention relates to a method for creating a binary mask image from an a inputted digital image of a scanned document, comprising the steps of creating a binarized image by binarizing the inputted digital image, detecting first text regions representing light text on a dark background, and inverting the first text regions, such that the inverted first text regions are interpretable in the same way as dark text on a light background. A second aspect of the invention relates to a method for comparing in a binary image a first pixel blob with a second pixel blob to determine whether they represent matching symbols, comprising the steps of detecting a line in one blob not present in the other and/or determining if one of the blobs represents an italicized symbol where the other does not.Type: GrantFiled: May 4, 2007Date of Patent: November 29, 2011Assignee: I.R.I.S.Inventors: Michel Dauw, Pierre Demuelenaere
-
Patent number: 8064687Abstract: The invention relates to a method and system for the acquisition and correlation matching of points belonging to a stereoscopic pair of images, whereby the pair is formed by a first image and a second image representing a scene. According to the invention, the two images of the pair are acquired with a single acquisition instrument (30) comprising two sensors CCD (31, 32) in the optical focal plane. The matching of the acquired stereoscopic pair consists in determining, by means of correlation, the point in the second image that is homologous to a point in the first image. Said correlation is performed for a point from the first image using an optimally-sized correlation window. When the homologous point of a point from the first image has been determined, the position deviation between the point from the first image and the homologous point thereof is entered in a table. Once all of the homologous points of the points from the first image have been found, the results table is reset barycentrically.Type: GrantFiled: March 3, 2010Date of Patent: November 22, 2011Assignee: Centre National d'etudes SpatialesInventors: Bernard Rouge, Hélène Vadon, Alain Giros
-
Publication number: 20110280481Abstract: An electronic model of the image document is created by undergoing an OCR process. The electronic model includes elements (e.g., words, text lines, paragraphs, images) of the image document that have been determined by each of a plurality of sequentially executed stages in the OCR process. The electronic model serves as input information which is supplied to each of the stages by a previous stage that processed the image document. A graphical user interface is presented to the user so that the user can provide user input data correcting a mischaracterized item appearing in the document. Based on the user input data, the processing stage which produced the initial error that gave rise to the mischaracterized item corrects the initial error. Stages of the OCR process subsequent to this stage then correct any consequential errors arising in their respective stages as a result of the initial error.Type: ApplicationFiled: May 17, 2010Publication date: November 17, 2011Applicant: MICROSOFT CORPORATIONInventors: Bogdan Radakovic, Milan Vugdelija, Nikola Todic, Aleksandar Uzelac, Bodin Dresevic
-
Patent number: 8059896Abstract: A character recognition processing system includes a character recognition confidence evaluating unit that evaluates whether confidence of character recognition of a plurality of areas are low or high, a character area classification unit that classifies a first area evaluated low by the character recognition confidence evaluating unit into a plurality of components, a character separation unit that separates the components classified by the character area classification unit into a character component and non-character components, according to information relating to a second area evaluated high by the character recognition confidence evaluating unit, and a first character recognition unit that performs character recognition processing for the character component separated by the character separation unit.Type: GrantFiled: February 23, 2007Date of Patent: November 15, 2011Assignee: Fuji Xerox Co., Ltd.Inventor: Etsuko Ito
-
Publication number: 20110274354Abstract: An image processing apparatus is provided that includes a character chopper component that segments words into individual characters in a bitmap of a textual image undergoing an OCR process. The Character chopper component is configured to produce a set of (possibly curved) chop-lines which divide a bitmap of any given word into its individual character or glyph candidates. Cases where an input bitmap contains two separate words are handled by marking a place where those words should be split. The character segmentation algorithm computes the set of vertically oriented, curved chop-lines by considering glyph and background colors in a given word bitmap. The set is filtered afterwards using various heuristics, in order to preserve those lines that indeed do separate a word's glyphs and minimize the number of those that do not.Type: ApplicationFiled: May 10, 2010Publication date: November 10, 2011Applicant: MICROSOFT CORPORATIONInventor: Djordje Nijemcevic
-
Publication number: 20110268360Abstract: A method for identifying words in a textual image undergoing optical character recognition includes receiving a bitmap of an input image which includes textual lines that have been segmented by a plurality of chop lines. The chop lines are each associated with a confidence level reflecting a degree to which the respective chop line properly segments the textual line into individual characters. One or more words are identified in one of the textual lines based at least in part on the textual lines and a first subset of the plurality of chop lines which have a chop line confidence level above a first threshold value. If the first word is not associated with a sufficiently high word confidence level, at least a second word in the textual line is identified based at least in part on a second subset of the plurality of chop lines which have a confidence level above a second threshold value lower than the first threshold value.Type: ApplicationFiled: May 3, 2010Publication date: November 3, 2011Applicant: MICROSOFT CORPORATIONInventors: Aleksandar Antonijevic, Ivan Mitic, Mircea Cimpoi, Djordje Nijemcevic
-
Publication number: 20110249897Abstract: Systems and methods for character recognition by performing lateral view-based analysis on the character data and generating a feature vector based on the lateral view-based analysis.Type: ApplicationFiled: April 8, 2010Publication date: October 13, 2011Applicant: UNIVERSITY OF CALCUTTAInventors: Nabendu CHAKI, Soharab Hossain Shaikh
-
Patent number: 8036463Abstract: The present invention provides a technique of accurately extracting areas of characters included in a captured image. A character extracting device of the present invention extracts each character in an image with compensated pixel values. In more detail, the character extracting device integrates pixel values at each coordinate position in the image along a character extracting direction. Then, the character extracting device predicts the background area in the image based on the integrated pixel value. The compensated pixel values are compensated based on integrated pixel values at the predicted background area from integrated pixel values at each coordinate position.Type: GrantFiled: September 13, 2007Date of Patent: October 11, 2011Assignee: Keyence CorporationInventor: Masato Shimodaira
-
Patent number: 8036464Abstract: Text segmentation based on topic boundary detection has been an industry problem in automating information dissemination to targeted users. A system for automatic segmentation of ASR output text involves boundary identification based on “topic” changes. The proposed approach is based on building a weighted graph to determine dependency in input sentences based on bi-directional analysis of the input sentences. Furthermore, the input sentences are segmented based on the notion of segment cohesiveness and the segmented sentences are merged based on preamble and postamble analyses.Type: GrantFiled: September 7, 2007Date of Patent: October 11, 2011Assignee: Satyam Computer Services LimitedInventors: Varadarajan Sridhar, Mohamed Abdul Karim Sadiq, K. Kalyana Rao
-
Publication number: 20110243445Abstract: Line segmentation in an OCR process is performed to detect the positions of words within an input textual line image by extracting features from the input to locate breaks and then classifying the breaks into one of two break classes which include inter-word breaks and inter-character breaks. An output including the bounding boxes of the detected words and a probability that a given break belongs to the identified class can then be provided to downstream OCR or other components for post-processing. Advantageously, by reducing line segmentation to the extraction of features, including the position of each break and the number of break features, and break classification, the task of line segmentation is made less complex but with no loss of generality.Type: ApplicationFiled: March 30, 2010Publication date: October 6, 2011Applicant: MICROSOFT CORPORATIONInventors: Aleksandar Uzelac, Bodin Dresevic, Sasa Galic, Bogdan Radakovic
-
Patent number: 8027054Abstract: A scanning apparatus and a method thereof include a scanning unit scanning a document and outputting a scanned result, at least one external storage unit detachably attached to the apparatus, at least one internal storage unit, and a controller detecting an attachment state of the external storage unit and storing the scanned result in one of the external storage unit and the internal storage unit according to the attachment state of the external storage unit. The scanning unit of the scanning apparatus is combined with a user scanning unit and a user printing unit into a combination apparatus, and the scanned result is printed in a printing apparatus spaced-apart from the scanning apparatus by a distance, thereby removing cables between the scanning or printing apparatus and a personal computer.Type: GrantFiled: September 30, 2003Date of Patent: September 27, 2011Assignee: Samsung Electronics Co., Ltd.Inventors: Hyung-jong Kang, Jung-soo Seo
-
Publication number: 20110228124Abstract: Disclosed is a character recognition preprocessing method and apparatus for correcting a nonlinear character string into a linear character string. A binarized character string region is divided into character regions on a character-by-character basis. Upper and lower feature points of each character region are derived, and an upper boundary line, which is a curve connecting the upper feature points of the character regions, and a lower boundary line, which is a curve connecting the lower feature points of the character regions, are generated by applying cubic spline interpolation. Nonlinearity is corrected through adaptive region enlargement by using the maximum horizontal length and the maximum height of the divided character regions.Type: ApplicationFiled: March 21, 2011Publication date: September 22, 2011Applicants: Samsung Electronics Co., Ltd., Industry Foundation of Chonnam National UniversityInventors: Hee-Bum AHN, Jong-Hyun Park, Soo-Hyung Kim, Hyung-Jeong Yang, Guee-Sang Lee
-
Patent number: 8019158Abstract: A method for altering a recognition error correction data structure, the method includes: altering at least one key out of a set of semantically similar keys in response to text appearance probabilities of keys of the set of semantically similar keys to provide an at least one altered key; and replacing the at least one key by the at least one altered key.Type: GrantFiled: January 2, 2008Date of Patent: September 13, 2011Assignee: International Business Machines CorporationInventors: Ella Barkan, Tal Drory, André Heilper
-
Patent number: 8014604Abstract: Disclosed embodiments of the invention provide automated global optimization methods and systems of OCR, tailored to each document being digitized. A document-specific database is created from an OCR scan of a document of interest, which contains an exhaustive listing of words in the document. Images of each word, taken from all the fonts encountered, are entered into the database and mapped to a corresponding textual representation. After entry of a first instance of an image of a word written in a particular font, each new occurrence of the word in that font can be quickly recognized by image processing techniques. The disclosed methods and systems may be used in conjunction with adaptive character recognition training and word recognition training of the OCR engines.Type: GrantFiled: April 16, 2008Date of Patent: September 6, 2011Assignee: International Business Machines CorporationInventors: Asaf Tzadok, Eugeniusz Walach
-
Patent number: 8014603Abstract: A method of characterizing a word image includes traversing the word image stepwise with a window to provide a plurality of window images. For each of the plurality of window images, the method includes splitting the window image to provide a plurality of cells. A feature, such as a gradient direction histogram, is extracted from each of the plurality of cells. The word image can then be characterized based on the features extracted from the plurality of window images.Type: GrantFiled: August 30, 2007Date of Patent: September 6, 2011Assignee: Xerox CorporationInventors: José A. Rodriguez Serrano, Florent C. Perronnin
-
Patent number: 8009928Abstract: Various embodiments of the present invention relate to a method, system and computer program product for detecting and recognizing text in the images captured by cameras and scanners. First, a series of image-processing techniques is applied to detect text regions in the image. Subsequently, the detected text regions pass through different processing stages that reduce blurring and the negative effects of variable lighting. This results in the creation of multiple images that are versions of the same text region. Some of these multiple versions are sent to a character-recognition system. The resulting texts from each of the versions of the image sent to the character-recognition system are then combined to a single result, wherein the single result is detected text.Type: GrantFiled: September 19, 2008Date of Patent: August 30, 2011Assignee: A9.com, Inc.Inventors: Raghavan Manmatha, Mark A. Ruzon
-
Patent number: 8010564Abstract: A logical structure analyzing apparatus includes an extracting unit that extracts word candidates from a form, a first generating unit that classifies each of the word candidates into a group of heading candidates or a group of data candidates to generate, based on positions of the word candidates on the form, first candidates sets each including one heading candidate and one data candidate identifiable by the heading candidate, and a second generating unit that combines the first candidate sets to generate second candidate sets that each include plural heading candidates that differ and one data candidate. The apparatus also includes a removing unit that, based on positions of the heading candidates and the data word candidate in each second candidate set, removes from among the second candidates sets, a determined set including a data item and headings identifying the data item, and an output unit that outputs the determined set.Type: GrantFiled: July 25, 2008Date of Patent: August 30, 2011Assignee: Fujitsu LimitedInventors: Akihiro Minagawa, Yoshinobu Hotta, Yusaku Fujii, Katsuhito Fujimoto
-
Patent number: 8004731Abstract: An image forming apparatus is provided which includes: an image acquisition section (110) which reads an original and acquires an original image; a specific-pattern storage section (141) which stores a specific pattern which expresses, using a dot pattern, apparatus identification information for identifying an apparatus that prints the original image on a sheet of recording paper; an extraction section (132) which extracts an actual image area except a blank area in the original image, and base on the extracted actual image area, extracts a specific area corresponding to an area for printing the specific pattern; and a print section (150) which prints the specific pattern within the actual image area, using a yellow toner.Type: GrantFiled: February 14, 2008Date of Patent: August 23, 2011Assignee: Kyocera Mita CorporationInventor: Kunihiko Tanaka
-
Patent number: 8004712Abstract: It is desired that only necessary document pages be picked up from an enormous quantity of documents and copied by controlling copying operation on the basis of information designated by a user. For this purpose, a plurality of images are input, each image is segmented into objects, and an object as a search key is set. It is then determined, with respect to each of the plurality of images, whether the objects segmented from the image includes the object as the search key. Images containing the object as the search key are selectively copied out of the plurality of images.Type: GrantFiled: January 31, 2006Date of Patent: August 23, 2011Assignee: Canon Kabushiki KaishaInventors: Noboru Hamada, Masakazu Kitora
-
Patent number: 8000528Abstract: A document authentication method compares a target document image (scanned image) with an original document image at multiple levels, such as block (e.g. paragraph, graphics, image), line, word and character levels. The paragraph level comparison determines whether the target and original images have the same number of paragraphs and whether the paragraphs have the same sizes and locations; the line level comparison determines if the target and original images have the same number of lines and whether the lines have the same sizes and locations; etc. Document segmentation is performed on the target and original images to segment them into paragraph units, line units, etc. for purposes of the comparisons. The original document may be segmented beforehand and the segmentation information stored for later use. The authentication process may be designed to stop when alterations are detected at a higher level, so lower level comparisons are not carried out.Type: GrantFiled: December 29, 2009Date of Patent: August 16, 2011Assignee: Konica Minolta Systems Laboratory, Inc.Inventors: Wei Ming, Yibin Tian
-
Patent number: 7988297Abstract: Non-rigidly coupled, overlapping, non-feedback optical systems for spatial filtering of Fourier transform optical patterns and image shape characterization comprises a first optical subsystem that includes a lens for focusing a polarized, coherent beam to a focal point, an image input device that spatially modulates phase positioned between the lens and the focal point, and a spatial filter at the Fourier transform pattern, and a second optical subsystem overlapping the first optical subsystem includes a projection lens and a detector. The second optical subsystem is optically coupled to the first optical subsystem.Type: GrantFiled: October 19, 2007Date of Patent: August 2, 2011Assignee: Look Dynamics, Inc.Inventor: Rikk Crill
-
Publication number: 20110182513Abstract: Locations of word images corresponding to words in a document image are ascertained. The word images are grouped into clusters. For each of multiple of the clusters, a respective compressed word image cluster is determined based on a joint compression of respective ones of the word images that are grouped into the cluster. The positions of the word images in the document image are associated with the respective ones of the compressed word image clusters corresponding to the clusters respectively containing the word images.Type: ApplicationFiled: January 26, 2010Publication date: July 28, 2011Inventors: Kave Eshghi, George Forman, Prakash Reddy
-
Patent number: 7982922Abstract: According to the present invention, an image processing apparatus comprises scanning unit that converts a original image into image data; extraction unit that extracts an area that contains characters of every character size from the image data scanned by the scanning unit; and display unit that displays images of the area that contains characters extracted by the extraction unit at a plurality of resolutions.Type: GrantFiled: August 16, 2005Date of Patent: July 19, 2011Assignee: Canon Kabushiki KaishaInventor: Junichi Takano
-
Publication number: 20110170777Abstract: Processing for a time-series analysis of keywords comprises clustering or classifying pieces of document data, each of which is description of a phenomenon in a natural language, on the basis of frequencies of occurrence of keywords in the pieces of document data, individual keywords being also clustered or classified by clustering or classifying the pieces of document data, and performing a time-series analysis of frequencies of occurrence of pieces of document data containing individual keywords in clusters or classes into which the pieces of document data are clustered or classified or a time-series analysis of frequencies of occurrence of pieces of document data containing clusters or classes into which the individual keywords are clustered or classified. Frequency distribution showing variation of the frequencies of occurrence of the pieces of document data is acquired by the time-series analysis.Type: ApplicationFiled: December 31, 2010Publication date: July 14, 2011Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventor: Takeshi Inagaki
-
Publication number: 20110150335Abstract: A system for processing text captured from rendered documents is described. The system receives a sequence of one or more words optically or acoustically captured from a rendered document by a user. The system identifies among words of the sequence a word with which an action has been associated. The system then performs the associated action with respect to the user.Type: ApplicationFiled: February 21, 2011Publication date: June 23, 2011Applicant: GOOGLE INC.Inventors: Martin T. King, Dale L. Grover, Clifford A. Kushler, James Q. Stafford-Fraser
-
Patent number: 7949155Abstract: An image processing apparatus includes image memory that stores an image; character recognition rate acquisition unit that segments the image stored in the image memory into a plurality of partial images and acquiring a character recognition rate for each partial image; image quality assessment unit that calculates a parameter showing the image quality of the image based on the character recognition rates of the plural partial images acquired by the character recognition rate acquisition unit; and output unit that outputs assessment results obtained by the image quality assessment unit.Type: GrantFiled: September 12, 2005Date of Patent: May 24, 2011Assignee: Fuji Xerox Co., Ltd.Inventor: Shunichi Kimura
-
Patent number: 7949187Abstract: A character string recognition method for recognizing a character string may include a first step in which a first projection data of image data are calculated in a direction of the character string and a second step in which a position of the character string is detected on the basis of the first projection data. In the first step, the image data are divided into a plurality of segments in the direction of the character string and projection in the segment is calculated. The method may further include a third step in which a second projection data in the segment are calculated on the basis of the position of the character string and a fourth step in which a position where the second projection data exceeds a threshold value is detected as a boundary position of a character, and the threshold value may be changed according to pixel number between both ends of the character string.Type: GrantFiled: March 29, 2007Date of Patent: May 24, 2011Assignee: NIDEC Sankyo CorporationInventor: Hiroshi Nakamura
-
Publication number: 20110116715Abstract: A computer-implemented system and method for retrieving a digital image through document image decomposition is provided. A stored digital image is retrieved. Generic visual features are extracted. The features are grouped into a primitive layer including word-graphs that each include words and features. The words are grouped into a layout layer including zone hypotheses that each include one or more of the words. Causal dependencies between the word-graphs and the zone hypotheses are expressed through zone models that include a joint probability defining a pair of probabilistic models generated through a learned binary edge classifier. Each pair of probabilistic models is expressed as an optimal set selection problem including a set of cost functions and constraints. The optimal set selection problem is evaluated through a heuristic search of the cost functions and constraints and a non-overlapping optimal set of the zone hypotheses is provided that characterize the stored digital image.Type: ApplicationFiled: January 24, 2011Publication date: May 19, 2011Applicant: PALO ALTO RESEARCH CENTER INCORPORATEDInventors: Yizhou Wang, Dashan Gao, Haitham Hindi, Minh Binh Do
-
Patent number: 7916972Abstract: A form reader includes a landmarks extractor configured to select textboxes of a converted document as form landmarks based on textual characteristics. A set of positional constraints constrain the form entries relative to the identified form landmarks. A constraints solver selects textboxes of the converted document as form entries by solving the set of positional constraints respective to a set of facts including the selected form landmarks and converted document. In some embodiments, the constraints solver includes a query engine configured to (i) construct a query in a logic programming language setting forth the set of positional constraints and the set of facts and to (ii) input said query to a logic programming language query solving engine and to (iii) receive a response from the query solving engine responsive to the input.Type: GrantFiled: July 31, 2006Date of Patent: March 29, 2011Assignee: Xerox CorporationInventor: Jean-Luc Meunier
-
Patent number: 7912286Abstract: A method of labeling of image data includes reading the image data sequentially with units of two successive pixels and providing one label to a target unit of two successive pixels in the image data when a preliminary label is to be assigned to at least one of the two successive pixels of the target unit. And an image processing apparatus includes a memory configured to store image data, a processor configured to process the image data with units of two successive pixels and to provide one label to a target unit of two successive pixels when a preliminary label is to be assigned to at least one of the two successive pixels of the target unit and a memory controller arranged between the memory and the processor and configured to control reading and writing the image data.Type: GrantFiled: May 10, 2006Date of Patent: March 22, 2011Assignee: Ricoh Company, Ltd.Inventors: Tomoaki Ozaki, Shinichi Yamaura
-
Patent number: 7903881Abstract: An image processing device is structured such that an appropriate judgement of an image, at which blurring or disappearance or the like will occur, is possible. When pixels, which form a line image at which there is the possibility that blurring or disappearance will occur at the time of printing by using a printing plate, are extracted, a line image warning function gives notice by displaying a warning message on a monitor of a client terminal. Thereafter, image converting and print setting are carried out such that an extracted line image is clarified. In this way, when a proof is prepared, an image, at which there is the possibility that blurring or disappearance will occur on a printed matter obtained by using a printing plate, is clarified, and appropriate proofing is possible.Type: GrantFiled: October 9, 2008Date of Patent: March 8, 2011Assignee: Fuji Xerox Co., Ltd.Inventors: Ryuichi Ishizuka, Mari Kodama, Yasushi Nishide
-
Patent number: 7894670Abstract: A system for processing text captured from rendered documents is described. The system receives a sequence of one or more words optically or acoustically captured from a rendered document by a user. The system identifies among words of the sequence a word with which an action has been associated. The system then performs the associated action with respect to the user.Type: GrantFiled: August 10, 2009Date of Patent: February 22, 2011Assignee: Exbiblio B.V.Inventors: Martin Towle King, Dale L. Grover, Clifford A. Kushler, James Quentin Stafford-Fraser
-
Patent number: 7873216Abstract: Disclosed are embodiments of systems and methods for eliminating or reducing the distortion in a scanned image. In embodiments, the image is segmented into foreground and background pixels. Foreground pixels may be grouped into “letters.” Using index-based searching, “letters” may be grouped into “words” and “words” may be grouped into baselines. One or more dominant baselines may be selected and the characteristics of the dominant baseline or baselines may be used to unwarp the image.Type: GrantFiled: February 27, 2007Date of Patent: January 18, 2011Assignee: Seiko Epson CorporationInventors: Ali Zandifar, Anoop K. Bhattacharjya
-
Publication number: 20110007366Abstract: Methods and systems for classifying markings on images in a document are undertaken according to marking types. The document containing the images is supplied to a segmenter which breaks the images into fragments of foreground pixel structures that are identified as being likely to be of the same marking type by finding connected components, extracting near-horizontal or -vertical rule lines and subdividing some connected components to obtain the fragments. The fragments are then supplied to a classifier, where the classifier provides a category score for each fragment, wherein the classifier is trained from the groundtruth images whose pixels are labeled according to known marking types. Thereafter, a same label is assigned to all pixels in a particular fragment, when the fragment is classified by the classifier.Type: ApplicationFiled: July 10, 2009Publication date: January 13, 2011Applicant: Palo Alto Research Center IncorporatedInventors: Prateek Sarkar, Eric Saund
-
Publication number: 20100321714Abstract: A computer-implemented method of scanning a document (e.g. a newspaper or a book) is provided where the text may be legally protected from unauthorized copying, comprising the steps of: acquiring to a memory at least one recording confined to a field that covers a delimited area of a document; processing the at least one recording to perform character recognition; when a character is recognized, registering it in a memory, and performing the above steps repeatedly while recording at shifted positions so as to progressively obtain a string of characters; and evaluating the string against a predefined condition; if condition is not satisfied, determining whether to clear from the memory at least a portion of the at least one recording; if condition is satisfied, provide an output and clear from the memory at least a portion of the string and at least a portion of the at least one recording.Type: ApplicationFiled: March 5, 2009Publication date: December 23, 2010Applicant: Jala ApSInventors: Lars Stig Nielsen, Jacob Meibom
-
Patent number: 7848572Abstract: An image processing method according to the present invention includes extracting from a document image an area to be determined, calculating the number of closed loops within the extracted area, and making a determination based on the calculated number of closed loops, whether the area is a character area. This invention makes it possible to determine with a high accuracy whether an area to be determined is a character area.Type: GrantFiled: April 19, 2006Date of Patent: December 7, 2010Assignee: Canon Kabushiki KaishaInventor: Reiji Misawa
-
Publication number: 20100278427Abstract: The present invention provides a method and system for text processing. The method comprises determining at least a part of characters in a text; dividing the text into a plurality of text segments by using the at least a part of characters as separators; and decoding the plurality of text segments respectively.Type: ApplicationFiled: April 29, 2010Publication date: November 4, 2010Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: BIN LI, LI QUN PANG, ZHI QIANG SHA, ZHI BO ZUO
-
Publication number: 20100278428Abstract: There is provided an apparatus including a model based topic segmentation section that segments a text using a topic model representing semantic coherence, a parameter estimation section that estimates a control parameter used in segmenting the text based on detection of a change point of word distribution in the text, using the result of segmentation by the model based topic segmentation unit as training data, and a change point detection topic segmentation section that segments the text, based on detection of the change point of word distribution in the text, using the parameter estimated by the parameter estimation section (FIG. 1).Type: ApplicationFiled: December 25, 2008Publication date: November 4, 2010Inventors: Makoto Terao, Takafumi Koshinaka
-
Patent number: 7813550Abstract: The object of this invention is to reduce the effort of deleting and information symbol from a read image. To accomplish this, an image of a document with an information symbol is read (S100), and the information symbol is identified in the read image (S130). The identified information symbol is decoded (S150), and it is determined on the basis of the decoding result whether the data format of the information symbol is a desired one (S160). On the basis of the determination, if the data format is the desired one, the information symbol is deleted from the read image (S170).Type: GrantFiled: August 24, 2006Date of Patent: October 12, 2010Assignee: Canon Kabushiki KaishaInventor: Yasuo Komada
-
Patent number: 7810026Abstract: A method for optimizing a source document comprising a plurality of pages of content, comprising each of the following, is presented. A source document is obtained. An optimized document is created corresponding to the source document. Thereafter, for each page in the source document, the following are applied. A page record is created for the page. Each page record comprises a word table comprising a list of the page's words in the order that they appear in the page's content. Each page record further comprises a paragraph entry list for the page including a paragraph entry for each paragraph in the page. Each paragraph entry includes a reference to the first and last word of that paragraph in the word table. The page record is compressed using a compression technique. Thereafter, the compressed page record is stored in the optimized document.Type: GrantFiled: September 29, 2006Date of Patent: October 5, 2010Assignee: Amazon Technologies, Inc.Inventors: Joshua Shagam, Robert L Goodwin
-
Patent number: 7805022Abstract: The present invention allows a thumbnail display representing the outline of input images in a digital image printer to be made, in which it is determined whether an image is a first kind of image or a second kind of image, and if it is determined that the image is the first kind of image, a feature part of the first kind of image is enlarged in the thumbnail display to make the contents of image more understandable. Also, the invention allows a thumbnail display representing the outline of input images in a digital image printer to be made, in which it is determined whether an image is a character image or a gradation image, and if it is determined that the image is the character image, a part of the character image is enlarged in the thumbnail display to make the characters more understandable.Type: GrantFiled: August 24, 2004Date of Patent: September 28, 2010Assignee: Canon Kabushiki KaishaInventor: Mamoru Tanaka
-
Patent number: 7796281Abstract: In an image forming system in which a printing device is communicably connected to a server and a terminal, an automatic determination is performed to determine whether or not print data created by the terminal needs to be stored in a memory for placing the data in a reprintable condition. The print data stored in the memory can be reprinted without need for resending the same print data from the terminal or server to the printing device. The automatic determination is, for example, performed by referring to the header of the print data and determining whether the print data is from the terminal or the server.Type: GrantFiled: January 21, 2005Date of Patent: September 14, 2010Assignee: Brother Kogyo Kabushiki KaishaInventor: Toru Tsuzuki
-
Publication number: 20100215270Abstract: A system for performing an automated network-based login procedure on an interactive keypad image includes a software agent executable from a digital medium connected to the network for navigating to a login page, accessing the keypad image, and performing an automated login, and an automated login support application executable from the same or a different digital medium connected to the network, the support application including at least an image processor, an optical character recognizer, and an image data encoder and decoder. The software agent performs a login at the virtual keypad image based on character image matching and location information acquisition for each character of a client's specific set of credential characters included in the image of the keypad.Type: ApplicationFiled: February 26, 2009Publication date: August 26, 2010Inventors: Pradheesh Manohar, Prashant Nalwaya, Prashant Kumar Agrawal
-
Publication number: 20100208996Abstract: A system that extracts text from an image includes a capture device that captures the image having a low resolution. An image segmentation subsystem partitions the image into image segments. An image restoration subsystem generates a resolution-expanded image from the image segments and negates degradation effects of the low-resolution image by transforming the image segments from a first domain to a second domain and deconvolving the transformed image segments in the second domain to determine parameters of the low-resolution image. A text recognition subsystem transforms the restored image data into computer readable text data based on the determined parameters.Type: ApplicationFiled: October 6, 2008Publication date: August 19, 2010Applicant: TUFTS UNIVERSITYInventors: Joseph P. Noonan, Prabahan Basu
-
Publication number: 20100189352Abstract: A method for classifying an input character is disclosed. Character models are used. Each character model is associated with an output character and defines a model specific segmentation scheme for that output character and an associated segment model. The model specific segmentation scheme defines a minimum length corresponding to a number of points in a stroke of the output character and a minimum length threshold. Using each of the character models, the input character is decomposed into segments and the segments are evaluated against the segment model of the respective character model to produce a score indicative of the conformity of the segments with the segment model. The character model that produced the highest score is selected and the input character is classified as the output character associated with the character model that produces the highest score.Type: ApplicationFiled: March 30, 2010Publication date: July 29, 2010Inventor: Jonathon Leigh Napper
-
Patent number: 7765170Abstract: A method for segmenting a data set is disclosed. The method consists of setting a maximum walker size and setting a walker size. Then, a first segment of data from the data set is obtained, wherein the first segment of data is the size of the walker. Then, a second segment of data from the data set is obtained, wherein the second segment of data is not greater than the maximum walker size.Type: GrantFiled: July 11, 2006Date of Patent: July 27, 2010Assignee: Samsung Electronics Co., Ltd.Inventor: Michael David Hall
-
Publication number: 20100177964Abstract: A system for processing text captured from rendered documents is described. The system receives a sequence of one or more words optically or acoustically captured from a rendered document by a user. The system identifies among words of the sequence a word with which an action has been associated. The system then performs the associated action with respect to the user.Type: ApplicationFiled: August 10, 2009Publication date: July 15, 2010Applicant: Exbiblio B.V.Inventors: Martin T. King, Dale L. Grover, Clifford A. Kushler, James Q. Stafford-Fraser
-
Patent number: 7756340Abstract: Methods and apparatus for detecting the presence of combs, determining their shape and removing the combs from a scanned form in an automated manner are described. Horizontal and vertical line feature analysis is combined with knowledge of the usual size, shape, and spacing characteristics of lines which form a comb. Vertical and horizontal lines failing to meet certain characteristics, e.g., size or shape characteristics, are eliminated from consideration. Vertical lines which do not intersect a horizontal line are also eliminated from consideration. Confidence measures for different possible comb shapes are generated and the most probable comb shapes as indicated by the confidence measures are included in a comb list. The comb list may be output for use in further processing, e.g., comb removal and/or data extraction processing.Type: GrantFiled: July 11, 2006Date of Patent: July 13, 2010Assignee: Pegasus Imaging CorporationInventor: M. Scot Alexander
-
Patent number: 7751087Abstract: Embodiments herein include a method of adding color to a monochrome (single color printing) document that begins by inputting/creating colorization rules relating to the previously printed monochromatic document and scanning the previously printed monochromatic document to locate rasterized data. After the scanning, the method performs optical character recognition on the rasterized data to search for text corresponding to the previously printed monochromatic document. After the rules are input and the rasterized data is produced, the method automatically colorizes portions of rasterized content according to the colorization rules and this generates a colorized electronic document.Type: GrantFiled: April 3, 2007Date of Patent: July 6, 2010Assignee: Xerox CorporationInventors: Javier A. Morales, Arlene Buck, Michael E. Farrell
-
Publication number: 20100166307Abstract: One embodiment of the present invention provides a system that removes noise from an image. During operation, the system first identifies blobs in the image, wherein a blob is a set of contiguous pixels which possibly represents a character or a portion of a character in the image. Next, the system analyzes the blobs to dynamically determine a “noise threshold” for the blobs. The system then removes blobs from the image which are below the noise threshold.Type: ApplicationFiled: December 28, 2009Publication date: July 1, 2010Inventor: Dennis G. Nicholson