Segmenting Individual Characters Or Words Patents (Class 382/177)
  • Patent number: 7047014
    Abstract: Methods, apparatuses and systems directed to optimizing vector models for use in modeling RF propagation in desired physical environments. In one embodiment, the present invention can operate on pre-existing vector models. In other implementations, the present invention facilitates the conversion of raster images of buildings and other physical locations to vector formats for use in connection with the computational modeling of radio-frequency (RF) propagation. According to certain embodiments, the present invention is implemented within the context of a location diagram editing application that supports a line recognition filter, a snap filter and a merge filter which a user may individually select and configure. As discussed more fully below, the line recognition filter operates on the vector objects to adjust near-vertical lines (as defined by a configurable threshold angle) to vertical, and near-horizontal lines to horizontal.
    Type: Grant
    Filed: November 5, 2004
    Date of Patent: May 16, 2006
    Assignee: Airespace, Inc.
    Inventors: Robert J. Friday, Paul F. Dietrich, Gregg Scott Davi
  • Patent number: 7035463
    Abstract: A document image processing device and method for extracting a title region and a mark attached by the user from a document image to use them as document tag information. A region with a region average character size larger then a predetermine extraction judging value is extracted as a title region by title region extracting means. As a result, title regions can be extracted from one document image. A mark that the user makes on an input image is extracted by mark extracting means, and characteristic value of the mark is found by calculating means. Document tag information to be imparted to the input image is selected from reference tag information according to the characteristic value and the attribute value of the reference tag information imparting means. Thus, document tag information is automatically imparted to a document image.
    Type: Grant
    Filed: February 29, 2000
    Date of Patent: April 25, 2006
    Assignee: Matsushita Electric Industrial Co., Ltd.
    Inventors: Yusuke Monobe, Atsushi Hirose, Akito Umebayashi
  • Patent number: 6999618
    Abstract: An object extraction device. In an exemplary embodiment, a first object extraction calculating device finds an object extraction image by employing object extraction calculations for extraction of an object by using a predetermined first calculation parameter on photographed images having a parallax with respect to the object. An incorrect outline extraction processor extracts an outline from the object extraction image and extracts an incorrect outline segment from the extracted outline. A recalculated region determining device determines as a recalculated region a partial region that includes the incorrect outline segment.
    Type: Grant
    Filed: February 13, 2001
    Date of Patent: February 14, 2006
    Assignee: NEC Corporation
    Inventor: Hiroshi Ohta
  • Patent number: 6993184
    Abstract: This invention provides an object extraction method for performing processing for extracting and cutting out a specific object from a sensed image at high speed, and an image sensing apparatus using the method. In this invention, in a method of extracting an object by comparing a sensed image and a standard image, a focusing signal, focal length data, visual axis direction data, and illumination conditions are detected, and the initial size, initial position, or initial color of the standard image is changed on the basis of the detection results, and extraction is started under optimal conditions. In a method of extracting a specific object from the background image, the background image is converted into an image having the same conditions as those of the object image. From a plurality of images obtained under different image sensing conditions, the contour of the object is accurately obtained at high speed.
    Type: Grant
    Filed: October 6, 2003
    Date of Patent: January 31, 2006
    Assignee: Canon Kabushiki Kaisha
    Inventor: Masakazu Matsugu
  • Patent number: 6983071
    Abstract: Image size converter 4 converts the size of the image data stored in image input part 1 to an arbitrary size and stores the converted data. Image enhancer 5 uses the character frame design data stored in character frame information memory 3 to extract, from the image stored in image size converter 4, an image of a region containing character frames, and enhances and stores this extracted image. Image outline detector 6 forms an outline image from the image obtained by image enhancer 5. Character frame center detector 7 uses the outline image to detect the coordinates of the centers of the character frames of the input image data. Character frame remover 8 uses the character frame center coordinates and the character frame design data to remove the character frames, and outputs the result from character image output part 9.
    Type: Grant
    Filed: May 14, 2002
    Date of Patent: January 3, 2006
    Assignee: NEC Corporation
    Inventor: Daisuke Nishiwaki
  • Patent number: 6961464
    Abstract: Character (or letter) information is extracted from source information, word information is extracted from the character information, and a database is created of the word information. Thereby, the created database is adapted for the technical field of the user or a field of interest to the user.
    Type: Grant
    Filed: October 22, 2001
    Date of Patent: November 1, 2005
    Assignee: Sony Computer Entertainment Inc.
    Inventors: Hidetaka Magoshi, Nobuo Sasaki
  • Patent number: 6927774
    Abstract: A character display device and method therefor are adapted to obtain a proximal reference point of each character comprising a character series and calculate display coordinates of each character from said proximal reference point and the display angle and display reference position of the character series.
    Type: Grant
    Filed: December 8, 2000
    Date of Patent: August 9, 2005
    Assignee: Mitsubishi Denki Kabushiki Kaisha
    Inventor: Fumiko Yano
  • Patent number: 6920247
    Abstract: The present invention is a method for recognizing non-English alpha characters that contain diacritics. An image analysis separates the character into its constituent components. The one or more diacritic components are then distinguished and isolated from the base portion of the character. Optical recognition is performed separately on the base portion. The diacritic is recognized through a special image analysis and pattern recognition algorithms. The image analysis extracts geometric information from the one or more diacritic components. The extracted information is used as input for the pattern recognition algorithms. The output is a code that corresponds to a particular diacritic. The recognized base portion and diacritic are combined and a check is performed for acceptable combinations in a chosen language. By separately recognizing the base portion and diacritic, the character sets used by the recognizer can be narrowed, resulting in greater recognition.
    Type: Grant
    Filed: November 1, 2000
    Date of Patent: July 19, 2005
    Assignee: Cardiff Software, Inc.
    Inventors: Isaac Mayzlin, Emily Ann Deere
  • Patent number: 6882743
    Abstract: There is provided a method for automatically segmenting lung nodules in a three-dimensional (3D) Computed Tomography (CT) volume dataset. An input is received corresponding to a user-selected point near a boundary of a nodule. A model is constructed of the nodule from the user-selected point, the model being a deformable circle having a set of parameters ? that represent a shape of the nodule. Continuous parts of the boundary and discontinuities of the boundary are estimated until the set of parameters ? converges, using dynamic programming and Expectation Maximization (EM). The nodule is segmented, based on estimates of the continuous parts of the boundary and the discontinuities of the boundary.
    Type: Grant
    Filed: November 29, 2001
    Date of Patent: April 19, 2005
    Assignee: Siemens Corporate Research, Inc.
    Inventors: Ravi Bansal, Ning Xu
  • Patent number: 6867875
    Abstract: The user circles the fax number or name of the receiving party and the system then uses the circled number or name to dial the receiving party's fax machine to initiate the transmission. The system scans the first page or cover page of the document to be faxed and uses a computer-implemented algorithm to detect the user-circled region. The image within the user-circled region is then extracted and optical character recognition performed to ascertain the fax number. Alternatively, the user-circled region may contain a name or other information to access a database where the fax number is stored.
    Type: Grant
    Filed: December 6, 1999
    Date of Patent: March 15, 2005
    Assignee: Matsushita Electric Industrial Co., Ltd.
    Inventors: Junichi Kanai, Terry J. Nelson
  • Patent number: 6856697
    Abstract: A character reading technique recognizes character strings in grayscale images where characters within such strings have poor contrast, are variable in position or rotation with respect to other characters in the string, or where portions of characters in the string are partially obscured. The method improves classification accuracy by improving the robustness of the underlying correlation operation or the character design. Characters are divided into regions before performing correlations. Based upon the relative individual region results, region results are combined into a whole character result. Using the characters that are read, a running checksum is computed and, based upon the checksum result, characters are replaced to produce a valid result.
    Type: Grant
    Filed: December 29, 2003
    Date of Patent: February 15, 2005
    Inventors: Shih-Jong J. Lee, Piloco Louis
  • Patent number: 6853749
    Abstract: A character recognition section generates character recognition result information resulting from character recognition of image information. An image information cutout section cuts out character recognition image information, corresponding to an area as to which the character recognition is performed, from the image information. A recognition result generation section generates recognition result information which is composed of the character recognition result information and the character recognition image information. A recognition result transmission section transmits the recognition result information to other terminals using electronic mail. As a result, an information communications apparatus of the invention can make transmissions of information to the wide area, without increasing the network load, which is to be used for the determination of whether or not a character recognition has been accurately performed.
    Type: Grant
    Filed: November 29, 2001
    Date of Patent: February 8, 2005
    Assignee: Panasonic Communications Co. Ltd.
    Inventors: Shinichi Watanabe, Hideki Honma
  • Patent number: 6847734
    Abstract: In word recognition using the character recognition result, recognition processing is performed for an input character string that corresponds to a word to be recognized, a probability at which characteristics obtained as the result of character recognition are generated by conditioning characters of words contained in a word dictionary that stores in advance candidates of words to be recognized. The thus obtained probability is divided by a probability at which characteristics obtained as the result of character recognition are generated, and each of the division results obtained relevant to the characters of the words contained in the word dictionary is multiplied relevant to all the characters. The recognition results of the above words are obtained based on the multiplication results.
    Type: Grant
    Filed: January 26, 2001
    Date of Patent: January 25, 2005
    Assignee: Kabushiki Kaisha Toshiba
    Inventor: Tomoyuki Hamamura
  • Patent number: 6798895
    Abstract: A character string extraction apparatus comprises: a connected component (CC) detector for detecting, in a binary image, connected components (CC) comprising black pixels; a character-sized connected component (CharCC) extraction unit for extracting character-sized connected components (CharCC) having an appropriate size from the detected connected components; a horizontal extension unit and a vertical extension unit for extending the extracted character-sized connected components in an assumed character string direction, and for reducing the character-sized connected components in a direction perpendicular to the assumed character string direction; long connected component (LongCC) extraction units and for connecting a plurality of the thus obtained connected components in the assumed character string direction, and for extracting a long connected component; and a character string selector for employing the extracted long connected component to determine a character string for image recognition.
    Type: Grant
    Filed: October 5, 2000
    Date of Patent: September 28, 2004
    Assignee: International Business Machines Corporation
    Inventor: Hiroyasu Takahashi
  • Patent number: 6798906
    Abstract: The present invention provides an image processing apparatus and method that enables extraction of line segments of an arbitrary width from multi-valued images not uniform in background. To extract line segment data constituting a line segment, image data is scanned using a line segment basic element to extract line segment data from the image data. In other words, pixel data included in the line segment basic element is used as one unit and it is judged for each unit whether the pixel data corresponds to line segment data. Thereby, even if the densities of pixel data corresponding to, e.g., backgrounds are not uniform, by judging the line segment basic element as one unit, line segment data of a line segment width to be extracted can be extracted free of the influence of the densities being not uniform.
    Type: Grant
    Filed: June 13, 2000
    Date of Patent: September 28, 2004
    Assignee: Fuji Xerox Co., Ltd.
    Inventor: Masahiro Kato
  • Publication number: 20040179734
    Abstract: A labeling process unit groups a continuous black pixel area as one group in the binary image data read by an image input device, and extracts the group bounding rectangle information about the group. A row extracting process unit extracts row rectangle information from the position information about the extracted group bounding rectangle. An overlap integrating process unit determines the overlap between the group bounding rectangles contained in the extracted row rectangle, and performs an overlap integrating process of integrating overlapping groups into one group. The ratio of the number of group bounding rectangles contained in the row rectangle before performing the overlap integrating process to the number of the group bounding rectangles contained in the row rectangle after performing the overlap integrating process is obtained, and the language of the characters written in the original is determined based on the difference in ratio.
    Type: Application
    Filed: March 4, 2004
    Publication date: September 16, 2004
    Applicant: PFU Limited
    Inventor: Nobuyuki Okubo
  • Patent number: 6788814
    Abstract: Methods and apparatus for creating a skeletal representation (400A) of a pixel image (100) composed of connected components (110 and 120). The skeletal representation (400A) is obtained by dividing each connected component (110) into a line segment having plural slices, calculating a minimal bounding rectangle (MBR) of each line segment, replacing each line segment with a thin line approximately formed by centroid pixels of the slices (112S) inside the MBR, and connecting the resulting thin lines (410 and 420). One of the many benefits of using the disclosed methods and apparatus is that the resulting thin lined graph (400A), i.e., the skeletal representation, is isomorphic to the original pixel image (100).
    Type: Grant
    Filed: October 2, 2000
    Date of Patent: September 7, 2004
    Assignee: Canon Kabushiki Kaisha
    Inventor: Radovan V. Krtolica
  • Patent number: 6782509
    Abstract: A method and a system for embedding information in document data that include text written in a page description language. First, an analysis is made of the layout of the document data in which information is to be embedded. Then, based on the analysis of the layout, a sequence of locations is generated whereat the information is to be embedded. A page description of the text at a determined location is changed in accordance with the embedded information. As a result, the information is embedded in document data that include text written in a page description language. The sequence of locations is generated by producing a string of sequential pseudo-random numbers.
    Type: Grant
    Filed: September 15, 1999
    Date of Patent: August 24, 2004
    Assignee: International Business Machines Corporation
    Inventors: Yuki Hirayama, Tomio Amano, Shuichi Shimizu, Norishige Morimoto
  • Patent number: 6778690
    Abstract: A fast semi-automatic prostate contouring method is provided using model-based initialization and an efficient Discrete Dynamic Contour (DDC) for boundary refinement. The user initiates the process of the preferred embodiment by identifying four (4) points on the prostate boundary, thereby scaling and shaping a prostate model, and then the final prostate contour is refined with a DDC. The method of the present invention has particular application during the pre-implant planning phase of a brachytherapy procedure. However, this method also has uses in any phase of dose planning in the brachytherapy procedure or any other therapy approach.
    Type: Grant
    Filed: August 13, 1999
    Date of Patent: August 17, 2004
    Inventors: Hanif M. Ladak, Aaron Fenster, Donal B. Downey, David A. Steinman
  • Patent number: 6776542
    Abstract: A ticket issuing system for facilitating processing and printing of citations. The ticket issuing system includes a housing being designed for being held in a hand of a user. An information assembly is positioned in the housing. The information assembly is designed for gathering information about a vehicle and the violation. A printing assembly designed for printing the citation to be issued to the motorist. The printing assembly is operationally coupled to the information assembly whereby the printing assembly gathers information from the information assembly to print the citation.
    Type: Grant
    Filed: March 28, 2003
    Date of Patent: August 17, 2004
    Inventor: Keith Kearney
  • Publication number: 20040146200
    Abstract: A method and computer program product are provided for classifying a character string. A plurality of candidate segmentations are determined for the character string, each ranked according to an associated score. At least two of the candidate segmentations are provided to a pattern recognition classifier. The character string is classified according to the highest-ranked candidate segmentation to obtain a first classified character string. An acceptor determines if the first classified character string is a valid character string. The classifier iteratively reclassifies the character string according to the ranked candidate segmentations until a valid character string is obtained if the first classified character string is not a valid character string.
    Type: Application
    Filed: January 29, 2003
    Publication date: July 29, 2004
    Applicant: Lockheed Martin Corporation
    Inventors: Richard S. Andel, Edward G. Ovando
  • Patent number: 6754391
    Abstract: Systems and methods for rendering image-based data are disclosed. A representative system includes a data interface that receives a remotely-generated data stream; a data manager coupled to the data interface, the data manager configured to translate the remotely-generated data stream into a plurality of word blocks, wherein the data manager determines for each word block of interest whether an active line can accommodate an entire word block of interest prior to registering the word block with the active line and wherein the data manager increments the active line in response to a determination that the word block of interest would not be accommodated on the active line; and a display device coupled to the data manager, the display device configured to render the plurality of word blocks.
    Type: Grant
    Filed: June 25, 2002
    Date of Patent: June 22, 2004
    Assignee: Hewlett-Packard Development Company, LP.
    Inventor: Frank P Carau, Sr.
  • Patent number: 6754385
    Abstract: A ruled line extracting apparatus obtains circumscribed rectangles of pixel concatenation regions included in an input pattern, and calculates the most frequent value of their heights. Additionally, the apparatus integrates segments by ignoring a wild card segment, and calculates the most frequent value of height/width of extracted straight lines and segments structuring the straight line. Next, it performs a process for integrating/deleting straight lines using each threshold value based on the highest frequency value. Then, it checks/deletes a straight line according to a distribution of black pixels around the straight line, and recognizes the remaining straight lines as ruled line candidates.
    Type: Grant
    Filed: January 8, 2001
    Date of Patent: June 22, 2004
    Assignee: Fujitsu Limited
    Inventor: Yutaka Katsuyama
  • Patent number: 6754386
    Abstract: In a computing device that receives handwritten data, a method and system that corrects for parser segmentation errors by sending an entire line of ink to a recognizer, and then comparing, on a word-by-word basis, the initial segmentation guesses of the parser with the more-thoroughly recognized segmentation results of the handwriting recognition engine. In the correction process, the ink words are efficiently adjusted with relatively little data manipulation. As the recognizer is fed a series of strokes on a line, the recognizer returns segmentation information. For ink word breaks that are the same for any given set of data, the existing ink word is unchanged. For ink words that are recognized differently relative to their initial segmentation, one or more new ink words are created and the handwriting (including stroke) data of the parser's ink word is manipulated to create a new ink processor word (or words) to match the recognizer output.
    Type: Grant
    Filed: August 22, 2000
    Date of Patent: June 22, 2004
    Assignee: Microsft Corporation
    Inventors: Peter H. Williamson, Charlton E. Lui, Dan W. Altman
  • Patent number: 6735337
    Abstract: A character reading technique recognizes character strings in grayscale images where characters within such strings have poor contrast, are variable in position or rotation with respect to other characters in the string, or where portions of characters in the string are partially obscured. The method improves classification accuracy by improving the robustness of the underlying correlation operation. Characters are divided into regions before performing correlations. Based upon the relative individual region results, region results are combined into a whole character result. Using the characters that are read, a running checksum is computed and, based upon the checksum result, characters are replaced to produce a valid result.
    Type: Grant
    Filed: February 2, 2001
    Date of Patent: May 11, 2004
    Inventors: Shih-Jong J. Lee, Louis Piloco
  • Publication number: 20040086179
    Abstract: A method of post-processing character data from an optical character recognition (OCR) engine and apparatus to perform the method. This exemplary method includes segmenting the character data into a set of initial words. The set of initial words is word level processed to determine at least one candidate word corresponding to each initial word. The set of initial words is segmented into a set of sentences. Each sentence in the set of sentences includes a plurality of initial words and candidate words corresponding to the initial words. A sentence is selected from the set of sentences. The selected sentence is word disambiguity processed to determine a plurality of final words. A final word is selected from the at least one candidate word corresponding to a matching initial word. The plurality of final words is then assembled as post-processed OCR data.
    Type: Application
    Filed: November 4, 2002
    Publication date: May 6, 2004
    Inventors: Yue Ma, Jinhong Katherine Guo, Mu Li, Yu-kun Tong, Tian-shun Yao, Jing-bo Zhu
  • Patent number: 6721451
    Abstract: A character line is identified with a keyword included in the relevant logical element of a keyword dictionary, and the identification result is output to a “keyword identification part” to store the result. All the character lines processed through a “division process part” are output to a “tagging modification part” to replace tags so that there is no logically inconsistent in adjacent character lines, when tags given to adjacent character lines are in logically inconsistent as a combination of tags. A character line to which an appropriate tag may not be given in the “tagging modification part” is output to an “indecisive tag estimation part” where an appropriate tag is estimated and is given with reference to a tag adjacent the character lines. The process in the “indecisive tag estimation part” is applied to all the character lines, and repeated till there is no replacement of tags.
    Type: Grant
    Filed: August 30, 2000
    Date of Patent: April 13, 2004
    Assignee: Kabushiki Kaisha Toshiba
    Inventor: Yasuto Ishitani
  • Patent number: 6718059
    Abstract: An image processing system includes input of image data, performance of block selection processing on the input image data to determine types of pixel data within the image data, a first determining step of determining, based on the block selection processing, if subject pixel data represents a text pixel, a second determining step of determining if the subject pixel data represents an edge pixel, performance of a first processing on the subject pixel data in a case that the subject pixel data is determined to represent a text pixel and an edge pixel, and performance of a second processing on the subject pixel data in a case that the subject pixel data is not determined to represent a text pixel and is not determined to represent an edge pixel.
    Type: Grant
    Filed: December 10, 1999
    Date of Patent: April 6, 2004
    Assignee: Canon Kabushiki Kaisha
    Inventor: Yoshiki Uchida
  • Patent number: 6701015
    Abstract: A character string extraction apparatus extracts an aggregate of basic components from a document image, such as a binary image, gray scale image, color image, etc., and judges whether each basic component is a character component using an inclusion relationship between the basic components. Then, the character string extraction apparatus extracts an aggregate of character components based on the judgment result and extracts a character string from the aggregate of character components.
    Type: Grant
    Filed: September 25, 2001
    Date of Patent: March 2, 2004
    Assignee: Fujitsu Limited
    Inventors: Katsuhito Fujimoto, Hiroshi Kamada
  • Patent number: 6694055
    Abstract: A word segmentation method to identify proper names in input text includes locating a sequence of single-characters in the input text not forming part of a multiple-character word. The method further includes comparing the sequence of single-characters to a lexical knowledge base to identify if a first portion of the sequence corresponds to stored identifiable portions of a proper name, and comparing the sequence of single-characters to the lexical knowledge base to identify if a second portion of the sequence proximate the first portion includes characters known to comprise a second portion of a proper name. Instructions can be provided on a computer readable medium to implement the method.
    Type: Grant
    Filed: July 15, 1998
    Date of Patent: February 17, 2004
    Assignee: Microsoft Corporation
    Inventor: Andi Wu
  • Patent number: 6687612
    Abstract: A data collection system including a data glove is used by a researcher to safely and efficiently collect and input data for a geographic database. The data collection system is transported by the researcher along roads in a geographic area. As the data collection system is being transported, the positions of the data collection system are determined. The researcher inputs data into the data collection system through the glove by hand and/or finger gestures. The data collection system stores data indicating a determined position with a type of data associated with the gestures. The data obtained by the data collection system is displayed so that the researcher can add or modify data records in the geographic database. The data obtained by the data collection system is used to add data to or modify data in the geographic database.
    Type: Grant
    Filed: March 24, 2003
    Date of Patent: February 3, 2004
    Assignee: Navigation Technologies Corp.
    Inventor: Kevin Cherveny
  • Patent number: 6681047
    Abstract: The current edge pair determination system and method determine whether or not an edge pair and certain image characteristics of pixels between the edge pair as a whole or a unit is a part of a character based upon a change in pixel values. The change is compared to a predetermined set of conditions that are exclusionary in nature. Since the edge pair as well the image characteristics of pixels between the edge pair are considered, the accuracy in the character edge determination has improved.
    Type: Grant
    Filed: May 31, 2000
    Date of Patent: January 20, 2004
    Assignee: Ricoh Co., Ltd.
    Inventor: Takashi Saito
  • Patent number: 6678409
    Abstract: The present invention segments a non-segmented input text. The input text is received and segmented based on parameter values associated with parameterized word formation rules. In one illustrative embodiment, the input text is processed into a form which includes parameter indications, but which preserves the word-internal structure of the input text. Thus, the parameter values can be changed without entirely re-processing the input text.
    Type: Grant
    Filed: January 14, 2000
    Date of Patent: January 13, 2004
    Assignee: Microsoft Corporation
    Inventors: Andi Wu, Zixin Jiang
  • Patent number: 6665436
    Abstract: This invention discloses a method for automatically segmenting and recognizing Chinese character strings continuously written by a user in a handwritten Chinese character processing system, comprising the steps of: creating a geometry model and a language mode; finding out all of potential segmentation schemes in the Chinese character strings continuously written by a user based on the associated timing information and said geometry model; recognizing the groups of strokes as defined by each of potential segmentation schemes and computing the probability characterizing the exactness of recognition results; correcting the probability characterizing the exactness of recognition results by said language model; and, selecting the recognition result and the corresponding segmentation scheme having the maximum probability value.
    Type: Grant
    Filed: January 23, 2003
    Date of Patent: December 16, 2003
    Assignee: International Business Machines Corporation
    Inventors: Hui Su, Donald T. Tang, Qian Ying Wang
  • Patent number: 6661919
    Abstract: A system for producing a raster image derived from coded and non-coded portions of a hybrid data structure from an input bitmap including (1) a data processing apparatus, (2) a recognizer which performs recognition on an input bitmap to the data processing apparatus to detect identifiable objects within the input bitmap, (3) a mechanism for producing a hybrid data structure including coded data corresponding to the identifiable objects and non-coded data derived from portions of the input bitmap which do not correspond to the identifiable objects, and (4) an output device capable of developing a visually perceptible raster image derived from the hybrid data structure. The raster image includes raster images of the identifiable objects and raster images derived from portions of the input bitmap that do not correspond to the identifiable objects.
    Type: Grant
    Filed: January 25, 2002
    Date of Patent: December 9, 2003
    Assignee: Adobe Systems Incorporated
    Inventors: Dennis G. Nicholson, James C. King
  • Patent number: 6658151
    Abstract: A method and apparatus for extracting information from symbolically compressed document images. A deciphering module generates first and second text strings by deciphering respective sequences of template identifiers in first and second symbolically compressed document images. A conditional n-gram module receives the first and second text strings from the deciphering module and extracts n-gram terms therefrom based on a predicate condition. A comparison module generates a measure of similarity between the first and second symbolically compressed document images based on the n-gram terms extracted by the conditional n-gram module.
    Type: Grant
    Filed: April 8, 1999
    Date of Patent: December 2, 2003
    Assignee: Ricoh Co., Ltd.
    Inventors: Dar-Shyang Lee, Jonathan J. Hull
  • Patent number: 6640006
    Abstract: The present invention provides a facility for selecting from a sequence of natural language characters combinations of characters that may be words. The facility uses indications, for each of a plurality of characters, of (a) the characters that occur in the second position of words that begin with the character and (b) the positions in which the character occurs in words. For each of a plurality of contiguous combinations of characters occurring in the sequence, the facility determines whether the character occurring in the second position of the combination is indicated to occur in words that begin with the character occurring in the first position of the combination. If so, the facility determines whether every character of the combination is indicated to occur in words in a position in which it occurs in the combination. If so, the facility determines that the combination of characters may be a word.
    Type: Grant
    Filed: May 29, 1998
    Date of Patent: October 28, 2003
    Assignee: Microsoft Corporation
    Inventors: Andi Wu, Stephen D. Richardson, Zixin Jiang
  • Patent number: 6625335
    Abstract: A keyword assignment system is provided to assign keywords when a digitized image of a document is created. The keyword assignment system includes a digitizer to generate the digitized image from the input document. A keyword entry system determines a keyword to be associated with the digitized image. A linker generates linking information that associates the keyword with the digitized image. A database is provided to store the digitized image and linking information.
    Type: Grant
    Filed: May 11, 2000
    Date of Patent: September 23, 2003
    Assignee: Matsushita Electric Industrial Co., Ltd.
    Inventor: Junichi Kanai
  • Patent number: 6621930
    Abstract: An electronic device automatically classifies documents based upon textual content. Documents may be classified into document categories. Statistical characteristics are gathered for each document category and these statistical characteristics are used as a frame of reference in determining how to classify the document. The document categories may be intersecting or non-intersecting. A neutral category is used to represent documents that do not fit into many of the other specified categories. The statistical characteristics for an input document are compared with those for the document category and for the neutral category in making a determination on how to categorize the document. This approach is extensible, generalizable and efficient.
    Type: Grant
    Filed: August 9, 2000
    Date of Patent: September 16, 2003
    Assignee: Elron Software, Inc.
    Inventor: Frank Smadja
  • Patent number: 6600834
    Abstract: This invention discloses a handwriting information processing system, comprising handwriting information input means and handwriting information recognition means, said handwriting information input means and said handwriting information recognition means co-operate for accepting and recognizing a user's handwriting input, said handwriting information processing system is characterized by further comprising a character segmentation user interface for accepting the definitions of text/picture areas, handwriting lines and character boundaries from a user.
    Type: Grant
    Filed: January 12, 2000
    Date of Patent: July 29, 2003
    Assignee: International Business Machines Corporation
    Inventors: Hui Su, Donald T. Tang, Qian Ying Wang
  • Publication number: 20030128875
    Abstract: An image capture device electronically captures and selects a desired text portion. A visible light image is projected onto text on a document to enable a user to position the light image relative to the desired text portion. At least some of the text is captured in accordance with a position of the light image. A text selector selects the desired text portion from the captured text.
    Type: Application
    Filed: December 6, 2002
    Publication date: July 10, 2003
    Inventors: Maurizio Pilu, Guy de Warrenne Bruce Adams
  • Publication number: 20030123730
    Abstract: In a document recognition system, a document structure analysis unit extracts a character image region from an input document image. A character string extraction unit extracts a character string image from the character image region. A character extraction unit extracts an individual character image from the character string image expressed in vertical lines by vertical line adjacency graphs by changing a pixel representation of the extracted character string image into a vertical line representation thereof. A character recognition unit recognizes each character in the individual character image and converting the recognized character into a corresponding character code.
    Type: Application
    Filed: December 27, 2002
    Publication date: July 3, 2003
    Inventors: Doo Sik Kim, Ho Yon Kim, Kil Taek Lim, Jae Gwan Song, Yun Seok Nam, Hye Kyu Kim
  • Publication number: 20030108239
    Abstract: This invention discloses a method for automatically segmenting and recognizing Chinese character strings continuously written by a user in a handwritten Chinese character processing system, comprising the steps of: creating a geometry model and a language mode; finding out all of potential segmentation schemes in the Chinese character strings continuously written by a user based on the associated timing information and said geometry model; recognizing the groups of strokes as defined by each of potential segmentation schemes and computing the probability characterizing the exactness of recognition results; correcting the probability characterizing the exactness of recognition results by said language model; and, selecting the recognition result and the corresponding segmentation scheme having the maximum probability value.
    Type: Application
    Filed: January 23, 2003
    Publication date: June 12, 2003
    Applicant: International Business Machines Corporation
    Inventors: Hui Su, Donald T. Tang, Qian Ying Wang
  • Patent number: 6567545
    Abstract: Disclosed is a format recognition method, apparatus and its storage medium for automatically recognizing the format of a form, whereby the format is automatically determined by examining the arrangement of the smallest rectangles. According to the present invention, the smallest rectangles are extracted from a form, and the positional relationship of these rectangles is obtained. The attribute of the smallest rectangle is determined from the positional relationship. In accordance with the attribute, the smallest rectangles are sorted into a headline portion and a data portion, and a character string in the data portion is recognized.
    Type: Grant
    Filed: October 20, 1999
    Date of Patent: May 20, 2003
    Assignee: Fujitsu Limited
    Inventors: Katsutoshi Kobara, Shinichi Eguchi, Koichi Chiba, Kouichi Kanamoto, Maki Yabuki, Yutaka Katsumata
  • Patent number: 6563949
    Abstract: The connected elements of an input image are obtained and grouped based on the relative positions of the connected elements and the similarity in thickness. Then, the character recognition level of a group is obtained by performing a character recognizing process. The obtained character recognition level is weighted by the area of a rectangular area. Using a total of the weighted values as an evaluation value of the group, the evaluation value is obtained for all combinations in all groups. The combination of the groups having the highest evaluation value is extracted as a character string.
    Type: Grant
    Filed: November 24, 1998
    Date of Patent: May 13, 2003
    Assignee: Fujitsu Limited
    Inventor: Hiroaki Takebe
  • Patent number: 6564144
    Abstract: A data collection system including a data glove is used by a researcher to safely and efficiently collect and input data for a geographic database. The data collection system is transported by the researcher along roads in a geographic area. As the data collection system is being transported, the positions of the data collection system are determined. The researcher inputs data into the data collection system through the glove by hand and/or finger gestures. The data collection system stores data indicating a determined position with a type of data associated with the gestures. The data obtained by the data collection system is displayed so that the researcher can add or modify data records in the geographic database. The data obtained by the data collection system is used to add data to or modify data in the geographic database.
    Type: Grant
    Filed: January 10, 2002
    Date of Patent: May 13, 2003
    Assignee: Navigation Technologies Corporation
    Inventor: Kevin Cherveny
  • Publication number: 20030086610
    Abstract: A method for dividing a character image into lines, comprising the following steps: segment-dividing step for, in term of pixels, dividing a character image into a plurality of character image segments arranged sided by side, each segment having a predetermined width; pixel distribution statistic step for obtaining the pixel distribution statistic in each image segment, namely the number of black pixels in each pixel-row of the segment, and obtaining the pixel distribution statistic in the whole image, namely the number of black pixels in each pixel-row of the whole image; segment block forming step for dividing the image segment into segment blocks according to the pixel distribution statistic of the image segments and the pixel distribution statistic of the whole image obtained in the pixel distribution statistic step; line images forming step for integrating the divided segment blocks into line images.
    Type: Application
    Filed: September 5, 2002
    Publication date: May 8, 2003
    Inventors: Zhaohai Luo, Yi Li
  • Patent number: 6542635
    Abstract: Document type comparison and classification using layout classification is accomplished by first segmenting a document page into blocks of text and white space. A grid of rows and columns, forming bins, is created on the page to intersect the blocks. Layout information is identified using a unique fixed length interval vector, to represent each row on the segmented document. By computing the Manhattan distance between interval vectors of all rows of two document pages and performing a warping function to determine the row to row correspondence, two documents may be compared by their layout. Furthermore, interval vectors may be grouped into N clusters with a cluster center, defined as the median of the interval vectors of the cluster, replacing each interval vector in its cluster. Using Hidden Markov Models, documents can be compared to document type models comprising rows represented by cluster centers and identified as belonging to one or more document types.
    Type: Grant
    Filed: September 8, 1999
    Date of Patent: April 1, 2003
    Assignee: Lucent Technologies Inc.
    Inventors: Jianying Hu, Ramanujan S. Kashi, Gordon Thomas Wilfong
  • Patent number: 6539117
    Abstract: A communications system for rendering image based data includes a data interface, a display device, and a data manager. The data interface receives image based data that is used by the display device to display an image. The data manager identifies word blocks defined by the received data. The data manager uses the word blocks to define a first row of the image. In this regard, the data manager determines whether images respectively defined by each of the word blocks would be visible if the word blocks are rendered to the first row of the display screen. In response to a determination that an image associated with one of the word blocks would not be visible if the one word block is rendered to the first row of the display screen, the data manager defines a second row and renders the one word block to the second row.
    Type: Grant
    Filed: April 12, 1999
    Date of Patent: March 25, 2003
    Assignee: Hewlett-Packard Company
    Inventor: Frank P Carau, Sr.
  • Patent number: 6539116
    Abstract: The structure of entered document image data is analyzed and a character string in a text block that has been analyzed is subjected to pattern recognition. Synonyms and equivalents of words obtained as results of language analysis are extracted and words obtained as results of language analysis are converted to words of another language. A character string in a text block that has been analyzed is translated to another language. At least results of analyzing the structure of document image data, results of character recognition and results of language analysis are stored, and at least one of the results of extraction, results of conversion and results of translation are stored in a RAM in association with the results of character recognition.
    Type: Grant
    Filed: October 2, 1998
    Date of Patent: March 25, 2003
    Assignee: Canon Kabushiki Kaisha
    Inventor: Makoto Takaoka