Context Analysis Or Word Recognition (e.g., Character String) Patents (Class 382/229)
  • Patent number: 6658151
    Abstract: A method and apparatus for extracting information from symbolically compressed document images. A deciphering module generates first and second text strings by deciphering respective sequences of template identifiers in first and second symbolically compressed document images. A conditional n-gram module receives the first and second text strings from the deciphering module and extracts n-gram terms therefrom based on a predicate condition. A comparison module generates a measure of similarity between the first and second symbolically compressed document images based on the n-gram terms extracted by the conditional n-gram module.
    Type: Grant
    Filed: April 8, 1999
    Date of Patent: December 2, 2003
    Assignee: Ricoh Co., Ltd.
    Inventors: Dar-Shyang Lee, Jonathan J. Hull
  • Patent number: 6650362
    Abstract: In connection with the detection of an amount of movement of an image such as a document image having a small gradient of brightness, it has been difficult to detect the amount by a representative point method, and a block matching method requires much time. An image (F1) at a time (T1), which has been imaged by moving an imaging section (1), is taken into a memory (2) and a feature point extracting section (3), where feature points of the image (F1) are extracted, and an image (F2) at a start time (T2) of the subsequent frame is taken in the memory (2) and the feature point extracting section (3). A correlation operating section (5) operates feature points of the image (F2) and the image (F1) in an area designated by a search area deciding section (4) to output an amount of movement.
    Type: Grant
    Filed: March 18, 1999
    Date of Patent: November 18, 2003
    Assignee: Sharp Kabushiki Kaisha
    Inventors: Yasuhisa Nakamura, Yoshihiro Kitamura, Hiroshi Akagi
  • Patent number: 6643401
    Abstract: A character pattern is extracted from image data read from a document, listing, etc., and discriminated between a hand-written character and a typed character by a hand-written/typed character discrimination unit. The hand-written/typed character discrimination unit obtains, from the character pattern, N feature vectors containing a feature indicating at least the complexity and the linearity of the character pattern; and discriminating the character pattern between a hand-written character and a typed character using the feature vectors. A character recognition unit performs a character recognizing process based on the result of discriminating whether the character data is a hand-written character or a typed character. As a feature of the above described character pattern, the variance of line widths, the variance of character positions, etc. can also be used.
    Type: Grant
    Filed: June 24, 1999
    Date of Patent: November 4, 2003
    Assignee: Fujitsu Limited
    Inventors: Junji Kashioka, Satoshi Naoi
  • Patent number: 6640010
    Abstract: An image processing technique for selecting a text region from an image is described. Character and formatting information for each word in the image is used to determine an active region for each word in the image. For a preferred embodiment of the present invention, the character and formatting information is derived during optical character recognition (OCR). A first and last word within a selected text region is identified based on at least one active region associated with at least one word within the selected text region. Using the first and last words within the selected text region, all words within the selected text region are identified. An image of the selected text region may be displayed. Text contained within the selected text region may be copied to an application program.
    Type: Grant
    Filed: November 12, 1999
    Date of Patent: October 28, 2003
    Assignee: Xerox Corporation
    Inventors: Mauritius Seeger, Christopher R. Dance, Stuart A. Taylor, William M. Newman
  • Publication number: 20030190077
    Abstract: Methods of organizing a series of sibling data entities in a digital computer are provided for preserving sibling ranking information associated with the sibling data entities and for attaching the sibling ranking information to a joint parent of the sibling data entities to facilitate on-demand generation of ranked parent candidates. A rollup function of the present invention builds a rollup matrix (126) that embodies information about the sibling entities and the sibling ranking information and provides a method for reading out the ranked parent candidates from the rollup matrix in order of their parent confidences (141). Parent confidences are based on the sibling ranking information, either alone or in combination with n-gram dictionary ranking or other ranking information.
    Type: Application
    Filed: April 8, 2003
    Publication date: October 9, 2003
    Applicant: RAF Technology, Inc.
    Inventors: David Justin Ross, Stephen E.M. Billester, Brent R. Smith
  • Publication number: 20030185448
    Abstract: An image processing technique for selecting a text region from an image is described. Character and formatting information for each word in the image is used to determine an active region for each word in the image. For a preferred embodiment of the present invention, the character and formatting information is derived during optical character recognition (OCR). A first and last word within a selected text region is identified based on at least one active region associated with at least one word within the selected text region. Using the first and last words within the selected text region, all words within the selected text region are identified. An image of the selected text region may be displayed. Text contained within the selected text region may be copied to an application program.
    Type: Application
    Filed: November 12, 1999
    Publication date: October 2, 2003
    Inventors: MAURITIUS SEEGER, CHRISTOPHER R DANCE, STUART A TAYLOR, WILLIAM M NEWMAN
  • Patent number: 6626960
    Abstract: Disclosed is a system, method, and program for generating a table for use by a computer in determining a location of a boundary, such as a word boundary, between two characters in text. A first table indicates a boundary between characters when processing text in a first direction, such as the forward direction. A second table is generated based on the content of the first table. The second table can be used to determine whether one boundary is located between any two consecutive characters processed in a second direct ion, such as the backward direction.
    Type: Grant
    Filed: September 1, 1999
    Date of Patent: September 30, 2003
    Assignee: International Business Machines Corporation
    Inventor: Richard Theodore Gillam
  • Patent number: 6621930
    Abstract: An electronic device automatically classifies documents based upon textual content. Documents may be classified into document categories. Statistical characteristics are gathered for each document category and these statistical characteristics are used as a frame of reference in determining how to classify the document. The document categories may be intersecting or non-intersecting. A neutral category is used to represent documents that do not fit into many of the other specified categories. The statistical characteristics for an input document are compared with those for the document category and for the neutral category in making a determination on how to categorize the document. This approach is extensible, generalizable and efficient.
    Type: Grant
    Filed: August 9, 2000
    Date of Patent: September 16, 2003
    Assignee: Elron Software, Inc.
    Inventor: Frank Smadja
  • Publication number: 20030152277
    Abstract: A method and a system by which a document image is analyzed for the purposes of establishing a searchable data structure characterizing ground-truthed contents of the document represented by the document image operates by segmenting a document image into a set of image objects, and linking the image objects with fields that store metadata. Image objects identified by segmenting the document image are grouped into subsets. The image objects are grouped according to characteristics suggesting that the image objects may have common ground-truthed metadata. By grouping the image objects into subsets, the image objects may be indexed to facilitate the ground-truthing process. In some embodiments, the index of representative image objects is presented to the user in a table form. A database of image objects with ground-truthed metadata is formed. Interactive tools and processes facilitate ground-truthing based on paired image objects and metadata.
    Type: Application
    Filed: June 13, 2002
    Publication date: August 14, 2003
    Applicant: Convey Corporation
    Inventors: Floyd Steven Hall, Cameron Telfer Howie
  • Patent number: 6597809
    Abstract: Methods of organizing a series of sibling data entities in a digital computer are provided for preserving sibling ranking information associated with the sibling data entities and for attaching the sibling ranking information to a joint parent of the sibling data entities to facilitate on-demand generation of ranked parent candidates. A rollup function of the present invention builds a rollup matrix (126) that embodies information about the sibling entities and the sibling ranking information and provides a method for reading out the ranked parent candidates from the rollup matrix in order of their parent confidences (141). Parent confidences are based on the sibling ranking information, either alone or in combination with n-gram dictionary ranking or other ranking information.
    Type: Grant
    Filed: March 20, 2000
    Date of Patent: July 22, 2003
    Assignee: RAF Technology, Inc.
    Inventors: David Justin Ross, Stephen E. M. Billester, Brent R. Smith
  • Patent number: 6594393
    Abstract: In a text recognition system, the computational efficiency of a text line image decoding operation is improved by utilizing the characteristic of a graph known as the cut set. The branches of the data structure that represents the image are initially labeled with estimated scores. When estimated scores are used, the decoding operation must perform iteratively on a text line before producing the best path through the data structure. After each iteration, nodes in the best path are re-scored with actual scores. The decoding operation incorporates an operating mode called skip mode.
    Type: Grant
    Filed: May 12, 2000
    Date of Patent: July 15, 2003
    Inventors: Thomas P. Minka, Dan S. Bloomberg, Ashok C. Popat
  • Publication number: 20030103675
    Abstract: Paired image information and text information correlated to each other are retrieved as information sets. Frequency information on words used in text is extracted from text information in a group of information sets, and text information features are extracted based on frequency information. Text features are used to lay out information sets in a virtual space such that similar pieces of text are located close to each other, and images are displayed at those positions. Further, important words are extracted from those words extracted from text information in a group of information sets, and those words are laid out in the virtual space in the same manner as with information sets and displayed as labels.
    Type: Application
    Filed: November 27, 2002
    Publication date: June 5, 2003
    Applicant: Fujitsu Limited
    Inventors: Susumu Endo, Yuusuke Uehara, Daiki Masumoto, Syuuichi Shiitani
  • Patent number: 6573844
    Abstract: Predictive keyboards, such as predictive soft keyboards, are disclosed. In one embodiment, a computer-implemented method predicts at least one key to be entered next within a sequence of keys. The method displays a soft keyboard where the predicted keys are displayed on the soft keyboard differently than the other keys on the keyboard. For example, the predicted keys may be larger in size on the soft keyboard as compared to the other keys. This makes the predicted keys more easily typed by a user as compared to the other keys.
    Type: Grant
    Filed: January 18, 2000
    Date of Patent: June 3, 2003
    Assignee: Microsoft Corporation
    Inventors: Daniel Venolia, Joshua Goodman, Xuedong Huang, Hsiao-Wuen Hon
  • Publication number: 20030099402
    Abstract: A method of analyzing a verbatim text comprising the steps of storing the verbatim text in an electronic memory device and identifying at least one concept in said verbatim text and linking said concept to a code.
    Type: Application
    Filed: March 11, 2002
    Publication date: May 29, 2003
    Inventor: Charles M. Baylis
  • Patent number: 6563956
    Abstract: The present invention provides a data compression method in which a plurality of consecutive characters of a data string to be compressed are set as a character string to be searched for. Bits of a bit string representing the set character string are allocated to at least two codewords. Thus, first and second searching codewords are generated. These first and second codewords are used as array addresses. Fist and second array tables are prepared, in which information on the past occurrence positions of the set character string is previously entered as the contents thereof. When the first and second codewords are generated from the character string to be compressed, the first and second array tables are looked up by using these codewords as the addresses of the arrays. When results of looking up these tables match with each other, it is found that the set character string occurred in the past.
    Type: Grant
    Filed: July 7, 1999
    Date of Patent: May 13, 2003
    Assignee: Fujitsu Limited
    Inventors: Noriko Satoh, Shigeru Yoshida
  • Publication number: 20030086618
    Abstract: Evaluation based on sensitivity of an image that was performed by using the sensibility and the manual work of a person is automatically performed.
    Type: Application
    Filed: July 11, 2002
    Publication date: May 8, 2003
    Applicant: SEIKO EPSON CORPORATION
    Inventor: Michihiro Nagaishi
  • Publication number: 20030086619
    Abstract: The layout of an image that was performed with the help of the sensibility and the manual work of a person is automatically optimized.
    Type: Application
    Filed: July 11, 2002
    Publication date: May 8, 2003
    Applicant: Seiko Epson Corporation
    Inventor: Michihiro Nagaishi
  • Patent number: 6560360
    Abstract: A recognition system is disclosed, including a representation of an object in terms of its constituent parts that is translationally invariant, and which provides scale invariant recognition. The system further provides effective recognition of patterns that are partially present in the input signal, or that are partially occluded, and also provides an effective representation for sequences within the input signal. The system utilizes dynamically determined, context based expectations, for identifying individual features/parts of an object to be recognized. The system is computationally efficient, and capable of highly parallel implementation, and further includes a mechanism for improving the preprocessing of individual sections of an input pattern, either by applying one or more preprocessors selected from a set of several preprocessors, or by changing the parameters within a single preprocessor.
    Type: Grant
    Filed: January 27, 2000
    Date of Patent: May 6, 2003
    Assignees: Nestor, Inc., Brown University Research Foundation
    Inventors: Predrag Neskovic, Douglas L. Reilly, Leon N Cooper
  • Patent number: 6556713
    Abstract: A search result of a search target object is displayed at a high speed. By dividing an image into a plurality of areas and allocating attribute information to each area, only the area including the attribute information showing the search target object is searched and is displayed or transmitted, so that a part of a desired image can be extracted at a high speed.
    Type: Grant
    Filed: July 30, 1998
    Date of Patent: April 29, 2003
    Assignee: Canon Kabushiki Kaisha
    Inventors: Yuji Kobayashi, Kentaro Matsumoto
  • Patent number: 6549662
    Abstract: Characters of data on a document are recognized by automatically determining the definitions of characters of the data from the arrangement of character strings of the data. Character strings on the document are extracted by reading the document, and headers and data on the document are distinguished from each other by determining the positional relationship between the character strings. Character attributes of the data are determined by recognizing characters of the character strings of the headers using a header recognition dictionary. Characters of the character strings of the data are recognized according to the determined character attributes of the data. Since character attributes of the data are determined from recognized characters of the headers after the headers and the data are distinguished from each other from the layout on the document, it is possible to enter automatically the character attributes of the data.
    Type: Grant
    Filed: May 27, 1998
    Date of Patent: April 15, 2003
    Assignee: Fujitsu Limited
    Inventors: Katsutoshi Kobara, Shinichi Eguchi, Yoshihiro Nagano, Hideki Matsuno, Koichi Chiba, Yutaka Katsumata
  • Publication number: 20030068088
    Abstract: A mechanism is provided for magnifying information with contextual information. The user may configure the magnification mechanism to present some contextual information along with the focus being magnified. Particularly, a user may set “look ahead” and “look behind” parameters to specify a number of words or characters to include before and after the magnified word or words. The actual magnified word or words may be distinguished from the contextual information. For example, the word or words being magnified may be magnified to a size that is larger than that of the contextual information. The magnification mechanism may also present a magnified display of image information.
    Type: Application
    Filed: October 4, 2001
    Publication date: April 10, 2003
    Applicant: International Business Machines Corporation
    Inventors: Janani Janakiraman, Rabindranath Dutta
  • Patent number: 6542640
    Abstract: A dictionary in which a character train serving as a processing unit upon compression has been registered is stored into a character train dictionary storing unit. In a character train comparing unit, the registration character train in the character train dictionary storing unit and a partial character train in non-compression data are compared, thereby detecting the coincident partial character train. A code output unit allocates a predetermined code every partial character train detected by the character train comparing unit and outputs. The character train dictionary storing unit allocates character train codes of a fixed length of 17 bits to about 130,000 words and substantially compresses a data amount to the half or less irrespective of an amount of document data.
    Type: Grant
    Filed: June 18, 1998
    Date of Patent: April 1, 2003
    Assignee: Fujitsu Limited
    Inventors: Takashi Morihara, Yahagi Hironori, Satoh Noriko
  • Patent number: 6539116
    Abstract: The structure of entered document image data is analyzed and a character string in a text block that has been analyzed is subjected to pattern recognition. Synonyms and equivalents of words obtained as results of language analysis are extracted and words obtained as results of language analysis are converted to words of another language. A character string in a text block that has been analyzed is translated to another language. At least results of analyzing the structure of document image data, results of character recognition and results of language analysis are stored, and at least one of the results of extraction, results of conversion and results of translation are stored in a RAM in association with the results of character recognition.
    Type: Grant
    Filed: October 2, 1998
    Date of Patent: March 25, 2003
    Assignee: Canon Kabushiki Kaisha
    Inventor: Makoto Takaoka
  • Patent number: 6539117
    Abstract: A communications system for rendering image based data includes a data interface, a display device, and a data manager. The data interface receives image based data that is used by the display device to display an image. The data manager identifies word blocks defined by the received data. The data manager uses the word blocks to define a first row of the image. In this regard, the data manager determines whether images respectively defined by each of the word blocks would be visible if the word blocks are rendered to the first row of the display screen. In response to a determination that an image associated with one of the word blocks would not be visible if the one word block is rendered to the first row of the display screen, the data manager defines a second row and renders the one word block to the second row.
    Type: Grant
    Filed: April 12, 1999
    Date of Patent: March 25, 2003
    Assignee: Hewlett-Packard Company
    Inventor: Frank P Carau, Sr.
  • Patent number: 6539113
    Abstract: The system described herein automatically defines a set of radicals to be used in a Kanji character handwriting recognition system and automatically creates a dictionary of the Kanji characters that are recognized by the system. In performing its functionality, the system described herein first obtains representative handwriting samples for each Kanji character that is to be recognized by the system. The system described herein then evaluates the samples to identify a set of subparts (“radicals”) that are common to at least two of the Kanji characters. These radicals represent component roots from which the characters are formed. Each Kanji character is formed by one or more of these radicals. The radicals that are identified by the system described herein are not constrained to any preset definition (e.g., the traditional set of radicals used to organize Japanese dictionaries).
    Type: Grant
    Filed: December 29, 1999
    Date of Patent: March 25, 2003
    Assignee: Microsoft Corporation
    Inventor: Michael Van Kleeck
  • Patent number: 6539118
    Abstract: An evaluator system accepts input textual messages in unknown languages and assesses which character sets, corresponding to languages, matches that message. Textual messages whose individual characters are encoded in 16 bit Unicode of other universal format are parsed, and character sets which can express each character and the accumulated correspondence is logged. When the character sets against which the message is being tested only provide partial matches, the invention can determine which offers the best fit, including by way of a weighting function. The evaluation technology of the invention can be applied to multipart documents, and to search engines and indices.
    Type: Grant
    Filed: August 27, 1999
    Date of Patent: March 25, 2003
    Assignee: International Business Machines Corporation
    Inventors: Brendan P. Murray, Kuniaki Takizawa
  • Publication number: 20030044068
    Abstract: The invention relates to a mobile device with a built-in image capture device, and a character recognition function to present the information gathered with the character recognition result. With the mobile device, the character line extraction process is displayed whenever necessary, and the resolution of an image to be inputted for recognition processing is enhanced. Accordingly, it is possible for the operator to select the target character line with ease. In addition, the mobile device has a character recognition ratio improved by the enhancement in resolution.
    Type: Application
    Filed: July 23, 2002
    Publication date: March 6, 2003
    Applicant: Hitachi, Ltd.
    Inventors: Tatsuhiko Kagehiro, Minenobu Seki, Hiroshi Sako
  • Patent number: 6526170
    Abstract: A character recognition system is disclosed, In a feature extraction parameter storage section 22 a transformation matrix for reducing a number of dimensions of feature parameters and a codebook for quantization are stored. In an HMM storage section 23 a constitution and parameters of Hidden Markov Model (HMM) for character string expression are stored. A feature extraction section 32 scans a word image given from an image storage means from left to right in a predetermined cycle with a slit having a sufficiently small width than the character width and thus outputs a feature symbol at each predetermined timing. A matching section 33 matches a feature symbol row and a probability maximization HMM state, thereby recognizing the character string.
    Type: Grant
    Filed: December 13, 1994
    Date of Patent: February 25, 2003
    Assignee: NEC Corporation
    Inventor: Shinji Matsumoto
  • Publication number: 20030026459
    Abstract: A system and a method for drawing a patent map using a technical field word are disclosed. In the system and the method, a word to be used for drawing a patent map is extracted by calculating weight values of significant words which are gotten by removing unnecessary words from patent data, and this extracted word is matched with a patent to draw the patent map.
    Type: Application
    Filed: November 29, 2001
    Publication date: February 6, 2003
    Inventors: Jeong Wook Won, Hyoung Bok Lee, Jai Sang Koh
  • Patent number: 6512851
    Abstract: A word recognition device uses an associative memory to store a plurality of coded words in such a way that a weight is associated with each character of the alphabet of the stored words, wherein equal weights correspond to equal characters. To perform the recognition, a dictionary of words is first chosen; this is stored in the associative memory according to a pre-determined code; a string of characters which correspond to a word to be recognized is received; a sequence of weights corresponding to the string of characters received is supplied to the associative memory; the distance between the word to be recognized and at least some of the stored words is calculated in parallel as the sum of the difference between the weights of each character of the word to be recognized and the weights of each character of the stored words; the minimum distance is identified; and the word stored in the associative memory having the minimum distance is stored.
    Type: Grant
    Filed: October 9, 2001
    Date of Patent: January 28, 2003
    Assignee: STMicroelectronics S.r.l.
    Inventors: Loris Navoni, Roberto Canegallo, Mauro Chinosi, Giovanni Gozzini, Alan Kramer, Pierluigi Rolandi
  • Publication number: 20030016874
    Abstract: An image analysis method designed to identify images in a sequence of images that are statistically different in a pre-selected region of interest. The method is suitable when there is no a priori knowledge of the nature of the interesting images. A reference image is used to identify specific regions of the image that may contain interesting changes (Detect Zone), that will not have interesting changes, but can be used to assess image quality (Veto zone), and an unanalyzed region (Ignore zone). To improve the spatial sensitivity, the Detect and Veto zones can be divided into specific cells. The analysis may also be performed on compressed data and another method automatically classifies a cell as either in the Detect zone or Ignore zone. The sensitivity can be further improved by removing periodic feature variation prior to the statistics calculation.
    Type: Application
    Filed: May 31, 2001
    Publication date: January 23, 2003
    Inventors: Kenneth A. Lefler, Wayne L. Kilmer, Yi Zhang
  • Patent number: 6507678
    Abstract: A character string retrieval apparatus classifies a plurality of characters following a prefix of a registration character string into a plurality of groups, and registers those following characters in an array structure using a different displacement amount for each group. The character string retrieval apparatus retrieves a given character string based on the displacement amount of a group corresponding to an input character.
    Type: Grant
    Filed: February 8, 1999
    Date of Patent: January 14, 2003
    Assignee: Fujitsu Limited
    Inventor: Hironori Yahagi
  • Publication number: 20020196166
    Abstract: The present invention provides a data compression method in which a plurality of consecutive characters of a data string to be compressed are set as a character string to be searched for. Bits of a bit string representing the set character string are allocated to at least two codewords. Thus, first and second searching codewords are generated. These first and second codewords are used as array addresses. Fist and second array tables are prepared, in which information on the past occurrence positions of the set character string is previously entered as the contents thereof. When the first and second codewords are generated from the character string to be compressed, the first and second array tables are looked up by using these codewords as the addresses of the arrays. When results of looking up these tables match with each other, it is found that the set character string occurred in the past.
    Type: Application
    Filed: August 29, 2002
    Publication date: December 26, 2002
    Applicant: FUJITSU LIMITED
    Inventors: Noriko Satoh, Shigeru Yoshida
  • Publication number: 20020176628
    Abstract: A document digitizing method digitizes and automatically indexes documents in printed form. The method includes optically scanning the document, forming and storing a digitized image file from the optically scanned document, optically recognizing characters in the optically scanned document, and forming and storing a text file of the optically recognized characters in document. A retrieval method for retrieving the digitized image file for a document includes searching the text files to identify any having a selected text string and providing access to the digitized image files that correspond to those text files. The digital image file and the text file together represent a digitized document data structure that combines a digital image of a document with a text file of optically recognized characters in the digital image.
    Type: Application
    Filed: May 22, 2001
    Publication date: November 28, 2002
    Inventor: Gary K. Starkweather
  • Publication number: 20020172425
    Abstract: Described herein is a technology for recognizing the content of text documents. The technology determines one or more hash values for the content of a text document. Alternatively, the technology may generate a “sifted text” version of a document. In one implementation described herein, document recognition is used to determine whether the content of one document is copied (i.e., plagiarized) from another document. This is done by comparing hash values of documents (or alternatively their sifted text). In another implementation described herein, document recognition is used to categorize the content of a document so that it may be grouped with other documents in the same category. This abstract itself is not intended to limit the scope of this patent. The scope of the present invention is pointed out in the appending claims.
    Type: Application
    Filed: April 24, 2001
    Publication date: November 21, 2002
    Inventors: Ramarathnam Venkatesan, Michael Malkin
  • Publication number: 20020164079
    Abstract: Systems and methods for rendering image-based data are disclosed. A representative system includes a data interface that receives a remotely-generated data stream; a data manager coupled to the data interface, the data manager configured to translate the remotely-generated data stream into a plurality of word blocks, wherein the data manager determines for each word block of interest whether an active line can accommodate an entire word block of interest prior to registering the word block with the active line and wherein the data manager increments the active line in response to a determination that the word block of interest would not be accommodated on the active line; and a display device coupled to the data manager, the display device configured to render the plurality of word blocks.
    Type: Application
    Filed: June 25, 2002
    Publication date: November 7, 2002
    Inventor: Frank P. Carau
  • Publication number: 20020154817
    Abstract: A document image search apparatus generates a text by performing the character recognition of a document image and determines a re-process scope. Then, the apparatus generates a candidate character lattice from the re-recognition result of the re-process scope, generates character strings from the candidate character lattice and adds the character strings to the text. Then, the apparatus performs index search using the text with the character strings added.
    Type: Application
    Filed: September 12, 2001
    Publication date: October 24, 2002
    Applicant: Fujitsu Limited
    Inventors: Yutaka Katsuyama, Satoshi Naoi, Fumihito Nishino
  • Patent number: 6470336
    Abstract: A document search device searches for a keyword in a recognition result obtained by character recognition performed on a document image. The keyword includes at least one first character, and a character code is assigned to each of the at least one first character. The recognition result includes at least one second character, and a character code and a partial area of the document image are assigned to each of the at least one second character.
    Type: Grant
    Filed: August 23, 2000
    Date of Patent: October 22, 2002
    Assignee: Matsushita Electric Industrial Co., Ltd.
    Inventors: Yoshihiko Matsukawa, Taro Imagawa, Kenji Kondo, Tsuyoshi Mekata
  • Publication number: 20020150300
    Abstract: A method and apparatus for extracting information from symbolically compressed document images. A deciphering module generates first and second text strings by deciphering respective sequences of template identifiers in first and second symbolically compressed document images. A conditional n-gram module receives the first and second text strings from the deciphering module and extracts n-gram terms therefrom based on a predicate condition. A comparison module generates a measure of similarity between the first and second symbolically compressed document images based on the n-gram terms extracted by the conditional n-gram module.
    Type: Application
    Filed: April 8, 1999
    Publication date: October 17, 2002
    Inventors: DAR-SHYANG LEE, JONTHAN J. HULL
  • Publication number: 20020141644
    Abstract: A system for conducting fortune telling and character analysis over Internet based on an entered name of any language is provided. The system includes a host, a database, one or more terminals, and a communicating system connecting the host, the database and the terminal to one another. The database is capable of analyzing implied meanings of names of any language entered at the terminal in accordance with traditional Chinese fortune telling theories and thereby providing judgments on good or bad signs possibly represented by the entered names. The implied meanings of names are obtained either from numbers of strokes of the entered names or from meanings or origins of words constituting the entered names.
    Type: Application
    Filed: March 29, 2001
    Publication date: October 3, 2002
    Inventor: Tawei Lin
  • Patent number: 6459810
    Abstract: An exemplary embodiment of the invention is a method for forming variant search strings. The method includes receiving a search string and parsing the search string to locate a mistaken search string character. A mistaken search string character is a character which is confused with other characters. A variant search string is formed in response to a presence of a mistaken search string character in the search string. The search string and variant search string may then be used to search a database. Another exemplary embodiment of the invention is a system for forming variant search strings. The system includes a user interface for receiving a search string. A variant search string generator parses the search string to locate a mistaken search string character. The mistaken search string character is a character which is confused with other characters. The variant search string generator forms a variant search string in response to a presence of a mistaken search string character in the search string.
    Type: Grant
    Filed: September 3, 1999
    Date of Patent: October 1, 2002
    Assignee: International Business Machines Corporation
    Inventor: Christopher T. Cring
  • Patent number: 6453070
    Abstract: Handwritten ink is scanned to identify potential diacriticals. A list of diacriticals (19) is generated by traversing the ink. Potential diacritical-containing characters are processed by scoring them with and without a diacritical to generate a first and second score. The first score is compared to the second score to in order to make a decision as to which variant of the potential diacritical-containing character produced a highest score. The highest score is used as a score for a theory and the decision is recorded. A data structure (50) is added to the theory. Each data unit in the data structure (50) corresponds to an entry in the list of diacriticals (19). As a new theory is created by propagation, contents of the data structure (50) are copied into the new theory. Thus, the data structure (50) is used to ensure that all handwritten ink is used and is used only once.
    Type: Grant
    Filed: March 17, 1998
    Date of Patent: September 17, 2002
    Assignee: Motorola, Inc.
    Inventors: Giovanni Seni, John Seybold
  • Publication number: 20020126904
    Abstract: A character-recognition pre-processing apparatus includes extraction means for extracting an image of a character string to be subjected to character recognition; setting means for setting the smallest rectangle that surrounds the character string image extracted; specifying means for specifying the position of each character within the smallest rectangle set by the setting means; detection means for detecting, at each character position specified, the shortest distance between a character region and the lower edge of the smallest rectangle, and the shortest distance between the character region and the upper edge of the smallest rectangle; and judgment means for judging whether the character string extracted is in an upright state or an inverted state, on the basis of variations in the two shortest distances detected.
    Type: Application
    Filed: October 3, 2001
    Publication date: September 12, 2002
    Inventors: Hiroshi Kakutani, Yasuharu Inami
  • Publication number: 20020126903
    Abstract: A word recognizing apparatus extracts the feature amount from a given image, and dynamically composes the feature amount of a candidate word to be recognized which is registered in a word list, using feature amounts of characters registered in an individual character dictionary. Then, the apparatus collates the composed feature amount of the word with the feature amount extracted from the image, calculates the degree of similarity between the two feature amounts, and outputs a recognition result.
    Type: Application
    Filed: May 11, 1999
    Publication date: September 12, 2002
    Inventors: HIROAKI TAKEBE, YOSHINOBU HOTTA, SATOSHI NAOI
  • Publication number: 20020126905
    Abstract: A mathematical expression recognizing device comprises a character recognition unit which recognizes characters in a document image, a dictionary storing a pair of evaluation scores for each type of word, the score showing the possibility of belonging to the text and that of belonging to the mathematical expression, an evaluation unit which obtains the evaluation scores showing the possibility of belonging to the text and that of belonging to the mathematical expression for each of the words included in the recognized characters with reference to the dictionary, and a mathematical expression detecting unit which searches for an optimal path connecting words by selecting one of the text and the mathematical expression based on a formative grammar and the evaluation scores showing the possibility of belonging to the text and that of belonging to the mathematical expression for each of the words, thereby detecting characters belonging to the mathematical expression.
    Type: Application
    Filed: March 5, 2002
    Publication date: September 12, 2002
    Applicant: KABUSHIKI KAISHA TOSHIBA
    Inventors: Masakazu Suzuki, Kazuaki Yokota, Yuko Eto
  • Publication number: 20020118866
    Abstract: An automatic quantitative analysis method is developed so as to analyze perfusion cardiovascular images. First the image registration per data set is performed so as to compensate for translation and rotation of the target region of interest over the acquisition time. Next a parameter, for example, a maximum intensity projection, is calculated in order to average out misalignments of the target region of interest within each data set. Finally, parameter registration is performed to calculate the co-ordinate translation matrix between the anatomically corresponding pixels within the target region of interest. The co-ordinate translation matrix can also be used to calculate local perfusion values.
    Type: Application
    Filed: January 29, 2002
    Publication date: August 29, 2002
    Inventors: Marcel Breeuwer, Marcel Johannes Quist
  • Patent number: 6442295
    Abstract: A word recognition device uses an associative memory to store a plurality of coded words in such a way that a weight is associated with each character of the alphabet of the stored words, wherein equal weights correspond to equal characters. To perform the recognition, a dictionary of words is first chosen; this is stored in the associative memory according to a pre-determined code; a string of characters which correspond to a word to be recognized is received; a sequence of weights corresponding to the string of characters received is supplied to the associative memory; the distance between the word to be recognized and at least some of the stored words is calculated in parallel as the sum of the difference between the weights of each character of the word to be recognized and the weights of each character of the stored words; the minimum distance is identified; and the word stored in the associative memory having the minimum distance is stored.
    Type: Grant
    Filed: February 12, 1998
    Date of Patent: August 27, 2002
    Assignee: STMicroelectronics S.r.l.
    Inventors: Loris Navoni, Roberto Canegallo, Mauro Chinosi, Giovanni Gozzini, Alan Kramer, Pierluigi Rolandi
  • Publication number: 20020114523
    Abstract: In a combined holistic and analytic recognition system, the holistic recognition module will recognize an input word or phrase image by matching an input string of character features for the whole word or phrase against a string of prototype features for a plurality of reference words in a lexicon. This will yield a holistic answer list of recognized word or phrase candidates for the input word or phrase along with a confidence value for each answer on the list. At the same time based on each answer in the answer list, the holistic recognition modules will generate a list of character features and segment the character features into sets for each character in an answer. The analytical recognition module uses segmentation hypotheses from the segmented character feature sets to cut the image of the input string of characters into individual character images.
    Type: Application
    Filed: February 16, 2001
    Publication date: August 22, 2002
    Inventors: Alexander Filatov, Igor Kil, Arseni Seregin
  • Publication number: 20020114524
    Abstract: Disclosed herein is a method for automatically filtering a corpus of documents containing textual and non-textual information of a natural language. According to the method, through a first dividing step (101), the document corpus is divided into appropriate portions. At a following determining step (105), for each portion of the document corpus, there is determined a regularity value (VR) measuring the conformity of the portion with respect to character sequences probabilities predetermined for the language considered. At a comparing step (107), each regularity value (VR) is then compared with a threshold value (VT) to decide whether the conformity is sufficient. Finally, at a rejecting step (111), any portion of the document corpus whose conformity is not sufficient is rejected and removed from the corpus. An apparatus for carrying out such a method is also disclosed.
    Type: Application
    Filed: June 29, 2001
    Publication date: August 22, 2002
    Applicant: International Business Machines Corporation
    Inventor: Hubert Crepy
  • Patent number: 6430314
    Abstract: Described are methods for entering and editing data strings that are inputted into cellular telephones having a screen. In one method, all basic Hangul consonants and some of the compound Hangul consonants are included in a candidate consonant list and all basic Hangul vowels and some of the compound vowels are included in a candidate vowel list. The candidate consonant and vowel lists are alternatively displayed on a component display region (906) located on the screen. To form a Korean character, a user can select consonant(s) and vowel from the candidate consonant and vowel lists. To form a compound Hangul component that is not included in either the candidate consonant list or the candidate vowel list, the user selects a basic Hangul component as a first part of the compound Hangul component from either the candidate consonant list or the candidate vowel list.
    Type: Grant
    Filed: January 20, 1999
    Date of Patent: August 6, 2002
    Assignees: Sony Corporation, Sony Electronics. Inc.
    Inventor: Soon Ko