Context Analysis Or Word Recognition (e.g., Character String) Patents (Class 382/229)
-
Patent number: 6658151Abstract: A method and apparatus for extracting information from symbolically compressed document images. A deciphering module generates first and second text strings by deciphering respective sequences of template identifiers in first and second symbolically compressed document images. A conditional n-gram module receives the first and second text strings from the deciphering module and extracts n-gram terms therefrom based on a predicate condition. A comparison module generates a measure of similarity between the first and second symbolically compressed document images based on the n-gram terms extracted by the conditional n-gram module.Type: GrantFiled: April 8, 1999Date of Patent: December 2, 2003Assignee: Ricoh Co., Ltd.Inventors: Dar-Shyang Lee, Jonathan J. Hull
-
Patent number: 6650362Abstract: In connection with the detection of an amount of movement of an image such as a document image having a small gradient of brightness, it has been difficult to detect the amount by a representative point method, and a block matching method requires much time. An image (F1) at a time (T1), which has been imaged by moving an imaging section (1), is taken into a memory (2) and a feature point extracting section (3), where feature points of the image (F1) are extracted, and an image (F2) at a start time (T2) of the subsequent frame is taken in the memory (2) and the feature point extracting section (3). A correlation operating section (5) operates feature points of the image (F2) and the image (F1) in an area designated by a search area deciding section (4) to output an amount of movement.Type: GrantFiled: March 18, 1999Date of Patent: November 18, 2003Assignee: Sharp Kabushiki KaishaInventors: Yasuhisa Nakamura, Yoshihiro Kitamura, Hiroshi Akagi
-
Patent number: 6643401Abstract: A character pattern is extracted from image data read from a document, listing, etc., and discriminated between a hand-written character and a typed character by a hand-written/typed character discrimination unit. The hand-written/typed character discrimination unit obtains, from the character pattern, N feature vectors containing a feature indicating at least the complexity and the linearity of the character pattern; and discriminating the character pattern between a hand-written character and a typed character using the feature vectors. A character recognition unit performs a character recognizing process based on the result of discriminating whether the character data is a hand-written character or a typed character. As a feature of the above described character pattern, the variance of line widths, the variance of character positions, etc. can also be used.Type: GrantFiled: June 24, 1999Date of Patent: November 4, 2003Assignee: Fujitsu LimitedInventors: Junji Kashioka, Satoshi Naoi
-
Patent number: 6640010Abstract: An image processing technique for selecting a text region from an image is described. Character and formatting information for each word in the image is used to determine an active region for each word in the image. For a preferred embodiment of the present invention, the character and formatting information is derived during optical character recognition (OCR). A first and last word within a selected text region is identified based on at least one active region associated with at least one word within the selected text region. Using the first and last words within the selected text region, all words within the selected text region are identified. An image of the selected text region may be displayed. Text contained within the selected text region may be copied to an application program.Type: GrantFiled: November 12, 1999Date of Patent: October 28, 2003Assignee: Xerox CorporationInventors: Mauritius Seeger, Christopher R. Dance, Stuart A. Taylor, William M. Newman
-
Publication number: 20030190077Abstract: Methods of organizing a series of sibling data entities in a digital computer are provided for preserving sibling ranking information associated with the sibling data entities and for attaching the sibling ranking information to a joint parent of the sibling data entities to facilitate on-demand generation of ranked parent candidates. A rollup function of the present invention builds a rollup matrix (126) that embodies information about the sibling entities and the sibling ranking information and provides a method for reading out the ranked parent candidates from the rollup matrix in order of their parent confidences (141). Parent confidences are based on the sibling ranking information, either alone or in combination with n-gram dictionary ranking or other ranking information.Type: ApplicationFiled: April 8, 2003Publication date: October 9, 2003Applicant: RAF Technology, Inc.Inventors: David Justin Ross, Stephen E.M. Billester, Brent R. Smith
-
Publication number: 20030185448Abstract: An image processing technique for selecting a text region from an image is described. Character and formatting information for each word in the image is used to determine an active region for each word in the image. For a preferred embodiment of the present invention, the character and formatting information is derived during optical character recognition (OCR). A first and last word within a selected text region is identified based on at least one active region associated with at least one word within the selected text region. Using the first and last words within the selected text region, all words within the selected text region are identified. An image of the selected text region may be displayed. Text contained within the selected text region may be copied to an application program.Type: ApplicationFiled: November 12, 1999Publication date: October 2, 2003Inventors: MAURITIUS SEEGER, CHRISTOPHER R DANCE, STUART A TAYLOR, WILLIAM M NEWMAN
-
Patent number: 6626960Abstract: Disclosed is a system, method, and program for generating a table for use by a computer in determining a location of a boundary, such as a word boundary, between two characters in text. A first table indicates a boundary between characters when processing text in a first direction, such as the forward direction. A second table is generated based on the content of the first table. The second table can be used to determine whether one boundary is located between any two consecutive characters processed in a second direct ion, such as the backward direction.Type: GrantFiled: September 1, 1999Date of Patent: September 30, 2003Assignee: International Business Machines CorporationInventor: Richard Theodore Gillam
-
Patent number: 6621930Abstract: An electronic device automatically classifies documents based upon textual content. Documents may be classified into document categories. Statistical characteristics are gathered for each document category and these statistical characteristics are used as a frame of reference in determining how to classify the document. The document categories may be intersecting or non-intersecting. A neutral category is used to represent documents that do not fit into many of the other specified categories. The statistical characteristics for an input document are compared with those for the document category and for the neutral category in making a determination on how to categorize the document. This approach is extensible, generalizable and efficient.Type: GrantFiled: August 9, 2000Date of Patent: September 16, 2003Assignee: Elron Software, Inc.Inventor: Frank Smadja
-
Publication number: 20030152277Abstract: A method and a system by which a document image is analyzed for the purposes of establishing a searchable data structure characterizing ground-truthed contents of the document represented by the document image operates by segmenting a document image into a set of image objects, and linking the image objects with fields that store metadata. Image objects identified by segmenting the document image are grouped into subsets. The image objects are grouped according to characteristics suggesting that the image objects may have common ground-truthed metadata. By grouping the image objects into subsets, the image objects may be indexed to facilitate the ground-truthing process. In some embodiments, the index of representative image objects is presented to the user in a table form. A database of image objects with ground-truthed metadata is formed. Interactive tools and processes facilitate ground-truthing based on paired image objects and metadata.Type: ApplicationFiled: June 13, 2002Publication date: August 14, 2003Applicant: Convey CorporationInventors: Floyd Steven Hall, Cameron Telfer Howie
-
Patent number: 6597809Abstract: Methods of organizing a series of sibling data entities in a digital computer are provided for preserving sibling ranking information associated with the sibling data entities and for attaching the sibling ranking information to a joint parent of the sibling data entities to facilitate on-demand generation of ranked parent candidates. A rollup function of the present invention builds a rollup matrix (126) that embodies information about the sibling entities and the sibling ranking information and provides a method for reading out the ranked parent candidates from the rollup matrix in order of their parent confidences (141). Parent confidences are based on the sibling ranking information, either alone or in combination with n-gram dictionary ranking or other ranking information.Type: GrantFiled: March 20, 2000Date of Patent: July 22, 2003Assignee: RAF Technology, Inc.Inventors: David Justin Ross, Stephen E. M. Billester, Brent R. Smith
-
Patent number: 6594393Abstract: In a text recognition system, the computational efficiency of a text line image decoding operation is improved by utilizing the characteristic of a graph known as the cut set. The branches of the data structure that represents the image are initially labeled with estimated scores. When estimated scores are used, the decoding operation must perform iteratively on a text line before producing the best path through the data structure. After each iteration, nodes in the best path are re-scored with actual scores. The decoding operation incorporates an operating mode called skip mode.Type: GrantFiled: May 12, 2000Date of Patent: July 15, 2003Inventors: Thomas P. Minka, Dan S. Bloomberg, Ashok C. Popat
-
Publication number: 20030103675Abstract: Paired image information and text information correlated to each other are retrieved as information sets. Frequency information on words used in text is extracted from text information in a group of information sets, and text information features are extracted based on frequency information. Text features are used to lay out information sets in a virtual space such that similar pieces of text are located close to each other, and images are displayed at those positions. Further, important words are extracted from those words extracted from text information in a group of information sets, and those words are laid out in the virtual space in the same manner as with information sets and displayed as labels.Type: ApplicationFiled: November 27, 2002Publication date: June 5, 2003Applicant: Fujitsu LimitedInventors: Susumu Endo, Yuusuke Uehara, Daiki Masumoto, Syuuichi Shiitani
-
Patent number: 6573844Abstract: Predictive keyboards, such as predictive soft keyboards, are disclosed. In one embodiment, a computer-implemented method predicts at least one key to be entered next within a sequence of keys. The method displays a soft keyboard where the predicted keys are displayed on the soft keyboard differently than the other keys on the keyboard. For example, the predicted keys may be larger in size on the soft keyboard as compared to the other keys. This makes the predicted keys more easily typed by a user as compared to the other keys.Type: GrantFiled: January 18, 2000Date of Patent: June 3, 2003Assignee: Microsoft CorporationInventors: Daniel Venolia, Joshua Goodman, Xuedong Huang, Hsiao-Wuen Hon
-
Publication number: 20030099402Abstract: A method of analyzing a verbatim text comprising the steps of storing the verbatim text in an electronic memory device and identifying at least one concept in said verbatim text and linking said concept to a code.Type: ApplicationFiled: March 11, 2002Publication date: May 29, 2003Inventor: Charles M. Baylis
-
Patent number: 6563956Abstract: The present invention provides a data compression method in which a plurality of consecutive characters of a data string to be compressed are set as a character string to be searched for. Bits of a bit string representing the set character string are allocated to at least two codewords. Thus, first and second searching codewords are generated. These first and second codewords are used as array addresses. Fist and second array tables are prepared, in which information on the past occurrence positions of the set character string is previously entered as the contents thereof. When the first and second codewords are generated from the character string to be compressed, the first and second array tables are looked up by using these codewords as the addresses of the arrays. When results of looking up these tables match with each other, it is found that the set character string occurred in the past.Type: GrantFiled: July 7, 1999Date of Patent: May 13, 2003Assignee: Fujitsu LimitedInventors: Noriko Satoh, Shigeru Yoshida
-
Publication number: 20030086618Abstract: Evaluation based on sensitivity of an image that was performed by using the sensibility and the manual work of a person is automatically performed.Type: ApplicationFiled: July 11, 2002Publication date: May 8, 2003Applicant: SEIKO EPSON CORPORATIONInventor: Michihiro Nagaishi
-
Publication number: 20030086619Abstract: The layout of an image that was performed with the help of the sensibility and the manual work of a person is automatically optimized.Type: ApplicationFiled: July 11, 2002Publication date: May 8, 2003Applicant: Seiko Epson CorporationInventor: Michihiro Nagaishi
-
Patent number: 6560360Abstract: A recognition system is disclosed, including a representation of an object in terms of its constituent parts that is translationally invariant, and which provides scale invariant recognition. The system further provides effective recognition of patterns that are partially present in the input signal, or that are partially occluded, and also provides an effective representation for sequences within the input signal. The system utilizes dynamically determined, context based expectations, for identifying individual features/parts of an object to be recognized. The system is computationally efficient, and capable of highly parallel implementation, and further includes a mechanism for improving the preprocessing of individual sections of an input pattern, either by applying one or more preprocessors selected from a set of several preprocessors, or by changing the parameters within a single preprocessor.Type: GrantFiled: January 27, 2000Date of Patent: May 6, 2003Assignees: Nestor, Inc., Brown University Research FoundationInventors: Predrag Neskovic, Douglas L. Reilly, Leon N Cooper
-
Patent number: 6556713Abstract: A search result of a search target object is displayed at a high speed. By dividing an image into a plurality of areas and allocating attribute information to each area, only the area including the attribute information showing the search target object is searched and is displayed or transmitted, so that a part of a desired image can be extracted at a high speed.Type: GrantFiled: July 30, 1998Date of Patent: April 29, 2003Assignee: Canon Kabushiki KaishaInventors: Yuji Kobayashi, Kentaro Matsumoto
-
Patent number: 6549662Abstract: Characters of data on a document are recognized by automatically determining the definitions of characters of the data from the arrangement of character strings of the data. Character strings on the document are extracted by reading the document, and headers and data on the document are distinguished from each other by determining the positional relationship between the character strings. Character attributes of the data are determined by recognizing characters of the character strings of the headers using a header recognition dictionary. Characters of the character strings of the data are recognized according to the determined character attributes of the data. Since character attributes of the data are determined from recognized characters of the headers after the headers and the data are distinguished from each other from the layout on the document, it is possible to enter automatically the character attributes of the data.Type: GrantFiled: May 27, 1998Date of Patent: April 15, 2003Assignee: Fujitsu LimitedInventors: Katsutoshi Kobara, Shinichi Eguchi, Yoshihiro Nagano, Hideki Matsuno, Koichi Chiba, Yutaka Katsumata
-
Publication number: 20030068088Abstract: A mechanism is provided for magnifying information with contextual information. The user may configure the magnification mechanism to present some contextual information along with the focus being magnified. Particularly, a user may set “look ahead” and “look behind” parameters to specify a number of words or characters to include before and after the magnified word or words. The actual magnified word or words may be distinguished from the contextual information. For example, the word or words being magnified may be magnified to a size that is larger than that of the contextual information. The magnification mechanism may also present a magnified display of image information.Type: ApplicationFiled: October 4, 2001Publication date: April 10, 2003Applicant: International Business Machines CorporationInventors: Janani Janakiraman, Rabindranath Dutta
-
Patent number: 6542640Abstract: A dictionary in which a character train serving as a processing unit upon compression has been registered is stored into a character train dictionary storing unit. In a character train comparing unit, the registration character train in the character train dictionary storing unit and a partial character train in non-compression data are compared, thereby detecting the coincident partial character train. A code output unit allocates a predetermined code every partial character train detected by the character train comparing unit and outputs. The character train dictionary storing unit allocates character train codes of a fixed length of 17 bits to about 130,000 words and substantially compresses a data amount to the half or less irrespective of an amount of document data.Type: GrantFiled: June 18, 1998Date of Patent: April 1, 2003Assignee: Fujitsu LimitedInventors: Takashi Morihara, Yahagi Hironori, Satoh Noriko
-
Patent number: 6539116Abstract: The structure of entered document image data is analyzed and a character string in a text block that has been analyzed is subjected to pattern recognition. Synonyms and equivalents of words obtained as results of language analysis are extracted and words obtained as results of language analysis are converted to words of another language. A character string in a text block that has been analyzed is translated to another language. At least results of analyzing the structure of document image data, results of character recognition and results of language analysis are stored, and at least one of the results of extraction, results of conversion and results of translation are stored in a RAM in association with the results of character recognition.Type: GrantFiled: October 2, 1998Date of Patent: March 25, 2003Assignee: Canon Kabushiki KaishaInventor: Makoto Takaoka
-
Patent number: 6539117Abstract: A communications system for rendering image based data includes a data interface, a display device, and a data manager. The data interface receives image based data that is used by the display device to display an image. The data manager identifies word blocks defined by the received data. The data manager uses the word blocks to define a first row of the image. In this regard, the data manager determines whether images respectively defined by each of the word blocks would be visible if the word blocks are rendered to the first row of the display screen. In response to a determination that an image associated with one of the word blocks would not be visible if the one word block is rendered to the first row of the display screen, the data manager defines a second row and renders the one word block to the second row.Type: GrantFiled: April 12, 1999Date of Patent: March 25, 2003Assignee: Hewlett-Packard CompanyInventor: Frank P Carau, Sr.
-
Patent number: 6539113Abstract: The system described herein automatically defines a set of radicals to be used in a Kanji character handwriting recognition system and automatically creates a dictionary of the Kanji characters that are recognized by the system. In performing its functionality, the system described herein first obtains representative handwriting samples for each Kanji character that is to be recognized by the system. The system described herein then evaluates the samples to identify a set of subparts (“radicals”) that are common to at least two of the Kanji characters. These radicals represent component roots from which the characters are formed. Each Kanji character is formed by one or more of these radicals. The radicals that are identified by the system described herein are not constrained to any preset definition (e.g., the traditional set of radicals used to organize Japanese dictionaries).Type: GrantFiled: December 29, 1999Date of Patent: March 25, 2003Assignee: Microsoft CorporationInventor: Michael Van Kleeck
-
Patent number: 6539118Abstract: An evaluator system accepts input textual messages in unknown languages and assesses which character sets, corresponding to languages, matches that message. Textual messages whose individual characters are encoded in 16 bit Unicode of other universal format are parsed, and character sets which can express each character and the accumulated correspondence is logged. When the character sets against which the message is being tested only provide partial matches, the invention can determine which offers the best fit, including by way of a weighting function. The evaluation technology of the invention can be applied to multipart documents, and to search engines and indices.Type: GrantFiled: August 27, 1999Date of Patent: March 25, 2003Assignee: International Business Machines CorporationInventors: Brendan P. Murray, Kuniaki Takizawa
-
Publication number: 20030044068Abstract: The invention relates to a mobile device with a built-in image capture device, and a character recognition function to present the information gathered with the character recognition result. With the mobile device, the character line extraction process is displayed whenever necessary, and the resolution of an image to be inputted for recognition processing is enhanced. Accordingly, it is possible for the operator to select the target character line with ease. In addition, the mobile device has a character recognition ratio improved by the enhancement in resolution.Type: ApplicationFiled: July 23, 2002Publication date: March 6, 2003Applicant: Hitachi, Ltd.Inventors: Tatsuhiko Kagehiro, Minenobu Seki, Hiroshi Sako
-
Patent number: 6526170Abstract: A character recognition system is disclosed, In a feature extraction parameter storage section 22 a transformation matrix for reducing a number of dimensions of feature parameters and a codebook for quantization are stored. In an HMM storage section 23 a constitution and parameters of Hidden Markov Model (HMM) for character string expression are stored. A feature extraction section 32 scans a word image given from an image storage means from left to right in a predetermined cycle with a slit having a sufficiently small width than the character width and thus outputs a feature symbol at each predetermined timing. A matching section 33 matches a feature symbol row and a probability maximization HMM state, thereby recognizing the character string.Type: GrantFiled: December 13, 1994Date of Patent: February 25, 2003Assignee: NEC CorporationInventor: Shinji Matsumoto
-
Publication number: 20030026459Abstract: A system and a method for drawing a patent map using a technical field word are disclosed. In the system and the method, a word to be used for drawing a patent map is extracted by calculating weight values of significant words which are gotten by removing unnecessary words from patent data, and this extracted word is matched with a patent to draw the patent map.Type: ApplicationFiled: November 29, 2001Publication date: February 6, 2003Inventors: Jeong Wook Won, Hyoung Bok Lee, Jai Sang Koh
-
Patent number: 6512851Abstract: A word recognition device uses an associative memory to store a plurality of coded words in such a way that a weight is associated with each character of the alphabet of the stored words, wherein equal weights correspond to equal characters. To perform the recognition, a dictionary of words is first chosen; this is stored in the associative memory according to a pre-determined code; a string of characters which correspond to a word to be recognized is received; a sequence of weights corresponding to the string of characters received is supplied to the associative memory; the distance between the word to be recognized and at least some of the stored words is calculated in parallel as the sum of the difference between the weights of each character of the word to be recognized and the weights of each character of the stored words; the minimum distance is identified; and the word stored in the associative memory having the minimum distance is stored.Type: GrantFiled: October 9, 2001Date of Patent: January 28, 2003Assignee: STMicroelectronics S.r.l.Inventors: Loris Navoni, Roberto Canegallo, Mauro Chinosi, Giovanni Gozzini, Alan Kramer, Pierluigi Rolandi
-
Publication number: 20030016874Abstract: An image analysis method designed to identify images in a sequence of images that are statistically different in a pre-selected region of interest. The method is suitable when there is no a priori knowledge of the nature of the interesting images. A reference image is used to identify specific regions of the image that may contain interesting changes (Detect Zone), that will not have interesting changes, but can be used to assess image quality (Veto zone), and an unanalyzed region (Ignore zone). To improve the spatial sensitivity, the Detect and Veto zones can be divided into specific cells. The analysis may also be performed on compressed data and another method automatically classifies a cell as either in the Detect zone or Ignore zone. The sensitivity can be further improved by removing periodic feature variation prior to the statistics calculation.Type: ApplicationFiled: May 31, 2001Publication date: January 23, 2003Inventors: Kenneth A. Lefler, Wayne L. Kilmer, Yi Zhang
-
Patent number: 6507678Abstract: A character string retrieval apparatus classifies a plurality of characters following a prefix of a registration character string into a plurality of groups, and registers those following characters in an array structure using a different displacement amount for each group. The character string retrieval apparatus retrieves a given character string based on the displacement amount of a group corresponding to an input character.Type: GrantFiled: February 8, 1999Date of Patent: January 14, 2003Assignee: Fujitsu LimitedInventor: Hironori Yahagi
-
Publication number: 20020196166Abstract: The present invention provides a data compression method in which a plurality of consecutive characters of a data string to be compressed are set as a character string to be searched for. Bits of a bit string representing the set character string are allocated to at least two codewords. Thus, first and second searching codewords are generated. These first and second codewords are used as array addresses. Fist and second array tables are prepared, in which information on the past occurrence positions of the set character string is previously entered as the contents thereof. When the first and second codewords are generated from the character string to be compressed, the first and second array tables are looked up by using these codewords as the addresses of the arrays. When results of looking up these tables match with each other, it is found that the set character string occurred in the past.Type: ApplicationFiled: August 29, 2002Publication date: December 26, 2002Applicant: FUJITSU LIMITEDInventors: Noriko Satoh, Shigeru Yoshida
-
Publication number: 20020176628Abstract: A document digitizing method digitizes and automatically indexes documents in printed form. The method includes optically scanning the document, forming and storing a digitized image file from the optically scanned document, optically recognizing characters in the optically scanned document, and forming and storing a text file of the optically recognized characters in document. A retrieval method for retrieving the digitized image file for a document includes searching the text files to identify any having a selected text string and providing access to the digitized image files that correspond to those text files. The digital image file and the text file together represent a digitized document data structure that combines a digital image of a document with a text file of optically recognized characters in the digital image.Type: ApplicationFiled: May 22, 2001Publication date: November 28, 2002Inventor: Gary K. Starkweather
-
Publication number: 20020172425Abstract: Described herein is a technology for recognizing the content of text documents. The technology determines one or more hash values for the content of a text document. Alternatively, the technology may generate a “sifted text” version of a document. In one implementation described herein, document recognition is used to determine whether the content of one document is copied (i.e., plagiarized) from another document. This is done by comparing hash values of documents (or alternatively their sifted text). In another implementation described herein, document recognition is used to categorize the content of a document so that it may be grouped with other documents in the same category. This abstract itself is not intended to limit the scope of this patent. The scope of the present invention is pointed out in the appending claims.Type: ApplicationFiled: April 24, 2001Publication date: November 21, 2002Inventors: Ramarathnam Venkatesan, Michael Malkin
-
Publication number: 20020164079Abstract: Systems and methods for rendering image-based data are disclosed. A representative system includes a data interface that receives a remotely-generated data stream; a data manager coupled to the data interface, the data manager configured to translate the remotely-generated data stream into a plurality of word blocks, wherein the data manager determines for each word block of interest whether an active line can accommodate an entire word block of interest prior to registering the word block with the active line and wherein the data manager increments the active line in response to a determination that the word block of interest would not be accommodated on the active line; and a display device coupled to the data manager, the display device configured to render the plurality of word blocks.Type: ApplicationFiled: June 25, 2002Publication date: November 7, 2002Inventor: Frank P. Carau
-
Publication number: 20020154817Abstract: A document image search apparatus generates a text by performing the character recognition of a document image and determines a re-process scope. Then, the apparatus generates a candidate character lattice from the re-recognition result of the re-process scope, generates character strings from the candidate character lattice and adds the character strings to the text. Then, the apparatus performs index search using the text with the character strings added.Type: ApplicationFiled: September 12, 2001Publication date: October 24, 2002Applicant: Fujitsu LimitedInventors: Yutaka Katsuyama, Satoshi Naoi, Fumihito Nishino
-
Patent number: 6470336Abstract: A document search device searches for a keyword in a recognition result obtained by character recognition performed on a document image. The keyword includes at least one first character, and a character code is assigned to each of the at least one first character. The recognition result includes at least one second character, and a character code and a partial area of the document image are assigned to each of the at least one second character.Type: GrantFiled: August 23, 2000Date of Patent: October 22, 2002Assignee: Matsushita Electric Industrial Co., Ltd.Inventors: Yoshihiko Matsukawa, Taro Imagawa, Kenji Kondo, Tsuyoshi Mekata
-
Publication number: 20020150300Abstract: A method and apparatus for extracting information from symbolically compressed document images. A deciphering module generates first and second text strings by deciphering respective sequences of template identifiers in first and second symbolically compressed document images. A conditional n-gram module receives the first and second text strings from the deciphering module and extracts n-gram terms therefrom based on a predicate condition. A comparison module generates a measure of similarity between the first and second symbolically compressed document images based on the n-gram terms extracted by the conditional n-gram module.Type: ApplicationFiled: April 8, 1999Publication date: October 17, 2002Inventors: DAR-SHYANG LEE, JONTHAN J. HULL
-
Publication number: 20020141644Abstract: A system for conducting fortune telling and character analysis over Internet based on an entered name of any language is provided. The system includes a host, a database, one or more terminals, and a communicating system connecting the host, the database and the terminal to one another. The database is capable of analyzing implied meanings of names of any language entered at the terminal in accordance with traditional Chinese fortune telling theories and thereby providing judgments on good or bad signs possibly represented by the entered names. The implied meanings of names are obtained either from numbers of strokes of the entered names or from meanings or origins of words constituting the entered names.Type: ApplicationFiled: March 29, 2001Publication date: October 3, 2002Inventor: Tawei Lin
-
Patent number: 6459810Abstract: An exemplary embodiment of the invention is a method for forming variant search strings. The method includes receiving a search string and parsing the search string to locate a mistaken search string character. A mistaken search string character is a character which is confused with other characters. A variant search string is formed in response to a presence of a mistaken search string character in the search string. The search string and variant search string may then be used to search a database. Another exemplary embodiment of the invention is a system for forming variant search strings. The system includes a user interface for receiving a search string. A variant search string generator parses the search string to locate a mistaken search string character. The mistaken search string character is a character which is confused with other characters. The variant search string generator forms a variant search string in response to a presence of a mistaken search string character in the search string.Type: GrantFiled: September 3, 1999Date of Patent: October 1, 2002Assignee: International Business Machines CorporationInventor: Christopher T. Cring
-
Patent number: 6453070Abstract: Handwritten ink is scanned to identify potential diacriticals. A list of diacriticals (19) is generated by traversing the ink. Potential diacritical-containing characters are processed by scoring them with and without a diacritical to generate a first and second score. The first score is compared to the second score to in order to make a decision as to which variant of the potential diacritical-containing character produced a highest score. The highest score is used as a score for a theory and the decision is recorded. A data structure (50) is added to the theory. Each data unit in the data structure (50) corresponds to an entry in the list of diacriticals (19). As a new theory is created by propagation, contents of the data structure (50) are copied into the new theory. Thus, the data structure (50) is used to ensure that all handwritten ink is used and is used only once.Type: GrantFiled: March 17, 1998Date of Patent: September 17, 2002Assignee: Motorola, Inc.Inventors: Giovanni Seni, John Seybold
-
Publication number: 20020126904Abstract: A character-recognition pre-processing apparatus includes extraction means for extracting an image of a character string to be subjected to character recognition; setting means for setting the smallest rectangle that surrounds the character string image extracted; specifying means for specifying the position of each character within the smallest rectangle set by the setting means; detection means for detecting, at each character position specified, the shortest distance between a character region and the lower edge of the smallest rectangle, and the shortest distance between the character region and the upper edge of the smallest rectangle; and judgment means for judging whether the character string extracted is in an upright state or an inverted state, on the basis of variations in the two shortest distances detected.Type: ApplicationFiled: October 3, 2001Publication date: September 12, 2002Inventors: Hiroshi Kakutani, Yasuharu Inami
-
Publication number: 20020126903Abstract: A word recognizing apparatus extracts the feature amount from a given image, and dynamically composes the feature amount of a candidate word to be recognized which is registered in a word list, using feature amounts of characters registered in an individual character dictionary. Then, the apparatus collates the composed feature amount of the word with the feature amount extracted from the image, calculates the degree of similarity between the two feature amounts, and outputs a recognition result.Type: ApplicationFiled: May 11, 1999Publication date: September 12, 2002Inventors: HIROAKI TAKEBE, YOSHINOBU HOTTA, SATOSHI NAOI
-
Publication number: 20020126905Abstract: A mathematical expression recognizing device comprises a character recognition unit which recognizes characters in a document image, a dictionary storing a pair of evaluation scores for each type of word, the score showing the possibility of belonging to the text and that of belonging to the mathematical expression, an evaluation unit which obtains the evaluation scores showing the possibility of belonging to the text and that of belonging to the mathematical expression for each of the words included in the recognized characters with reference to the dictionary, and a mathematical expression detecting unit which searches for an optimal path connecting words by selecting one of the text and the mathematical expression based on a formative grammar and the evaluation scores showing the possibility of belonging to the text and that of belonging to the mathematical expression for each of the words, thereby detecting characters belonging to the mathematical expression.Type: ApplicationFiled: March 5, 2002Publication date: September 12, 2002Applicant: KABUSHIKI KAISHA TOSHIBAInventors: Masakazu Suzuki, Kazuaki Yokota, Yuko Eto
-
Publication number: 20020118866Abstract: An automatic quantitative analysis method is developed so as to analyze perfusion cardiovascular images. First the image registration per data set is performed so as to compensate for translation and rotation of the target region of interest over the acquisition time. Next a parameter, for example, a maximum intensity projection, is calculated in order to average out misalignments of the target region of interest within each data set. Finally, parameter registration is performed to calculate the co-ordinate translation matrix between the anatomically corresponding pixels within the target region of interest. The co-ordinate translation matrix can also be used to calculate local perfusion values.Type: ApplicationFiled: January 29, 2002Publication date: August 29, 2002Inventors: Marcel Breeuwer, Marcel Johannes Quist
-
Patent number: 6442295Abstract: A word recognition device uses an associative memory to store a plurality of coded words in such a way that a weight is associated with each character of the alphabet of the stored words, wherein equal weights correspond to equal characters. To perform the recognition, a dictionary of words is first chosen; this is stored in the associative memory according to a pre-determined code; a string of characters which correspond to a word to be recognized is received; a sequence of weights corresponding to the string of characters received is supplied to the associative memory; the distance between the word to be recognized and at least some of the stored words is calculated in parallel as the sum of the difference between the weights of each character of the word to be recognized and the weights of each character of the stored words; the minimum distance is identified; and the word stored in the associative memory having the minimum distance is stored.Type: GrantFiled: February 12, 1998Date of Patent: August 27, 2002Assignee: STMicroelectronics S.r.l.Inventors: Loris Navoni, Roberto Canegallo, Mauro Chinosi, Giovanni Gozzini, Alan Kramer, Pierluigi Rolandi
-
Publication number: 20020114523Abstract: In a combined holistic and analytic recognition system, the holistic recognition module will recognize an input word or phrase image by matching an input string of character features for the whole word or phrase against a string of prototype features for a plurality of reference words in a lexicon. This will yield a holistic answer list of recognized word or phrase candidates for the input word or phrase along with a confidence value for each answer on the list. At the same time based on each answer in the answer list, the holistic recognition modules will generate a list of character features and segment the character features into sets for each character in an answer. The analytical recognition module uses segmentation hypotheses from the segmented character feature sets to cut the image of the input string of characters into individual character images.Type: ApplicationFiled: February 16, 2001Publication date: August 22, 2002Inventors: Alexander Filatov, Igor Kil, Arseni Seregin
-
Publication number: 20020114524Abstract: Disclosed herein is a method for automatically filtering a corpus of documents containing textual and non-textual information of a natural language. According to the method, through a first dividing step (101), the document corpus is divided into appropriate portions. At a following determining step (105), for each portion of the document corpus, there is determined a regularity value (VR) measuring the conformity of the portion with respect to character sequences probabilities predetermined for the language considered. At a comparing step (107), each regularity value (VR) is then compared with a threshold value (VT) to decide whether the conformity is sufficient. Finally, at a rejecting step (111), any portion of the document corpus whose conformity is not sufficient is rejected and removed from the corpus. An apparatus for carrying out such a method is also disclosed.Type: ApplicationFiled: June 29, 2001Publication date: August 22, 2002Applicant: International Business Machines CorporationInventor: Hubert Crepy
-
Patent number: 6430314Abstract: Described are methods for entering and editing data strings that are inputted into cellular telephones having a screen. In one method, all basic Hangul consonants and some of the compound Hangul consonants are included in a candidate consonant list and all basic Hangul vowels and some of the compound vowels are included in a candidate vowel list. The candidate consonant and vowel lists are alternatively displayed on a component display region (906) located on the screen. To form a Korean character, a user can select consonant(s) and vowel from the candidate consonant and vowel lists. To form a compound Hangul component that is not included in either the candidate consonant list or the candidate vowel list, the user selects a basic Hangul component as a first part of the compound Hangul component from either the candidate consonant list or the candidate vowel list.Type: GrantFiled: January 20, 1999Date of Patent: August 6, 2002Assignees: Sony Corporation, Sony Electronics. Inc.Inventor: Soon Ko