Context Analysis Or Word Recognition (e.g., Character String) Patents (Class 382/229)
-
Patent number: 6137911Abstract: Documents are classified into one or more clusters corresponding to predefined classification categories by building a knowledge base comprising matrices of vectors which indicate the significance of terms within a corpus of text formed by the documents and classified in the knowledge base to each cluster. The significance of terms is determined assuming a standard normal probability distribution, and terms are determined to be significant to a cluster if their probability of occurrence being due to chance is low. For each cluster, statistical signatures comprising sums of weighted products and intersections of cluster terms to corpus terms are generated and used as discriminators for classifying documents. The knowledge base is built using prefix and suffix lexical rules which are context-sensitive and applied selectively to improve the accuracy and precision of classification.Type: GrantFiled: June 16, 1997Date of Patent: October 24, 2000Assignee: The Dialog Corporation PLCInventor: Maxim Zhilyaev
-
Patent number: 6128412Abstract: Provided is a probabilistic statistical data compression/restoration method for variable-length encoding a source character and restoring a variable-length code to a character using the probability of occurrence of the source character which appears following a character string (context) of n characters which immediately precede the source character. When variable-length encoding is performed, a context registration unit successively registers context based upon an entered character without fixing the length (degree) of context, and a compressing encoder selects registered context satisfying a predetermined condition, e.g., registered context for which the frequency of occurrence is greater than a fixed value, as context (encoding context) used in encoding, and variable-length encodes a source character by using this encoding context. A restoration unit restores a code to a character by executing processing which is the reverse of the processing executed by the compressing encoder.Type: GrantFiled: March 31, 1997Date of Patent: October 3, 2000Assignee: Fujitsu LimitedInventor: Noriko Satoh
-
Patent number: 6122402Abstract: Pattern encoding is carried out by 1) substituting an index data of a registered pattern for a position data in a library with respect to an index data peculiar to each of the extracted patterns, 2) taking a difference between an off-set position data of the extracted pattern and an off-set position data of the registered pattern whereby an off-set position difference data is provided, and 3) encoding the position data and the off-set position difference data and providing an encoded data. A pattern extracting unit obtains the extracted patterns from image data. An accumulating/checking unit accumulates the extracted patterns as accumulated patterns, assigns indexes specific to the accumulated patterns, and checks each extracted pattern by comparison with the accumulated patterns. When an accumulated pattern is found to match the extracted pattern, the accumulating/checking unit provides a position data within a library instead of the index data, and also provides the off-set position difference data.Type: GrantFiled: December 3, 1997Date of Patent: September 19, 2000Assignee: NEC CorporationInventors: Mitsutoshi Arai, Keiji Yamada, Toshihiko Okamura, Takahiro Hongu, Kouichirou Hirao
-
Patent number: 6111985Abstract: A method and mechanism for displaying partial results of full context handwriting recognition. As handwritten characters are entered into a system, a shape matcher associates the character with a plurality of alternate code points, with each alternate code point having probability information associated therewith. The alternate code points are placed at the end of a queue, and a cost is determined from each alternate code point to any immediately preceding alternate in the queue. The cost is based on the probability information of the alternates and a transition cost therebetween. Then, the lowest cost path back from each of the alternates at the end of the queue to an alternate at the beginning of the queue is determined. If each lowest cost path back converges to a common alternate in the queue, the common alternate and any previous alternates on the path back are recognized as the code points for each of the handwritten characters associated therewith.Type: GrantFiled: June 6, 1997Date of Patent: August 29, 2000Assignee: Microsoft CorporationInventors: Gregory N. Hullender, Patrick M. Haluptzok
-
Patent number: 6104500Abstract: A processor-based fax routing method receives digital data representing a facsimile document. Without performing optical character recognition ("OCR"), the method identifies in the image data a keyword block of text, and an addressee-name block of text that is located near the keyword block of text. The fax routing method then performs OCR on the image data extracting therefrom texts for the keyword, the name of the addressee, and other text present in the facsimile. Using probabilities computed between the text of the name of the addressee and names in a list of possible addressees, and between the keyword and keywords in a list of keywords, the fax routing method determines an addressee for the document. The fax routing method then converts all text into email addressed to the fax's addressee, and stores the email onto an email server from which it may be retrieved.Type: GrantFiled: April 29, 1998Date of Patent: August 15, 2000Assignee: BCL, Computer Inc.Inventors: Hassan Alam, Horace Dediu, Scot Tupaj
-
Patent number: 6097841Abstract: A character recognition apparatus for inferring the entire character string solely from a user-input handwritten keyword and displaying the inferred result as a candidate character string. The apparatus of the invention comprises: a word dictionary storing word identification information and hierarchy information for layering a plurality of words into a hierarchy and for recognizing each of the words within the hierarchy; a character transition probability table a4 storing probabilities of transitions from any one character to another, and those pieces of the word identification information which correspond to combinations of characters resulting from the transitions; and an optimization unit for using the character transition probability table in optimizing candidate character strings obtained by a recognition unit.Type: GrantFiled: May 20, 1997Date of Patent: August 1, 2000Assignee: Hitachi, Ltd.Inventors: Keiko Gunji, Koyo Katsura, Soshiro Kuzunuki, Masaki Miura, Toshimi Yokota
-
Patent number: 6084985Abstract: A method for on-line handwriting recognition is based on a hidden Markov model and implies the following steps: sensing real-time at least an instantaneous write position of the handwriting, deriving from the handwriting a time-conforming string of segments each associated to a handwriting feature vector, matching the time-conforming string to various example strings from a data base pertaining to the handwriting, and selecting from the example strings a best-matching recognition string through hidden-Markov processing, or rejecting the handwriting as unrecognized. In particular, the feature vectors are based on local observations derived from a single segment, as well as on compacted observations derived from time-sequential segments.Type: GrantFiled: October 3, 1997Date of Patent: July 4, 2000Assignee: U.S. Philips CorporationInventors: Jannes G. A. Dolfing, Reinhold Hab-Umbach
-
Patent number: 6075896Abstract: When character codes and character patterns coexist in the retrieval character string and/or the retrieval objective data, they are standardized into character patterns, and a character string matching with or similar to the retrieval character string is retrieved from the retrieval objective data. A character code included in the retrieval character string and/or the retrieval objective data is converted into a character pattern by referring to a prepared character pattern dictionary, and by pattern matching between two character patterns, a character string matching with or similar to the retrieval character string is picked up from the retrieval objective data and produced as the result of retrieval.Type: GrantFiled: August 10, 1998Date of Patent: June 13, 2000Assignee: Fujitsu LimitedInventor: Hiroshi Tanaka
-
Patent number: 6047251Abstract: The disclosed invention utilizes a dictionary-based approach to identify languages within different zones in a multi-lingual document. As a first step, a document image is segmented into various zones, regions and word tokens, using suitable geometric properties. Within each zone, the word tokens are compared to dictionaries associated with various candidate languages, and the language that exhibits the highest confidence factor is initially identified as the language of the zone. Subsequently, each zone is further split into regions. The language for each region is then identified, using the confidence factors for the words of that region. For any language determination having a low confidence value, the previously determined language of the zone is employed to assist the identification process.Type: GrantFiled: September 15, 1997Date of Patent: April 4, 2000Assignee: Caere CorporationInventors: Leonard K. Pon, Tapas Kanungo, Jun Yang, Kenneth Chan Choy, Mindy R. Bokser
-
Patent number: 6041137Abstract: The system described herein automatically defines a set of radicals to be used in a Kanji character handwriting recognition system and automatically creates a dictionary of the Kanji characters that are recognized by the system. In performing its functionality, the system described herein first obtains representative handwriting samples for each Kanji character that is to be recognized by the system. The system described herein then evaluates the samples to identify a set of subparts ("radicals") that are common to at least two of the Kanji characters. These radicals represent component roots from which the characters are formed. Each Kanji character is formed by one or more of these radicals. The radicals that are identified by the system described herein are not constrained to any preset definition (e.g., the traditional set of radicals used to organize Japanese dictionaries).Type: GrantFiled: August 25, 1995Date of Patent: March 21, 2000Assignee: Microsoft CorporationInventor: Michael Van Kleeck
-
Patent number: 6041141Abstract: There is disclosed a character recognition machine adapted to recognize Japanese characters such as kanjis and kanas. The machine comprises a character string storage portion, a character extraction portion, a character recognition portion, and a language processing portion. A character string to be recognized is stored as an image in the storage portion. The character extraction portion comprises a network consisting a plurality of interconnected operators each of which has numerous inputs and outputs. An evaluation function which assumes its minimum value when a character extraction produces the best results is calculated by the operators simultaneously so as to minimize the value of the function. The character recognition portion calculates degrees of similarity of a character pattern to various character categories, the character pattern being applied from the character extraction portion.Type: GrantFiled: August 10, 1995Date of Patent: March 21, 2000Assignee: Matsushita Electric Industrial Co., Ltd.Inventors: Hiroshi Yamamoto, Hisao Niwa, Yoshihiro Kojima, Susumu Maruno, Kazuhiro Kayashima, Toshiyuki Kouda, Hidetsugu Maekawa, Satoru Ito, Yasuharu Shimeki
-
Patent number: 6028970Abstract: A method and apparatus for enhancing optical character recognition comprises a data processor and memory for maintaining an error detection and correction log. The data processor maintains a memory table of a plurality of rules for generating a rule base determined by recognition of a particular context type of an electronic bit-map portion. The appropriate rule base comprises rules and combinations of rules for application to bit-map portion data. A rule, a rule base or data may be selected and obtained from an internal or external memory. Upon application of the rule base, the error detection and correction log maintains a record of clear errors, corrected data, failed rules of the rule base and the original bit map. Possible errors are flagged and clear errors are automatically corrected provided a confidence level in the correction is reached or exceeded.Type: GrantFiled: October 14, 1997Date of Patent: February 22, 2000Assignee: AT&T CorpInventors: Philip Silvano DiPiazza, Thomas C. Redman
-
Patent number: 6023536Abstract: A character string correction system corrects a spelling error in a character string input through the keyboard, OCT, etc. An error pattern representing frequent occurrences of errors is preliminarily set and stored in the memory, etc. A processor reads an input character string character by character, and compares the read character with the error pattern. If the input character string matches an error pattern, it is assumed that an error exists. The input character is replaced with one of the alternative characters. Using the input character string or the character string corrected with an alternative character, a dictionary (TRIE table) is searched. If a corresponding word is detected in the dictionary, the word is output as one of the recognition results.Type: GrantFiled: June 21, 1996Date of Patent: February 8, 2000Assignee: Fujitsu LimitedInventor: Eric M. Visser
-
Patent number: 6014460Abstract: A character strings reading device for reading character strings from input image data comprises cut-out recognition means for cutting out a segment corresponding to one character from the image data to perform individual character recognition every segment, a recognition result buffer for storing a recognition result of the cut-out recognition means, word searching means for searching a word string candidate corresponding to a combination of character candidates in the recognition result buffer, a word string candidate buffer for storing a search result of the word searching means, check portion determining means for determining a check target portion and a presumed character string of the check target portion on the basis of the result in the word string candidate buffer, and check means for judging the possibility of existence of the presumed character string on the check portion.Type: GrantFiled: December 19, 1995Date of Patent: January 11, 2000Assignee: NEC CorporationInventors: Toshikazu Fukushima, Eiki Ishidera, Masahiko Hamanaka, Daisuke Nishiwaki
-
Patent number: 6005973Abstract: In a handwriting recognition process, a list of candidate recognized words is identified (202) as a function of both comparison of dictionary entries to various combinations of recognized character combinations, and through a most likely character string and most likely string of digits analysis as developed without reference to the dictionary. The process selects (301) a word from the list and presents (302) this word to the user. The user then has the option of displaying (303) this list. When displaying the list, candidate words developed with reference to the dictionary are displayed in segregated manner from the most likely character string words and the most likely string of digits. The user can charge the selected word by choosing from the list, or edit the selected word.Type: GrantFiled: July 22, 1997Date of Patent: December 21, 1999Assignee: Motorola, Inc.Inventors: John L. C. Seybold, Chris A. Kortge
-
Patent number: 5995664Abstract: The invention provides an information recognition apparatus for recognition of an address or the like which can recognize recognition object information, which is inputted in the form which does not have punctuations or element designations, at a high speed and with a high degree of accuracy. An element word recognition unit detects element word candidates of each information element of recognition element information and likelihoods of the element word candidates. A record number acquisition unit retrieves a record storage unit to acquire, for each element word candidate detected by the element word recognition unit, a record number of a record including the element word candidate. A likelihood calculation unit calculates likelihoods of the records using corresponding likelihood counters.Type: GrantFiled: June 23, 1997Date of Patent: November 30, 1999Assignee: NEC CorporationInventor: Hideki Shimomura
-
Patent number: 5987170Abstract: There is disclosed a character recognition machine adapted to recognize Japanese characters such as kanjis and kanas. The machine comprises a character string storage portion, a character extraction portion, a character recognition portion, and a language processing portion. A character string to be recognized is stored as an image in the storage portion. The character extraction portion comprises a network consisting a plurality of interconnected operators each of which has numerous inputs and outputs. An evaluation function which assumes its minimum value when a character extraction produces the best results is calculated by the operators simultaneously so as to minimize the value of the function. The character recognition portion calculates degrees of similarity of a character pattern to various character categories, the character pattern being applied from the character extraction portion.Type: GrantFiled: November 6, 1997Date of Patent: November 16, 1999Assignee: Matsushita Electric Industrial Co., Ltd.Inventors: Hiroshi Yamamoto, Hisao Niwa, Yoshihiro Kojima, Susumu Maruno, Kazuhiro Kayashima, Yasuharu Shimeki, Toshiyuki Kouda, Hidetsugu Maekawa, Satoru Ito
-
Patent number: 5978801Abstract: A character and/or character-string retrieving method with retrieves a plurality of patterns at a time by using a single deterministic finite automaton prepared from a plurality of different patterns. There is also a method for optimizing the number of states for the above-mentioned retrieving method, and a storage medium having records of programs and data necessary for executing the above-mentioned character and/or character-string retrieving and a state number optimizing method. A plurality of regular expressions r.sub.1, r.sub.2, . . . , r.sub.n to be simultaneously retrieved by pattern matching are prepared, and then augmented to form an augmented regular expression ((r.sub.1)#.sub.1,).vertline.((r.sub.2)#.sub.2).vertline. . . . ((r.sub.n)#.sub.n). A deterministic finite automaton is constructed so that it treats states including positions corresponding to #.sub.1, #.sub.2, . . . , n, thereby simultaneously retrieving a plurality of regular expression patterns by distinguishing matches from one another.Type: GrantFiled: November 18, 1997Date of Patent: November 2, 1999Assignee: Sharp Kabushiki KaishaInventor: Natsuki Yuasa
-
Patent number: 5970170Abstract: A handwritten character recognition system that includes a document scanner for generating scanned images of a previously created document containing handwritten characters, and a pen and digitizing tablet for real time entry of handwritten characters by a user. The handwritten character recognition system includes an image processor connected from the document scanner for receiving the scanned image of a previously created document and generating one or more ordered cluster arrays. The ordered cluster arrays contain spatially ordered coordinate arrays of skeletal image arcs representing and corresponding to the strokes of the handwritten characters wherein the spatial order represents an induced time ordered sequence of creation of the strokes of the handwritten characters that emulates the sequence of creation of the character strokes.Type: GrantFiled: June 7, 1995Date of Patent: October 19, 1999Assignee: Kodak LimitedInventors: A. Julie Kadashevich, Mary F. Harvey, Kenneth C. Knowlton, Alexander N. Jourjine
-
Patent number: 5960113Abstract: An automatic language recognition method which comprises selecting a block of data from the received data and searching said block for elements that are "for" or "against" the presence of a particular language. Recognition is performed by searching for a plurality of known languages in a predetermined order, and by proceeding, for each language, with a search for at least one element characteristic of that language in the data block. It is possible to begin by searching for languages having a special signature, then for languages having special synchronization characters or keywords, and then for languages using mnemonics made up of a determined number of significant characters. The method is used for automatically selecting an interpreter module for decoding the received data, in particular the data received by a plotter. The method is also applicable to detecting a fault, a banner, or a switch of language in the received data.Type: GrantFiled: July 26, 1995Date of Patent: September 28, 1999Assignee: Oce-Nederland, B.V.Inventors: Reneka Even, Luc Genetier, Robertus C. W. T. M. Van Den Tillaart
-
Patent number: 5960114Abstract: A process for identifying and capturing text comprising the steps of identifying delimiters in the text, selecting delimiters from the identified delimiters to be delimiters to the left and right of the selected text, indicating only one character of the text between the left and right delimiters, and automatically blocking and capturing the text having the indicated character. In an alternate embodiment, the process comprises the steps of identifying delimiters in the text that are to the left and to the right of a cursor and identifying the position of the delimiters relative to the cursor, specifying at least one particular delimiter position relative to the cursor, indicating only one character of the text between the cursor and the specified delimiter position, and automatically blocking and capturing the text having the indicated character.Type: GrantFiled: October 28, 1996Date of Patent: September 28, 1999Assignee: International Business Machines CorporationInventors: Norman J. Dauerer, Donato O. Forlenza, Edward E. Kelley, Franco Motika
-
Patent number: 5956419Abstract: A method for operating a machine to perform unsupervised training of a set of character templates uses as the source of training samples an image source of character images, called glyphs, that need not be manually or automatically segmented or isolated prior to training. A recognition operation performed on the image source of character images produces a labeled glyph position data structure that includes, for each glyph in the image source, a glyph image position in the image source associating an estimated image location of the glyph in the image source with a character label paired with the glyph image position that indicates the character in the character set being trained. The labeled glyph position data and the image source are then used to determine sample image regions in the image source; each sample image region is large enough to contain at least a single glyph but need not be restricted in size to only contain a single glyph.Type: GrantFiled: April 28, 1995Date of Patent: September 21, 1999Assignee: Xerox CorporationInventors: Gary E. Kopec, Philip Andrew Chou
-
Patent number: 5949906Abstract: A character string region extracting apparatus comprises an extracting section for extracting a plurality of primitives from image information in which a character and a graphic pattern other than the character are mixedly present, a character string candidate region forming section for generating character candidate regions from the primitives and connecting the character candidate regions, thereby forming at least one character string candidate region, a character recognizing section for subjecting the character candidate regions included in the character string candidate region to character recognition, and a character string region extracting section for extracting a character string region from the character string candidate region by the character recognition.Type: GrantFiled: December 7, 1995Date of Patent: September 7, 1999Assignee: Kabushiki Kaisha ToshibaInventors: Hidekata Hontani, Shigeyoshi Shimotsuji
-
Patent number: 5943443Abstract: The present invention provides a document processing apparatus, document processing method and a storage medium for storing thereof on purpose to offer document filing in which document can be registered with a little computation cost and with high speed, and retrieval can be performed with little oversight. In the document processing apparatus, a similar character classifying element classifies characters in a document image into similar character categories in advance and stores the classified categories together with their representative image features. When the document image is registered, a pseudo character recognizing element executes, without identifying each character in the text region, classification into character categories based on the image features less than those used in the ordinary character recognition and stores the category strings generated by identifying each character with the inputted image.Type: GrantFiled: June 23, 1997Date of Patent: August 24, 1999Assignee: Fuji Xerox Co., Ltd.Inventors: Katsuhiko Itonori, Masaharu Ozaki
-
Patent number: 5940533Abstract: Information is extracted from a hand-written text by means of a graphics tablet (1). The curves thus obtained make it possible to recognize primitives, namely basic forms, representing a way of writing a part of a letter. More accomplished forms, called allographs, are constructed from primitives in order to construct a letter or in even a group of two or three letters. When the series of codes corresponds to a known object from a dictionary of allographs (3), each defined by the sequence of codes of its primitives, the corresponding allograph is recognized. The genetic algorithm is used to improve the population of strings. "Descendants", obtained by combining two starting strings (7), are constructed from a limited-quantity selection of strings (9), from among which descendants the most appropriate are in turn chosen, this gradually optimizing the population. Applications: recognition of cursive writing.Type: GrantFiled: December 19, 1995Date of Patent: August 17, 1999Assignee: U.S. Philips CorporationInventor: Philippe Gentric
-
Patent number: 5933531Abstract: An optical character recognition method and system are provided, employing context analysis and operator input, alternatively and in combination, on the same batch of documents. After automatic character recognition, the context analyzer processes the fields that are good enough to expect resolution. This will accept as many fields as possible without any operator intervention. For some other fields, the process uses operator input to certify the character-level OCR result of, or to enter, a certain percentage of the characters, so that context analysis may accept some of the remaining fields. If the context analyzer successfully identifies a small set of very close hypotheses, the process asks the operator to certify one or two characters to resolve the ambiguity between the hypotheses. For the fields that are still not resolved, the fields and the hypotheses are shown to the operator for acceptance, correction, or entry.Type: GrantFiled: August 23, 1996Date of Patent: August 3, 1999Assignee: International Business Machines CorporationInventor: Raymond Amand Lorie
-
Patent number: 5917944Abstract: A study system of a character recognizing and translating system is provided with a character data base for storing character data representing characters contained in a sensed image; a character shape analysis unit for analyzing the shape of a character to extract the features of character constituting elements constituting the character; and, a mask learning unit for generating sample mask data of the character constituting elements on the basis of the analysis result of the character shape analysis unit. A recognition system of the character recognizing and translating system is provided with a collating unit for collating the character data of a character to be recognized with the sample mask data so as to recognize the character.Type: GrantFiled: November 15, 1996Date of Patent: June 29, 1999Assignee: Hitachi, Ltd.Inventors: Shinji Wakisaka, Hiroko Sato
-
Patent number: 5909510Abstract: A word shape token-based document classification system prepares a plurality of sets of training data degraded by image quality and selects the optimum training data set by examining scores from a relevance measurement. The system achieves high accuracy from a wide range of image quality.Type: GrantFiled: May 19, 1997Date of Patent: June 1, 1999Assignees: Xerox Corporation, Fuji Xerox Company, Ltd.Inventor: Takehiro Nakayama
-
Patent number: 5905811Abstract: When texts recognized by an OCR are registered and those texts are searched by a search word, a state in which the search cannot be performed depending on an error recognition at the time of the recognition by the OCR is eliminated. It is an object of the invention to realize a process such that no burden is exerted on an operator or an apparatus by the above state. There are provided an OCR processor for recognizing stored image information and outputting a recognition result while switching the number of candidate characters to be outputted as a recognition result in accordance with a degree of a likelihood; and a document searcher for forming character trains for search from the recognition result and for registering as a search file.Type: GrantFiled: June 15, 1995Date of Patent: May 18, 1999Assignee: Canon Kabushiki KaishaInventors: Hirotaka Shiiyama, Katsumi Masaki
-
Patent number: 5894525Abstract: A method and system for simultaneously recognizing contextually related images is disclosed. The image of two separate fields is captured to form two captured data images such as a word and numerical amount. Each captured image is cut to form a segmentation graph based on the cuts. The shortest path in each segmentation graph is found wherein the additive length corresponds to a score and is associated with each directed arc of the segmentation graph. The segmentation graphs are combined into a joint segmentation graph and the highest scoring mutually consistent interpretations are found.Type: GrantFiled: December 6, 1995Date of Patent: April 13, 1999Assignee: NCR CorporationInventors: Craig R. Nohl, Charles E. Stenard
-
Patent number: 5881169Abstract: A method and apparatus for presenting and gathering text entries in a pen based input device. The apparatus and method allows for the entry of textual information into a computing device using either handwriting recognition, character selection, or expression selection. Individual entry and selection fields for each of these methods are provided to the user in a coordinated comprehensive method of displaying and gathering the textual information. Handwriting recognition functionality is used to facilitate character recognition. Furthermore, lists of expressions allow the text entry method and apparatus to anticipate the next character or expression to be entered by the user.Type: GrantFiled: September 13, 1996Date of Patent: March 9, 1999Assignee: Ericsson Inc.Inventor: Raymond Charles Henry, Jr.
-
Patent number: 5875265Abstract: An image analyzing and expression adding apparatus for presenting an operator with image impressions displayed in terms of sensitive language and for allowing the operator to use stored design know-how in adding expressions to the image on display. An image segmenting unit segments an input image into a plurality of areas. An image feature storing unit stores the physical feature quantities of each of the segmented areas. The physical feature quantities are processed by an image analyzing unit preparing visual feature quantities about the entire image. A sensitive influence quantity computing unit receives the visual feature quantities thus prepared and, based on information from a design know-how storing unit, computes factors of the terms representing sensitivity. The operator is presented with the factors of the sensitivity-expressing terms. In response, the operator instructs desired expressions using the sensitivity-expressing terms through an expression instructing unit.Type: GrantFiled: June 18, 1996Date of Patent: February 23, 1999Assignee: Fuji Xerox Co., Ltd.Inventor: Atsushi Kasao
-
Patent number: 5867597Abstract: An improved document management system with high-speed retrieval by example retrieves a document attaching a target document, in whole or part, by comparing descriptors of documents. A descriptor is derived from a pattern of labels, where each label is associated with a character, or more precisely, a character bounding box. A bounding box is found by examining contiguous pixels in an image. The particular label associated with a bounding box depends on the value of a metric measured from that bounding box. In one system, the metric is the spacing between the bounding box and an adjacent bounding box, in which the labels approximately reflect a pattern of word lengths. In other systems, where words lengths are not present, the metric might be pixel density and the pattern of labels approximately reflect a pattern of denser characters and sparser characters.Type: GrantFiled: September 5, 1995Date of Patent: February 2, 1999Assignee: Ricoh CorporationInventors: Mark Peairs, Jonathan Hull
-
Patent number: 5864635Abstract: Pre-recognition analysis on stroke characteristics such as the count, size and position of each stroke in real-time as it is drawn improves recognition accuracy. After each stroke, the set of strokes is weighted toward handwriting or gesture. The system uses a gesture threshold size to distinguish between gestures and handwriting. The system also uses the stroke count to distinguish between the two inputs, relying on the knowledge of the allowable number of strokes in a gesture. The count information may also be used in conjunction with the stroke size information to weight the set of strokes between gestures handwriting. Once the stroke size crosses a gesture vs. text size threshold, the result is weighted toward gestures. By examining the `white space` between strokes and the juxtaposition of the strokes, a gesture vs. text determination can be made with high accuracy.Type: GrantFiled: June 14, 1996Date of Patent: January 26, 1999Assignee: International Business Machines CorporationInventors: John Mark Zetts, Maurice Roger Desrosiers
-
Patent number: 5862259Abstract: A pattern recognition system classifies images of patterns in which the definition of individual features of the pattern may have become blurred. The image is segmented into pieces of arbitrary size and shape, and various combinations are examined to determine those which represent the most likely segmentation of the pattern into its individual features. These individual features are then classified, according to known techniques. Through the use of a second order Markov model, not all possible combinations of pieces need to be examined, to determine the best ones. Rather, the examination of various combinations is limited in accordance with previously determined information, to thereby render the process more efficient. By combining multiple, independently determined probabilities, the accuracy of the overall operation is enhanced.Type: GrantFiled: March 27, 1996Date of Patent: January 19, 1999Assignee: Caere CorporationInventors: Mindy Bokser, Leonard Pon, Jun Yang, Kenneth Choy
-
Patent number: 5862256Abstract: To distinguish between gestures and handwriting, the pen subsystem examines the size of the user's writing. The user may set the gesture versus text size according to his/her handwriting style. In this manner, the user knows exactly how large to make the gestures. If the user declines to customize the setting, the pen subsystem can query other user settings that indicate approximate size of the user's handwriting. A third choice available the system dynamically determine and track the size of handwriting. This would allow multiple users to serially use the computer without having to change settings.Type: GrantFiled: June 14, 1996Date of Patent: January 19, 1999Assignee: International Business Machines CorporationInventors: John Mark Zetts, Maurice Roger Desrosiers
-
Patent number: 5850480Abstract: The present invention includes methods of correcting optical character recognition errors occurring during recognition of alphanumeric character strings contained within one or more predetermined types of alphanumeric character fields. The methods may be practiced with a document processing system having (1) a optical character recognition device for scanning documents and outputting bit-map image data; (2) a recognition engine for converting the bit-map image data into possibly correct alphanumeric characters with associated confidence values; and (3) at least one lexicon of character strings consisting of a list of at least a portion of all of the possible character string values for each of the fields being processed. The present invention corrects OCR errors by performing a contextual comparison analysis between the alphanumeric characters outputted from the recognition engine and the lexicon of character strings.Type: GrantFiled: May 30, 1996Date of Patent: December 15, 1998Assignee: Scan-Optics, Inc.Inventor: Edward Francis Scanlon
-
Patent number: 5848191Abstract: A method of automatically generating a thematic summary from a document image without performing character recognition to generate an ASCII representation of the document text. The method begins with decomposition of the document image into text blocks, and text lines. Using the median x-height of text blocks the main body of text is identified. Afterward, word image equivalence classes and sentence boundaries within the blocks of the main body of text are determined. The word image equivalence classes are used to identify thematic words. These, in turn are used to score the sentences within the main body of text, and the highest scoring sentences are selected for extraction.Type: GrantFiled: December 14, 1995Date of Patent: December 8, 1998Assignee: Xerox CorporationInventors: Francine R. Chen, Dan S. Bloomberg, John W. Tukey
-
Patent number: 5835635Abstract: A method for providing an effective completion of characters required in inputting a partial character string.Type: GrantFiled: June 27, 1995Date of Patent: November 10, 1998Assignee: Interntional Business Machines CorporationInventors: Hiroshi Nozaki, Nobuyasu Itoh
-
Patent number: 5825926Abstract: When character codes and character patterns coexist in the retrieval character string and/or the retrieval objective data, they are standardized into character patterns, and a character string matching with or similar to the retrieval character string is retrieved from the retrieval objective data. A character code included in the retrieval character string and/or the retrieval objective data is converted into a character pattern by referring to a prepared character pattern dictionary, and by pattern matching between two character patterns, a character string matching with or similar to the retrieval character string is picked up from the retrieval objective data and produced as the result of retrieval.Type: GrantFiled: April 12, 1995Date of Patent: October 20, 1998Assignee: Fujitsu LimitedInventor: Hiroshi Tanaka
-
Patent number: 5816717Abstract: A method of recalling stored labels is disclosed in which target data is provided by a user so that only labels having label data matching the target data are displayed.Type: GrantFiled: February 10, 1997Date of Patent: October 6, 1998Assignee: Esselte N.V.Inventors: Michael Andrew Beadman, Paul Robert Bridle
-
Patent number: 5812818Abstract: A translating facsimile machine includes a facsimile input receiver which feeds a facsimile input signal representative of a source language text to an optical character recognizer. The optical character recognizer converts the facsimile signal to a source natural language text signal which is then tested in a source language recognizer and, if recognized, is then processed by a translator which converts the source natural language signal to a target natural language signal. An output device such as a printer receives the target natural language signal and outputs it to a roll of paper or the like.Type: GrantFiled: November 17, 1994Date of Patent: September 22, 1998Assignee: Transfax Inc.Inventors: Richard Adler, Claude Richaud, Troy W. Livingston, Wayne D. Jung
-
Patent number: 5805730Abstract: A statistical classifier that can be used for pattern recognition is trained to recognize negative, or improper patterns as well as proper patterns that are positively associated with desired output classes. A set of training samples includes both the negative and positive patterns, and target output values for the negative patterns are set so that no recognized class is indicated. The negative patterns are selected for training with less frequency than the positive patterns, and their effect on training is also modified, so that training is focused more heavily on improper patterns.Type: GrantFiled: August 8, 1995Date of Patent: September 8, 1998Assignee: Apple Computer, Inc.Inventors: Larry S. Yaeger, Richard F. Lyon
-
Patent number: 5787197Abstract: A dictionary based post-processing technique for an on-line handwriting recognition system is described. An input word has all punctuation removed, and the word is checked against a word processing dictionary. If any word matches against the dictionary, it is verified as a valid word. If it does not verify, a stroke match function and a spell-aid dictionary are used to construct a list of possible words. In some cases, the list is appended with possible words based on changing the first character of the originally recognized word. A character-match score, a substitution score and a word length are assigned to the items on the list. A word hypothesis is constructed from the list with each such word being assigned a score. The word with the best score is chosen as the output word for the processor.Type: GrantFiled: March 28, 1994Date of Patent: July 28, 1998Assignee: International Business Machines CorporationInventors: Homayoon Sadr Mohammad Beigi, Tetsunosuke Fujisaki, William David Modlin, Kenneth Steven Wenstrup
-
Patent number: 5774586Abstract: Groups of symbols to be recognized are standardized by fitting four flexible curves to the group of symbols. The curves are fitted by minimizing a cost or energy function that associates a cost with the curvature of the curves, the slant of the curves, the displacement in spacing between the curves and the distance of maxima and minima points from the curves. After the curves are fitted to the group of symbols, the symbols are standardized by transforming coordinates systems so that the fitted curves are placed in a predetermined configuration.Type: GrantFiled: May 4, 1994Date of Patent: June 30, 1998Assignee: NCR CorporationInventor: Yann Andre LeCun
-
Patent number: 5774588Abstract: A system and method for more efficiently comparing an unverified string to a lexicon, which filters the lexicon through multiple steps to reduce the number of entries to be directly compared with the unverified string. The method begins by preparing the lexicon with an n-gram encoding, partitioning and hashing process, which can be accomplished in advance of any processing of unverified strings. The unknown is compared first by partitioning and hashing it in the same way to reduce the lexicon in a computationally inexpensive manner. This is followed by an encoded vector comparison step, and finally by a direct string comparison step, which is the most computationally expensive. The reduction of the lexicon is accomplished without arbitrarily eliminating any large portions of the lexicon that might contain relevant candidates. At the same time, the method avoids the need to compare the unverified string directly or indirectly with all the entries in the lexicon.Type: GrantFiled: June 7, 1995Date of Patent: June 30, 1998Assignee: United Parcel Service of America, Inc.Inventor: Liang Li
-
Patent number: 5774587Abstract: Recognition of various images uses, a presentation of reference (standard) objects transformed into corresponding discretized reference signals without repeating quantization levels. For a set of these signals, a dynamical system is formed of the kind of iterable mapping of d-dimensional cube into itself with limit cycles by the number of reference signals. Representations of an object under identification are transformed into information signals that are applied to the input of the dynamical system as initial conditions and by the results of its functioning a decision is made on recognition of the object under identification. Distinctive features of the method are the conditions of looking through the values of information at the input of the dynamical system and the conditions of determining the correspondence of the information signal to one of the reference signals by the phase trajectory of the dynamical system getting to one of the limit cycles in the process of its functioning.Type: GrantFiled: December 1, 1995Date of Patent: June 30, 1998Inventors: Alexander Dmitriev, Yury Andaeev, Yury Belsky, Dmitry Kuminov, Andzei Panas, Sergai Starkov
-
Patent number: 5764799Abstract: An OCR 300 stores signals representative of reference characters and scans a document 302 to generate a bit mapped digitized image of the document. After the characters and the words are recognized and candidate characters are identified, the initial results are post-processed to compare clusters of identical images to the candidates. Where the candidates of all equivalent images in a cluster are the same, the candidates are output as representative of the image on the document. Where the candidates are different, a majority of identical candidates determines the recognized candidates. Other post-processing operations include verification and re-recognition.Type: GrantFiled: June 26, 1995Date of Patent: June 9, 1998Assignee: Research Foundation of State of State of New YorkInventors: Tao Hong, Jonathan J. Hull
-
Patent number: 5761538Abstract: An improved method of matching a query string against a plurality of candidate strings replaces a highly computationally intensive string edit distance calculation with a less computationally intensive lower bound estimate. The lower bound estimate of the string edit distance between the two strings is calculated by equalising the lengths of the two strings by adding padding elements to the shorter one. The elements of the strings are then sorted and the substitution costs between corresponding elements are summed.Type: GrantFiled: July 10, 1995Date of Patent: June 2, 1998Assignee: Hewlett-Packard CompanyInventor: Richard Hull
-
Patent number: 5757983Abstract: A document retrieval method and system for retrieving, from a document database storing document data in the form of character codes, a document which contains given search terms and which meets a given search query condition. From documents loaded from the document database, a document containing terms which match the search terms is searched to generate document identification (ID) information including a document identifier of the searched document and containing match terms found to match with the search terms as well as term identifiers of the match terms and position information of the match terms in the searched document. A decision is then made as to whether or not the position information of the match terms satisfies a positional condition specified in the search query condition concerning a positional relation between the search terms, and match information is then generated indicating satisfaction of the search query condition when the positional condition is satisfied.Type: GrantFiled: August 21, 1995Date of Patent: May 26, 1998Assignee: Hitachi, Ltd.Inventors: Hisamitsu Kawaguchi, Mitsuru Akizawa, Kanji Kato, Atsushi Hatakeyama, Hiromichi Fujisawa