Context Analysis Or Word Recognition (e.g., Character String) Patents (Class 382/229)
  • Patent number: 6137911
    Abstract: Documents are classified into one or more clusters corresponding to predefined classification categories by building a knowledge base comprising matrices of vectors which indicate the significance of terms within a corpus of text formed by the documents and classified in the knowledge base to each cluster. The significance of terms is determined assuming a standard normal probability distribution, and terms are determined to be significant to a cluster if their probability of occurrence being due to chance is low. For each cluster, statistical signatures comprising sums of weighted products and intersections of cluster terms to corpus terms are generated and used as discriminators for classifying documents. The knowledge base is built using prefix and suffix lexical rules which are context-sensitive and applied selectively to improve the accuracy and precision of classification.
    Type: Grant
    Filed: June 16, 1997
    Date of Patent: October 24, 2000
    Assignee: The Dialog Corporation PLC
    Inventor: Maxim Zhilyaev
  • Patent number: 6128412
    Abstract: Provided is a probabilistic statistical data compression/restoration method for variable-length encoding a source character and restoring a variable-length code to a character using the probability of occurrence of the source character which appears following a character string (context) of n characters which immediately precede the source character. When variable-length encoding is performed, a context registration unit successively registers context based upon an entered character without fixing the length (degree) of context, and a compressing encoder selects registered context satisfying a predetermined condition, e.g., registered context for which the frequency of occurrence is greater than a fixed value, as context (encoding context) used in encoding, and variable-length encodes a source character by using this encoding context. A restoration unit restores a code to a character by executing processing which is the reverse of the processing executed by the compressing encoder.
    Type: Grant
    Filed: March 31, 1997
    Date of Patent: October 3, 2000
    Assignee: Fujitsu Limited
    Inventor: Noriko Satoh
  • Patent number: 6122402
    Abstract: Pattern encoding is carried out by 1) substituting an index data of a registered pattern for a position data in a library with respect to an index data peculiar to each of the extracted patterns, 2) taking a difference between an off-set position data of the extracted pattern and an off-set position data of the registered pattern whereby an off-set position difference data is provided, and 3) encoding the position data and the off-set position difference data and providing an encoded data. A pattern extracting unit obtains the extracted patterns from image data. An accumulating/checking unit accumulates the extracted patterns as accumulated patterns, assigns indexes specific to the accumulated patterns, and checks each extracted pattern by comparison with the accumulated patterns. When an accumulated pattern is found to match the extracted pattern, the accumulating/checking unit provides a position data within a library instead of the index data, and also provides the off-set position difference data.
    Type: Grant
    Filed: December 3, 1997
    Date of Patent: September 19, 2000
    Assignee: NEC Corporation
    Inventors: Mitsutoshi Arai, Keiji Yamada, Toshihiko Okamura, Takahiro Hongu, Kouichirou Hirao
  • Patent number: 6111985
    Abstract: A method and mechanism for displaying partial results of full context handwriting recognition. As handwritten characters are entered into a system, a shape matcher associates the character with a plurality of alternate code points, with each alternate code point having probability information associated therewith. The alternate code points are placed at the end of a queue, and a cost is determined from each alternate code point to any immediately preceding alternate in the queue. The cost is based on the probability information of the alternates and a transition cost therebetween. Then, the lowest cost path back from each of the alternates at the end of the queue to an alternate at the beginning of the queue is determined. If each lowest cost path back converges to a common alternate in the queue, the common alternate and any previous alternates on the path back are recognized as the code points for each of the handwritten characters associated therewith.
    Type: Grant
    Filed: June 6, 1997
    Date of Patent: August 29, 2000
    Assignee: Microsoft Corporation
    Inventors: Gregory N. Hullender, Patrick M. Haluptzok
  • Patent number: 6104500
    Abstract: A processor-based fax routing method receives digital data representing a facsimile document. Without performing optical character recognition ("OCR"), the method identifies in the image data a keyword block of text, and an addressee-name block of text that is located near the keyword block of text. The fax routing method then performs OCR on the image data extracting therefrom texts for the keyword, the name of the addressee, and other text present in the facsimile. Using probabilities computed between the text of the name of the addressee and names in a list of possible addressees, and between the keyword and keywords in a list of keywords, the fax routing method determines an addressee for the document. The fax routing method then converts all text into email addressed to the fax's addressee, and stores the email onto an email server from which it may be retrieved.
    Type: Grant
    Filed: April 29, 1998
    Date of Patent: August 15, 2000
    Assignee: BCL, Computer Inc.
    Inventors: Hassan Alam, Horace Dediu, Scot Tupaj
  • Patent number: 6097841
    Abstract: A character recognition apparatus for inferring the entire character string solely from a user-input handwritten keyword and displaying the inferred result as a candidate character string. The apparatus of the invention comprises: a word dictionary storing word identification information and hierarchy information for layering a plurality of words into a hierarchy and for recognizing each of the words within the hierarchy; a character transition probability table a4 storing probabilities of transitions from any one character to another, and those pieces of the word identification information which correspond to combinations of characters resulting from the transitions; and an optimization unit for using the character transition probability table in optimizing candidate character strings obtained by a recognition unit.
    Type: Grant
    Filed: May 20, 1997
    Date of Patent: August 1, 2000
    Assignee: Hitachi, Ltd.
    Inventors: Keiko Gunji, Koyo Katsura, Soshiro Kuzunuki, Masaki Miura, Toshimi Yokota
  • Patent number: 6084985
    Abstract: A method for on-line handwriting recognition is based on a hidden Markov model and implies the following steps: sensing real-time at least an instantaneous write position of the handwriting, deriving from the handwriting a time-conforming string of segments each associated to a handwriting feature vector, matching the time-conforming string to various example strings from a data base pertaining to the handwriting, and selecting from the example strings a best-matching recognition string through hidden-Markov processing, or rejecting the handwriting as unrecognized. In particular, the feature vectors are based on local observations derived from a single segment, as well as on compacted observations derived from time-sequential segments.
    Type: Grant
    Filed: October 3, 1997
    Date of Patent: July 4, 2000
    Assignee: U.S. Philips Corporation
    Inventors: Jannes G. A. Dolfing, Reinhold Hab-Umbach
  • Patent number: 6075896
    Abstract: When character codes and character patterns coexist in the retrieval character string and/or the retrieval objective data, they are standardized into character patterns, and a character string matching with or similar to the retrieval character string is retrieved from the retrieval objective data. A character code included in the retrieval character string and/or the retrieval objective data is converted into a character pattern by referring to a prepared character pattern dictionary, and by pattern matching between two character patterns, a character string matching with or similar to the retrieval character string is picked up from the retrieval objective data and produced as the result of retrieval.
    Type: Grant
    Filed: August 10, 1998
    Date of Patent: June 13, 2000
    Assignee: Fujitsu Limited
    Inventor: Hiroshi Tanaka
  • Patent number: 6047251
    Abstract: The disclosed invention utilizes a dictionary-based approach to identify languages within different zones in a multi-lingual document. As a first step, a document image is segmented into various zones, regions and word tokens, using suitable geometric properties. Within each zone, the word tokens are compared to dictionaries associated with various candidate languages, and the language that exhibits the highest confidence factor is initially identified as the language of the zone. Subsequently, each zone is further split into regions. The language for each region is then identified, using the confidence factors for the words of that region. For any language determination having a low confidence value, the previously determined language of the zone is employed to assist the identification process.
    Type: Grant
    Filed: September 15, 1997
    Date of Patent: April 4, 2000
    Assignee: Caere Corporation
    Inventors: Leonard K. Pon, Tapas Kanungo, Jun Yang, Kenneth Chan Choy, Mindy R. Bokser
  • Patent number: 6041137
    Abstract: The system described herein automatically defines a set of radicals to be used in a Kanji character handwriting recognition system and automatically creates a dictionary of the Kanji characters that are recognized by the system. In performing its functionality, the system described herein first obtains representative handwriting samples for each Kanji character that is to be recognized by the system. The system described herein then evaluates the samples to identify a set of subparts ("radicals") that are common to at least two of the Kanji characters. These radicals represent component roots from which the characters are formed. Each Kanji character is formed by one or more of these radicals. The radicals that are identified by the system described herein are not constrained to any preset definition (e.g., the traditional set of radicals used to organize Japanese dictionaries).
    Type: Grant
    Filed: August 25, 1995
    Date of Patent: March 21, 2000
    Assignee: Microsoft Corporation
    Inventor: Michael Van Kleeck
  • Patent number: 6041141
    Abstract: There is disclosed a character recognition machine adapted to recognize Japanese characters such as kanjis and kanas. The machine comprises a character string storage portion, a character extraction portion, a character recognition portion, and a language processing portion. A character string to be recognized is stored as an image in the storage portion. The character extraction portion comprises a network consisting a plurality of interconnected operators each of which has numerous inputs and outputs. An evaluation function which assumes its minimum value when a character extraction produces the best results is calculated by the operators simultaneously so as to minimize the value of the function. The character recognition portion calculates degrees of similarity of a character pattern to various character categories, the character pattern being applied from the character extraction portion.
    Type: Grant
    Filed: August 10, 1995
    Date of Patent: March 21, 2000
    Assignee: Matsushita Electric Industrial Co., Ltd.
    Inventors: Hiroshi Yamamoto, Hisao Niwa, Yoshihiro Kojima, Susumu Maruno, Kazuhiro Kayashima, Toshiyuki Kouda, Hidetsugu Maekawa, Satoru Ito, Yasuharu Shimeki
  • Patent number: 6028970
    Abstract: A method and apparatus for enhancing optical character recognition comprises a data processor and memory for maintaining an error detection and correction log. The data processor maintains a memory table of a plurality of rules for generating a rule base determined by recognition of a particular context type of an electronic bit-map portion. The appropriate rule base comprises rules and combinations of rules for application to bit-map portion data. A rule, a rule base or data may be selected and obtained from an internal or external memory. Upon application of the rule base, the error detection and correction log maintains a record of clear errors, corrected data, failed rules of the rule base and the original bit map. Possible errors are flagged and clear errors are automatically corrected provided a confidence level in the correction is reached or exceeded.
    Type: Grant
    Filed: October 14, 1997
    Date of Patent: February 22, 2000
    Assignee: AT&T Corp
    Inventors: Philip Silvano DiPiazza, Thomas C. Redman
  • Patent number: 6023536
    Abstract: A character string correction system corrects a spelling error in a character string input through the keyboard, OCT, etc. An error pattern representing frequent occurrences of errors is preliminarily set and stored in the memory, etc. A processor reads an input character string character by character, and compares the read character with the error pattern. If the input character string matches an error pattern, it is assumed that an error exists. The input character is replaced with one of the alternative characters. Using the input character string or the character string corrected with an alternative character, a dictionary (TRIE table) is searched. If a corresponding word is detected in the dictionary, the word is output as one of the recognition results.
    Type: Grant
    Filed: June 21, 1996
    Date of Patent: February 8, 2000
    Assignee: Fujitsu Limited
    Inventor: Eric M. Visser
  • Patent number: 6014460
    Abstract: A character strings reading device for reading character strings from input image data comprises cut-out recognition means for cutting out a segment corresponding to one character from the image data to perform individual character recognition every segment, a recognition result buffer for storing a recognition result of the cut-out recognition means, word searching means for searching a word string candidate corresponding to a combination of character candidates in the recognition result buffer, a word string candidate buffer for storing a search result of the word searching means, check portion determining means for determining a check target portion and a presumed character string of the check target portion on the basis of the result in the word string candidate buffer, and check means for judging the possibility of existence of the presumed character string on the check portion.
    Type: Grant
    Filed: December 19, 1995
    Date of Patent: January 11, 2000
    Assignee: NEC Corporation
    Inventors: Toshikazu Fukushima, Eiki Ishidera, Masahiko Hamanaka, Daisuke Nishiwaki
  • Patent number: 6005973
    Abstract: In a handwriting recognition process, a list of candidate recognized words is identified (202) as a function of both comparison of dictionary entries to various combinations of recognized character combinations, and through a most likely character string and most likely string of digits analysis as developed without reference to the dictionary. The process selects (301) a word from the list and presents (302) this word to the user. The user then has the option of displaying (303) this list. When displaying the list, candidate words developed with reference to the dictionary are displayed in segregated manner from the most likely character string words and the most likely string of digits. The user can charge the selected word by choosing from the list, or edit the selected word.
    Type: Grant
    Filed: July 22, 1997
    Date of Patent: December 21, 1999
    Assignee: Motorola, Inc.
    Inventors: John L. C. Seybold, Chris A. Kortge
  • Patent number: 5995664
    Abstract: The invention provides an information recognition apparatus for recognition of an address or the like which can recognize recognition object information, which is inputted in the form which does not have punctuations or element designations, at a high speed and with a high degree of accuracy. An element word recognition unit detects element word candidates of each information element of recognition element information and likelihoods of the element word candidates. A record number acquisition unit retrieves a record storage unit to acquire, for each element word candidate detected by the element word recognition unit, a record number of a record including the element word candidate. A likelihood calculation unit calculates likelihoods of the records using corresponding likelihood counters.
    Type: Grant
    Filed: June 23, 1997
    Date of Patent: November 30, 1999
    Assignee: NEC Corporation
    Inventor: Hideki Shimomura
  • Patent number: 5987170
    Abstract: There is disclosed a character recognition machine adapted to recognize Japanese characters such as kanjis and kanas. The machine comprises a character string storage portion, a character extraction portion, a character recognition portion, and a language processing portion. A character string to be recognized is stored as an image in the storage portion. The character extraction portion comprises a network consisting a plurality of interconnected operators each of which has numerous inputs and outputs. An evaluation function which assumes its minimum value when a character extraction produces the best results is calculated by the operators simultaneously so as to minimize the value of the function. The character recognition portion calculates degrees of similarity of a character pattern to various character categories, the character pattern being applied from the character extraction portion.
    Type: Grant
    Filed: November 6, 1997
    Date of Patent: November 16, 1999
    Assignee: Matsushita Electric Industrial Co., Ltd.
    Inventors: Hiroshi Yamamoto, Hisao Niwa, Yoshihiro Kojima, Susumu Maruno, Kazuhiro Kayashima, Yasuharu Shimeki, Toshiyuki Kouda, Hidetsugu Maekawa, Satoru Ito
  • Patent number: 5978801
    Abstract: A character and/or character-string retrieving method with retrieves a plurality of patterns at a time by using a single deterministic finite automaton prepared from a plurality of different patterns. There is also a method for optimizing the number of states for the above-mentioned retrieving method, and a storage medium having records of programs and data necessary for executing the above-mentioned character and/or character-string retrieving and a state number optimizing method. A plurality of regular expressions r.sub.1, r.sub.2, . . . , r.sub.n to be simultaneously retrieved by pattern matching are prepared, and then augmented to form an augmented regular expression ((r.sub.1)#.sub.1,).vertline.((r.sub.2)#.sub.2).vertline. . . . ((r.sub.n)#.sub.n). A deterministic finite automaton is constructed so that it treats states including positions corresponding to #.sub.1, #.sub.2, . . . , n, thereby simultaneously retrieving a plurality of regular expression patterns by distinguishing matches from one another.
    Type: Grant
    Filed: November 18, 1997
    Date of Patent: November 2, 1999
    Assignee: Sharp Kabushiki Kaisha
    Inventor: Natsuki Yuasa
  • Patent number: 5970170
    Abstract: A handwritten character recognition system that includes a document scanner for generating scanned images of a previously created document containing handwritten characters, and a pen and digitizing tablet for real time entry of handwritten characters by a user. The handwritten character recognition system includes an image processor connected from the document scanner for receiving the scanned image of a previously created document and generating one or more ordered cluster arrays. The ordered cluster arrays contain spatially ordered coordinate arrays of skeletal image arcs representing and corresponding to the strokes of the handwritten characters wherein the spatial order represents an induced time ordered sequence of creation of the strokes of the handwritten characters that emulates the sequence of creation of the character strokes.
    Type: Grant
    Filed: June 7, 1995
    Date of Patent: October 19, 1999
    Assignee: Kodak Limited
    Inventors: A. Julie Kadashevich, Mary F. Harvey, Kenneth C. Knowlton, Alexander N. Jourjine
  • Patent number: 5960113
    Abstract: An automatic language recognition method which comprises selecting a block of data from the received data and searching said block for elements that are "for" or "against" the presence of a particular language. Recognition is performed by searching for a plurality of known languages in a predetermined order, and by proceeding, for each language, with a search for at least one element characteristic of that language in the data block. It is possible to begin by searching for languages having a special signature, then for languages having special synchronization characters or keywords, and then for languages using mnemonics made up of a determined number of significant characters. The method is used for automatically selecting an interpreter module for decoding the received data, in particular the data received by a plotter. The method is also applicable to detecting a fault, a banner, or a switch of language in the received data.
    Type: Grant
    Filed: July 26, 1995
    Date of Patent: September 28, 1999
    Assignee: Oce-Nederland, B.V.
    Inventors: Reneka Even, Luc Genetier, Robertus C. W. T. M. Van Den Tillaart
  • Patent number: 5960114
    Abstract: A process for identifying and capturing text comprising the steps of identifying delimiters in the text, selecting delimiters from the identified delimiters to be delimiters to the left and right of the selected text, indicating only one character of the text between the left and right delimiters, and automatically blocking and capturing the text having the indicated character. In an alternate embodiment, the process comprises the steps of identifying delimiters in the text that are to the left and to the right of a cursor and identifying the position of the delimiters relative to the cursor, specifying at least one particular delimiter position relative to the cursor, indicating only one character of the text between the cursor and the specified delimiter position, and automatically blocking and capturing the text having the indicated character.
    Type: Grant
    Filed: October 28, 1996
    Date of Patent: September 28, 1999
    Assignee: International Business Machines Corporation
    Inventors: Norman J. Dauerer, Donato O. Forlenza, Edward E. Kelley, Franco Motika
  • Patent number: 5956419
    Abstract: A method for operating a machine to perform unsupervised training of a set of character templates uses as the source of training samples an image source of character images, called glyphs, that need not be manually or automatically segmented or isolated prior to training. A recognition operation performed on the image source of character images produces a labeled glyph position data structure that includes, for each glyph in the image source, a glyph image position in the image source associating an estimated image location of the glyph in the image source with a character label paired with the glyph image position that indicates the character in the character set being trained. The labeled glyph position data and the image source are then used to determine sample image regions in the image source; each sample image region is large enough to contain at least a single glyph but need not be restricted in size to only contain a single glyph.
    Type: Grant
    Filed: April 28, 1995
    Date of Patent: September 21, 1999
    Assignee: Xerox Corporation
    Inventors: Gary E. Kopec, Philip Andrew Chou
  • Patent number: 5949906
    Abstract: A character string region extracting apparatus comprises an extracting section for extracting a plurality of primitives from image information in which a character and a graphic pattern other than the character are mixedly present, a character string candidate region forming section for generating character candidate regions from the primitives and connecting the character candidate regions, thereby forming at least one character string candidate region, a character recognizing section for subjecting the character candidate regions included in the character string candidate region to character recognition, and a character string region extracting section for extracting a character string region from the character string candidate region by the character recognition.
    Type: Grant
    Filed: December 7, 1995
    Date of Patent: September 7, 1999
    Assignee: Kabushiki Kaisha Toshiba
    Inventors: Hidekata Hontani, Shigeyoshi Shimotsuji
  • Patent number: 5943443
    Abstract: The present invention provides a document processing apparatus, document processing method and a storage medium for storing thereof on purpose to offer document filing in which document can be registered with a little computation cost and with high speed, and retrieval can be performed with little oversight. In the document processing apparatus, a similar character classifying element classifies characters in a document image into similar character categories in advance and stores the classified categories together with their representative image features. When the document image is registered, a pseudo character recognizing element executes, without identifying each character in the text region, classification into character categories based on the image features less than those used in the ordinary character recognition and stores the category strings generated by identifying each character with the inputted image.
    Type: Grant
    Filed: June 23, 1997
    Date of Patent: August 24, 1999
    Assignee: Fuji Xerox Co., Ltd.
    Inventors: Katsuhiko Itonori, Masaharu Ozaki
  • Patent number: 5940533
    Abstract: Information is extracted from a hand-written text by means of a graphics tablet (1). The curves thus obtained make it possible to recognize primitives, namely basic forms, representing a way of writing a part of a letter. More accomplished forms, called allographs, are constructed from primitives in order to construct a letter or in even a group of two or three letters. When the series of codes corresponds to a known object from a dictionary of allographs (3), each defined by the sequence of codes of its primitives, the corresponding allograph is recognized. The genetic algorithm is used to improve the population of strings. "Descendants", obtained by combining two starting strings (7), are constructed from a limited-quantity selection of strings (9), from among which descendants the most appropriate are in turn chosen, this gradually optimizing the population. Applications: recognition of cursive writing.
    Type: Grant
    Filed: December 19, 1995
    Date of Patent: August 17, 1999
    Assignee: U.S. Philips Corporation
    Inventor: Philippe Gentric
  • Patent number: 5933531
    Abstract: An optical character recognition method and system are provided, employing context analysis and operator input, alternatively and in combination, on the same batch of documents. After automatic character recognition, the context analyzer processes the fields that are good enough to expect resolution. This will accept as many fields as possible without any operator intervention. For some other fields, the process uses operator input to certify the character-level OCR result of, or to enter, a certain percentage of the characters, so that context analysis may accept some of the remaining fields. If the context analyzer successfully identifies a small set of very close hypotheses, the process asks the operator to certify one or two characters to resolve the ambiguity between the hypotheses. For the fields that are still not resolved, the fields and the hypotheses are shown to the operator for acceptance, correction, or entry.
    Type: Grant
    Filed: August 23, 1996
    Date of Patent: August 3, 1999
    Assignee: International Business Machines Corporation
    Inventor: Raymond Amand Lorie
  • Patent number: 5917944
    Abstract: A study system of a character recognizing and translating system is provided with a character data base for storing character data representing characters contained in a sensed image; a character shape analysis unit for analyzing the shape of a character to extract the features of character constituting elements constituting the character; and, a mask learning unit for generating sample mask data of the character constituting elements on the basis of the analysis result of the character shape analysis unit. A recognition system of the character recognizing and translating system is provided with a collating unit for collating the character data of a character to be recognized with the sample mask data so as to recognize the character.
    Type: Grant
    Filed: November 15, 1996
    Date of Patent: June 29, 1999
    Assignee: Hitachi, Ltd.
    Inventors: Shinji Wakisaka, Hiroko Sato
  • Patent number: 5909510
    Abstract: A word shape token-based document classification system prepares a plurality of sets of training data degraded by image quality and selects the optimum training data set by examining scores from a relevance measurement. The system achieves high accuracy from a wide range of image quality.
    Type: Grant
    Filed: May 19, 1997
    Date of Patent: June 1, 1999
    Assignees: Xerox Corporation, Fuji Xerox Company, Ltd.
    Inventor: Takehiro Nakayama
  • Patent number: 5905811
    Abstract: When texts recognized by an OCR are registered and those texts are searched by a search word, a state in which the search cannot be performed depending on an error recognition at the time of the recognition by the OCR is eliminated. It is an object of the invention to realize a process such that no burden is exerted on an operator or an apparatus by the above state. There are provided an OCR processor for recognizing stored image information and outputting a recognition result while switching the number of candidate characters to be outputted as a recognition result in accordance with a degree of a likelihood; and a document searcher for forming character trains for search from the recognition result and for registering as a search file.
    Type: Grant
    Filed: June 15, 1995
    Date of Patent: May 18, 1999
    Assignee: Canon Kabushiki Kaisha
    Inventors: Hirotaka Shiiyama, Katsumi Masaki
  • Patent number: 5894525
    Abstract: A method and system for simultaneously recognizing contextually related images is disclosed. The image of two separate fields is captured to form two captured data images such as a word and numerical amount. Each captured image is cut to form a segmentation graph based on the cuts. The shortest path in each segmentation graph is found wherein the additive length corresponds to a score and is associated with each directed arc of the segmentation graph. The segmentation graphs are combined into a joint segmentation graph and the highest scoring mutually consistent interpretations are found.
    Type: Grant
    Filed: December 6, 1995
    Date of Patent: April 13, 1999
    Assignee: NCR Corporation
    Inventors: Craig R. Nohl, Charles E. Stenard
  • Patent number: 5881169
    Abstract: A method and apparatus for presenting and gathering text entries in a pen based input device. The apparatus and method allows for the entry of textual information into a computing device using either handwriting recognition, character selection, or expression selection. Individual entry and selection fields for each of these methods are provided to the user in a coordinated comprehensive method of displaying and gathering the textual information. Handwriting recognition functionality is used to facilitate character recognition. Furthermore, lists of expressions allow the text entry method and apparatus to anticipate the next character or expression to be entered by the user.
    Type: Grant
    Filed: September 13, 1996
    Date of Patent: March 9, 1999
    Assignee: Ericsson Inc.
    Inventor: Raymond Charles Henry, Jr.
  • Patent number: 5875265
    Abstract: An image analyzing and expression adding apparatus for presenting an operator with image impressions displayed in terms of sensitive language and for allowing the operator to use stored design know-how in adding expressions to the image on display. An image segmenting unit segments an input image into a plurality of areas. An image feature storing unit stores the physical feature quantities of each of the segmented areas. The physical feature quantities are processed by an image analyzing unit preparing visual feature quantities about the entire image. A sensitive influence quantity computing unit receives the visual feature quantities thus prepared and, based on information from a design know-how storing unit, computes factors of the terms representing sensitivity. The operator is presented with the factors of the sensitivity-expressing terms. In response, the operator instructs desired expressions using the sensitivity-expressing terms through an expression instructing unit.
    Type: Grant
    Filed: June 18, 1996
    Date of Patent: February 23, 1999
    Assignee: Fuji Xerox Co., Ltd.
    Inventor: Atsushi Kasao
  • Patent number: 5867597
    Abstract: An improved document management system with high-speed retrieval by example retrieves a document attaching a target document, in whole or part, by comparing descriptors of documents. A descriptor is derived from a pattern of labels, where each label is associated with a character, or more precisely, a character bounding box. A bounding box is found by examining contiguous pixels in an image. The particular label associated with a bounding box depends on the value of a metric measured from that bounding box. In one system, the metric is the spacing between the bounding box and an adjacent bounding box, in which the labels approximately reflect a pattern of word lengths. In other systems, where words lengths are not present, the metric might be pixel density and the pattern of labels approximately reflect a pattern of denser characters and sparser characters.
    Type: Grant
    Filed: September 5, 1995
    Date of Patent: February 2, 1999
    Assignee: Ricoh Corporation
    Inventors: Mark Peairs, Jonathan Hull
  • Patent number: 5864635
    Abstract: Pre-recognition analysis on stroke characteristics such as the count, size and position of each stroke in real-time as it is drawn improves recognition accuracy. After each stroke, the set of strokes is weighted toward handwriting or gesture. The system uses a gesture threshold size to distinguish between gestures and handwriting. The system also uses the stroke count to distinguish between the two inputs, relying on the knowledge of the allowable number of strokes in a gesture. The count information may also be used in conjunction with the stroke size information to weight the set of strokes between gestures handwriting. Once the stroke size crosses a gesture vs. text size threshold, the result is weighted toward gestures. By examining the `white space` between strokes and the juxtaposition of the strokes, a gesture vs. text determination can be made with high accuracy.
    Type: Grant
    Filed: June 14, 1996
    Date of Patent: January 26, 1999
    Assignee: International Business Machines Corporation
    Inventors: John Mark Zetts, Maurice Roger Desrosiers
  • Patent number: 5862259
    Abstract: A pattern recognition system classifies images of patterns in which the definition of individual features of the pattern may have become blurred. The image is segmented into pieces of arbitrary size and shape, and various combinations are examined to determine those which represent the most likely segmentation of the pattern into its individual features. These individual features are then classified, according to known techniques. Through the use of a second order Markov model, not all possible combinations of pieces need to be examined, to determine the best ones. Rather, the examination of various combinations is limited in accordance with previously determined information, to thereby render the process more efficient. By combining multiple, independently determined probabilities, the accuracy of the overall operation is enhanced.
    Type: Grant
    Filed: March 27, 1996
    Date of Patent: January 19, 1999
    Assignee: Caere Corporation
    Inventors: Mindy Bokser, Leonard Pon, Jun Yang, Kenneth Choy
  • Patent number: 5862256
    Abstract: To distinguish between gestures and handwriting, the pen subsystem examines the size of the user's writing. The user may set the gesture versus text size according to his/her handwriting style. In this manner, the user knows exactly how large to make the gestures. If the user declines to customize the setting, the pen subsystem can query other user settings that indicate approximate size of the user's handwriting. A third choice available the system dynamically determine and track the size of handwriting. This would allow multiple users to serially use the computer without having to change settings.
    Type: Grant
    Filed: June 14, 1996
    Date of Patent: January 19, 1999
    Assignee: International Business Machines Corporation
    Inventors: John Mark Zetts, Maurice Roger Desrosiers
  • Patent number: 5850480
    Abstract: The present invention includes methods of correcting optical character recognition errors occurring during recognition of alphanumeric character strings contained within one or more predetermined types of alphanumeric character fields. The methods may be practiced with a document processing system having (1) a optical character recognition device for scanning documents and outputting bit-map image data; (2) a recognition engine for converting the bit-map image data into possibly correct alphanumeric characters with associated confidence values; and (3) at least one lexicon of character strings consisting of a list of at least a portion of all of the possible character string values for each of the fields being processed. The present invention corrects OCR errors by performing a contextual comparison analysis between the alphanumeric characters outputted from the recognition engine and the lexicon of character strings.
    Type: Grant
    Filed: May 30, 1996
    Date of Patent: December 15, 1998
    Assignee: Scan-Optics, Inc.
    Inventor: Edward Francis Scanlon
  • Patent number: 5848191
    Abstract: A method of automatically generating a thematic summary from a document image without performing character recognition to generate an ASCII representation of the document text. The method begins with decomposition of the document image into text blocks, and text lines. Using the median x-height of text blocks the main body of text is identified. Afterward, word image equivalence classes and sentence boundaries within the blocks of the main body of text are determined. The word image equivalence classes are used to identify thematic words. These, in turn are used to score the sentences within the main body of text, and the highest scoring sentences are selected for extraction.
    Type: Grant
    Filed: December 14, 1995
    Date of Patent: December 8, 1998
    Assignee: Xerox Corporation
    Inventors: Francine R. Chen, Dan S. Bloomberg, John W. Tukey
  • Patent number: 5835635
    Abstract: A method for providing an effective completion of characters required in inputting a partial character string.
    Type: Grant
    Filed: June 27, 1995
    Date of Patent: November 10, 1998
    Assignee: Interntional Business Machines Corporation
    Inventors: Hiroshi Nozaki, Nobuyasu Itoh
  • Patent number: 5825926
    Abstract: When character codes and character patterns coexist in the retrieval character string and/or the retrieval objective data, they are standardized into character patterns, and a character string matching with or similar to the retrieval character string is retrieved from the retrieval objective data. A character code included in the retrieval character string and/or the retrieval objective data is converted into a character pattern by referring to a prepared character pattern dictionary, and by pattern matching between two character patterns, a character string matching with or similar to the retrieval character string is picked up from the retrieval objective data and produced as the result of retrieval.
    Type: Grant
    Filed: April 12, 1995
    Date of Patent: October 20, 1998
    Assignee: Fujitsu Limited
    Inventor: Hiroshi Tanaka
  • Patent number: 5816717
    Abstract: A method of recalling stored labels is disclosed in which target data is provided by a user so that only labels having label data matching the target data are displayed.
    Type: Grant
    Filed: February 10, 1997
    Date of Patent: October 6, 1998
    Assignee: Esselte N.V.
    Inventors: Michael Andrew Beadman, Paul Robert Bridle
  • Patent number: 5812818
    Abstract: A translating facsimile machine includes a facsimile input receiver which feeds a facsimile input signal representative of a source language text to an optical character recognizer. The optical character recognizer converts the facsimile signal to a source natural language text signal which is then tested in a source language recognizer and, if recognized, is then processed by a translator which converts the source natural language signal to a target natural language signal. An output device such as a printer receives the target natural language signal and outputs it to a roll of paper or the like.
    Type: Grant
    Filed: November 17, 1994
    Date of Patent: September 22, 1998
    Assignee: Transfax Inc.
    Inventors: Richard Adler, Claude Richaud, Troy W. Livingston, Wayne D. Jung
  • Patent number: 5805730
    Abstract: A statistical classifier that can be used for pattern recognition is trained to recognize negative, or improper patterns as well as proper patterns that are positively associated with desired output classes. A set of training samples includes both the negative and positive patterns, and target output values for the negative patterns are set so that no recognized class is indicated. The negative patterns are selected for training with less frequency than the positive patterns, and their effect on training is also modified, so that training is focused more heavily on improper patterns.
    Type: Grant
    Filed: August 8, 1995
    Date of Patent: September 8, 1998
    Assignee: Apple Computer, Inc.
    Inventors: Larry S. Yaeger, Richard F. Lyon
  • Patent number: 5787197
    Abstract: A dictionary based post-processing technique for an on-line handwriting recognition system is described. An input word has all punctuation removed, and the word is checked against a word processing dictionary. If any word matches against the dictionary, it is verified as a valid word. If it does not verify, a stroke match function and a spell-aid dictionary are used to construct a list of possible words. In some cases, the list is appended with possible words based on changing the first character of the originally recognized word. A character-match score, a substitution score and a word length are assigned to the items on the list. A word hypothesis is constructed from the list with each such word being assigned a score. The word with the best score is chosen as the output word for the processor.
    Type: Grant
    Filed: March 28, 1994
    Date of Patent: July 28, 1998
    Assignee: International Business Machines Corporation
    Inventors: Homayoon Sadr Mohammad Beigi, Tetsunosuke Fujisaki, William David Modlin, Kenneth Steven Wenstrup
  • Patent number: 5774586
    Abstract: Groups of symbols to be recognized are standardized by fitting four flexible curves to the group of symbols. The curves are fitted by minimizing a cost or energy function that associates a cost with the curvature of the curves, the slant of the curves, the displacement in spacing between the curves and the distance of maxima and minima points from the curves. After the curves are fitted to the group of symbols, the symbols are standardized by transforming coordinates systems so that the fitted curves are placed in a predetermined configuration.
    Type: Grant
    Filed: May 4, 1994
    Date of Patent: June 30, 1998
    Assignee: NCR Corporation
    Inventor: Yann Andre LeCun
  • Patent number: 5774588
    Abstract: A system and method for more efficiently comparing an unverified string to a lexicon, which filters the lexicon through multiple steps to reduce the number of entries to be directly compared with the unverified string. The method begins by preparing the lexicon with an n-gram encoding, partitioning and hashing process, which can be accomplished in advance of any processing of unverified strings. The unknown is compared first by partitioning and hashing it in the same way to reduce the lexicon in a computationally inexpensive manner. This is followed by an encoded vector comparison step, and finally by a direct string comparison step, which is the most computationally expensive. The reduction of the lexicon is accomplished without arbitrarily eliminating any large portions of the lexicon that might contain relevant candidates. At the same time, the method avoids the need to compare the unverified string directly or indirectly with all the entries in the lexicon.
    Type: Grant
    Filed: June 7, 1995
    Date of Patent: June 30, 1998
    Assignee: United Parcel Service of America, Inc.
    Inventor: Liang Li
  • Patent number: 5774587
    Abstract: Recognition of various images uses, a presentation of reference (standard) objects transformed into corresponding discretized reference signals without repeating quantization levels. For a set of these signals, a dynamical system is formed of the kind of iterable mapping of d-dimensional cube into itself with limit cycles by the number of reference signals. Representations of an object under identification are transformed into information signals that are applied to the input of the dynamical system as initial conditions and by the results of its functioning a decision is made on recognition of the object under identification. Distinctive features of the method are the conditions of looking through the values of information at the input of the dynamical system and the conditions of determining the correspondence of the information signal to one of the reference signals by the phase trajectory of the dynamical system getting to one of the limit cycles in the process of its functioning.
    Type: Grant
    Filed: December 1, 1995
    Date of Patent: June 30, 1998
    Inventors: Alexander Dmitriev, Yury Andaeev, Yury Belsky, Dmitry Kuminov, Andzei Panas, Sergai Starkov
  • Patent number: 5764799
    Abstract: An OCR 300 stores signals representative of reference characters and scans a document 302 to generate a bit mapped digitized image of the document. After the characters and the words are recognized and candidate characters are identified, the initial results are post-processed to compare clusters of identical images to the candidates. Where the candidates of all equivalent images in a cluster are the same, the candidates are output as representative of the image on the document. Where the candidates are different, a majority of identical candidates determines the recognized candidates. Other post-processing operations include verification and re-recognition.
    Type: Grant
    Filed: June 26, 1995
    Date of Patent: June 9, 1998
    Assignee: Research Foundation of State of State of New York
    Inventors: Tao Hong, Jonathan J. Hull
  • Patent number: 5761538
    Abstract: An improved method of matching a query string against a plurality of candidate strings replaces a highly computationally intensive string edit distance calculation with a less computationally intensive lower bound estimate. The lower bound estimate of the string edit distance between the two strings is calculated by equalising the lengths of the two strings by adding padding elements to the shorter one. The elements of the strings are then sorted and the substitution costs between corresponding elements are summed.
    Type: Grant
    Filed: July 10, 1995
    Date of Patent: June 2, 1998
    Assignee: Hewlett-Packard Company
    Inventor: Richard Hull
  • Patent number: 5757983
    Abstract: A document retrieval method and system for retrieving, from a document database storing document data in the form of character codes, a document which contains given search terms and which meets a given search query condition. From documents loaded from the document database, a document containing terms which match the search terms is searched to generate document identification (ID) information including a document identifier of the searched document and containing match terms found to match with the search terms as well as term identifiers of the match terms and position information of the match terms in the searched document. A decision is then made as to whether or not the position information of the match terms satisfies a positional condition specified in the search query condition concerning a positional relation between the search terms, and match information is then generated indicating satisfaction of the search query condition when the positional condition is satisfied.
    Type: Grant
    Filed: August 21, 1995
    Date of Patent: May 26, 1998
    Assignee: Hitachi, Ltd.
    Inventors: Hisamitsu Kawaguchi, Mitsuru Akizawa, Kanji Kato, Atsushi Hatakeyama, Hiromichi Fujisawa