Context Analysis Or Word Recognition (e.g., Character String) Patents (Class 382/229)
  • Publication number: 20020102025
    Abstract: The present invention provides a facility for selecting from a sequence of natural language characters combinations of characters that may be words. The facility uses indications, for each of a plurality of characters, of (a) the characters that occur in the second position of words that begin with the character and (b) the positions in which the character occurs in words. For each of a plurality of contiguous combinations of characters occurring in the sequence, the facility determines whether the character occurring in the second position of the combination is indicated to occur in words that begin with the character occurring in the first position of the combination. If so, the facility determines whether every character of the combination is indicated to occur in words in a position in which it occurs in the combination. If so, the facility determines that the combination of characters may be a word.
    Type: Application
    Filed: May 29, 1998
    Publication date: August 1, 2002
    Inventors: ANDI WU, STEPHEN D. RICHARDSON, ZIXIN JIANG
  • Publication number: 20020097915
    Abstract: The capacity of a character feature dictionary is reduced, and stored as a feature dictionary. The capacity is reduced by clustering feature vectors in units of columns or rows for character features, by making m column vectors represent the column or row features, and by assigning 1 to m identification numbers. The capacity of the dictionary can be further reduced by representing a column or row feature with an addition sum of other column or row features, or differential features after clustering is performed, or by performing dimension compression for character features. Word recognition is performed by synthesizing a word feature for a comparison based on a word list to be recognized, and by making a comparison between a feature extracted from an input word and the synthesized feature. Or, a comparison between input word and input word features whose numbers of dimensions are different may be made with nonlinear elastic matching.
    Type: Application
    Filed: September 12, 2001
    Publication date: July 25, 2002
    Applicant: Fujitsu Limited
    Inventor: Yoshinobu Hotta
  • Publication number: 20020094133
    Abstract: A device and a method for recording of text by imaging the text on a light-sensitive sensor (8). The device converts the images (14-17, 47-49) into a set of characters (50, 51) each using character recognition, and then assembling the acts of characters (50, 51) with the aid of the characters.
    Type: Application
    Filed: November 13, 2001
    Publication date: July 18, 2002
    Inventors: Markus Andreasson, Per Astrand
  • Patent number: 6418239
    Abstract: A method and mechanism for displaying partial results of full context handwriting recognition. As handwritten characters are entered into a system, a shape matcher associates the character with a plurality of alternate code points, with each alternate code point having probability information associated therewith. The alternate code points are placed at the end of a queue, and a cost is determined from each alternate code point to any immediately preceding alternate in the queue. The cost is based on the probability information of the alternates and a transition cost therebetween. Then, the lowest cost path back from each of the alternates at the end of the queue to an alternate at the beginning of the queue is determined. If each lowest cost path back converges to a common alternate in the queue, the common alternate and any previous alternates on the path back are recognized as the code points for each of the handwritten characters associated therewith.
    Type: Grant
    Filed: May 16, 2000
    Date of Patent: July 9, 2002
    Assignee: Microsoft Corporation
    Inventors: Gregory N. Hullender, Patrick M. Haluptzok
  • Publication number: 20020076112
    Abstract: A method of program classification based on syntax of transcript information includes receiving transcript information associated with the program wherein the transcript information has a plurality of sentences, determining characteristics of at least one of the plurality of sentences of the transcript information to identify at least the type and subject of the sentence, comparing the characteristics of the at least one of the plurality of sentences with a list of sentence characteristics having associated therewith a plurality of program types, and based on the comparing step, selecting a classification of program which is most closely associated with the characteristics of the at least one of the plurality of sentences.
    Type: Application
    Filed: December 18, 2000
    Publication date: June 20, 2002
    Applicant: PHILIPS ELECTRONICS NORTH AMERICA CORPORATION
    Inventor: Kavitha Vallari Devara
  • Publication number: 20020076111
    Abstract: Following scanning of a document image, and optical character recognition (OCR) processing, the outputted OCR text is processed to determine a text format (typeface and font size) to match the OCR text to the originally scanned image. The text format is identified by matching word sizes rather than individual character sizes. In particular, for each word and for each of a plurality of candidate typefaces, a scaling factor is calculated to match a typeface rendering of the word to the width of the word in the originally scanned image. After all of the scaling factors have been calculated, a cluster analysis is performed to identify close clusters of scaling factors for a typeface, indicative of a good typeface fit at a constant scaling factor (font size).
    Type: Application
    Filed: December 18, 2000
    Publication date: June 20, 2002
    Applicant: Xerox Corporation
    Inventors: Christopher R. Dance, Mauritius Seeger
  • Publication number: 20020076110
    Abstract: The present invention provides a method and system for efficient information storage and retrieval of information. The method includes the steps of: scanning/selecting/capturing a selected portion of text of the information wherein the selected portion of text scanned is typically a close-to-unique identifier of the text from which the portion was excerpted and serves as a key when the information is accessed electronically; and placing the key in an electronically available index/directory to facilitate retrieval of the information. The method may further include retrieving and storing the information associated with the key and using it to index, organize, and make available for search and retrieval the full information originally viewed by the user.
    Type: Application
    Filed: December 15, 2000
    Publication date: June 20, 2002
    Inventor: Pieter J. van Zee
  • Publication number: 20020076109
    Abstract: A method and apparatus for providing an interpreter that is operable to recognize whether text displayed on a display device belongs to a pre-defined type of text, and to present the user with an option for performing a context sensitive operation when an interpreter recognizes the input text as belonging to the pre-defined type of text. The input text is generated based on a user selecting text, a portion of text, an object, or a portion of an object displayed on a display device that is part of a computer system. Alternatively, an interpreter automatically recognizes when displayed text belongs to a pre-defined type of text. A wide variety of interpreters may be included, wherein each interpreter has at least one corresponding pre-defined type of text and at least one corresponding context sensitive operation.
    Type: Application
    Filed: January 25, 1999
    Publication date: June 20, 2002
    Inventors: ANDY HERTZFELD, DARIN BENJAMIN ADLER
  • Publication number: 20020073029
    Abstract: A system and method of authorizing an electronic commerce transaction between a purchaser using a credit card, an on-line merchant, and a credit card company. A server associated with the merchant receives a purchase request from the purchaser that includes a purchase amount and the purchaser's credit card information. A SIP multi-party conference is established, and information is shared among the three parties through a multicast procedure. A Web camera may take an image of the purchaser when the purchaser sends the purchase request to the merchant. A whiteboard application is used to capture an image of the purchaser's signature. The credit card company verifies the credit card information, authorizes the purchase amount, and validates the purchaser's image and signature utilizing an image recognition program and a database of valid cardholder images and signatures.
    Type: Application
    Filed: December 12, 2000
    Publication date: June 13, 2002
    Applicant: Telefonaktiebolaget LM Ericsson (publ)
    Inventors: Daniela Cheaib, Roch Glitho
  • Publication number: 20020067859
    Abstract: A system for producing a raster image derived from coded and non-coded portions of a hybrid data structure from an input bitmap including (1) a data processing apparatus, (2) a recognizer which performs recognition on an input bitmap to the data processing apparatus to detect identifiable objects within the input bitmap, (3) a mechanism for producing a hybrid data structure including coded data corresponding to the identifiable objects and non-coded data derived from portions of the input bitmap which do not correspond to the identifiable objects, and (4) an output device capable of developing a visually perceptible raster image derived from the hybrid data structure. The raster image includes raster images of the identifiable objects and raster images derived from portions of the input bitmap that do not correspond to the identifiable objects.
    Type: Application
    Filed: January 25, 2002
    Publication date: June 6, 2002
    Applicant: Adobe Systems, Inc., a California corporation
    Inventors: Dennis G. Nicholson, James C. King
  • Patent number: 6396951
    Abstract: To obtain a query for use in information retrieval, a document is scanned. The resulting text image data define an image of a segment of text in a first language. Automatic recognition is then performed on at least part of the text image data to obtain text code data including a series of element codes. Each element code indicates an element that occurs in the first language, and the series of element codes defines a set of expressions that also occur in the first language. Automatic translation is then performed on a version of the text code data to obtain translation data indicating a set of counterpart expressions in a second language. The counterpart expressions are used to automatically obtain query data defining the query. The query can then be provided to an information retrieval engine.
    Type: Grant
    Filed: December 23, 1998
    Date of Patent: May 28, 2002
    Assignee: Xerox Corporation
    Inventor: Gregory Grefenstette
  • Publication number: 20020057842
    Abstract: A method of handwriting recognition encourages the entry of an entire word, and presents the “most likely” word or words. A “look-ahead” mode of operation is implemented, wherein most probable word or words corresponding to the entered letters are identified in a dictionary; and presented to the user in such a way that the user may discontinue the entry of further letters if one the words identified in the dictionary matches the desired word. The determination of the most likely word or words may be based on a combination of one or more criteria, including the characters themselves, the length of the word, the relative placement of the recognized characters within the word, and so forth. The result may also be presented in various ways singly or in combination according to the invention. In addition to a presentation of the highest probable word, the ‘n’ highest probable words may be presented.
    Type: Application
    Filed: June 1, 2001
    Publication date: May 16, 2002
    Inventor: Henry C. Yuen
  • Publication number: 20020054693
    Abstract: The present invention encompasses a self-orthogonal character recognition engine for executing an iterative method employing a database of predetermined character strings. The method receives a digital representation of a character string. It then generates a proposed result string by applying to the captured digital image a predetermined recognition routine including one or more recognition subroutines. Each recognition subroutine employs an initial parameter setting. Next, if the proposed result string does not match any of the predetermined character strings in the database, the initial parameter setting of a recognition subroutine is changed to a next setting. The recognition process is then repeated using the next parameter setting to generate and test a next result string. The process can be repeated iteratively until a result string is verified or the process times out.
    Type: Application
    Filed: July 30, 2001
    Publication date: May 9, 2002
    Inventor: Brian J. Elmenhurst
  • Patent number: 6385350
    Abstract: A system for producing a raster image derived from a data structure including a data processing apparatus, a recognizer which performs recognition on an input bitmap to the data processing apparatus to detect identifiable objects within the input bitmap, a mechanism for producing a hybrid data structure including coded data corresponding to the identifiable objects and to non-identifiable objects and the input bitmap, and an output device capable of developing a visually perceptible raster image derived from the input bitmap in the hybrid data structure. The raster image is derived from the input bitmap and thus includes no misrecognition errors.
    Type: Grant
    Filed: January 6, 1998
    Date of Patent: May 7, 2002
    Assignee: Adobe Systems Incorporated
    Inventors: Dennis G. Nicholson, James C. King, David M. Emmett
  • Patent number: 6373985
    Abstract: A technique analyzing loosely constrained text blocks, such as e-mail signature blocks by performing a two-dimensional geometrical analysis and a one-dimensional language analysis in order to classify sub-blocks at the loosely constrained text block into particular functional classes. The present technique may also be utilized to identify a personal name from a user name in a loosely constrained text block, such as an e-mail signature block.
    Type: Grant
    Filed: August 12, 1998
    Date of Patent: April 16, 2002
    Assignee: Lucent Technologies, Inc.
    Inventors: Jianying Hu, Richard W. Sproat, Hao Chen
  • Patent number: 6373982
    Abstract: Apparatus and method for improving recognition of patterns such as alphanumeric characters. A known recognition system is expanded to further include a complementary recognition system which is linked with the primary recognition system. An image that can not be positively recognized by the primary recognition system is passed on to the complementary recognition system and any characters not positively recognized by the complementary recognition are again passed on to a correction system. At the correction system, an operator classifies unrecognized characters which are then used to teach the complementary recognition system. Thus, the classified data of the correction system provide the training data for a continuous training process which is coupled with the correction system by a pattern adaptation system.
    Type: Grant
    Filed: May 7, 1999
    Date of Patent: April 16, 2002
    Assignee: International Business Machines Corporation
    Inventors: Udo Maier, Werner Ruppert
  • Publication number: 20020041713
    Abstract: A search apparatus searches for a keyword from a character recognition result using an index table. The character recognition result being obtained as a result of character recognition of characters in an original document. The index table includes an index character string; a position of a portion, in the character recognition result, which matches the index character string; and a credibility which is defined for each character included in the index character string and indicates a probability of the character existing in a portion, in the original document, which corresponds to a portion, in the character recognition result, which matches the character.
    Type: Application
    Filed: June 6, 2001
    Publication date: April 11, 2002
    Inventors: Taro Imagawa, Kenji Kondo, Yoshihiko Matsukawa, Tsuyoshi Mekata
  • Patent number: 6366698
    Abstract: A user of a portable terminal writes sentences to be transmitted as an e-mail, a mail address, and information indicating that service a host device is requested to provide is “mail transmission” on paper as a memo. By using an image input unit installed in the portable terminal, the paper or the like is imaged. The portable terminal transmits the image data thus taken in to the host device. The host device analyzes received image data by using an image recognition unit. Upon recognizing by characters that the service requested by the portable terminal is “mail transmission”, the host device starts a mail transmitting/receiving unit. The mail transmitting/receiving unit transmits the content of the written memo included in the image data to a terminal specified by the mail address included in the image data.
    Type: Grant
    Filed: March 6, 1998
    Date of Patent: April 2, 2002
    Assignee: Casio Computer Co., Ltd.
    Inventor: Tooru Yamakita
  • Publication number: 20020031269
    Abstract: A named entity discriminating system capable of discriminating names entities such as location names, personal names, and organization names in text with a high degree of accuracy is provided. A reading means reads text from a hypertext database. A single text analyzing means analyzes each text read by the reading means and detects candidates for the named entity in the text. A complex text analyzing means estimates the likelihood of the candidate named entity detected by the single text analyzing means by an analysis with reference to referring link text or linked text of the text in which the candidate named entity appears.
    Type: Application
    Filed: September 7, 2001
    Publication date: March 14, 2002
    Applicant: NEC CORPORATION
    Inventor: Toshikazu Fukushima
  • Publication number: 20020031270
    Abstract: An image processing apparatus for changing a layout of a character string and/or a drawing contained in image data is disclosed. The apparatus includes a first detection means, a second detection means, a change means, a recognition means, and a replacing means. The first detection means detects a directive word, which is a character string that indicates a drawing position. The second detection means detects a drawing whose position is indicated by the directive word. The change means changes a layout of the character string and/or the drawing position. The recognition means recognizes positional relation between the directive word and the drawing after a layout change. The replacing means replaces the directive word based on the positional relation.
    Type: Application
    Filed: August 30, 2001
    Publication date: March 14, 2002
    Inventor: Tsutomu Yamazaki
  • Patent number: 6349147
    Abstract: A method of finding a Chinese character in an electronic dictionary. The method includes sorting the characters in the dictionary into three groups according to stroke type: horizontal, vertical and slant, identifying which group a character belongs to based on the first writing stroke of the character, locating an original root of the Chinese character from the identified group based on a first three writing strokes of the Chinese character and finding the Chinese character in the dictionary based on the first three writing strokes of the Chinese character that immediately follow the strokes of the located original root.
    Type: Grant
    Filed: January 31, 2000
    Date of Patent: February 19, 2002
    Inventors: Gim Yee Pong, Wai Jean Pong
  • Publication number: 20020012468
    Abstract: This invention provides a camera image recognition apparatus capable of moving a camera to read a wide region of a document at a high precision and easily correcting an erroneously recognized portion. The shift amount of the character string image of a document image to be compared is calculated for each sensed document image from the character string image of a specific document image among a plurality of sensed document images. When the calculated shift amount reaches a predetermined amount, a new character image in the character string image of a document image whose shift amount reaches the predetermined amount is composited to the character string image of the specific document image, thereby generating a document image.
    Type: Application
    Filed: June 28, 2001
    Publication date: January 31, 2002
    Applicant: KABUSHIKI KAISHA TOSHIBA
    Inventors: Yuuichi Togashi, Takayasu Tsuchiuchi
  • Patent number: 6341176
    Abstract: A character recognizing apparatus has a post-processing unit which makes character strings including a plurality of conversion candidates, respectively, made by a character recognizing unit, and a full text searching unit performs a full text search for the character strings in a plurality of documents having been converted into text data, whereby the post-processing unit determines a correct character on the basis of results of the search to correct misrecognition.
    Type: Grant
    Filed: November 13, 1997
    Date of Patent: January 22, 2002
    Assignee: Matsushita Electric Industrial Co., Ltd.
    Inventors: Yasuyo Shirasaki, Tomoko Tanabe, Chuichi Kikuchi
  • Patent number: 6333999
    Abstract: A method, apparatus, article of manufacture, and a memory structure for generating a sequence of character strings is disclosed. This is accomplished by making multiple consecutive variations of a single character-string pattern where each variation differs from the previous one in a systematic way. The pattern is not restricted in form or in content, and the variations can include adding, deleting, and modifying lines and strings in the pattern, and making no change. This is implemented by executing production rules on a template. In one embodiment, the method herein described begins by defining a template comprising a plurality of template character strings, and defining an ordered series of production rules, each with a comparison field, an operator, and a substitution field comprising a list of elements defining an initial value and a current value. The current value of each production rule is set to the initial value, and the production rules are applied to the template one at a time in execution order.
    Type: Grant
    Filed: November 6, 1998
    Date of Patent: December 25, 2001
    Assignee: International Business Machines Corporation
    Inventor: Joseph D. Brownsmith
  • Patent number: 6320983
    Abstract: A character recognition apparatus, in which a type of each of accounts, a type of each of marks, and characteristic data indicating a string of characters recognized when any of the marks is written in each account are previously registered for each group of the types of accounts in correlation to each other in an account-type database; a mark is estimated according to a result of verification between the account-type database and a result of ordinary character recognition according to a character recognizing program; and a type of account having the estimated mark is determined as a selected account name according to a result of recognizing characters in accounts other than the account having the mark.
    Type: Grant
    Filed: November 3, 1998
    Date of Patent: November 20, 2001
    Assignee: Fujitsu Limited
    Inventors: Hideki Matsuno, Shinichi Eguchi, Yoshihiro Nagano, Koichi Chiba, Katsutoshi Kobara
  • Publication number: 20010033694
    Abstract: A method of automatically recognizing text. The text is divided into whole words which are each recognize. Each whole word is characterized according to its silhouette. The silhouette is characterized by features in the silhouette such as upwardly extending “polls” and downwardly extending “holes”. The silhouette may also be characterized by its first syllable blends. Numbers are assigned to each of the different characteristics, and numbers may also be assigned based on analysis of a database of different kinds of cursive words. Recognition may be automatically carry out prior recognizing system which recognizes in this way.
    Type: Application
    Filed: January 19, 2001
    Publication date: October 25, 2001
    Inventors: Rodney M. Goodman, Donald J. Woods, Patricia A. Keaton, Joseph Chen
  • Publication number: 20010031088
    Abstract: In a word string collating method for collating an input word string and address data in an address dictionary when a word string using part of a plurality of words of a word string such as address information is extracted from the result of character recognition for the word string including the plurality of words, words of the input word string and words used as the address data in the address dictionary are set to correspond to each other, distances between the words are derived based on similarities between the words which are set into the correspondence relations, the positional relation of each word of the input word string which is set into the correspondence relation is derived, an evaluated value is derived based on the thus derived positional relation and the distance between the words which are set in the correspondence relation, and a partial word string extracted from the input word string is determined based on the evaluated value.
    Type: Application
    Filed: April 4, 2001
    Publication date: October 18, 2001
    Inventor: Naotake Natori
  • Publication number: 20010031087
    Abstract: A communications system for rendering image based data includes a data interface, a display device, and a data manager. The data interface receives image based data that is used by the display device to display an image. The data manager identifies word blocks defined by the received data. The data manager uses the word blocks to define a first row of the image. In this regard, the data manager determines whether images respectively defined by each of the word blocks would be visible if the word blocks are rendered to the first row of the display screen. In response to a determination that an image associated with one of the word blocks would not be visible if the one word block is rendered to the first row of the display screen, the data manager defines a second row and renders the one word block to the second row.
    Type: Application
    Filed: April 12, 1999
    Publication date: October 18, 2001
    Inventor: FRANK P. CARAU
  • Publication number: 20010028742
    Abstract: An object of the present invention is to provide a character recognition apparatus for inferring the entire character string solely from a user-input handwritten keyword and displaying the inferred result as a candidate character string.
    Type: Application
    Filed: February 22, 2001
    Publication date: October 11, 2001
    Inventors: Keiko Gunji, Koyo Katsura, Soshiro Kuzunuki, Masaki Miura, Toshimi Yokota
  • Publication number: 20010026640
    Abstract: An image processing device and a computer program product capable of accurately determining a user-desired region even when a region has been only roughly marked by a user, wherein a specific region within an image to be processed is detected; the image to be processed is allocated into a plurality of blocks; text included in the image to be processed is recognized; it is determined based on a result of text recognition that presence and absence of relevance between a first block which is partially included in the specific region and a second block which is entirely included in the specific region among the allocated blocks; and it is determined whether or not an image of the first block should be treated as an image belonging to the specific region in accordance with a result of determination as to the relevance.
    Type: Application
    Filed: March 16, 2001
    Publication date: October 4, 2001
    Inventor: Hideyuki Toriyama
  • Patent number: 6298159
    Abstract: There are provided a method and device for forming a character string image in a predetermined image area based on a plurality of character images each occupying an area for one character, each of the character images being formed of an actual character image and blank images arranged on horizontally opposite sides of the actual character image in a manner immediately adjacent thereto. Actual character images are taken out from the character images, respectively. The thus taken-out actual character images are arranged in the predetermined image area according to a desired sequence, to thereby form the character string image. The thus formed character string image is handled as an equivalent to an image of one character.
    Type: Grant
    Filed: November 9, 1998
    Date of Patent: October 2, 2001
    Assignees: Seiko Epson Corporation, King Jim Co., Ltd.
    Inventors: Shinichi Tukagoshii, Kenji Watanabe, Tomoyuki Shimmura
  • Patent number: 6298158
    Abstract: The invention comprises a method and system of recognition and translation, stored on a digital storage device with an operating system and running computer applications, such as a personal computer, which recognizes input by the human computer user and transmits output to the human user, which performs non-optical and optical character recognition of characters displayed on the output device of the digital storage device, which automatically recognizes and translates phrases contiguous to and including the phrase upon which the System is activated and which translates words from one written phrase set to a second written phrase set.
    Type: Grant
    Filed: September 25, 1997
    Date of Patent: October 2, 2001
    Assignee: Babylon, Ltd.
    Inventors: Ofer Egozi, Ovadia Amnon
  • Publication number: 20010019629
    Abstract: A word recognition device uses an associative memory to store a plurality of coded words in such a way that a weight is associated with each character of the alphabet of the stored words, wherein equal weights correspond to equal characters. To perform the recognition, a dictionary of words is first chosen; this is stored in the associative memory according to a pre-determined code; a string of characters which correspond to a word to be recognized is received; a sequence of weights corresponding to the string of characters received is supplied to the associative memory; the distance between the word to be recognized and at least some of the stored words is calculated in parallel as the sum of the difference between the weights of each character of the word to be recognized and the weights of each character of the stored words; the minimum distance is identified; and the word stored in the associative memory having the minimum distance is stored.
    Type: Application
    Filed: February 12, 1998
    Publication date: September 6, 2001
    Inventors: LORIS NAVONI, ROBERTO CANEGALLO, MAURO CHINOSI, GIOVANNI GOZZINI, ALAN KRAMER, PIERLUIGI ROLANDI
  • Publication number: 20010016074
    Abstract: In word recognition using the character recognition result, recognition processing is performed for an input character string that corresponds to a word to be recognized, a probability at which characteristics obtained as the result of character recognition are generated by conditioning characters of words contained in a word dictionary that stores in advance candidates of words to be recognized. The thus obtained probability is divided by a probability at which characteristics obtained as the result of character recognition are generated, and each of the division results obtained relevant to the characters of the words contained in the word dictionary is multiplied relevant to all the characters. The recognition results of the above words are obtained based on the multiplication results.
    Type: Application
    Filed: January 26, 2001
    Publication date: August 23, 2001
    Inventor: Tomoyuki Hamamura
  • Patent number: 6269189
    Abstract: Selected character strings are automatically found by performing an automatic search of a text to find character strings that match any of a list of selected strings. The automatic search includes a series of iterations, each with a starting point in the text. Each iteration determines whether its starting point is followed by a character string that matches any of the list of selected strings and that ends at a probable string ending. Each iteration also finds a starting point for the next iteration that is a probable string beginning. The selected strings can be words and multiple word expressions, in which case probable string endings and beginnings are word boundaries. A finite state lexicon, such as a finite state transducer or a finite state automation, can be used to determine whether character strings match the list of selected strings. A tokenizing automation can be used to find starting points.
    Type: Grant
    Filed: December 29, 1998
    Date of Patent: July 31, 2001
    Assignee: Xerox Corporation
    Inventor: Jean-Pierre Chanod
  • Patent number: 6269188
    Abstract: The present invention is a computer-implemented method for calculating word accuracy. Word grouping accuracy values (260) are calculated (212) by using the character accuracy values (250) calculated by an OCR program present in a computer system. The present invention preferably uses these character accuracy values (250) to create a word grouping accuracy value (260). Various methods are employed to calculate the word accuracy (260), including binarizing the character accuracy values (250), modified averaging of the character accuracy values (250), and creating fuzzy visual displays of word grouping accuracy values (260). The calculated word grouping accuracy values (260) are then adjusted based upon known OCR strengths and weaknesses, and based upon comparisons to stored word lists and the application of language rules. In a system with multiple character recognition techniques, the system can compare the accuracy values (260) of different versions of the word groupings to find the most accurate version.
    Type: Grant
    Filed: March 12, 1998
    Date of Patent: July 31, 2001
    Assignee: Canon Kabushiki Kaisha
    Inventor: Hamadi Jamali
  • Patent number: 6266445
    Abstract: A sample image (142) is recognized by normalizing (404) the size of a sample image (142) to the size of a referent images (146); and determining (406) a set of candidate images (147) from a set of referent images (146), wherein each of the candidate images (147) is within an acceptable distance from a different binarization (145) of the sample image (142). A system (120) for image recognition includes a scanning device (126), a normalization unit (134), a distance calculation unit (136), a classification unit (138), a disambiguation unit (140), and a display device (128).
    Type: Grant
    Filed: March 13, 1998
    Date of Patent: July 24, 2001
    Assignee: Canon Kabushiki Kaisha
    Inventors: Radovan V. Krtolica, Roger D. Melen
  • Patent number: 6252988
    Abstract: An adaptive OCR technique for character classification and recognition without the input and use of ground truth derived from the image itself. A set of so-called stop words are employed for classifying symbols, e.g., characters, from any image. The stop words are identified independent of any particular image and are used for classification purposes across any set of images of the same language, e.g., English. Advantageously, an adaptive OCR method is realized without the requirement of the selection and inputting of ground truth from each individual image to be recognized.
    Type: Grant
    Filed: July 9, 1998
    Date of Patent: June 26, 2001
    Assignee: Lucent Technologies Inc.
    Inventor: Tin Kam Ho
  • Patent number: 6246794
    Abstract: A character reading method has enhanced character segmentation accuracy and character string recognition accuracy for reading correctly hand-written addresses on postal matters. The method extracts provisional character patterns from image information of the address character string (step 206), creates a table 219 of tentative character patterns and implements the character classification for the tentative character patterns (step 207), extracts, specifically for characters of the street number portion of the address character string, periphery information (vertical and horizontal lengths, vertical/horizontal length ratio, pattern spacings, etc.) of tentative character patterns (step 212), and segments the character string into characters accurately based on the information (step 215).
    Type: Grant
    Filed: December 11, 1996
    Date of Patent: June 12, 2001
    Assignee: Hitachi, Ltd.
    Inventors: Tatsuhiko Kagehiro, Masashi Koga, Hiroshi Sako, Hiromichi Fujisawa, Hisao Ogata, Yoshihiro Shima, Shigeru Watanabe, Masato Teramoto
  • Patent number: 6236768
    Abstract: Documents stored in a database are searched for relevance to contextual information, instead of (or in addition to) similar text. Each stored document is indexed in term of meta-information specifying contextual information about the document. Current contextual information is acquired, either from the user or the current computational or physical environment, and this “meta-information” is used as the basis for identifying stored documents of possible relevance.
    Type: Grant
    Filed: May 1, 1998
    Date of Patent: May 22, 2001
    Assignee: Massachusetts Institute of Technology
    Inventors: Bradley J. Rhodes, Thad E. Starner, Pattie E. Maes, Alex P. Pentland
  • Patent number: 6219449
    Abstract: Characters are input to input means to be turned to electronic data, character variation of each input character is expected based on the position of the character in a word when the character is hand-written, based on information necessary for determining priority of recognition results stored in storing means, priority of the recognition results based on the expected character variation is determined by priority processing means, the character is recognized by recognizing means based on the priority, and the result is output to output means.
    Type: Grant
    Filed: June 21, 1996
    Date of Patent: April 17, 2001
    Assignees: ATR Auditory, Visual Perception Research Laboratories
    Inventor: Michihiro Nagaishi
  • Patent number: 6219453
    Abstract: A method and apparatus for correcting misrecognized words appearing in electronic documents that have been generated by scanning an original document in accordance with an optical character recognition (“OCR”) technique. Each recognized word is generated by first producing, for each character position of the corresponding word in the original document, the N-best characters for occupying that character position. If an incorrect word is found in the electronic document, the present invention generates a plurality of reference words from which one is selected for replacing the incorrect word. This selected reference word is determined by the present invention to be the reference word that is the most likely correct replacement for the incorrect recognized word. This selection is accomplished by computing for each reference word a replacement word value. The reference word that is selected to replace the incorrect recognized word corresponds to the highest replacement word value.
    Type: Grant
    Filed: August 11, 1997
    Date of Patent: April 17, 2001
    Assignee: AT&T Corp.
    Inventor: Randy G. Goldberg
  • Patent number: 6212497
    Abstract: The word processor of the present invention comprises: a voice inputting device for inputting spoken word and converting the spoken word into voice data; a voice storage device for storing the voice data; a speech recognition device for recognizing a word in the voice data output from the voice inputting device or the voice data stored by the voice storage device; a display for displaying a result obtained by the voice recognition device; an instruction inputting device for inputting an instruction to select a portion in the result; and a correction device for correcting the portion in the result according to the instruction from the instruction inputting device.
    Type: Grant
    Filed: November 24, 1998
    Date of Patent: April 3, 2001
    Assignee: NEC Corporation
    Inventors: Nobumasa Araki, Jun Noguchi, Mitsuru Nishiura
  • Patent number: 6212299
    Abstract: A document with a plurality of characters is read, a binary document image is produced, and a character rectangle circumscribed about a mass of black pixels connected with each other (called a black-pixel mass) is produced for each black-pixel mass. The character rectangles are classified into a plurality of groups on condition that one or more character rectangles in one group are circumscribed about one or more black-pixel masses having the same character pattern. The character rectangles in each group are circumscribed about images of the same character. Thereafter, a figure feature of a representative character image in each classified group of character rectangles is compared with each of referential character patterns. Therefore, the character images for the character rectangles circumscribing one of non-separating characters are recognized as one non-separating character.
    Type: Grant
    Filed: March 12, 1997
    Date of Patent: April 3, 2001
    Assignee: Matsushita Electric Industrial Co., Ltd.
    Inventor: Ryoichi Yuge
  • Patent number: 6198840
    Abstract: A recording apparatus includes a color table for storing color information corresponding 1:1 to color information codes. The apparatus executes color recording by providing color information corresponding to the color information code, to the given character data or image data. The apparatus features a function of expanding the color information, and a function of confirming the color information.
    Type: Grant
    Filed: November 27, 1996
    Date of Patent: March 6, 2001
    Assignee: Canon Kabushiki Kaisha
    Inventors: Shunya Mitsuhashi, Shuichi Kumada
  • Patent number: 6185338
    Abstract: A character recognition method for recognizing characters on an article having multiple character-bearing areas, such as a license plate, first involves obtaining image data from an image of the article. The method then assigns at least one parameter to a selected character-bearing area on the article. The method then attempts to obtain a correct frame which expresses the correct positional relationship between the selected character-bearing area on the article with other character-bearing areas of the article, and then uses that correct frame to perform character recognition with respect to each of the character-bearing areas of the article. To obtain the correct frame, the invention compares the image data of the article with plural candidate frames. The plural candidate frames are calculated using the predetermined positional correlation between (1) the selected character-bearing area [as represented by the at least one parameter] and (2) other character-bearing areas of the article.
    Type: Grant
    Filed: March 21, 1997
    Date of Patent: February 6, 2001
    Assignee: Sharp Kabushiki Kaisha
    Inventor: Mitsuaki Nakamura
  • Patent number: 6167367
    Abstract: A method and device for automatic error detection and correction for computerized text files uses a two-step segmentation method. A sentence of the computerized text file is first segmented at the first segmentation step into an original format and then converted into a correct sentence in the second segmentation step. In the first segmentation step the original sentence is segmented into a series of characters and the characters are analyzed so that the original phonetic or pictographic codes of the characters are revealed. The sentence in the original format is then converted into a series of phonetic representative codes and/or pictographic representative codes. Words consisting the sentence are then selected from a lexicon to reconstruct the sentence. The reconstructed sentence is then segmented again so that the errors in the original sentence are detected and corrections thereof are suggested.
    Type: Grant
    Filed: August 9, 1997
    Date of Patent: December 26, 2000
    Assignees: National Tsing Hua University, Galaxy Software Services Ltd.
    Inventors: Jyun-Sheng Chang, Tsuey-Fen Lin
  • Patent number: 6148105
    Abstract: A study system of a voice recognizing and translating system is provided with a sound data base for storing data from which noise is removed; a sound analysis unit for extracting the features of the voice corresponding to the voice data stored in the sound data base; and a model learning unit for creating an acoustic model on the basis of the analysis result of the sound analysis unit. A recognition system of the voice recognizing and translating system is provided with: an acoustic model storing unit for storing acoustic models; a second sound analysis unit for extracting the feature of the voice corresponding to the data concerned on the basis of the data obtained by removing the data representing noise from the voice data of a newly input voice, and a voice collating unit for collating the voice data obtained by the second sound analysis unit with the data of the acoustic models so as to recognize the voice.
    Type: Grant
    Filed: April 22, 1999
    Date of Patent: November 14, 2000
    Assignee: Hitachi, Ltd.
    Inventors: Shinji Wakisaka, Hiroko Sato
  • Patent number: 6137908
    Abstract: The speed and accuracy of a computer implemented handwriting recognition system is enhanced by several innovations, including integrated segmentation and context processing. The recognition processing occurs while the user is providing ink data. The system quickly reaches the recognition result once all of the input is received. More than one result may be returned by the system.
    Type: Grant
    Filed: June 29, 1994
    Date of Patent: October 24, 2000
    Assignee: Microsoft Corporation
    Inventor: Sung Sik Rhee
  • Patent number: 6137911
    Abstract: Documents are classified into one or more clusters corresponding to predefined classification categories by building a knowledge base comprising matrices of vectors which indicate the significance of terms within a corpus of text formed by the documents and classified in the knowledge base to each cluster. The significance of terms is determined assuming a standard normal probability distribution, and terms are determined to be significant to a cluster if their probability of occurrence being due to chance is low. For each cluster, statistical signatures comprising sums of weighted products and intersections of cluster terms to corpus terms are generated and used as discriminators for classifying documents. The knowledge base is built using prefix and suffix lexical rules which are context-sensitive and applied selectively to improve the accuracy and precision of classification.
    Type: Grant
    Filed: June 16, 1997
    Date of Patent: October 24, 2000
    Assignee: The Dialog Corporation PLC
    Inventor: Maxim Zhilyaev