Context Analysis Or Word Recognition (e.g., Character String) Patents (Class 382/229)

Trigrams or digrams (Class 382/230)

Checking spelling for recognition (Class 382/231)

WORD SEGMENTATION IN CHINESE TEXT

Publication number: 20020102025

Abstract: The present invention provides a facility for selecting from a sequence of natural language characters combinations of characters that may be words. The facility uses indications, for each of a plurality of characters, of (a) the characters that occur in the second position of words that begin with the character and (b) the positions in which the character occurs in words. For each of a plurality of contiguous combinations of characters occurring in the sequence, the facility determines whether the character occurring in the second position of the combination is indicated to occur in words that begin with the character occurring in the first position of the combination. If so, the facility determines whether every character of the combination is indicated to occur in words in a position in which it occurs in the combination. If so, the facility determines that the combination of characters may be a word.

Type: Application

Filed: May 29, 1998

Publication date: August 1, 2002

Inventors: ANDI WU, STEPHEN D. RICHARDSON, ZIXIN JIANG
Word recognition device, word recognition method, and storage medium

Publication number: 20020097915

Abstract: The capacity of a character feature dictionary is reduced, and stored as a feature dictionary. The capacity is reduced by clustering feature vectors in units of columns or rows for character features, by making m column vectors represent the column or row features, and by assigning 1 to m identification numbers. The capacity of the dictionary can be further reduced by representing a column or row feature with an addition sum of other column or row features, or differential features after clustering is performed, or by performing dimension compression for character features. Word recognition is performed by synthesizing a word feature for a comparison based on a word list to be recognized, and by making a comparison between a feature extracted from an input word and the synthesized feature. Or, a comparison between input word and input word features whose numbers of dimensions are different may be made with nonlinear elastic matching.

Type: Application

Filed: September 12, 2001

Publication date: July 25, 2002

Applicant: Fujitsu Limited

Inventor: Yoshinobu Hotta
Method and device for recording of information

Publication number: 20020094133

Abstract: A device and a method for recording of text by imaging the text on a light-sensitive sensor (8). The device converts the images (14-17, 47-49) into a set of characters (50, 51) each using character recognition, and then assembling the acts of characters (50, 51) with the aid of the characters.

Type: Application

Filed: November 13, 2001

Publication date: July 18, 2002

Inventors: Markus Andreasson, Per Astrand
Method and mechanism for providing partial results in full context handwriting recognition

Patent number: 6418239

Abstract: A method and mechanism for displaying partial results of full context handwriting recognition. As handwritten characters are entered into a system, a shape matcher associates the character with a plurality of alternate code points, with each alternate code point having probability information associated therewith. The alternate code points are placed at the end of a queue, and a cost is determined from each alternate code point to any immediately preceding alternate in the queue. The cost is based on the probability information of the alternates and a transition cost therebetween. Then, the lowest cost path back from each of the alternates at the end of the queue to an alternate at the beginning of the queue is determined. If each lowest cost path back converges to a common alternate in the queue, the common alternate and any previous alternates on the path back are recognized as the code points for each of the handwritten characters associated therewith.

Type: Grant

Filed: May 16, 2000

Date of Patent: July 9, 2002

Assignee: Microsoft Corporation

Inventors: Gregory N. Hullender, Patrick M. Haluptzok
Apparatus and method of program classification based on syntax of transcript information

Publication number: 20020076112

Abstract: A method of program classification based on syntax of transcript information includes receiving transcript information associated with the program wherein the transcript information has a plurality of sentences, determining characteristics of at least one of the plurality of sentences of the transcript information to identify at least the type and subject of the sentence, comparing the characteristics of the at least one of the plurality of sentences with a list of sentence characteristics having associated therewith a plurality of program types, and based on the comparing step, selecting a classification of program which is most closely associated with the characteristics of the at least one of the plurality of sentences.

Type: Application

Filed: December 18, 2000

Publication date: June 20, 2002

Applicant: PHILIPS ELECTRONICS NORTH AMERICA CORPORATION

Inventor: Kavitha Vallari Devara
Method and apparatus for formatting OCR text

Publication number: 20020076111

Abstract: Following scanning of a document image, and optical character recognition (OCR) processing, the outputted OCR text is processed to determine a text format (typeface and font size) to match the OCR text to the originally scanned image. The text format is identified by matching word sizes rather than individual character sizes. In particular, for each word and for each of a plurality of candidate typefaces, a scaling factor is calculated to match a typeface rendering of the word to the width of the word in the originally scanned image. After all of the scaling factors have been calculated, a cluster analysis is performed to identify close clusters of scaling factors for a typeface, indicative of a good typeface fit at a constant scaling factor (font size).

Type: Application

Filed: December 18, 2000

Publication date: June 20, 2002

Applicant: Xerox Corporation

Inventors: Christopher R. Dance, Mauritius Seeger
Technique to identify interesting print articles for later retrieval and use of the electronic version of the articles

Publication number: 20020076110

Abstract: The present invention provides a method and system for efficient information storage and retrieval of information. The method includes the steps of: scanning/selecting/capturing a selected portion of text of the information wherein the selected portion of text scanned is typically a close-to-unique identifier of the text from which the portion was excerpted and serves as a key when the information is accessed electronically; and placing the key in an electronically available index/directory to facilitate retrieval of the information. The method may further include retrieving and storing the information associated with the key and using it to index, organize, and make available for search and retrieval the full information originally viewed by the user.

Type: Application

Filed: December 15, 2000

Publication date: June 20, 2002

Inventor: Pieter J. van Zee
METHOD AND APPARATUS FOR CONTEXT SENSITIVE TEXT RECOGNITION

Publication number: 20020076109

Abstract: A method and apparatus for providing an interpreter that is operable to recognize whether text displayed on a display device belongs to a pre-defined type of text, and to present the user with an option for performing a context sensitive operation when an interpreter recognizes the input text as belonging to the pre-defined type of text. The input text is generated based on a user selecting text, a portion of text, an object, or a portion of an object displayed on a display device that is part of a computer system. Alternatively, an interpreter automatically recognizes when displayed text belongs to a pre-defined type of text. A wide variety of interpreters may be included, wherein each interpreter has at least one corresponding pre-defined type of text and at least one corresponding context sensitive operation.

Type: Application

Filed: January 25, 1999

Publication date: June 20, 2002

Inventors: ANDY HERTZFELD, DARIN BENJAMIN ADLER
System and method of authorizing an electronic commerce transaction

Publication number: 20020073029

Abstract: A system and method of authorizing an electronic commerce transaction between a purchaser using a credit card, an on-line merchant, and a credit card company. A server associated with the merchant receives a purchase request from the purchaser that includes a purchase amount and the purchaser's credit card information. A SIP multi-party conference is established, and information is shared among the three parties through a multicast procedure. A Web camera may take an image of the purchaser when the purchaser sends the purchase request to the merchant. A whiteboard application is used to capture an image of the purchaser's signature. The credit card company verifies the credit card information, authorizes the purchase amount, and validates the purchaser's image and signature utilizing an image recognition program and a database of valid cardholder images and signatures.

Type: Application

Filed: December 12, 2000

Publication date: June 13, 2002

Applicant: Telefonaktiebolaget LM Ericsson (publ)

Inventors: Daniela Cheaib, Roch Glitho
Method and apparatus for producing a hybrid data structure for displaying a raster image

Publication number: 20020067859

Abstract: A system for producing a raster image derived from coded and non-coded portions of a hybrid data structure from an input bitmap including (1) a data processing apparatus, (2) a recognizer which performs recognition on an input bitmap to the data processing apparatus to detect identifiable objects within the input bitmap, (3) a mechanism for producing a hybrid data structure including coded data corresponding to the identifiable objects and non-coded data derived from portions of the input bitmap which do not correspond to the identifiable objects, and (4) an output device capable of developing a visually perceptible raster image derived from the hybrid data structure. The raster image includes raster images of the identifiable objects and raster images derived from portions of the input bitmap that do not correspond to the identifiable objects.

Type: Application

Filed: January 25, 2002

Publication date: June 6, 2002

Applicant: Adobe Systems, Inc., a California corporation

Inventors: Dennis G. Nicholson, James C. King
Document-based query data for information retrieval

Patent number: 6396951

Abstract: To obtain a query for use in information retrieval, a document is scanned. The resulting text image data define an image of a segment of text in a first language. Automatic recognition is then performed on at least part of the text image data to obtain text code data including a series of element codes. Each element code indicates an element that occurs in the first language, and the series of element codes defines a set of expressions that also occur in the first language. Automatic translation is then performed on a version of the text code data to obtain translation data indicating a set of counterpart expressions in a second language. The counterpart expressions are used to automatically obtain query data defining the query. The query can then be provided to an information retrieval engine.

Type: Grant

Filed: December 23, 1998

Date of Patent: May 28, 2002

Assignee: Xerox Corporation

Inventor: Gregory Grefenstette
Smart handwriting recognition apparatus and methods

Publication number: 20020057842

Abstract: A method of handwriting recognition encourages the entry of an entire word, and presents the “most likely” word or words. A “look-ahead” mode of operation is implemented, wherein most probable word or words corresponding to the entered letters are identified in a dictionary; and presented to the user in such a way that the user may discontinue the entry of further letters if one the words identified in the dictionary matches the desired word. The determination of the most likely word or words may be based on a combination of one or more criteria, including the characters themselves, the length of the word, the relative placement of the recognized characters within the word, and so forth. The result may also be presented in various ways singly or in combination according to the invention. In addition to a presentation of the highest probable word, the ‘n’ highest probable words may be presented.

Type: Application

Filed: June 1, 2001

Publication date: May 16, 2002

Inventor: Henry C. Yuen
Orthogonal technology for multi-line character recognition

Publication number: 20020054693

Abstract: The present invention encompasses a self-orthogonal character recognition engine for executing an iterative method employing a database of predetermined character strings. The method receives a digital representation of a character string. It then generates a proposed result string by applying to the captured digital image a predetermined recognition routine including one or more recognition subroutines. Each recognition subroutine employs an initial parameter setting. Next, if the proposed result string does not match any of the predetermined character strings in the database, the initial parameter setting of a recognition subroutine is changed to a next setting. The recognition process is then repeated using the next parameter setting to generate and test a next result string. The process can be repeated iteratively until a result string is verified or the process times out.

Type: Application

Filed: July 30, 2001

Publication date: May 9, 2002

Inventor: Brian J. Elmenhurst
Method and apparatus for producing a hybrid data structure for displaying a raster image

Patent number: 6385350

Abstract: A system for producing a raster image derived from a data structure including a data processing apparatus, a recognizer which performs recognition on an input bitmap to the data processing apparatus to detect identifiable objects within the input bitmap, a mechanism for producing a hybrid data structure including coded data corresponding to the identifiable objects and to non-identifiable objects and the input bitmap, and an output device capable of developing a visually perceptible raster image derived from the input bitmap in the hybrid data structure. The raster image is derived from the input bitmap and thus includes no misrecognition errors.

Type: Grant

Filed: January 6, 1998

Date of Patent: May 7, 2002

Assignee: Adobe Systems Incorporated

Inventors: Dennis G. Nicholson, James C. King, David M. Emmett
E-mail signature block analysis

Patent number: 6373985

Abstract: A technique analyzing loosely constrained text blocks, such as e-mail signature blocks by performing a two-dimensional geometrical analysis and a one-dimensional language analysis in order to classify sub-blocks at the loosely constrained text block into particular functional classes. The present technique may also be utilized to identify a personal name from a user name in a loosely constrained text block, such as an e-mail signature block.

Type: Grant

Filed: August 12, 1998

Date of Patent: April 16, 2002

Assignee: Lucent Technologies, Inc.

Inventors: Jianying Hu, Richard W. Sproat, Hao Chen
Process and equipment for recognition of a pattern on an item presented

Patent number: 6373982

Abstract: Apparatus and method for improving recognition of patterns such as alphanumeric characters. A known recognition system is expanded to further include a complementary recognition system which is linked with the primary recognition system. An image that can not be positively recognized by the primary recognition system is passed on to the complementary recognition system and any characters not positively recognized by the complementary recognition are again passed on to a correction system. At the correction system, an operator classifies unrecognized characters which are then used to teach the complementary recognition system. Thus, the classified data of the correction system provide the training data for a continuous training process which is coupled with the correction system by a pattern adaptation system.

Type: Grant

Filed: May 7, 1999

Date of Patent: April 16, 2002

Assignee: International Business Machines Corporation

Inventors: Udo Maier, Werner Ruppert
Document search and retrieval apparatus, recording medium and program

Publication number: 20020041713

Abstract: A search apparatus searches for a keyword from a character recognition result using an index table. The character recognition result being obtained as a result of character recognition of characters in an original document. The index table includes an index character string; a position of a portion, in the character recognition result, which matches the index character string; and a credibility which is defined for each character included in the index character string and indicates a probability of the character existing in a portion, in the original document, which corresponds to a portion, in the character recognition result, which matches the character.

Type: Application

Filed: June 6, 2001

Publication date: April 11, 2002

Inventors: Taro Imagawa, Kenji Kondo, Yoshihiko Matsukawa, Tsuyoshi Mekata
Portable terminal device for transmitting image data via network and image processing device for performing an image processing based on recognition result of received image data

Patent number: 6366698

Abstract: A user of a portable terminal writes sentences to be transmitted as an e-mail, a mail address, and information indicating that service a host device is requested to provide is “mail transmission” on paper as a memo. By using an image input unit installed in the portable terminal, the paper or the like is imaged. The portable terminal transmits the image data thus taken in to the host device. The host device analyzes received image data by using an image recognition unit. Upon recognizing by characters that the service requested by the portable terminal is “mail transmission”, the host device starts a mail transmitting/receiving unit. The mail transmitting/receiving unit transmits the content of the written memo included in the image data to a terminal specified by the mail address included in the image data.

Type: Grant

Filed: March 6, 1998

Date of Patent: April 2, 2002

Assignee: Casio Computer Co., Ltd.

Inventor: Tooru Yamakita
System, method and program for discriminating named entity

Publication number: 20020031269

Abstract: A named entity discriminating system capable of discriminating names entities such as location names, personal names, and organization names in text with a high degree of accuracy is provided. A reading means reads text from a hypertext database. A single text analyzing means analyzes each text read by the reading means and detects candidates for the named entity in the text. A complex text analyzing means estimates the likelihood of the candidate named entity detected by the single text analyzing means by an analysis with reference to referring link text or linked text of the text in which the candidate named entity appears.

Type: Application

Filed: September 7, 2001

Publication date: March 14, 2002

Applicant: NEC CORPORATION

Inventor: Toshikazu Fukushima
Image processing apparatus, image processing method, and computer readable storage medium

Publication number: 20020031270

Abstract: An image processing apparatus for changing a layout of a character string and/or a drawing contained in image data is disclosed. The apparatus includes a first detection means, a second detection means, a change means, a recognition means, and a replacing means. The first detection means detects a directive word, which is a character string that indicates a drawing position. The second detection means detects a drawing whose position is indicated by the directive word. The change means changes a layout of the character string and/or the drawing position. The recognition means recognizes positional relation between the directive word and the drawing after a layout change. The replacing means replaces the directive word based on the positional relation.

Type: Application

Filed: August 30, 2001

Publication date: March 14, 2002

Inventor: Tsutomu Yamazaki
Chinese electronic dictionary

Patent number: 6349147

Abstract: A method of finding a Chinese character in an electronic dictionary. The method includes sorting the characters in the dictionary into three groups according to stroke type: horizontal, vertical and slant, identifying which group a character belongs to based on the first writing stroke of the character, locating an original root of the Chinese character from the identified group based on a first three writing strokes of the Chinese character and finding the Chinese character in the dictionary based on the first three writing strokes of the Chinese character that immediately follow the strokes of the located original root.

Type: Grant

Filed: January 31, 2000

Date of Patent: February 19, 2002

Inventors: Gim Yee Pong, Wai Jean Pong
Document recognition apparatus and method

Publication number: 20020012468

Abstract: This invention provides a camera image recognition apparatus capable of moving a camera to read a wide region of a document at a high precision and easily correcting an erroneously recognized portion. The shift amount of the character string image of a document image to be compared is calculated for each sensed document image from the character string image of a specific document image among a plurality of sensed document images. When the calculated shift amount reaches a predetermined amount, a new character image in the character string image of a document image whose shift amount reaches the predetermined amount is composited to the character string image of the specific document image, thereby generating a document image.

Type: Application

Filed: June 28, 2001

Publication date: January 31, 2002

Applicant: KABUSHIKI KAISHA TOSHIBA

Inventors: Yuuichi Togashi, Takayasu Tsuchiuchi
Method and apparatus for character recognition

Patent number: 6341176

Abstract: A character recognizing apparatus has a post-processing unit which makes character strings including a plurality of conversion candidates, respectively, made by a character recognizing unit, and a full text searching unit performs a full text search for the character strings in a plurality of documents having been converted into text data, whereby the post-processing unit determines a correct character on the basis of results of the search to correct misrecognition.

Type: Grant

Filed: November 13, 1997

Date of Patent: January 22, 2002

Assignee: Matsushita Electric Industrial Co., Ltd.

Inventors: Yasuyo Shirasaki, Tomoko Tanabe, Chuichi Kikuchi
Systematic enumerating of strings using patterns and rules

Patent number: 6333999

Abstract: A method, apparatus, article of manufacture, and a memory structure for generating a sequence of character strings is disclosed. This is accomplished by making multiple consecutive variations of a single character-string pattern where each variation differs from the previous one in a systematic way. The pattern is not restricted in form or in content, and the variations can include adding, deleting, and modifying lines and strings in the pattern, and making no change. This is implemented by executing production rules on a template. In one embodiment, the method herein described begins by defining a template comprising a plurality of template character strings, and defining an ordered series of production rules, each with a comparison field, an operator, and a substitution field comprising a list of elements defining an initial value and a current value. The current value of each production rule is set to the initial value, and the production rules are applied to the template one at a time in execution order.

Type: Grant

Filed: November 6, 1998

Date of Patent: December 25, 2001

Assignee: International Business Machines Corporation

Inventor: Joseph D. Brownsmith
Method and apparatus for character recognition, and computer-readable recording medium with a program making a computer execute the method recorded therein

Patent number: 6320983

Abstract: A character recognition apparatus, in which a type of each of accounts, a type of each of marks, and characteristic data indicating a string of characters recognized when any of the marks is written in each account are previously registered for each group of the types of accounts in correlation to each other in an account-type database; a mark is estimated according to a result of verification between the account-type database and a result of ordinary character recognition according to a character recognizing program; and a type of account having the estimated mark is determined as a selected account name according to a result of recognizing characters in accounts other than the account having the mark.

Type: Grant

Filed: November 3, 1998

Date of Patent: November 20, 2001

Assignee: Fujitsu Limited

Inventors: Hideki Matsuno, Shinichi Eguchi, Yoshihiro Nagano, Koichi Chiba, Katsutoshi Kobara
Handwriting recognition by word separation into sillouette bar codes and other feature extraction

Publication number: 20010033694

Abstract: A method of automatically recognizing text. The text is divided into whole words which are each recognize. Each whole word is characterized according to its silhouette. The silhouette is characterized by features in the silhouette such as upwardly extending “polls” and downwardly extending “holes”. The silhouette may also be characterized by its first syllable blends. Numbers are assigned to each of the different characteristics, and numbers may also be assigned based on analysis of a database of different kinds of cursive words. Recognition may be automatically carry out prior recognizing system which recognizes in this way.

Type: Application

Filed: January 19, 2001

Publication date: October 25, 2001

Inventors: Rodney M. Goodman, Donald J. Woods, Patricia A. Keaton, Joseph Chen
Word string collating apparatus, word string collating method and address recognition apparatus

Publication number: 20010031088

Abstract: In a word string collating method for collating an input word string and address data in an address dictionary when a word string using part of a plurality of words of a word string such as address information is extracted from the result of character recognition for the word string including the plurality of words, words of the input word string and words used as the address data in the address dictionary are set to correspond to each other, distances between the words are derived based on similarities between the words which are set into the correspondence relations, the positional relation of each word of the input word string which is set into the correspondence relation is derived, an evaluated value is derived based on the thus derived positional relation and the distance between the words which are set in the correspondence relation, and a partial word string extracted from the input word string is determined based on the evaluated value.

Type: Application

Filed: April 4, 2001

Publication date: October 18, 2001

Inventor: Naotake Natori
SYSTEM AND METHOD FOR RENDERING IMAGE BASED DATA

Publication number: 20010031087

Abstract: A communications system for rendering image based data includes a data interface, a display device, and a data manager. The data interface receives image based data that is used by the display device to display an image. The data manager identifies word blocks defined by the received data. The data manager uses the word blocks to define a first row of the image. In this regard, the data manager determines whether images respectively defined by each of the word blocks would be visible if the word blocks are rendered to the first row of the display screen. In response to a determination that an image associated with one of the word blocks would not be visible if the one word block is rendered to the first row of the display screen, the data manager defines a second row and renders the one word block to the second row.

Type: Application

Filed: April 12, 1999

Publication date: October 18, 2001

Inventor: FRANK P. CARAU
Apparatus for recognizing input character strings by inference

Publication number: 20010028742

Abstract: An object of the present invention is to provide a character recognition apparatus for inferring the entire character string solely from a user-input handwritten keyword and displaying the inferred result as a candidate character string.

Type: Application

Filed: February 22, 2001

Publication date: October 11, 2001

Inventors: Keiko Gunji, Koyo Katsura, Soshiro Kuzunuki, Masaki Miura, Toshimi Yokota
Image processing device and image processing program

Publication number: 20010026640

Abstract: An image processing device and a computer program product capable of accurately determining a user-desired region even when a region has been only roughly marked by a user, wherein a specific region within an image to be processed is detected; the image to be processed is allocated into a plurality of blocks; text included in the image to be processed is recognized; it is determined based on a result of text recognition that presence and absence of relevance between a first block which is partially included in the specific region and a second block which is entirely included in the specific region among the allocated blocks; and it is determined whether or not an image of the first block should be treated as an image belonging to the specific region in accordance with a result of determination as to the relevance.

Type: Application

Filed: March 16, 2001

Publication date: October 4, 2001

Inventor: Hideyuki Toriyama
Method and device for forming/processing character string image

Patent number: 6298159

Abstract: There are provided a method and device for forming a character string image in a predetermined image area based on a plurality of character images each occupying an area for one character, each of the character images being formed of an actual character image and blank images arranged on horizontally opposite sides of the actual character image in a manner immediately adjacent thereto. Actual character images are taken out from the character images, respectively. The thus taken-out actual character images are arranged in the predetermined image area according to a desired sequence, to thereby form the character string image. The thus formed character string image is handled as an equivalent to an image of one character.

Type: Grant

Filed: November 9, 1998

Date of Patent: October 2, 2001

Assignees: Seiko Epson Corporation, King Jim Co., Ltd.

Inventors: Shinichi Tukagoshii, Kenji Watanabe, Tomoyuki Shimmura
Recognition and translation system and method

Patent number: 6298158

Abstract: The invention comprises a method and system of recognition and translation, stored on a digital storage device with an operating system and running computer applications, such as a personal computer, which recognizes input by the human computer user and transmits output to the human user, which performs non-optical and optical character recognition of characters displayed on the output device of the digital storage device, which automatically recognizes and translates phrases contiguous to and including the phrase upon which the System is activated and which translates words from one written phrase set to a second written phrase set.

Type: Grant

Filed: September 25, 1997

Date of Patent: October 2, 2001

Assignee: Babylon, Ltd.

Inventors: Ofer Egozi, Ovadia Amnon
WORD RECOGNITION DEVICE AND METHOD

Publication number: 20010019629

Abstract: A word recognition device uses an associative memory to store a plurality of coded words in such a way that a weight is associated with each character of the alphabet of the stored words, wherein equal weights correspond to equal characters. To perform the recognition, a dictionary of words is first chosen; this is stored in the associative memory according to a pre-determined code; a string of characters which correspond to a word to be recognized is received; a sequence of weights corresponding to the string of characters received is supplied to the associative memory; the distance between the word to be recognized and at least some of the stored words is calculated in parallel as the sum of the difference between the weights of each character of the word to be recognized and the weights of each character of the stored words; the minimum distance is identified; and the word stored in the associative memory having the minimum distance is stored.

Type: Application

Filed: February 12, 1998

Publication date: September 6, 2001

Inventors: LORIS NAVONI, ROBERTO CANEGALLO, MAURO CHINOSI, GIOVANNI GOZZINI, ALAN KRAMER, PIERLUIGI ROLANDI
Word recognition method and storage medium that stores word recognition program

Publication number: 20010016074

Abstract: In word recognition using the character recognition result, recognition processing is performed for an input character string that corresponds to a word to be recognized, a probability at which characteristics obtained as the result of character recognition are generated by conditioning characters of words contained in a word dictionary that stores in advance candidates of words to be recognized. The thus obtained probability is divided by a probability at which characteristics obtained as the result of character recognition are generated, and each of the division results obtained relevant to the characters of the words contained in the word dictionary is multiplied relevant to all the characters. The recognition results of the above words are obtained based on the multiplication results.

Type: Application

Filed: January 26, 2001

Publication date: August 23, 2001

Inventor: Tomoyuki Hamamura
Finding selected character strings in text and providing information relating to the selected character strings

Patent number: 6269189

Abstract: Selected character strings are automatically found by performing an automatic search of a text to find character strings that match any of a list of selected strings. The automatic search includes a series of iterations, each with a starting point in the text. Each iteration determines whether its starting point is followed by a character string that matches any of the list of selected strings and that ends at a probable string ending. Each iteration also finds a starting point for the next iteration that is a probable string beginning. The selected strings can be words and multiple word expressions, in which case probable string endings and beginnings are word boundaries. A finite state lexicon, such as a finite state transducer or a finite state automation, can be used to determine whether character strings match the list of selected strings. A tokenizing automation can be used to find starting points.

Type: Grant

Filed: December 29, 1998

Date of Patent: July 31, 2001

Assignee: Xerox Corporation

Inventor: Jean-Pierre Chanod
Word grouping accuracy value generation

Patent number: 6269188

Abstract: The present invention is a computer-implemented method for calculating word accuracy. Word grouping accuracy values (260) are calculated (212) by using the character accuracy values (250) calculated by an OCR program present in a computer system. The present invention preferably uses these character accuracy values (250) to create a word grouping accuracy value (260). Various methods are employed to calculate the word accuracy (260), including binarizing the character accuracy values (250), modified averaging of the character accuracy values (250), and creating fuzzy visual displays of word grouping accuracy values (260). The calculated word grouping accuracy values (260) are then adjusted based upon known OCR strengths and weaknesses, and based upon comparisons to stored word lists and the application of language rules. In a system with multiple character recognition techniques, the system can compare the accuracy values (260) of different versions of the word groupings to find the most accurate version.

Type: Grant

Filed: March 12, 1998

Date of Patent: July 31, 2001

Assignee: Canon Kabushiki Kaisha

Inventor: Hamadi Jamali
Classification-driven thresholding of a normalized grayscale image

Patent number: 6266445

Abstract: A sample image (142) is recognized by normalizing (404) the size of a sample image (142) to the size of a referent images (146); and determining (406) a set of candidate images (147) from a set of referent images (146), wherein each of the candidate images (147) is within an acceptable distance from a different binarization (145) of the sample image (142). A system (120) for image recognition includes a scanning device (126), a normalization unit (134), a distance calculation unit (136), a classification unit (138), a disambiguation unit (140), and a display device (128).

Type: Grant

Filed: March 13, 1998

Date of Patent: July 24, 2001

Assignee: Canon Kabushiki Kaisha

Inventors: Radovan V. Krtolica, Roger D. Melen
Method and apparatus for character recognition using stop words

Patent number: 6252988

Abstract: An adaptive OCR technique for character classification and recognition without the input and use of ground truth derived from the image itself. A set of so-called stop words are employed for classifying symbols, e.g., characters, from any image. The stop words are identified independent of any particular image and are used for classification purposes across any set of images of the same language, e.g., English. Advantageously, an adaptive OCR method is realized without the requirement of the selection and inputting of ground truth from each individual image to be recognized.

Type: Grant

Filed: July 9, 1998

Date of Patent: June 26, 2001

Assignee: Lucent Technologies Inc.

Inventor: Tin Kam Ho
Method of reading characters and method of reading postal addresses

Patent number: 6246794

Abstract: A character reading method has enhanced character segmentation accuracy and character string recognition accuracy for reading correctly hand-written addresses on postal matters. The method extracts provisional character patterns from image information of the address character string (step 206), creates a table 219 of tentative character patterns and implements the character classification for the tentative character patterns (step 207), extracts, specifically for characters of the street number portion of the address character string, periphery information (vertical and horizontal lengths, vertical/horizontal length ratio, pattern spacings, etc.) of tentative character patterns (step 212), and segments the character string into characters accurately based on the information (step 215).

Type: Grant

Filed: December 11, 1996

Date of Patent: June 12, 2001

Assignee: Hitachi, Ltd.

Inventors: Tatsuhiko Kagehiro, Masashi Koga, Hiroshi Sako, Hiromichi Fujisawa, Hisao Ogata, Yoshihiro Shima, Shigeru Watanabe, Masato Teramoto
Method and apparatus for automated, context-dependent retrieval of information

Patent number: 6236768

Abstract: Documents stored in a database are searched for relevance to contextual information, instead of (or in addition to) similar text. Each stored document is indexed in term of meta-information specifying contextual information about the document. Current contextual information is acquired, either from the user or the current computational or physical environment, and this “meta-information” is used as the basis for identifying stored documents of possible relevance.

Type: Grant

Filed: May 1, 1998

Date of Patent: May 22, 2001

Assignee: Massachusetts Institute of Technology

Inventors: Bradley J. Rhodes, Thad E. Starner, Pattie E. Maes, Alex P. Pentland
Character recognition system

Patent number: 6219449

Abstract: Characters are input to input means to be turned to electronic data, character variation of each input character is expected based on the position of the character in a word when the character is hand-written, based on information necessary for determining priority of recognition results stored in storing means, priority of the recognition results based on the expected character variation is determined by priority processing means, the character is recognized by recognizing means based on the priority, and the result is output to output means.

Type: Grant

Filed: June 21, 1996

Date of Patent: April 17, 2001

Assignees: ATR Auditory, Visual Perception Research Laboratories

Inventor: Michihiro Nagaishi
Method and apparatus for performing an automatic correction of misrecognized words produced by an optical character recognition technique by using a Hidden Markov Model based algorithm

Patent number: 6219453

Abstract: A method and apparatus for correcting misrecognized words appearing in electronic documents that have been generated by scanning an original document in accordance with an optical character recognition (“OCR”) technique. Each recognized word is generated by first producing, for each character position of the corresponding word in the original document, the N-best characters for occupying that character position. If an incorrect word is found in the electronic document, the present invention generates a plurality of reference words from which one is selected for replacing the incorrect word. This selected reference word is determined by the present invention to be the reference word that is the most likely correct replacement for the incorrect recognized word. This selection is accomplished by computing for each reference word a replacement word value. The reference word that is selected to replace the incorrect recognized word corresponds to the highest replacement word value.

Type: Grant

Filed: August 11, 1997

Date of Patent: April 17, 2001

Assignee: AT&T Corp.

Inventor: Randy G. Goldberg
Word processor via voice

Patent number: 6212497

Abstract: The word processor of the present invention comprises: a voice inputting device for inputting spoken word and converting the spoken word into voice data; a voice storage device for storing the voice data; a speech recognition device for recognizing a word in the voice data output from the voice inputting device or the voice data stored by the voice storage device; a display for displaying a result obtained by the voice recognition device; an instruction inputting device for inputting an instruction to select a portion in the result; and a correction device for correcting the portion in the result according to the instruction from the instruction inputting device.

Type: Grant

Filed: November 24, 1998

Date of Patent: April 3, 2001

Assignee: NEC Corporation

Inventors: Nobumasa Araki, Jun Noguchi, Mitsuru Nishiura
Method and apparatus for recognizing a character

Patent number: 6212299

Abstract: A document with a plurality of characters is read, a binary document image is produced, and a character rectangle circumscribed about a mass of black pixels connected with each other (called a black-pixel mass) is produced for each black-pixel mass. The character rectangles are classified into a plurality of groups on condition that one or more character rectangles in one group are circumscribed about one or more black-pixel masses having the same character pattern. The character rectangles in each group are circumscribed about images of the same character. Thereafter, a figure feature of a representative character image in each classified group of character rectangles is compared with each of referential character patterns. Therefore, the character images for the character rectangles circumscribing one of non-separating characters are recognized as one non-separating character.

Type: Grant

Filed: March 12, 1997

Date of Patent: April 3, 2001

Assignee: Matsushita Electric Industrial Co., Ltd.

Inventor: Ryoichi Yuge
Recording apparatus

Patent number: 6198840

Abstract: A recording apparatus includes a color table for storing color information corresponding 1:1 to color information codes. The apparatus executes color recording by providing color information corresponding to the color information code, to the given character data or image data. The apparatus features a function of expanding the color information, and a function of confirming the color information.

Type: Grant

Filed: November 27, 1996

Date of Patent: March 6, 2001

Assignee: Canon Kabushiki Kaisha

Inventors: Shunya Mitsuhashi, Shuichi Kumada
Character recognition using candidate frames to determine character location

Patent number: 6185338

Abstract: A character recognition method for recognizing characters on an article having multiple character-bearing areas, such as a license plate, first involves obtaining image data from an image of the article. The method then assigns at least one parameter to a selected character-bearing area on the article. The method then attempts to obtain a correct frame which expresses the correct positional relationship between the selected character-bearing area on the article with other character-bearing areas of the article, and then uses that correct frame to perform character recognition with respect to each of the character-bearing areas of the article. To obtain the correct frame, the invention compares the image data of the article with plural candidate frames. The plural candidate frames are calculated using the predetermined positional correlation between (1) the selected character-bearing area [as represented by the at least one parameter] and (2) other character-bearing areas of the article.

Type: Grant

Filed: March 21, 1997

Date of Patent: February 6, 2001

Assignee: Sharp Kabushiki Kaisha

Inventor: Mitsuaki Nakamura
Method and device for automatic error detection and correction for computerized text files

Patent number: 6167367

Abstract: A method and device for automatic error detection and correction for computerized text files uses a two-step segmentation method. A sentence of the computerized text file is first segmented at the first segmentation step into an original format and then converted into a correct sentence in the second segmentation step. In the first segmentation step the original sentence is segmented into a series of characters and the characters are analyzed so that the original phonetic or pictographic codes of the characters are revealed. The sentence in the original format is then converted into a series of phonetic representative codes and/or pictographic representative codes. Words consisting the sentence are then selected from a lexicon to reconstruct the sentence. The reconstructed sentence is then segmented again so that the errors in the original sentence are detected and corrections thereof are suggested.

Type: Grant

Filed: August 9, 1997

Date of Patent: December 26, 2000

Assignees: National Tsing Hua University, Galaxy Software Services Ltd.

Inventors: Jyun-Sheng Chang, Tsuey-Fen Lin
Character recognizing and translating system and voice recognizing and translating system

Patent number: 6148105

Abstract: A study system of a voice recognizing and translating system is provided with a sound data base for storing data from which noise is removed; a sound analysis unit for extracting the features of the voice corresponding to the voice data stored in the sound data base; and a model learning unit for creating an acoustic model on the basis of the analysis result of the sound analysis unit. A recognition system of the voice recognizing and translating system is provided with: an acoustic model storing unit for storing acoustic models; a second sound analysis unit for extracting the feature of the voice corresponding to the data concerned on the basis of the data obtained by removing the data representing noise from the voice data of a newly input voice, and a voice collating unit for collating the voice data obtained by the second sound analysis unit with the data of the acoustic models so as to recognize the voice.

Type: Grant

Filed: April 22, 1999

Date of Patent: November 14, 2000

Assignee: Hitachi, Ltd.

Inventors: Shinji Wakisaka, Hiroko Sato
Handwriting recognition system simultaneously considering shape and context information

Patent number: 6137908

Abstract: The speed and accuracy of a computer implemented handwriting recognition system is enhanced by several innovations, including integrated segmentation and context processing. The recognition processing occurs while the user is providing ink data. The system quickly reaches the recognition result once all of the input is received. More than one result may be returned by the system.

Type: Grant

Filed: June 29, 1994

Date of Patent: October 24, 2000

Assignee: Microsoft Corporation

Inventor: Sung Sik Rhee
Test classification system and method

Patent number: 6137911

Abstract: Documents are classified into one or more clusters corresponding to predefined classification categories by building a knowledge base comprising matrices of vectors which indicate the significance of terms within a corpus of text formed by the documents and classified in the knowledge base to each cluster. The significance of terms is determined assuming a standard normal probability distribution, and terms are determined to be significant to a cluster if their probability of occurrence being due to chance is low. For each cluster, statistical signatures comprising sums of weighted products and intersections of cluster terms to corpus terms are generated and used as discriminators for classifying documents. The knowledge base is built using prefix and suffix lexical rules which are context-sensitive and applied selectively to improve the accuracy and precision of classification.

Type: Grant

Filed: June 16, 1997

Date of Patent: October 24, 2000

Assignee: The Dialog Corporation PLC

Inventor: Maxim Zhilyaev

prev … 11 12 13 14 15 16 17 next