Checking Spelling For Recognition Patents (Class 382/231)
  • Patent number: 11934789
    Abstract: A document capture server receives a document image from a document capture client and processes the image into an electronic document containing textual content. During capture, the document capture server determines a graphical layout of the document, extracts keywords from the document, classifies the document accordingly, and calls an artificial intelligence (AI) platform to gain insights on the textual content. The AI platform analyzes the textual content and returns additional, insightful data such as a sentiment of the textual content. The document capture server can validate the additional data, integrate the additional data in a process or workflow, and/or provide the textual content and the additional data to a content repository or a computing facility operating in an enterprise computing environment. The document capture server can provide validated data to the AI platform to improve future analyses by the AI platform.
    Type: Grant
    Filed: August 2, 2021
    Date of Patent: March 19, 2024
    Assignee: Open Text SA ULC
    Inventor: Gareth Edward Hutchins
  • Patent number: 11348330
    Abstract: Systems, methods, and computer-executable instructions for extracting key value data. Optical character recognition (OCR) text of a document is received. The y-coordinate of characters are adjusted to a common y-coordinate. The rows of OCR text are tokenized into tokens based on a distance between characters. The tokens are ordered based on the x,y coordinates of the characters. The document is clustered into a cluster based on the ordered tokens and ordered tokens from other documents. Keys for the cluster are determined from the first set of documents. Each key is a token from a first set of documents. A value is assigned to each kay based on the tokens for the document, and values are assigned to each key for the other documents. The values for the document and the values for the other documents are stored in an output document.
    Type: Grant
    Filed: June 9, 2020
    Date of Patent: May 31, 2022
    Assignee: Microsoft Technology Licensing, LLC
    Inventor: Nicolae Duta
  • Patent number: 10210282
    Abstract: Methods and systems for providing a search engine capability for large datasets are disclosed. These methods and systems employ a Partition-by-Query index containing key-values pairs corresponding to keys reflecting concept-ordered search phrases and values reflecting ordered lists of document references that are responsive to the concept-ordered search phrase in a corresponding key. A large Partition-by-Query index may be partitioned across multiple servers depending on the size of the index, or the size of the index may be reduced by compressing query-references pairs into clusters. The methods and systems described herein may to provide suggestions and spelling corrections to the user, thereby improving the user's search engine experience while meeting user expectations for search quality and responsiveness.
    Type: Grant
    Filed: August 20, 2015
    Date of Patent: February 19, 2019
    Assignee: UBER TECHNOLOGIES, INC.
    Inventor: Geoffrey Rummens Hendrey
  • Patent number: 10198530
    Abstract: Methods and systems for providing a search engine capability for large datasets are disclosed. These methods and systems employ a Partition-by-Query index containing key-values pairs corresponding to keys reflecting concept-ordered search phrases and values reflecting ordered lists of document references that are responsive to the concept-ordered search phrase in a corresponding key. A large Partition-by-Query index may be partitioned across multiple servers depending on the size of the index, or the size of the index may be reduced by compressing query-references pairs into clusters. The methods and systems described herein may to provide suggestions and spelling corrections to the user, thereby improving the user's search engine experience while meeting user expectations for search quality and responsiveness.
    Type: Grant
    Filed: December 20, 2013
    Date of Patent: February 5, 2019
    Assignee: Uber Technologies, Inc.
    Inventor: Geoffrey Hendrey
  • Patent number: 9275636
    Abstract: Embodiments of the present invention provide an approach for estimating the accuracy of a transcription of a voice recording. Specifically, in a typical embodiment, each word of a transcription of a voice recording is checked against a customer-specific dictionary and/or a common language dictionary. The number of words not found in either dictionary is determined. An accuracy number for the transcription is calculated from the number of said words not found and the total number of words in the transcription.
    Type: Grant
    Filed: May 3, 2012
    Date of Patent: March 1, 2016
    Assignee: International Business Machines Corporation
    Inventors: James E. Bostick, John M. Ganci, Jr., John P. Kaemmerer, Craig M. Trim
  • Patent number: 9036923
    Abstract: Provided are an age estimation apparatus, an age estimation method, and an age estimation program capable of obtaining a recognition result closely matching the result perceived by human. An age estimation apparatus 10 for estimating an age of a person on image data includes a dimension compressor 11 for applying dimension compression to the image data to output low dimensional data; and an identification device 12 for estimating an age of a person on the basis of a learning result using a feature amount contained in the low dimensional data, wherein a parameter used for the dimension compression by the dimension compressor 11 and the feature amount used for age estimation by the identification device 12 are set on the basis of a result of an evaluation of a generalization capability using a weighting function that shows a degree of seriousness of an age estimation error for every age, and learning of the identification device 12 is performed on the basis of the weighting function.
    Type: Grant
    Filed: April 14, 2010
    Date of Patent: May 19, 2015
    Assignees: NEC Solution Innovators, Ltd., TOKYO INSTITUTE OF TECHNOLOGY
    Inventors: Kazuya Ueki, Masashi Sugiyama, Yasuyuki Ihara
  • Patent number: 8818111
    Abstract: Provided are an age estimation apparatus, an age estimation method, and an age estimation program capable of reducing the labor of labeling the image data used for age estimation. An age estimation apparatus for estimating an age of a person on image data includes a dimension compression unit for applying dimension compression to the image data to output low dimensional data; a clustering unit for performing clustering of the low dimensional data outputted; a labeling unit for labeling representative data of each cluster among the low dimensional data clustered; and an identification unit for estimating an age of a person on the basis of a learning result using a feature amount contained in labeled low dimensional data and unlabeled low dimensional data.
    Type: Grant
    Filed: April 14, 2010
    Date of Patent: August 26, 2014
    Assignees: NEC Soft, Ltd., Tokyo Institute of Technology
    Inventors: Kazuya Ueki, Masashi Sugiyama, Yasuyuki Ihara
  • Patent number: 8606022
    Abstract: An information processing apparatus, which creates a tree structure used by a recognition apparatus which recognizes specific information using the tree structure, including a memory unit which stores data including the information to be recognized and data not including the information so as to correspond to a label showing whether or not the data includes the information, a recognition device which recognizes the information and outputs a high score value when the data including the information is input, and a grouping unit which performs grouping of the recognition devices using a score distribution obtained when the data is input into the recognition devices.
    Type: Grant
    Filed: March 2, 2011
    Date of Patent: December 10, 2013
    Assignee: Sony Corporation
    Inventor: Jun Yokono
  • Patent number: 8599401
    Abstract: An image processing device includes an analysis result reception unit, an inclination reception unit, a date and time reception unit, a date and time extraction unit, a determination unit, an adjustment unit, an information image generation unit, and an output unit. The analysis result reception unit receives an analysis result of an information image scanned by an information image scan unit. The inclination reception unit receives inclination when the information image scan unit scans the information image. The date and time reception unit receives a date and time when the information image scan unit scans the information image. The date and time extraction unit extracts a date and time when the information image scanned by the information image scan unit is printed. The determination unit determines whether or not to adjust a size of a pixel cluster constituting the information image.
    Type: Grant
    Filed: April 14, 2011
    Date of Patent: December 3, 2013
    Assignee: Fuji Xerox Co., Ltd.
    Inventors: Kenji Ebitani, Hirofumi Komatsubara, Takeshi Noguchi
  • Patent number: 8515185
    Abstract: A live video stream captured by an on-device camera is displayed on a screen with an overlaid guideline. Video frames of the live video stream are analyzed for a video frame with acceptable quality. A text region is identified in the video frame approximate to the on-screen guideline and cropped from the video frame. The cropped image is transmitted to an optical character recognition (OCR) engine, which processes the cropped image and generates text in an editable symbolic form (the OCR'ed text). A confidence score is determined for the OCR'ed text and compared with a threshold value. If the confidence score exceeds the threshold value, the OCR'ed text is outputted.
    Type: Grant
    Filed: November 25, 2009
    Date of Patent: August 20, 2013
    Assignee: Google Inc.
    Inventors: Dar-Shyang Lee, Lee-Feng Chien, Aries Hsieh, Pin Ting, Kin Wong
  • Patent number: 8467444
    Abstract: An information processing system for performing processing of dividing a moving image into tiles and packetizing and outputting information corresponding to each tile includes a process time measuring packet generation unit adapted to generate and transmit a process time measuring packet in which a packet sending time is set to measure a packet process time, a packet process time measuring unit adapted to measure, based on the packet sending time set in the process time measuring packet and the reception time of the process time measuring packet, the packet process time necessary for processing a packet, a determination unit adapted to determine, based on the packet process time, the timestamp of the moving image divided into the tiles, and a packetization unit adapted to execute processing of packetizing and outputting the timestamp and the information of the moving image divided into the tiles.
    Type: Grant
    Filed: June 15, 2009
    Date of Patent: June 18, 2013
    Assignee: Canon Kabushiki Kaisha
    Inventors: Masayuki Odagawa, Wataru Ochiai, Akihiro Takamura
  • Publication number: 20130066663
    Abstract: Billing data associated with telecom products provided by a variety of vendors is captured, normalized, and processed to calculate true profit margins by invoice, by vendor, by geographic location, by end-customer, by circuit, or combinations of these.
    Type: Application
    Filed: September 12, 2011
    Publication date: March 14, 2013
    Applicant: DOCO LABS, LCC
    Inventor: Ghen Saito
  • Patent number: 8345114
    Abstract: Sub-regions within a face image are identified to be enhanced by applying a localized smoothing kernel to luminance data corresponding to the sub-regions of the face image. An enhanced face image is generated including an enhanced version of the face that includes certain original pixels in combination with pixels corresponding to the one or more enhanced sub-regions of the face.
    Type: Grant
    Filed: July 30, 2009
    Date of Patent: January 1, 2013
    Assignee: DigitalOptics Corporation Europe Limited
    Inventors: Mihai Ciuc, Adrian Capata, Valentin Mocanu, Alexei Pososin, Corneliu Florea, Peter Corcoran
  • Patent number: 8280175
    Abstract: A document processing apparatus includes: a character segmentation unit that segment a plurality of character images from a document image; a character image classifying unit that classifies the character images to categories corresponding to each of the character images; an average character image obtaining unit that obtains average character images for each of the categories of the character images classified by the character image classifying unit; a character recognizing unit that performs a character recognition to a character contained in each of the average character images; and an output unit that outputs character discriminating information as a character recognition result obtained by the character recognizing unit.
    Type: Grant
    Filed: February 17, 2009
    Date of Patent: October 2, 2012
    Assignee: Fuji Xerox Co., Ltd.
    Inventor: Katsuhiko Itonori
  • Patent number: 8249399
    Abstract: A method for optical character recognition (OCR) verification, the method includes: receiving a first character image that was obtained from applying an OCR process on a document; wherein the first character image is classified, by the OCR, as being associated with a first character; receiving a first character code of a text; replacing the first character code by the first character image; and evaluating a correctness of the OCR based upon a response of a user to a display of the text first character image.
    Type: Grant
    Filed: September 16, 2008
    Date of Patent: August 21, 2012
    Assignee: International Business Machines Corporation
    Inventors: Ella Barkan, Dan Shmuel Chevion, Boaz Ophir, Doron Tal
  • Patent number: 8228539
    Abstract: An image forming apparatus, which includes a plurality of image forming portions transferring a yellow image, a magenta image, a cyan image, and a black image formed on a plurality of photoconductor drums to a sheet conveyed on a conveying belt, a marking unit forming marks on the conveying belt, a detecting unit detecting the marks with three or more sensors aligned in a direction normal to a direction in which the sheet is conveyed, a calculating unit calculating an amount of color misalignment in accordance with results detected by the detecting unit, and a correcting unit correcting the color misalignment in accordance with the calculated amount of color misalignment, wherein the calculating unit calculates an amount of skew difference in accordance with results detected by two sensors among the three or more sensors, wherein one sensor of the two sensors is disposed on one end of the three or more sensors and the other sensor of the two sensors is disposed on the other end of the three or more sensors, w
    Type: Grant
    Filed: July 3, 2008
    Date of Patent: July 24, 2012
    Assignee: Ricoh Company, Ltd.
    Inventor: Tadashi Shinohara
  • Patent number: 8218831
    Abstract: Techniques are provided to analyze video frames of a video signal in order to distinguish regions containing a face (and body torso) from regions that contain a relatively static background. The region containing the face is referred to as a foreground region. A current video frame is divided into a plurality of elements and the foreground regions and background regions are detected. The background regions of a subsequent video frame are detected/registered using the foreground regions of the current video frame. The foreground regions of the subsequent video frame are determined using the background regions of the current video frame as a temporal reference.
    Type: Grant
    Filed: June 30, 2008
    Date of Patent: July 10, 2012
    Assignee: Cisco Technology, Inc.
    Inventors: Dihong Tian, Joseph T. Friel, J. William Mauchly, Maurice J. Buttimer, Wen-hsiung Chen
  • Patent number: 8184345
    Abstract: In a first mode, an original is read with a first and a second carriages stopped. In a second mode, the original is read with the first and the second carriages moving in a sub-scanning direction with a distance between the original and an optical reading element kept constant. A control unit causes, if a predetermined condition is satisfied after reading the original in the first mode, the first and the second carriages to standby at a reading position of the first mode while keeping the light source turned on, and if a next read request is issued within a predetermined time, causes the first and the second carriages to move to a next reading operation.
    Type: Grant
    Filed: October 7, 2008
    Date of Patent: May 22, 2012
    Assignee: Ricoh Company, Limited
    Inventor: Osamu Inage
  • Patent number: 8170290
    Abstract: A method for checking an imprint reads an imprint, forms a data code from the imprint, and compares the data code with a predetermined number of check data codes of a stored data set. During a search for the data code in the data set, the method decides whether the data code is to be classified as acceptable or unacceptably faulty.
    Type: Grant
    Filed: October 25, 2007
    Date of Patent: May 1, 2012
    Assignee: Siemens Akteingesellschaft
    Inventors: Ingolf Rauh, Udo Miletzki
  • Publication number: 20110123115
    Abstract: A live video stream captured by an on-device camera is displayed on a screen with an overlaid guideline. Video frames of the live video stream are analyzed for a video frame with acceptable quality. A text region is identified in the video frame approximate to the on-screen guideline and cropped from the video frame. The cropped image is transmitted to an optical character recognition (OCR) engine, which processes the cropped image and generates text in an editable symbolic form (the OCR'ed text). A confidence score is determined for the OCR'ed text and compared with a threshold value. If the confidence score exceeds the threshold value, the OCR'ed text is outputted.
    Type: Application
    Filed: November 25, 2009
    Publication date: May 26, 2011
    Applicant: GOOGLE INC.
    Inventors: Dar-Shyang Lee, Lee-Feng Chien, Aries Hsieh, Pin Ting, Kin Wong
  • Patent number: 7792369
    Abstract: A form processing apparatus extracts layout information and character information from a form document. A candidate extracting unit extracts word candidates from the character information. A frequency digitizing unit calculates emission probability of a word candidate from each element. A relation digitizing unit calculates transition probability that relationship between word candidates is established. An evaluating unit calculates an evaluation value indicative of a probability of appearance of word candidates in respective logical elements. A determining unit determines the element and a word candidate thereof as the element and a character string thereof in the form document, based on the evaluation value.
    Type: Grant
    Filed: November 15, 2006
    Date of Patent: September 7, 2010
    Assignee: Fujitsu Limited
    Inventors: Akihiro Minagawa, Hiroaki Takebe, Katsuhito Fujimoto
  • Patent number: 7769235
    Abstract: The present invention discloses a method of character and text recognition of a bit-mapped graphic file received from an optical scanning device. The method comprises a trainable template cache, a preliminarily trained feature analysis means, and a context analysis means. The present invention discloses the way to use said means for achieving the best results in recognition. The method supposes that the template cache along with the context analysis means are used as the main shape characteristic analyzing means. The feature analysis means along with the context analysis means are used as subsidiary shape characteristic analyzing means and as a training means for the template cache. The method comprises applying the main shape characteristic analyzing means and optionally applying the subsidiary shape characteristic analyzing means if no or not enough reliability of recognition is achieved after the template cache analyzing.
    Type: Grant
    Filed: September 12, 2002
    Date of Patent: August 3, 2010
    Assignee: Abbyy Software Ltd
    Inventors: Konstantin Anisimovich, Vadim Tereshchenko, Vladimir Rybkin, Sergey Platonov
  • Patent number: 7756335
    Abstract: A method for determining at least one recognition candidate for a handwritten pattern comprises selecting possible segmentation points in the handwritten pattern for use in segmenting and recognizing the handwritten pattern. The method further may comprise comparing segments of the handwritten pattern to templates. The comparison may return segment candidates forming possible recognition results of the segments of the handwritten pattern. The method further comprises forming a representation of sequences of segment candidates, said representation comprising data blocks corresponding to segmentation points, wherein a data block comprises references to data blocks corresponding to subsequent segmentation points. The reference may comprise information of segment candidates.
    Type: Grant
    Filed: February 28, 2006
    Date of Patent: July 13, 2010
    Assignee: Zi Decuma AB
    Inventor: Jakob Sternby
  • Patent number: 7664343
    Abstract: Methods and systems of mapping of an optical character recognition (OCR) text string to a code included in a coding dictionary by supplementing the Levenshtein Distance Algorithm (LDA) with additional information in the form of adjustments based on particular character substitutions, insertions and deletions together with weighting based on multiple alternatives for the OCR text string.
    Type: Grant
    Filed: January 23, 2006
    Date of Patent: February 16, 2010
    Assignee: Lockheed Martin Corporation
    Inventors: Timothy O. Withum, Kurt P. Kopchik, Oren I. Oxman
  • Patent number: 7657422
    Abstract: A method and system for generating a Directed Acyclic Graph (DAG) from an initial multi-chain, subject to a constraint. The initial multi-chain is expressed as a string serving as a current input string to which the constraint is subsequently applied. A provided string P expresses the constraint. P is applied to the current input string to generate at least one output string, wherein each generated output string violates the constraint to a lesser extent than does the input string or does not violate the constraint. Each generated output string violating the constraint serves as a current input string to which the constraint is subsequently applied. P is recursively applied to each current input string that had been determined from applying P previously, until applying P does not generate any more output strings violating the constraint. A set of the generated output strings not violating the constraint represents the DAG.
    Type: Grant
    Filed: January 23, 2004
    Date of Patent: February 2, 2010
    Assignee: International Business Machines Corporation
    Inventor: Christian Mauceri
  • Patent number: 7647340
    Abstract: A JPEG2000 file includes a plurality of boxes containing data suitable to render an image including a metadata box that includes information within the box describing the content of the image. The information within the metadata box describing content may be MPEG-7 data, which is compliant with the MPEG-7 specification.
    Type: Grant
    Filed: June 15, 2001
    Date of Patent: January 12, 2010
    Assignee: Sharp Laboratories of America, Inc.
    Inventors: Petrus Van Beek, Muhammed Ibrahim Sezan, George R. Borden, IV
  • Patent number: 7561740
    Abstract: A possible portion providing method involving receiving a pattern provided using a movable member, the pattern corresponding to a traveling path of the movable member, and the provided pattern corresponding to a portion of an intended sequence, determining at least one possible sequence which includes the received pattern, and displaying a remaining portion of the at least one possible sequence using a predefined font set for a plurality of components, the predefined font set including a record of a pattern corresponding to the traveling path of the movable member or another movable member for each of the plurality of components.
    Type: Grant
    Filed: December 10, 2004
    Date of Patent: July 14, 2009
    Assignee: Fuji Xerox Co., Ltd.
    Inventors: Laurent Denoue, Patrick Chiu
  • Patent number: 7404143
    Abstract: A single-roundtrip server-based spell checking service is provided. A client provides a composition to the server to be spell checked. The server performs a spell check on the composition and flags the words that are determined to be errors. The server may provide suggested corrections for the flagged words in the composition depending on the error. The suggested corrections are compressed before they are sent to the client. While a word may be misspelled several times throughout the composition, the suggested corrections for the word are only sent one time. Spell check options may also be set to help control the spell check operation.
    Type: Grant
    Filed: December 26, 2003
    Date of Patent: July 22, 2008
    Assignee: Microsoft Corporation
    Inventors: Jack Freelander, Shawn Derek Bracewell
  • Patent number: 6826317
    Abstract: A technology of the present invention is capable of objectively judging an ability of a proofreader who proofreads a digitized document by use OCR programs. A method of managing an ability of a proofreader who proofreads an electronic document generated from a recognition target document by executing a character auto recognition program, comprises a step of estimating a character count of potential mis-recognized characters contained in the electronic document, a step of detecting a mis-recognized character discover count as a mis-recognized character count with which the proofreader discovers the mis-recognized characters in the electronic document, a step of detecting a processing time spent for proofreading the electronic document, and a step of calculating a score relative to a proofreader ability based on a ratio of the potential mis-recognized character count to the mis-recognized character discover count per unit time.
    Type: Grant
    Filed: March 13, 2001
    Date of Patent: November 30, 2004
    Assignee: Fujitsu Limited
    Inventors: Akio Fujino, Yoitsu Nakade, Hitoshi Ozawa, Tsutomu Matsushita, Mariko Kita
  • Patent number: 6766069
    Abstract: A user-interface for selecting text from images of documents using auto-completion is described. The auto-completion process may be used to complete words (or text sequences), phrases, sentences, paragraphs, or other groupings of words. In response to user input, the OCR results for one or more images of documents are searched. The user input may include typing in a partial word (or the initial characters in a text sequence) via an input device or alternatively, annotations made by a user on a hardcopy document prior to scanning the document. One or more word matches are presented to the user for acceptance until the user accepts a word match or until all word matches have been presented to the user. Once a user accepts a word match, the word match is copied into an electronic document such as a word processing document, spreadsheet document, or other electronic document created by an application program.
    Type: Grant
    Filed: December 21, 1999
    Date of Patent: July 20, 2004
    Assignee: Xerox Corporation
    Inventors: Christopher R. Dance, William M. Newman, Alex S. Taylor, Stuart A. Taylor
  • Patent number: 6459810
    Abstract: An exemplary embodiment of the invention is a method for forming variant search strings. The method includes receiving a search string and parsing the search string to locate a mistaken search string character. A mistaken search string character is a character which is confused with other characters. A variant search string is formed in response to a presence of a mistaken search string character in the search string. The search string and variant search string may then be used to search a database. Another exemplary embodiment of the invention is a system for forming variant search strings. The system includes a user interface for receiving a search string. A variant search string generator parses the search string to locate a mistaken search string character. The mistaken search string character is a character which is confused with other characters. The variant search string generator forms a variant search string in response to a presence of a mistaken search string character in the search string.
    Type: Grant
    Filed: September 3, 1999
    Date of Patent: October 1, 2002
    Assignee: International Business Machines Corporation
    Inventor: Christopher T. Cring
  • Patent number: 6269188
    Abstract: The present invention is a computer-implemented method for calculating word accuracy. Word grouping accuracy values (260) are calculated (212) by using the character accuracy values (250) calculated by an OCR program present in a computer system. The present invention preferably uses these character accuracy values (250) to create a word grouping accuracy value (260). Various methods are employed to calculate the word accuracy (260), including binarizing the character accuracy values (250), modified averaging of the character accuracy values (250), and creating fuzzy visual displays of word grouping accuracy values (260). The calculated word grouping accuracy values (260) are then adjusted based upon known OCR strengths and weaknesses, and based upon comparisons to stored word lists and the application of language rules. In a system with multiple character recognition techniques, the system can compare the accuracy values (260) of different versions of the word groupings to find the most accurate version.
    Type: Grant
    Filed: March 12, 1998
    Date of Patent: July 31, 2001
    Assignee: Canon Kabushiki Kaisha
    Inventor: Hamadi Jamali
  • Patent number: 6212299
    Abstract: A document with a plurality of characters is read, a binary document image is produced, and a character rectangle circumscribed about a mass of black pixels connected with each other (called a black-pixel mass) is produced for each black-pixel mass. The character rectangles are classified into a plurality of groups on condition that one or more character rectangles in one group are circumscribed about one or more black-pixel masses having the same character pattern. The character rectangles in each group are circumscribed about images of the same character. Thereafter, a figure feature of a representative character image in each classified group of character rectangles is compared with each of referential character patterns. Therefore, the character images for the character rectangles circumscribing one of non-separating characters are recognized as one non-separating character.
    Type: Grant
    Filed: March 12, 1997
    Date of Patent: April 3, 2001
    Assignee: Matsushita Electric Industrial Co., Ltd.
    Inventor: Ryoichi Yuge
  • Patent number: 6154579
    Abstract: A method and apparatus for correcting misrecognized words appearing in electronic documents that have been generated by scanning an original document in accordance with an optical character recognition ("OCR") technique. If an incorrect word is found in the electronic document, the present invention generates at least one reference word and selects the reference word that is the most likely correct replacement for the incorrect word. This selection is accomplished by performing a probabilistic determination that assigns to each reference word a replacement word recognition probability. The probabilistic determination is carried out on the basis of a pre-stored confusion matrix that stores a plurality of probability values. The confusion matrix is used to associate each character of recognized word in the electronic document with a corresponding character of a word in the original document on the basis of these probability values.
    Type: Grant
    Filed: August 11, 1997
    Date of Patent: November 28, 2000
    Assignee: AT&T Corp.
    Inventor: Randy G. Goldberg
  • Patent number: 6041141
    Abstract: There is disclosed a character recognition machine adapted to recognize Japanese characters such as kanjis and kanas. The machine comprises a character string storage portion, a character extraction portion, a character recognition portion, and a language processing portion. A character string to be recognized is stored as an image in the storage portion. The character extraction portion comprises a network consisting a plurality of interconnected operators each of which has numerous inputs and outputs. An evaluation function which assumes its minimum value when a character extraction produces the best results is calculated by the operators simultaneously so as to minimize the value of the function. The character recognition portion calculates degrees of similarity of a character pattern to various character categories, the character pattern being applied from the character extraction portion.
    Type: Grant
    Filed: August 10, 1995
    Date of Patent: March 21, 2000
    Assignee: Matsushita Electric Industrial Co., Ltd.
    Inventors: Hiroshi Yamamoto, Hisao Niwa, Yoshihiro Kojima, Susumu Maruno, Kazuhiro Kayashima, Toshiyuki Kouda, Hidetsugu Maekawa, Satoru Ito, Yasuharu Shimeki
  • Patent number: 5960114
    Abstract: A process for identifying and capturing text comprising the steps of identifying delimiters in the text, selecting delimiters from the identified delimiters to be delimiters to the left and right of the selected text, indicating only one character of the text between the left and right delimiters, and automatically blocking and capturing the text having the indicated character. In an alternate embodiment, the process comprises the steps of identifying delimiters in the text that are to the left and to the right of a cursor and identifying the position of the delimiters relative to the cursor, specifying at least one particular delimiter position relative to the cursor, indicating only one character of the text between the cursor and the specified delimiter position, and automatically blocking and capturing the text having the indicated character.
    Type: Grant
    Filed: October 28, 1996
    Date of Patent: September 28, 1999
    Assignee: International Business Machines Corporation
    Inventors: Norman J. Dauerer, Donato O. Forlenza, Edward E. Kelley, Franco Motika
  • Patent number: 5933531
    Abstract: An optical character recognition method and system are provided, employing context analysis and operator input, alternatively and in combination, on the same batch of documents. After automatic character recognition, the context analyzer processes the fields that are good enough to expect resolution. This will accept as many fields as possible without any operator intervention. For some other fields, the process uses operator input to certify the character-level OCR result of, or to enter, a certain percentage of the characters, so that context analysis may accept some of the remaining fields. If the context analyzer successfully identifies a small set of very close hypotheses, the process asks the operator to certify one or two characters to resolve the ambiguity between the hypotheses. For the fields that are still not resolved, the fields and the hypotheses are shown to the operator for acceptance, correction, or entry.
    Type: Grant
    Filed: August 23, 1996
    Date of Patent: August 3, 1999
    Assignee: International Business Machines Corporation
    Inventor: Raymond Amand Lorie
  • Patent number: 5917941
    Abstract: After each complete stroke in a handwriting recognition process, a hypothesis is generated whether a word break is present between the previous stroke and the new stroke. This hypothesis is weighted with a probability of a word-break occurring between the strokes. This probability is determined from the geometrical relationships between characters. Subsequently, a word search is carried out on the basis of these weighted hypotheses, to identity the most likely candidates for the words represented by the written strokes. A user interface is provided that offers the user a limited list of alternative word recognitions for a group of characters. These recognitions undergo segmentation filtering, in accordance with the word breaks of the selected hypotheses, to present the user with only those alternatives having the same groupings of strokes.
    Type: Grant
    Filed: August 8, 1995
    Date of Patent: June 29, 1999
    Assignee: Apple Computer, Inc.
    Inventors: Brandyn Webb, Larry S. Yaeger
  • Patent number: 5905811
    Abstract: When texts recognized by an OCR are registered and those texts are searched by a search word, a state in which the search cannot be performed depending on an error recognition at the time of the recognition by the OCR is eliminated. It is an object of the invention to realize a process such that no burden is exerted on an operator or an apparatus by the above state. There are provided an OCR processor for recognizing stored image information and outputting a recognition result while switching the number of candidate characters to be outputted as a recognition result in accordance with a degree of a likelihood; and a document searcher for forming character trains for search from the recognition result and for registering as a search file.
    Type: Grant
    Filed: June 15, 1995
    Date of Patent: May 18, 1999
    Assignee: Canon Kabushiki Kaisha
    Inventors: Hirotaka Shiiyama, Katsumi Masaki
  • Patent number: 5835634
    Abstract: A bitmap comparison technique that is able quickly to compare two bitmap images while discounting differences between the images likely due to noise. The bitmap comparison technique includes the operations of: comparing the first and second bitmaps, producing a difference map identifying differing bits between the first and second bitmaps, producing outline masks based on the outlines of the first and second bitmaps, identifying certain very different bits within the difference map that are to be weighted differently from the remaining bits within the difference map based upon the outline mask, and determining a comparison score to indicate the extent to which the first and second images differ by differently weighting the very different bits and the remaining bits. The certain bits are normally weighted to a lesser extent than the remaining bits when determining the comparison score so that the influence of noise in the comparison score is diminished.
    Type: Grant
    Filed: May 31, 1996
    Date of Patent: November 10, 1998
    Assignee: Adobe Systems Incorporated
    Inventor: Kenneth A. Abrams