Patents Examined by Jack Choulks
  • Patent number: 5459739
    Abstract: Three OCR systems are employed for text conversion and the results generated from each of the three are merged using a edit distance algorithm to estimate a correct common text ancestor. To make the process computationally feasible for large strings such as pages of documentation with 3,000 characters, the method is executed in two stages. The first procedure is carried out with each page considered as a string of lines. Where differences exist using the edit distance between the lines on a page to find the optimal alignment of the lines. In the event that choice must be made among three non-null lines, the procedure then is invoked on the three lines , by using the edit distance between the characters on a line to find the optimal alignment. The number of computations required of the procedure is further reduced by comer-cutting that hueristically determines an upper bound on the edit distance and limits calculations to those which do not exceed the upper bound.
    Type: Grant
    Filed: March 18, 1992
    Date of Patent: October 17, 1995
    Assignee: OCLC Online Computer Library Center, Incorporated
    Inventors: John C. Handley, Thomas B. Hickey