Character normalization using an elliptical sampling window for optical character recognition

- Eastman Kodak Company

In character normalization for optical character recognition preprocessing employing horizontal and vertical scaling factors A and B, the value of each pixel in the normalized character image superimposed on the original character image is determined from the values of the pixels in the original image lying within an elliptical neighborhood surrounding the normalized character image pixel and having major and minor elliptical axes proportional to A and B.

Skip to: Description  ·  Claims  ·  References Cited  · Patent History  ·  Patent History
Description
BACKGROUND OF THE INVENTION

1. Technical Field

The invention is related to optical character recognition and in particular to the normalization of characters of different sizes to a uniform size in optical character recognition.

2. Problem to be Solved by the Invention

Optical character recognition systems which recognize printed characters generally require that the sizes of all of the characters to be recognized be the same uniform size. Quite often, however, a text which is to be processed by an optical character recognition system contains printed characters of various point sizes. In order for the system to process such a text, character normalization must be performed on each character whose size is different from the desired size in order to change each such character to the desired size. Each character image is first separated from the other images in the text prior to character recognition. Then, the normalization process changes (if necessary) the character size to the correct size. In many cases, it is necessary to scale the character image to make it "fatter" or "skinnier" relative to its original aspect ratio. Thus, it is required to control both the size and the proportion (aspect ratio) of each character prior to character recognition processing.

It is desirable to preserve the strokes in each character without loosing any due to the character normalization process. Thus, for example, if the character size is to be reduced, the number of pixels representing the reduced character will necessarily be reduced. The problem is how to reduce the image size while minimizing the amount of lost character stroke information. If the pixels in the new (reduced) image are simply taken from the pixels in corresponding locations in the original image, then the remaining pixels will be discarded and the information they represent will be irretrievably lost.

In accordance with one goal of the invention, the reduction in the number of pixels representing a character of reduced size is compensated by computing each pixel in the new (reduced) image based upon the values of a local neighborhood of pixels surrounding the corresponding location in the old image. This minimizes the information lost through a reduction in the number of pixels representing each character. This however raises the another problem, namely, how to define the neighborhood of pixels in the old image which are to be considered in computing the pixel in the new image. This problem is particularly acute where the character size as well as its shape (aspect ratio) must be changed. One solution that may be tried is to define the neighborhood as a rectangle whose proportion reflects the ratio of the horizontal and vertical scale factors by which the character size must be reduced. Such an approach has been suggested, but not particularly for normalizing individual character images, in U.S. Pat. No. 4,725,892 to Suzuki et al. In fact, if applied to optical character recognition, such an approach would create other problems. Specifically, the use of the rectangular sampling window can create false character strokes in the reduced image.

In summary, character normalization involving a size reduction and aspect ratio change has created one of two problems. If each pixel in the new (reduced) image is taken only from the pixel in the corresponding position in the old image, then the remaining pixels in the old image are discarded and their character stroke information is irretrievably lost. On the other hand, if each pixel in the new (reduced) image is taken from all of the pixels lying in a neighborhood surrounding the corresponding location in the old image, then false character strokes may be introduced into the new image.

Accordingly, it is an object of the present invention to perform character normalization without discarding a significant amount of character stroke information and without introducing false character strokes into the normalized character image.

DISCLOSURE OF THE INVENTION

The invention is comprised in an optical character recognition system in which a text image of individual characters is separated into individual character images in an optical character recognition preprocessor, and the size and shape of each character is determined relative to a predetermined character size and shape (aspect ratio). The invention includes an optical character recognition character normalization processor which changes the size and/or shape (aspect ratio) of each individual character image whenever it significantly deviates from the predetermined size and/or shape, respectively. This change is specified by independent horizontal and vertical scaling factors. The value of each pixel in the new (normalized) image is determined by considering the value of each pixel in a neighborhood surrounding the corresponding location in the original character image.

In accordance with the invention, the shape of this neighborhood is automatically constrained in proportion to the horizontal and vertical scaling factors so as to optimally capture those pixels in the original image which are most relevant in determining the value of the corresponding pixel in the new (normalized) image while excluding those which are less relevant and therefore more likely to be misleading with respect to the current pixel location.

More specifically, in accordance with the invention, the shape of the neighborhood is automatically constrained be an ellipse whose horizontal (minor) axis corresponds to the horizontal scaling factor and whose vertical (major) axis corresponds to the vertical scaling factor.

In the preferred embodiment of the invention, an array of normalized character image pixel locations are superimposed in the original character image and have a pitch (or periodic spacing between adjacent pixels) with respect to the original image pixels equal to the horizontal scaling factor in the horizontal direction and equal to the vertical scaling factor in the vertical direction. An elliptical neighborhood is defined around each of these normalized character image pixel locations having an elliptical major axis spanning a number of pixels in the original image equal to the vertical scaling factor and an elliptical minor axis spanning a number of pixels in the original image equal to the horizontal scaling factor. Then, a determination is made as to which pixels in the original image lie within or on the boundary of the elliptical neighborhood. The value of the normalized character image pixel image is computed from the values of the pixels in the old image which lie within or on the border of the elliptical neighborhood. The normalized character image pixel values thus determined now comprise the "new" normalized character image.

The elliptical neighborhood sampling process of the invention has two advantages. First, including such a neighborhood of plural pixels minimizes the loss of character stroke information typically accompanying a reduction in character size. Second, the elliptical shape defines a group of pixels whose boundary is proportionately equidistant, within the limitations posed by differing horizontal and vertical scaling factors, from the location for which a normalized character image pixel value is to be computed. For example, if the horizontal and vertical scaling factors are equal, the ellipse reduces to a circle and the neighborhood boundary truly becomes equidistant from the normalized pixel location of interest. Thus, pixels which are disproportionately far away from the pixel location of interest are excluded. This stands in stark contrast to results obtained using a rectangular neighborhood, in which neighboring pixels lying within a corner of the rectangle--and which are therefore disproportionately far away--are included in the computation of the value of the new pixels. Such pixels are misleading with respect to the computation of the value of the current pixel location. As will be seen below in the detailed description, such a technique will often create a false character stroke in the new image. The present invention minimizes such mistakes.

BRIEF DESCRIPTION OF THE DRAWINGS

The invention is best understood by reference to the accompanying drawings, of which:

FIG. 1 is a simplified illustration of the pixels in an individual character image, showing an elliptical sampling window in accordance with the invention,

FIGS. 2a and 2b illustrate the character normalization results obtained using the invention;

FIGS. 3a and 3b illustrate the character normalization results obtained using a rectangular sampling window;

FIG. 4 is a simplified block diagram illustrating an optical character recognition embodying the invention;

FIG. 5 is a flow diagram illustrating the process of the invention performed in the system of FIG. 4; and

FIG. 6 is a graph illustrating some parameters in the flow diagram of FIG. 5.

MODES OF CARRYING OUT THE INVENTION

Referring to FIG. 1, an binary image 10 of an individual character 12 consists of an array 14 of individual pixels 16 (picture elements). The pixels 16 are arranged in horizontal rows parallel to an x-axis and in vertical columns parallel to a y-axis and are individually addressable by individual x-y coordinates. In such a binary image, each pixel is either "on" or "off", the "on" pixels 16a being the small circles. The individual character 12 is an outline of the pattern of the "on" pixels 16a. In the present example, the character 12 does not conform with a predetermined size and shape (aspect ratio) to which it must be reduced. The desired size and shape is depicted in the same scale as FIG. 1 in the pattern of "on" pixels 18a of FIG. 2a and the corresponding "normalized" character image 20 of FIG. 2b. The pixel densities (spacings between adjacent pixels) of FIGS. 1 and 2 are identical. The normalization process which transforms the character image 12 of FIG. 1 to the normalized character image of FIG. 2b requires (in this specific example) a reduction in size along the x-axis by a horizontal scaling factor of 6 and a reduction in size along the y-axis by a vertical scaling factor of 8. FIG. 1 shows the x'-y' axes of FIG. 2a stretched out to the scale of FIG. 1 and the corresponding pixels 18 of FIG. 2 superimposed in FIG. 1 as black dots. Note that the x' pixels 18 occur once for every six pixels along the x-axis of FIG. 1 and that the y' pixels occur once for every eight pixels along the y-axis of FIG. 1. Because the horizontal and vertical scaling factors in this example are integer numbers (i.e., six and eight), the "new" pixels 18 lie on top of the "old" pixels 16 in the stretched superimposed image of FIG. 1. However, it should be noted that these scaling factors may be any real numbers, including non-integers or irrational numbers, so that the "new" pixels 18 may lie anywhere in between adjacent ones of the old pixels 16, depending upon the value of the scaling factors chosen.

The problem in creating the normalized character image 20 of FIG. 2b is how to determine from the pattern of "on" pixels 16a of FIG. 1 which ones of the pixels 18 of the normalized image of FIG. 2a are to be turned "on". In accordance with the invention, this determination is made by defining an elliptical neighborhood 22 surrounding each one of the "new" pixels 18 superimposed in FIG. 1 and then determining which ones of the "old" pixels 16 lie within the elliptical neighborhood. The "new" pixel 18 is turned "on" if any one of the old pixels 16 lying within the elliptical neighborhood 22 is "on".

The elliptical neighborhood 22 is bounded by an ellipse 24 whose major axis precisely spans a number of "old" pixels 16 equal to the vertical scaling factor (in this example, eight) and whose minor axis precisely spans a number of "old" pixels 16 equal to the horizontal scaling factor (in this example, six). This ellipse may be thought of as being centered around successive ones of the "new" pixels 18 superimposed in FIG. 1 as a movable sampling window which captures those of the "old" pixels 16 which are most relevant to the determination of the value of the corresponding "new" pixel 18.

Introduction of False Character Strokes without the Invention

The advantage of defining such a sampling window is that the addition of false character strokes in the "new" or normalized character image of FIG. 2b is minimized or avoided. For example, consider the apparently spurious "on" pixel 16a' in the original image of FIG. 1 at x-y coordinates (+14, -11) and its two nearest neighbors. If a square sampling window is employed whose length in number of pixels is equal to the vertical scaling factor (or a rectangular sampling window is employed whose horizontal and vertical sides in pixel lengths are equal to the horizontal and vertical scaling factors, respectively), then the sampling window for the "new" pixel 18a' superimposed in FIG. 1 at x-y coordinate (+12, -7) will capture the "on" pixel 16a' at (+14, -11). This will cause the value of the "new" pixel 18a' to be "on", as illustrated in FIG. 3a, giving rise to a false character stroke 26 in the resulting character image illustrated in FIG. 3b.

The Invention Avoids False Character Strokes

The present invention avoids such an error, since the "spurious" pixel 16a' is outside of the elliptical neighborhood 22. At the same time, however, the elliptical neighborhood includes a significant plurality of "old" pixels 16 surrounding each "new" pixel location in the superimposed image of FIG. 1 so that a reliable decision is made regarding each "new" pixel.

An optical character recognition preprocessing system embodying the invention is illustrated in FIG. 4. A memory 30 stores an image of a text consisting of many individual character images. A character separation processor 32, which can use techniques well-known in the art, separates the text image from the memory 30 into many separate individual character images. Each of these individual character images is then processed in turn by a character normalization processor 34 employing the elliptical sampling window process described above in connection with FIG 1. The processor 34 does this by storing the original individual character image in a memory 35 and computing the pixel values of the normalized image from the image data stored in the memory 35 by executing special processing instructions stored in a read-only memory 36. As a result, each individual character image is normalized by the processor 34 to a predetermined size and proportion (aspect ratio) which is the same for all normalized characters. Each normalized character thus generated by the processor 34 is then transmitted to a character recognizer processor 38 which can use well-known techniques to correlate each normalized individual character image to one of a predetermined set of characters. The normalization process is necessary because, in general, character recognition processors such as the processor 38 require that each received individual character image be of a certain size and aspect ratio in order to function.

Operation of the normalization processor 34 is illustrated in FIG. 5. FIG. 5 is representative of the type of instructions stored in the read-only memory 36 and is an implementation of the concept described above in connection with FIG. 1. Operation begins with the character separation processor 32, using techniques well-known in the art, determining horizontal and vertical scaling factors for the current individual character image. Essentially, the character separation processor determines by how much the current individual character image deviates from the predetermined character size and aspect ratio. Of course, this function can be implemented in either the character separation processor 32 or in the normalization processor 34. The resulting horizontal and vertical scaling factors are stored as A and B, respectively, by the normalization processor 34 (block 40 of FIG. 5). A and B may be any real number, including fractions or irrational numbers, but in the example of FIG. 1, A=6 and B=8. The normalization processor 34 then initializes several parameters by setting a=A/2, b=B/2, Y=0, k=0 (block 42 of FIG. 5), and X=0, j= 0 (block 44), and x=X-a, y=Y-b (block 46). X and Y are the coordinates of successive ones of the "new" pixels 18 superimposed in the x-y coordinate system of "old" pixels of FIG. 1, j and k are the coordinates of the "new" pixels 18 in the "new" coordinate system of FIG. 2, while x and y are the coordinates of successive ones of the "old" pixels 16 in the coordinate system of FIG. 1. As in FIG. 1, the origins of the old and new coordinate systems are placed at the same point by initializing both (x,y) and (X,Y) at (0,0), although another alignment may be chosen. As will be seen and as depicted in the graph of FIG. 6, for each "new" pixel location (X,Y), a determination is made in this exemplary embodiment as to which one of the "old" pixel locations (x,y) within a rectangle between X+a and X-a and between Y+b and Y-b are within an ellipse centered at (X,Y) whose major and minor axes are 2b and 2a, respectively.

The foregoing is accomplished by determining whether the quantity (X-x).sup.2 +(Y-y).sup.2 is less than or equal to 1 (block 48). If so (YES branch of block 48), then a determination is made whether the "old" pixel 16 at the current value of (x,y) is "on" (block 50). If it is "on" (YES branch of block 50), then the "new" pixel at the location X=j and Y=k is assigned the value "on" (block 52). On the other hand, if the foregoing algebraic quantity is greater than one (NO branch of block 48) or if the "old" pixel was not "on" (NO branch of block 50), then x is incremented to the next "old" pixel location along the x axis of FIG. 1 (block 54) and it is determined whether the new value of x exceeds X+a (block 56). If not (NO branch of block 56) the process returns to the determination of block 48 and the succeeding steps previously described are repeated as x is incremented successively. Once x exceeds its maximum value (X+a) (YES branch of block 56), it is reset to its starting value (block 58), y is incremented to the next "old" pixel location along the y axis of FIG. 1 (block 60) and a determination is made whether y has exceeded its maximum value of Y+b (block 62). If not (NO branch of block 62), the process returns to the determination step of block 48 and the succeeding steps described above are repeated as x is successively incremented again.

As soon as y reaches its maximum value (YES branch of block 62), the elliptical sampling window 22 of FIG. 1 must be "stepped" to the location of the next "new" pixel 18 in the array of FIG. 1. Thus, X is incremented to X+A, the location of the next "new" pixel along the x-axis of FIG. 1, and j is incremented to j+1 (block 64). A determination is then made whether the new value of X exceeds the boundary of the array 10 of FIG. 1 (block 66). If not (NO branch of block 66), the process returns to the determination step of block 44, and the succeeding steps previously described are repeated as X is successively incremented. On the other hand, if the new value of X exceeds the array boundaries (YES branch of block 66), Y is incremented to Y+b, the location of the next "new" pixel along the y axis of FIG. 1, and k is incremented to k+1 (block 68). A determination is then made whether the new value of Y exceeds the boundaries of the arry 10 (block 70). If not (NO branch of block 70), the process returns to the initialization step of block 44 and the succeeding steps previously described are repeated as Y is successively incremented. As soon as Y reaches its maximum value, the construction of the normalized character image (FIGS. 2a and 2b) is completed.

The invention has been described in connection with an example in which the normalization required the character size to be reduced, so that A and B were integers greater than unity. However, either A or B or both A and B may be non-integers (e.g., 2.337 and 4.539) and either or both may be less than unity for character size reduction. Also, the size of the elliptical neighborhood may be adjusted so that successive sampling windows overlap or underlap to a greater extent than that of the exemplary embodiment described above. This is done by modifying the computation in block 48 of FIG. 5 to be:

(X-x).sup.2 /a.sup.2 +(Y-y).sup.2 /b.sup.2 <R or =R,

where R is any number not necessarily equal to one.

The invention has been described in connection with a binary image in which each pixel in the normalized image is either "on" or "off" depending upon whether any of the pixels in the corresponding elliptical neighborhood in the original image were "on". However, the invention is also useful in connection with normalizing a gray-scale image in which the value of each pixel in the normalized image is computed from a statistical ensemble (such as a mean value) of the pixel gray level of each of the pixels within the corresponding elliptical neighborhood in the original image.

Finally, the algorithm of FIG. 5 is just one of many approaches which may be used to implement the concept of the invention.

INDUSTRIAL ADVANTAGES AND APPLICABILITY

The invention is thus useful as a preprocessor to prepare data derived by scanning a document containing text character images for optical character recognition (OCR) processing. More specifically, the invention is useful in OCR preprocessing of documents containing text characters of different point sizes to change all of the characters to the same uniform size. This has the advantage of greatly simplifying the optical character recognition process ultimately applied to the preprocessed image of the document.

A program embodying the concept of the invention written in "C" language is attached hereto as an appendix to this specification as pages 15 through 19 hereof.

Accordingly, while the invention has been described in detail by specific reference to preferred embodiments thereof, it is understood that variations and modifications thereof may be made without departing from the true spirit and scope of the invention. ##SPC1##

Claims

1. In an optical character recognition system in which a text image comprising a plurality of characters is separated into plural individual character images, each of said individual character images comprising an array of original pixels of dimensions x.sub.max and y.sub.max in respective orthogonal directions, each of said pixels being one of a plurality of pixel values, a character normalization device for normalizing each of said individual character images to a size X.sub.max =Ax.sub.max and Y.sub.max =By.sub.max in said respective orthogonal directions in accordance with orthogonal normalization ratios A and B respectively, A and B being any real numbers, comprising:

means for defining a set of normalized image pixel locations, said normalized image pixel locations having a pitch relative to said original pixels in each of said orthogonal directions corresponding to said orthogonal normalization ratios A and B, respectively;
means for computing a value for each one of said normalized pixel locations from the values of the original pixels which lie within an elliptical neighborhood surrounding said one normalized pixel location, whereby to compute a set of normalized pixel location values defining a normalized individual character image.

2. The device of claim 1 wherein said elliptical neighborhood is centered on said one normalized pixel location.

3. The device of claim 2 wherein said elliptical neighborhood is characterized by major and minor axes whose lengths are approximately equal to the lengths spanned in said original image by A and B original pixels, respectively.

4. The device of claim 3 wherein said plurality of pixel values is two such that said image is a binary image, and wherein said means for computing assigns said one normalized pixel location a first value if the value of any of the original pixels lying within the corresponding elliptical neighborhood is said first value and otherwise assigns it a second value.

5. The device of claim 3 wherein the location of said one normalized pixel location in said original image is (X,Y), the location of a given one of said original pixels is (x,y) and said means for computing determines that said given one of said original pixels lies within said elliptical neighborhood if:

6. The device of claim 5 wherein R=1.

7. The device of claim 5 wherein, for each one of said elliptical neighborhoods, said given one of said original pixels is selected from a rectangle of pixels centered around said elliptical neighborhood, said rectangle characterized by sides of lengths A and B in said original image.

8. In an optical character recognition system in which a text image comprising a plurality of characters is separated into plural individual character images, each of said individual character images comprising an array of original pixels, each of said pixels characterized by one of a plurality of pixel values, a method for scaling each of said individual character images in accordance with orthogonal scaling ratios A and B comprising:

defining a set of normalized image pixel locations, said normalized image pixel locations having a pitch relative to said original pixels in each of said orthogonal directions corresponding to said orthogonal normalization ratios A and B, respectively;
computing a value for each one of said normalized pixel locations from the values of the original pixels which lie within an elliptical neighborhood surrounding said one normalized pixel location; and
transmitting the normalized pixel location values computed by said computing step as a normalized individual character image.

9. The method of claim 8 wherein said elliptical neighborhood is centered on said one normalized pixel location.

10. The method of claim 9 wherein said elliptical neighborhood is characterized by major and minor axes whose lengths are approximately equal to the lengths spanned in said original image by A and B original pixels, respectively.

11. The method of claim 10 wherein said plurality of pixel values is two such that said image is a binary image, and wherein said computing step comprises assigning said one normalized pixel location a first value if the value of any of the original pixels lying within the corresponding elliptical neighborhood is said first value and otherwise assigning it a second value.

12. The method of claim 10 wherein the location of said one normalized pixel location in said original image is (X,Y), the location of a given one of said original pixels is (x,y) and computing step comprises determining that said given one of said original pixels lies within said elliptical neighborhood if:

13. The method of claim 12 wherein R=1.

14. The method of claim 12 further comprising selecting said given one of said original pixels from a rectangle of pixels centered around said elliptical neighborhood, said rectangle characterized by sides of lengths A and B in said original image.

15. In a method for character normalization in optical character recognition in which the image of an individual character comprising an array of original pixels is normalized in accordance with horizontal and vertical scaling factors A and B to construct a normalized character image comprising an array of normalized pixels, the improvement comprising the steps of:

defining in said array of original pixels a plurality of elliptical neighborhoods each having a minor axis and a major axis, said minor and major axes having lengths corresponding to said horizontal and vertical scaling factors A and B respectively, said neighborhoods being located at periodic horizontal and vertical intervals in respective horizontal and vertical directions therein, said horizontal and vertical intervals corresponding to said horizontal and vertical scaling factors A and B, respectively; and
computing the value of each normalized pixel from a set of said original pixels lying within a corresponding one of said elliptical neighborhoods.

16. The method of claim 15 wherein said original and normalized pixel arrays represent bi-tonal images in which the value of each of said pixels is either "on" or "off", and wherein said computing step sets the value of each normalized pixel to "on" if any of the original pixels in the corresponding one of said elliptical neighborhoods is "on", and otherwise sets said value to "off".

Referenced Cited
U.S. Patent Documents
3784981 January 1974 Boroski, Jr.
3976982 August 24, 1976 Eiselen
4290084 September 15, 1981 Minshull et al.
4381547 April 26, 1983 Ejiri
4437122 March 13, 1984 Walsh et al.
4484347 November 20, 1984 Kashioka
4528693 July 9, 1985 Pearson et al.
4569081 February 4, 1986 Mintzer et al.
4680720 July 14, 1987 Yoshii et al.
4712140 December 8, 1987 Mintzer et al.
4725892 February 16, 1988 Suzuki et al.
4747154 May 24, 1988 Suzuki et al.
4771471 September 13, 1988 Kitamura
Patent History
Patent number: 4977602
Type: Grant
Filed: Nov 20, 1989
Date of Patent: Dec 11, 1990
Assignee: Eastman Kodak Company (Rochester, NY)
Inventor: Louis J. Beato (Rochester, NY)
Primary Examiner: Leo H. Boudreau
Assistant Examiner: Steven P. Fallon
Attorney: Dennis R. Arndt
Application Number: 7/439,222
Classifications
Current U.S. Class: 382/27; 382/47; 382/54; 358/433; 340/731
International Classification: G06K 956;