# System and method for scanned document correction

A system and method are provided for the correction of a warped page image. The method first accepts a camera image of a page, creates a filtered edge map, and identifies text-likely regions. The filtered edge map and text-likely regions are projected into a polar coordinate system to determine page lines and warped image page curves. An adaptive two-dimensional (2D) ruled mesh piecewise planar approximation of a warped page surface is created. A three-dimensional (3D) model is created using the adaptive 2D ruled mesh and the estimate of the camera focal length estimate. Using the 3D model, a 2D target mesh is created for rectifying the image of the page. In one aspect, the adaptive 2D ruled mesh is projected onto a 3D warped page surface using the estimated camera focal length and an estimated surface normal of each planar strip from the adaptive 2D ruled mesh.

## Latest Sharp Laboratories of America, Inc. Patents:

- Systems and methods for traffic offloading in multi-radio-access-technology networks
- Content initialization for enhancement layer coding
- System and method for selectively enabling inertial measurement unit (IMU) sensors
- System and method for docking an actively stabilized platform
- System and method for normalized focal length profiling

## Description

#### RELATED APPLICATION

This application is a Continuation-in-part of an application entitled, SYSTEM AND METHOD FOR NORMALIZED FOCAL LENGTH PROFILING, invented by Richard Campbell, Ser. No. 15/203,889, filed on Jul. 7, 2016, and published as US Patent Publication No. 2016/0314565, published on Oct. 27, 2016;

which is a Continuation-in-part of an application entitled, METHODS, SYSTEMS AND APPARATUS FOR CORRECTING PERSPECTIVE DISTORTION IN A DOCUMENT IMAGE, invented by Richard Campbell, Ser. No. 13/275,256, filed on Oct. 17, 2011, and issued as U.S. Pat. No. 9,390,342 on Jul. 12, 2016.

Both these application are incorporated herein by reference.

#### BACKGROUND OF THE INVENTION

Field of the Invention

This invention generally relates to camera-based scanners and, more particularly, to a system and method for correcting errors introduced in the process of document scanning.

Description of the Related Art

The scanning of a document with a handheld camera-enabled device is liable to introduce considerably more errors than if the same document was scanned, for example, on a dedicated flatbed device. Currently, most image correction techniques solve this problem by providing special backgrounds with known characteristics, active illumination, special lighting, a known camera pose, special hardware to hold book or minimize distortions, and/or multiple image captures in order to help estimate page geometry, simplify segmentation, and reduce shading artifacts. However, these techniques may introduce significant limitations on the use of the camera or distortion in the captured images.

It would be advantageous if a process existed for more accurately identifying document scanning errors related to focal length, page identification, and warping due to document binding.

#### SUMMARY OF THE INVENTION

Disclosed herein are a system and method that enable near scanner quality from camera captures of bound document content. The disclosed method identifies the page or pages of interest on an arbitrary background from ad hoc imaging locations, computes the 3D surface shape using multi-scale processing, and dewarps the image regions to produce images of each page, similar in quality to images obtained from scans of unbound content. This enables easy-to-use mobile scanning of bound documents and their reuse in document workflows, including optical character recognition (OCR), portable document format (pdf), and content extraction, etc. The disclosed method describes a multi-scale approach to flattening curved pages that often results from imaging bound content such as books and magazines. The method describes both single page and facing page detection and correction algorithms.

To detect the book page edges and gutters, the method algorithms first project the features into a polar coordinate system that restores parallel structures for the perspective distorted pages. The detection method uses both edge features and text region detection to determine page boundaries and the gutter. The page curves are then identified using filtering and grouping technique with a heuristic selection routine that weighs the contour completeness, curvature, and the fit to a smooth curve. Based on these criteria, top and bottom curves for each page are chosen to mark the extent of the page.

The top and bottom curves are then sent through a multi-scale curve refinement to better match the true curves. The multi-scale technique uses a Laplacian of Gaussian edge gradient at a larger image resolution to better localize the page contour edge using sub-pixel localization technique. The search for the edge is guided by the vertical vanishing point.

After curve refinement, a piecewise Bezier curve is fit to the points to smooth the data and provide a curve description better suited for user manipulation. In one aspect, only two Bezier segments are used to describe the data. In experiments it has been shown that fitting piecewise Bezier curves better models the data and the shape of the true curve using simple least-squares fitting routine and a C**1** continuity post process.

From the curves describing the extent and outline of the page, a ruled mesh is generated that can be used in interactive feedback routines or for the page flatting routines. An adaptive method to change the quantization of the piecewise planar approximation in areas of high surface warp is also presented. The adaption is derived from a polygonal simplification of the bounding curves. During the mesh generation process the curves are checked to trim any regions that exceed the page boundary.

The ruled mesh can also be used to flatten images of warped documents, but the process may need to consider the inferred 3D geometry in order to produce corrections with consistent content scales across the page. In order to infer this geometry, the camera focal length is estimated from the polygonal simplification of the page curves. The disclosed method includes several heuristic methods to handle numerical stability that can arise with preferred methods to handle the uncertainty and still provide acceptable results. Next, the warped page surface normal is estimated for a piecewise planar approximation. Then, the ruled mesh is intersected with the piecewise planes to produce a 3D model of the document surface (to a scale factor). Finally, the target mesh of the flatten page is estimated from the 3D model, and used in conjunction with the ruled mesh to map data from the input image to the flatten image. The result is high quality image corrections of warped document images (curl, skew, perspective) to flatten images as if they were scanned from a conventional flatbed scanner.

Accordingly, a method is provided for the correction of a warped page image. The method first accepts a camera image of a page, creates a filtered edge map, and identifies text-likely regions. The filtered edge map and text-likely regions are projected into a polar coordinate system to determine page lines and warped image page curves. In one aspect, a multi-scale edge localization refinement is performed for each page curve. In another aspect, the method independently fits a pair of warped image page curve sections to a corresponding pair of Bezier curves. In yet another aspect, an edge map is histogrammed into theta bins. Minimum and maximum theta angles are computed for text-likely regions, a first page line is localized as the closest significant edge bin less than minimum text-likely region theta angle, and a second page line is localized as the closest significant edge bin greater than maximum test-likely region theta angle. Thus, page gutters can be identified for images of adjacent pages, where the gutter is determined as the closest significant theta edge bin to the theta angle halfway between the minimum text-likely region theta angle and the maximum text-likely region angle.

Then, an adaptive two-dimensional (2D) ruled mesh piecewise planar approximation of a warped page surface is created. In one aspect, the adaptive 2D ruled mesh of the warped page surface is used to estimate the camera focal length. The method is able to create a three-dimensional (3D) model using the adaptive 2D ruled mesh and the estimate of the camera focal length estimate. If the adaptive 2D ruled mesh includes a plurality of planar strips, a camera focal length can be estimated for each planar strip. Using the 3D model, a 2D target mesh is created for rectifying the image of the page. In one aspect, the adaptive 2D ruled mesh is projected onto a 3D warped page surface using the estimated camera focal length and an estimated surface normal of each planar strip from the adaptive 2D ruled mesh.

Additional details of the above-described method and a system for the correction of a warped page image are provided below.

#### BRIEF DESCRIPTION OF THE DRAWINGS

**1**).

**1** to L**2**.

#### DETAILED DESCRIPTION

The parent applications mentioned above in the Related Applications Section include a description of low-level features, filtering, and contour selection processes. In this disclosure, the terms “contours” and “curves” are used interchangeably.

**100** comprises a non-transitory memory **102**, a processor **104**, and an interface **106** to accept a camera image of a page, and to supply a rectified image of the page. As shown, interface **106** is connected to camera **108** via line **110**. An image correction application **112** is stored in the memory **102** and enabled as a sequence of processor executable steps for performing the following steps:

1) creating a filtered edge map and identifying text-likely regions;

2) projecting the filtered edge map and text-likely regions into a polar coordinate system to determine page lines and warped image page curves;

3) creating an adaptive two-dimensional (2D) ruled mesh piecewise planar approximation of a warped page surface;

4) estimating a camera focal length;

5) creating a three-dimensional (3D) model using the adaptive 2D ruled mesh and camera focal length estimate; and,

6) using the 3D model, creating a 2D target mesh rectifying the image of the page.

The system **100** described above may employ a computer system with a bus **114** or other communication mechanism for communicating information between the processor **104**, memory **102**, and interface **106**. The memory **102** may include a main memory, such as a random access memory (RAM) or other dynamic storage device, coupled to the bus for storing information and instructions to be executed by processor. These memories may also be referred to as a computer-readable medium. The execution of the sequences of instructions contained in a computer-readable medium may cause the processor **104** to perform some or all of the steps associated with image correction. The practical implementation of such a computer system would be well known to one with skill in the art.

As used herein, the term “computer-readable medium” refers to any medium that participates in providing instructions to a processor for execution. Such a medium may take many forms, including but not limited to, non-volatile media, volatile media, and transmission media. Non-volatile media includes, for example, optical or magnetic disks. Volatile media includes dynamic memory. Common forms of computer-readable media include, for example, a floppy disk, a flexible disk, hard disk, magnetic tape, or any other magnetic medium, a CD-ROM, any other optical medium, a RAM, a PROM, and EPROM, a FLASH-EPROM, any other memory chip or cartridge, or any other medium from which a computer can read. As such, the image correction application may permanently stored in a computer-readable medium.

In one aspect, the above-described system is enabled as a handheld device, such as a smartphone. Other examples of a handheld device include a calculator, personal digital assistant (PDA), cell phone, tablet, or notebook computer. The interface **106** may be more than one interface, some examples of which include a modem, an Ethernet card, a computer bus protocol, which may include a back-side bus or peripheral component interconnect (PCI) for example, or any other appropriate data communications device such as USB. The physical communication links may be optical, wired, or wireless.

**1**). Shown are a single text-likely region **200** and edges **202***a*, **202***b*, **202***c*, and **202***d. *

In one aspect, subsequent to projecting the filtered edge map (e.g., edges **202***a*, **202***b*, **202***c*, and **202***d*) and text-likely regions (e.g., **200**) into the polar coordinate system (Step **2**), the image correction application performs a multi-scale edge localization refinement for each page curve. In turn, the image correction application performs the multi-scale refinement of the page curves through the following sub-steps:

transforming top and bottom page curves into point sets having a first resolution;

scaling the point sets to a second resolution, lower than the first resolution;

creating a gradient image of the second resolution point sets;

identifying zero crossings points in the gradient image as a page edge; and,

scaling the zero crossing points to the first resolution.

**2**), the image correction application independently fits a pair of warped image page curve sections to a corresponding pair of Bezier curves **300***a *and **300***b. *

**2**), the image correction application histograms an edge map **400** into theta bins **402**, computes minimum and maximum theta angles for text-likely regions **200**, and localizes a first page line **202***c *as the closest significant edge bin less than minimum text-likely region theta angle and a second page line **202***d *as the closest significant edge bin greater than maximum test-likely region theta angle.

**500** is determined as the closest significant theta edge bin **502** to the theta angle halfway between the minimum text-likely region theta angle **504** and the maximum text-likely region angle **506**.

**600** that are within the region defined by boundary lines L**1**, L**2**, and LC**1**. A bottom set is defined by curves **602** that are within the region defined by the boundary lines L**1**, L**2**, and LC**2**, where L**1** is a left boundary, L**2** is a right boundary, and LC**2** is a bottom boundary. The image correction application selects the top page curve as a top set curve **600** with highest completeness and lowest curvature between the boundary lines L**1** and L**2**, and selects the bottom page curve **602** as bottom set curve with the highest completeness and lowest curvature between the boundary lines L**1** and L**2**. For an image of adjacent pages, the result is that each page is divided into upper and lower curve sets **600** and **602** using boundary lines L**1** L**2**, LC**1**, LC**2**, and gutter **500**, with upper and lower curve lines being favored that join at the gutter.

**3**) with a plurality of planar strips, as shown, and independently estimates a camera focal length for each planar strip. For example, one planar strip is defined between points T**1**, T**2**, B**1**, and B**2**.

**4**) using the adaptive 2D ruled mesh of the warped page surface, details of which are provided below.

**5** the image correction application creates the 3D model by projecting the adaptive 2D ruled mesh onto a 3D warped page surface using the estimated camera focal length and an estimated surface normal of each planar strip from the adaptive 2D ruled mesh. The image correction application creates the 2D target mesh by determining the resolution of a 2D target mesh, in response to scaling 3D mesh width and height values, where the scale factor is determined by measuring an overall 3D page width and height and an input region width and height. The image of the page is rectified (Step **6**) using a perspective dewarp to interpolate image values from the adaptive 2D ruled mesh into the image defined by the 2D target rectilinear mesh, as explained in more detail below.

Detect Single Page Boundaries

To help resolve the ambiguity between significant content edges and boundary edges, the text region detection results described in the above-referenced parent applications are used to seed the selection of page edge boundaries. The text-likely regions denoted by rectangles roiThetaCornerQuads, computed using methods described in the above-referenced parent applications, represent regions within the image that contain strong text signatures. The extent of these regions are determined and then grown to identify the nearest edge boundary. In some cases, no edge region is identified before reaching the limits of the data. This can happen if the content boundary (cThetaMin, cThetaMax) encroaches on the edge regions because the whitespace between pages is small or because of linear structures near page boundaries. An error checking procedure identifies cases when the search exceeds data boundaries. The procedure then walks back into the data to identify the nearest edge region to the data boundary. If the new region identified by the newly identify edge (newThetaMinInterior, thetaMaxInterior) or (thetaMinInterior, newThetaMaxInterior) is still large (as defined by a percentage change in region size) then this region is used.

is within 0.4 to 0.6, for example, then the identified gutter is retained. Otherwise, the center prior is used. The result is then projected back into original (x,y) image space using one or more of the process workflows shown in

Identify Page Curves

The top and bottom page curves can be split up into multiple edge contours by the edge detection method or when noise or low background contrast exists. This, along with shadow or content contours near the true page contour, complicates the identification of the page curves from edge data.

The first process trims the input set of contours to only contain segments that exist between the boundary lines and lie within top and bottom curve regions. The top and bottom nomenclature is a convenient labeling of the opposing curve sets that match the illustrations and human norms for book bound content, but does not need to match the image capture conditions. In one aspect, the opposing sets general directionality is known to be left/right or top/down. This is done by conforming to capture norms of bound count for single and opposing page captures to fill the frame with the content of interest and utilize the best orientation of the camera to have the content fill the frame. This can be reinforced with user training on how best to capture content. Alternatively, the user can directly cue the readable orientation of the page through touch or menu/buttons. Alternatively, the method can automatically determine how best to capture content by examining the linear/curved structures and/or text font orientation for text that can be recognized.

The boundary lines bLines (see **1500** and **1502** of

**1**, L**2** are illustrated both with the long thin vertical lines and the thicker shorter horizontal lines. The extent of the shorter lines illustrates the marking of the content boundary by text region detection. The extent is defined by points p**0**, p**1** on L**1** and the points p**2**, p**3** on L**2**. Then, the line LC**1** is defined by the points ptLC**1**_**0**=0.9*p**0**+0.1*p**1** and ptLC**1**_2=0.9*p**2**+0.1*p**3** as a weighted average of the content boundary points, where p**0**, p**2** are closest to top and p**1**, p**3** are closest to bottom. Likewise, the line LC**2** is defined by the points ptLC**2**_0=0.9*p**1**+0.1*p**0** and ptLC**2**_2=0.9*p**3**+0.1*p**2** as a weighted average of the content boundary points. The lines LC**1** and LC**2** are illustrated in **1**, L**2** and content boundary LC**1**. While the bottom region is defined by boundary lines L**1**, L**2** and content boundary LC**2**.

**1700** labels the curves that span between the boundary lines, the second line **1702** labels curves that only intersect the first boundary (right), and the last line **1704** labels curves that only intersect the second boundary (left).

The lines L**1**, L**2**, LC**1** (**1** (top) and tContours**2** (bottom).

Additionally, the contours are marked with a complete label to indicate if the contour spans the space between L**1**, L**2**. This label is marked with reference designator **1700** in **1** but not L**2** then it is given leftlabel (**1704**) and if the curve only intersects L**2** but not L**1**, then it is given rightlabel (**1702**). These labels define subsets in the possible page contours to assist in search and generation of the best path.

In one aspect, the contour selection process jointly estimates the best path using labeled data from both top and bottom sets as well as the facing page if those sets also exist. Top and bottom paths that join or nearly join at the gutter for the bound document are favored over alternatives that are more disjoint.

In a different aspect, the top and bottom paths for each page are considered separately. Contours that are complete are favored over partial paths between the boundary lines. If a complete path exists, then the path with the lowest curvature is selected as the best path. If no complete path exists, the partial sets right and left are searched for a nearly complete path where completeness is measured as the distance between boundary lines, minus the distance between the contour endpoint and boundary line normalized by the distance between the boundary lines. Paths that are nearly complete are close to the value of 1. In one example, nearly complete paths are “1” if they have a completeness greater than or equal to 0.95. If multiple nearly complete paths exist, then the best path is the one with the lowest curvature. If no path has been found, then composite paths are checked built from right and left pairs. The quality measure tests each pair if the combined completeness (left plus right) is above a partial completeness threshold (e.g., 0.5) to denote a composite curve that spans a significant portion of the space between the boundary lines and, if composite completeness is less than a max threshold (e.g., 1.05), to check if a pair overlaps indicating a completeness greater than one. Of the pairs that meet these two thresholds, a quality measure is calculated as a product of the residual Bezier curve fit (points in both left and right curves) and the mean curvature of the pair over the combined completeness. The quality measure rewards pairs that can be fit to a simple smooth curve and with low curvature, but penalizes curves with lower completeness scores or more complex shapes. The pair with the lowest quality measure is selected as the best composite path.

The curvatures of the contours are tested to differentiate shadow contours from the true alternative when multiple solutions exist. The shadow contours are often more highly curved than the true contour. Finally, the top and bottom paths are converted into point sets traversing from L**1** to L**2**.

Multi-Scale Curve Refinement

For efficiency reasons the page boundary and curve are found at the smallest scale. Searching the smallest scale reduces the number of operations required to identify the boundary and minimizes the response from small structures, but results in a highly quantized representations of the page curves. When the curve points are rescaled to full resolution and used for correction, the result can have residual artifacts. To compensate for the small scale curve estimate a multi-scale refinement process is adopted. The result is a more accurate page curl correction with less residual artifacts while still maintaining computational efficiencies.

**1**Low and ptsC**2**Low, where C**1** represents the top curve in the previously presented illustrations and C**2** the bottom curve in the illustrations. The point set representations can have gaps if the edge map has gaps in the page curves due to noise or other factors. These gaps are filled by fitting the points to a curve after the multi-scale refinement.

The point sets ptsC**1**Low and ptsC**2**Low are integer point sets that match the small scale edge map. These point sets are scaled to the medium image scale, as described in the above-referenced parent applications, and stored as floating point values. The Laplacian of Gaussian (LoG) operator is applied to the medium scale luminance image imgLMed to produce a gradient image where the zero crossings represent edge locations. The LoG can be applied in multiple stages, or a Gaussian operator then Laplacian, or in one aspect the two kernels are convolved and only a single operator applied. Even though the refinement operation is done at the medium scale, the use of a sub-pixel edge location technique, such as finding the zero crossing, essentially produces a higher resolution estimate than the medium resolution image would initially indicate. In practice, this approach produces a noticeable improvement in the correction quality using the refined curve estimates.

Recall that LoG operator results in a positive and negative response to an edge as the operator iterates across an edge. The refinement technique uses the original estimate and the vanishing point intersected by page boundary and gutter to define a search region for both the positive and negative LoG response. If both the positive and negative response is found, then a simple linear interpolation between them is used to identify the zero crossing for sub-pixel accuracy of edge localization. If the search does not identify both the positive and negative responses, then original low resolution estimate is used.

In the current embodiment the refinement process uses a local search process to identify the positive and negative edge responses for each point. A search process is used because the refinement is run on skewed and warped data where the curves may not be aligned with the cardinal directions. For each point pt(k) in ptsMC**1** and ptsMC**2** a region ={i=nS+pt(k).y, . . . , pt(k).y−nS, j=nS+pt(k).x, . . . , pt(k).x−nS} surrounding the point is searched using a neighborhood defined by the parameter nS, for example, nS=3. Then for a point pt(k) in the set of priors ptsMC**1** or ptsMC**2**, the rectangular region is searched to identify the location and value of the minimum and maximum weighted wLoG(i,j) response given pt(k), denoted as pt_minLoG, pt_maxLog and minWLoG,maxWLoG respectively, where wLoG(i,j) is defined as imgMLoG(i,j)/(1+dLine) and dLine is the distance between the point (i,j) and the line from pt(k) to the vanishing point. The weight penalizes pixels that are farther from the line connecting the prior point estimate and the vanishing point. Conversely, the measure favors large LoG values close to the vanishing line. If minWLoG<0 and maxWLoG>0 indicate that min max LoG values have been found in the region, then the zero crossing is calculated as a linear interpolation between the min and max locations and added as the new value in ptsMC**1** or ptsMC**2**. Finally, the refined points are scaled to match the full image resolution or the resolution used in the correction routines to remove page warp, perspective distortion and skew.

Piecewise Bezier Curve Fit

In practice, it has been found that fitting a single Bezier curve to the points is insufficient to accurately model the shape of the true curve. For face-up camera scanning of bound content, the shape of the page typically conforms to a curve as illustrated in

The point sets are then used to fit a Bezier curve. The fitting process is run independently for each set. Because the fitting process is run independently, the curve continuity at the split point is only C**0**, which can result in visual crease at the split point. A post processing routine is applied to the Bezier control points after fitting process to insure C**1** continuity.

The result of the routine is a piecewise Bezier curve with 7 control points and C**1** continuity. The fitting process is applied for both the top and bottom page curves and for each page. Denoting the curves as Ctop and Cbottom for facing pages, the curves are further denoted as Ctop**1**, Cbottom**1** for left page and Ctop**2**, Cbottom**2** for right page.

Adaptive Ruled Mesh from Piecewise Bezier Curves

Returning to **1** and L**2**. The description of the curves generates the ruled mesh decomposition of the warped page. The ruled mesh is piecewise planar approximation to the three dimensional page surface, projected into the image plane. The ruled mesh assumes that the three dimensional page is a developable surface (a surface that can be flattened onto a plane) where at any point on the surface only one of the principal curvatures is non-zero.

The curves marked as **700** and **702** respectively denote the top and bottom page curves. The end points of these curves intersect the page boundary lines L**1**, L**2**, which intersect at the page's vanishing point. The vertical ruled lines intersect the curves and also intersect the vanishing point illustrated by the dashed lines **704** through **722**. Finally, the polygonal curves **724** through **738** are the straight line traces on the 3D surface that when distorted from page curl and projected fill in the mesh. The mesh illustrates the distortion of the flat page in the projected image and also illustrates the deviation from the desired flat page scan which would have produced a rectilinear grid.

The first step in generating the ruled mesh is to approximate the Bezier curves Ctop and Cbottom with a polygonal curve. The piecewise Bezier curves are first polygonized by generating points on the curve using as small delta T value to walk the curve representation. In one aspect, the value is 0.001. Then, the polygonal representation is simplified to reduce the number of points using OpenCV's implementation of Douglas Peucker polygon simplification algorithm. In one aspect, the fit epsilon is 0.5 pixels. The result is polygonal representation of each curve that approximates the curve with more points in areas of high curvature. These areas also correspond to areas of high curvature on the 3D surface for the page. The polygonal approximations approxCtop and approxCbottom to the curves are used to guide the adaptive subdivision of the surface where more planes are used in areas where the page contours are quickly changing, and less in other areas.

Returning to **1**, L**2**, as illustrated. This happens in most cases for documents with high gutter curvatures or for documents with shadows or noise near page contour. To compensate for this mis-estimation the curves are trimmed using the polygonal approximation by rejecting points on the polygonal approximation that lie outside of the region defined by the two boundary lines L**1** and L**2**. The result is a trimmed curves approxCtopTrim and/or approxCbottomTrim, such as the one illustrated by the dashed line. If the curve has been trimmed, which is indicated by the shorter length (number of points), then it replaces the polygonal approximation approxCtop=approxCtopTrim if length (approxCtopTrim)<length(approxCtop) and approxCbottom=approxCbottomTrim if length(approxCbottomTrim)<length(approxCbottom).

An approximation with more points tends to indicate a curve with more curvature and is used in the algorithm to guide the vertical subdivision of the ruled Mesh.

If (length(approxCtop)>length(approxCbottom)), then approxCtop is used to subdivide the mesh. Otherwise, approxCbottom is used. The vertical subdivision (**704** through **722** in

When the page's boundary lines L**1** and L**2** are nearly parallel, the calculations are modified to create the intersection lines using the point on the curve and the average direction of L**1** and L**2**. This is done to improve numerical stability when vanishing point is ill-defined or becomes a vanishing direction.

**1** to L**2**. Traces **724**-**738** are generated from L**1** to L**2**, by first uniformly quantizing L**1** between the two curves with equal line segments. In one aspect ten line segments are used. Then, the horizontal vanishing point for the first plane is calculated by using the lines defined by T**1**, T**2** and the line defined by B**1**, B**2** as illustrated. The intersection technique checks for ill-defined conditions which indicate that the lines are nearly parallel. When this is the case the vanishing point becomes a vanishing line whose direction is the average of the line directions. The lines **724**-**738** for the first plane defined by L**1** and first line **704** are calculated by intersecting the line **704** with the lines defined by joint points of the quantized L**1** line segments and the horizontal vanishing direction (as illustrated by the arrows). The process is repeated for each line pair starting with the horizontal line intersection points for the prior vertical line until L**2** is reached.

The process to estimate the camera focal length uses the planar strips defined by the ruled mesh to produce multiple estimates for the focal length. The normal estimated by the planar strips can have numerical stability issues because of the inaccuracy of the horizontal vanishing point estimated from the narrow planar strip. To compensate for the possible inaccuracy, the focal length is estimated for each planar strip. For each strip the horizontal line segments in the curve approximation, as illustrated respectively by the **2002** and **2002** in **1**,T**2** and B**1**,B**2** for first strip, are checked to see if they are nearly parallel indicating that numerical stability issues will arise for vanishing point estimate. In one example, if the line directions are within 0.5 degrees, then the lines are considered parallel. In addition to the direction test, the calculated vanishing point is checked to insure all the values are finite. If the lines are nearly parallel or ill-conditioned, then the strip is ignored for estimating the focal length. However, if the horizontal vanishing point exits and vertical vanishing point and the dot product between the vanishing points is negative, then a focal length estimate is calculated and cached in a list of estimates. The vector form for calculating the focal length is used f=(−(vpV·vpH))^{1/2 }where vpV and vpH correspond to vertical and horizontal vanishing points respectively. This is repeated for each planar strip in the ruled mesh. If the list of estimates is empty, then a default focal length is used. In one example, the default focal length is determined to be the focal length that matches a 60 degree horizontal field-of-view. In an alternative aspect, the default focal length can be the focal length determined from prior capture or the average focal length determined from prior estimations. If the focal length estimate list has valid entries, then the median value is chosen to be the focal length estimate. In an alternative aspect, the average focal length is used. Let fest denote the estimate focal length value in the analysis present below.

The normal for each strip is estimated next. For each strip k in the ruled mesh, the vector from the camera focal point to the vanishing points h=(vpH .x,vpH .y,fest) v=(vpV .x,vpV .y,fest) is generated. If the vanishing point exists, the same conditions mentioned in focal length estimate are used. Otherwise, the normal is assumed to be aligned with the optical axis (n_{k}=(0,0,1)). If h,v exist then the normal for the strip is defined as n_{k}=∥h×v∥. The normal values are checked to insure they are finite. If for some reason they are ill-conditioned, then the default normal (n_{k}=(0,0,1)) is used for the strip.

Once each strips' normal is estimated, then the ruled mesh points for each strip is projected out into space to build 3D surface of the warped page. For unconstrained single view observations of unknown originals, the dimensions of the document are not known and can only be reconstructed up to a scale factor. Because of this, the method chooses an arbitrary scale value by enforcing that the first planar strip must contain the point P**0**=(0, 0, 10*fest), see _{k}·pt+d_{k}=0, where n_{k }is the strip normal, pt=(x,y,z) is a point in the plane, and d_{k}=−P**0**·n_{k }for k=0 to choose the left most strip. Once the strip's planar equation is defined, the ruled mesh points q_{i,j}=(x_{i,j},y_{i,j}) for the strip i=0, . . . Max(i) and j=0,1 are transformed into rays r_{i,j}=(x_{i,j},y_{i,j},fest), which are then used to intersect the plane. The intersections compute the corresponding 3D mesh points p_{i,j }on the 3D piecewise planar document surface.

Once the first strip j=0 has been intersected, the adjacent strip j=1 is computed by insuring the adjacent plane j=1 intersects all the 3D mesh points on the boundary pi,j=1 i=0, . . . ,Max(i) between the two planes. To do this a 3D point from prior intersection is picked from the boundary to fit the implicit form and insure that the new plane contains the boundary points from prior strip intersections with the 3D piecewise planar document surface. In one aspect, the implicit form for each strip is n_{j}·pt+d_{1}=0, where n_{j }is the j's strip normal, pt=(x,y,z) is a point in the plane, d_{j}=p_{0,j}·n_{j }for j=1 . . . Q, and Q=Max(j)−1 is the maximum index for the ruled mesh (the number of planar strips) minus one. Then, for each strip j, the ruled Mesh points q_{i,j+1}=(x_{i,j+1},y_{i,j+1}) are used to define rays r_{i,j+1}=(x_{i,j+1},y_{i,j+1},fest) for i=0 . . . Max(i) and intersected the strip planar equation n_{j}·pt+d_{j}=0 to produce the points p_{i,j+1}. This process continues until all the rays defined by the ruled mesh points have been intersected with the corresponding plane equation to produce MeshPoints3D MP={p_{i,j}|i=0, . . . , Max(i) and j=0, . . . , Max(j). The mesh is a piecewise planar approximation to the warped document surface that preserves relative width and height of the content.

Then, the width and height of each cell in MeshPoint3D is calculated by computing the length of the top width_{i,j}=∥p_{i,j+1}−p_{i,j}∥ and left sides height_{i,j}=∥p_{i+1,j}−p_{i,j}∥, i=0, . . . , Max(i)−1 and j=0, . . . , Max(j)−1. The width and height arrays will be used as a basis for the 2D target mesh for the flatten page.

Then, to determine the scale factor for the target mesh the

of the arrays is computed.

In one aspect, the bounding box of the ruledMesh q_{i,j }is computed to determine the extent of the bounding box inW=max(q_{i,j}.x)−min(q_{i,j}.x) and inH=max(q_{i,j}.y)−min(q_{i,j}.y), where x,y correspond to the x or y component of the point. Then the scale factor is

In another aspect, inW and inH are determined jointly for the left and right pages. Being set as the same value insures both corrected pages will the same output resolution. In one aspect, inW and inH are determined jointly for all the pages scanned for a job to determine a fixed uniform output size for all the pages in a job.

Finally, the targetMesh is computed for the x values using the following form t_{i,j}.x=round(scalef*width_{0,j})+t_{i,j−1}.x for i=0, . . . , max(i) and j=1, . . . , max(j) and t_{i,0}.x=0 and y values using t_{i,j}.y=round(scalef*height_{i,0})+t_{i−1,j}.y for i=1, . . . , max(i) and j=0, . . . , max(j) and t_{0j}.x=0. The targetMesh is in units of pixels and is used in the image rectification routines to dewarp the image content.

The piecewise rectification routine maps each pixel in the quadrilateral defined by the ruledMesh to the corresponding region in the targetMesh using standard perspective transform methods. In one aspect, OpenCV's four point method getPerspectiveTransform is used to calculate the perspective transform matrix. Then, OpenCV's dewarp routine warpPerspective is used to interpolate the input image pixels to the dewarped image. In an alternative aspect, each strip in the ruledMesh is interpolated from the input image to the dewarped image using the quadrilateral defined by the strip in the ruledMesh and the corresponding rectangle in the targetMesh.

In summary, a multi-scale page flattening technique is disclosed with polar projected edge map and text detection for book page boundary and gutter detection. Book page contour selection is achieved via contour grouping and curve completeness and fitting metrics. The book multi-scale contour refinement uses opposing vanishing point and piecewise Bezier curves to describe book contours for correction and interaction. An adaptive ruled mesh uses polygonal simplification of bounding curves, and the camera's focal length is calculated from page curl. Piecewise planar approximations of page normals are calculated for warped pages, and the relative text height is preserved in dewarping using robustness handling mathematics.

**2200**.

Step **2202** accepts a camera image of a page. Typically, the focal length of the camera is unknown. Step **2204** creates a filtered edge map and identifies text-likely regions. Step **2206** projects the filtered edge map and text-likely regions into a polar coordinate system to determine page lines and warped image page curves. Step **2208** creates an adaptive 2D ruled mesh piecewise planar approximation of a warped page surface. Step **2210** estimates the camera focal length. Step **2212** creates a 3D model using the adaptive 2D ruled mesh and camera focal length estimate. Using the 3D model, Step **2214** creates a 2D target mesh rectifying the image of the page.

In a different aspect, subsequent to Step **2206**, Step **2207***a***1** histograms an edge map into theta bins, and computes minimum and maximum theta angles for text-likely regions. Step **2207***a***2** localizes a first page line as the closest significant edge bin less than minimum text-likely region theta angle and a second page line as the closest significant edge bin greater than maximum text-likely region theta angle. Step **2207***a***3** identifies page gutters for images of adjacent pages, where the gutter is determined as the closest significant theta edge bin to the theta angle halfway between the minimum text-likely region theta angle and the maximum text-likely region angle.

In another aspect, Step **2207***a***4** filters curves identified as not being between page boundary lines, and divides remaining curves into a top set defined by curves that are within the region defined by boundary lines L**1**, L**2**, and LC**1**, and a bottom set defined by curves that are within the region defined by the boundary lines L**1**, L**2**, and LC**2**, where L**1** is a left boundary, L**2** is a right boundary, and LC**2** is a bottom boundary. Step **2207***a***5** selects the top page curve as a top set curve with highest completeness and lowest curvature between the boundary lines L**1** and L**2**. Step **2207***a***6** selects the bottom page curve as bottom set curve with the highest completeness and lowest curvature between the boundary lines L**1** and L**2**. Step **2207***a***7** processes an image of adjacent pages, where each page is divided into upper and lower curve sets using boundary lines and gutter, and upper and lower curve lines are favored that join at the gutter.

In one aspect, subsequent to Step **2207***a***7**, Step **2207***b***1** performs a multi-scale edge localization refinement for each page curve using the following substeps. Step **2207***b***2** transforms top and bottom page curves into point sets having a first resolution. Step **2207***b***3** scales the point sets to a second resolution, lower than the first resolution. Step **2207***b***4** creates a gradient image of the second resolution point sets. Step **2207***b***5** identifies zero crossings points in the gradient image as a page edge. Step **2207***b***6** scales the zero crossing points to the first resolution.

In another aspect, subsequent to Step **2207***b***6**, Step **2207***b***7** independently fits a pair of warped image page curve sections to a corresponding pair of Bezier curves.

In one aspect, creating the adaptive 2D ruled mesh in Step **2208** includes creating a plurality of planar strips and estimating the camera focal length in Step **2210** includes independently estimating a camera focal length for each planar strip. Then, creating the 3D model in Step **2212** includes projecting the adaptive 2D ruled mesh onto a 3D warped page surface using the estimated camera focal length and an estimated surface normal of each planar strip from the adaptive 2D ruled mesh. In a different aspect, Step **2210** uses the adaptive 2D ruled mesh of the warped page surface to estimate the camera focal length.

For example, the adaptive 2D ruled mesh may be created using a polygonal approximation of the Bezier curves and polygonal simplification, with a fit tolerance (fitTH) of 0.5 pixels, to assist in generating the adaptive 2D ruled mesh.

In one aspect, creating the 2D target mesh in Step **2214** includes determining a resolution of a 2D target mesh in response to scaling 3D mesh width and height values, where the scale factor is determined by measuring an overall 3D page width and height and an input region width and height. In another aspect, rectifying the image of the page in Step **2214** includes using a perspective dewarp to interpolate image values from the adaptive 2D ruled mesh into the image defined by the 2D target rectilinear mesh.

A system and method have been provided for correcting a warped page camera image. Examples of particular mathematical techniques have been presented to illustrate the invention. However, the invention is not limited to merely these examples. Other variations and embodiments of the invention will occur to those skilled in the art.

## Claims

1. A method for the correction of a warped page image, the method comprising:

- accepting a camera image of a page;

- creating a filtered edge map and identifying text-likely regions;

- projecting the filtered edge map and text-likely regions into a polar coordinate system to determine page lines and warped image page curves;

- subsequent to projecting the filtered edge map and text-likely regions into the polar coordinate system, histogramming the filtered and projected edge map into theta bins, computing minimum and maximum theta angles for text-likely regions, localizing a first page line as a closest significant edge bin less than minimum text-likely region theta angle and a second page line as a closest significant edge bin greater than maximum text-likely region theta angle;

- creating an adaptive two-dimensional (2D) ruled mesh piecewise planar approximation of a warped page surface;

- using the created adaptive two-dimensional (2D) ruled mesh piecewise planar approximation of a warped page surface for estimating a camera focal length;

- creating a three-dimensional (3D) model using the adaptive 2D ruled mesh and camera focal length estimate; and,

- using the 3D model, creating a 2D target mesh rectifying the image of the page; wherein rectifying the image of the page includes using a perspective dewarp to interpolate image values from the adaptive 2D ruled mesh into the image defined by the 2D target rectilinear mesh.

2. The method of claim 1 further comprising:

- subsequent to projecting the filtered edge map and text-likely regions into the polar coordinate system, performing a multi-scale edge localization refinement for each determined page curve.

3. The method of claim 2 wherein performing the multi-scale refinement of the each determined page curves includes:

- transforming top and bottom page curves into point sets having a first resolution;

- scaling the point sets to a second resolution, lower than the first resolution;

- creating a gradient image of the second resolution point sets;

- identifying zero crossings points in the gradient image as a page edge; and,

- scaling the zero crossing points to the first resolution.

4. The method of claim 1 further comprising:

- subsequent to projecting the filtered edge map and text-likely regions into the polar coordinate system, independently fitting a pair of warped image page curve sections to a corresponding pair of Bezier curves.

5. The method of claim 1 wherein creating the adaptive 2D ruled mesh includes creating a plurality of planar strips; and,

- estimating a camera focal length includes independently estimating a camera focal length for each planar strip.

6. The method of claim 5 wherein creating the adaptive 2D ruled mesh includes using a polygonal approximation of Bezier curves and polygonal simplification, with a fit tolerance (fitTH) of 0.5 pixels, to assist in creating the adaptive 2D ruled mesh.

7. The method of claim 5 wherein creating the 3D model includes projecting the adaptive 2D ruled mesh onto a 3D warped page surface using the estimated camera focal length and an estimated surface normal of each planar strip from the adaptive 2D ruled mesh.

8. The method of claim 1 wherein estimating the camera focal length includes using the adaptive 2D ruled mesh of the warped page surface.

9. The method of claim 1 wherein creating the 2D target mesh includes determining a resolution of a 2D target mesh in response to scaling 3D mesh width and height values, where the scaling is determined by measuring an overall 3D page width and height and an input region width and height.

10. The method of claim 1, further comprising:

- identifying page gutters for images of adjacent pages, where each gutter is determined as the closest significant theta edge bin to a theta angle halfway between the minimum text-likely region theta angle and the maximum text-likely region angle.

11. The method of claim 1, further comprising determining a contour region of the page by:

- filtering curves not between page boundary lines, and dividing remaining curves into a top set defined by curves that are within the region defined by boundary lines L1, L2, and LC1, and a bottom set defined by curves that are within the region defined by the boundary lines L1, L2, and LC2, where L1 is a left boundary, L2 is a right boundary, and LC2 is a bottom boundary;

- selecting a top page curve as a top set curve with highest completeness and lowest curvature between the boundary lines L1 and L2; and,

- selecting a bottom page curve as bottom set curve with the highest completeness and lowest curvature between the boundary lines L1 and L2.

12. The method in claim 11 further comprising determining contour regions on two adjacent pages of a book, by:

- processing an image of adjacent pages, where each page is divided into upper and lower curve sets using boundary lines and gutter, and upper and lower curve lines are favored that join at the gutter.

13. A system for the correction of a warped page image, the system comprising:

- a non-transitory memory;

- a processor;

- an interface to accept a camera image of a page, and to supply a rectified image of the page;

- an image correction application stored in the memory and enabled as a sequence of processor executable steps for: creating a filtered edge map and identifying text-likely regions; projecting the filtered edge map and text-likely regions into a polar coordinate system to determine page lines and warped image page curves; subsequent to projecting the filtered edge map and text-likely regions into the polar coordinate system, histograms the filtered and projected edge map into theta bins, computes minimum and maximum theta angles for text-likely regions, and localizes a first page line as a closest significant edge bin less than minimum text-likely region theta angle and a second page line as a closest significant edge bin greater than maximum text-likely region theta angle; creating an adaptive two-dimensional (2D) ruled mesh piecewise planar approximation of a warped page surface; using the created adaptive two-dimensional (2D) ruled mesh piecewise planar approximation of a warped page surface for estimating a camera focal length; creating a three-dimensional (3D) model using the adaptive 2D ruled mesh and camera focal length estimate; and, using the 3D model, creating a 2D target mesh rectifying the image of the page; wherein the image correction application rectifies the image of the page using a perspective dewarp to interpolate image values from the adaptive 2D ruled mesh into the image defined by the 2D target rectilinear mesh.

14. The system of claim 13 wherein the image correction application, subsequent to projecting the filtered edge map and text-likely regions into the polar coordinate system, performs a multi-scale edge localization refinement for each determined page curve.

15. The system of claim 14 wherein the image correction application performs the multi-scale refinement of the determined page curves by:

- transforming top and bottom page curves into point sets having a first resolution;

- scaling the point sets to a second resolution, lower than the first resolution;

- creating a gradient image of the second resolution point sets;

- identifying zero crossings points in the gradient image as a page edge; and,

- scaling the zero crossing points to the first resolution.

16. The system of claim 13 wherein the image correction application, subsequent to projecting the filtered edge map and text-likely regions into the polar coordinate system, independently fits a pair of warped image page curve sections to a corresponding pair of Bezier curves.

17. The system of claim 13 wherein the image correction application creates the adaptive 2D ruled mesh with a plurality of planar strips, and independently estimates a camera focal length for each planar strip.

18. The system of claim 17 wherein the image correction application creates the adaptive 2D ruled mesh using a polygonal approximation of the Bezier curves and polygonal simplification, with a fit tolerance (fitTH) of 0.5 pixels, to assist in creating the adaptive 2D ruled mesh.

19. The system of claim 17 wherein the image correction application creates the 3D model by projecting the adaptive 2D ruled mesh onto a 3D warped page surface using the estimated camera focal length and an estimated surface normal of each planar strip from the adaptive 2D ruled mesh.

20. The system of claim 17 wherein the image correction application further creates the 2D target mesh with the plurality of planar strips by determining a resolution of a 2D target mesh in response to scaling 3D mesh width and height values, where the scaling is determined by measuring an overall 3D page width and height and an input region width and height.

21. The system of claim 13 wherein the image correction application estimates the camera focal length includes using the adaptive 2D ruled mesh of the warped page surface.

22. The system of claim 13 wherein the image correction application identifies page gutters for images of adjacent pages, where each gutter is determined as the closest significant theta edge bin to a theta angle halfway between the minimum text-likely region theta angle and the maximum text-likely region angle.

23. The system of claim 13, further comprising determining a contour region of the page by:

- when the image correction application filters curves not between page boundary lines, and divides remaining curves into a top set defined by curves that are within the region defined by boundary lines L1, L2, and LC1, and a bottom set defined by curves that are within the region defined by the boundary lines L1, L2, and LC2, where L1 is a left boundary, L2 is a right boundary, and LC2 is a bottom boundary; and,

- wherein the image correction application selects a top page curve as a top set curve with highest completeness and lowest curvature between the boundary lines L1 and L2, and selects a bottom page curve as bottom set curve with the highest completeness and lowest curvature between the boundary lines L1 and L2.

24. The system of claim 23, further comprising determining contour regions on two adjacent pages of a book, by:

- wherein when the image correction application processes an image of adjacent pages, where each page is divided into upper and lower curve sets using boundary lines and gutter, and upper and lower curve lines are favored that join at the gutter.

## Referenced Cited

#### U.S. Patent Documents

5616914 | April 1, 1997 | Matsuda |

5760925 | June 2, 1998 | Saund et al. |

5764383 | June 9, 1998 | Saund et al. |

5774237 | June 30, 1998 | Nako |

6330050 | December 11, 2001 | Takahashi et al. |

7001024 | February 21, 2006 | Kitaguchi et al. |

7283655 | October 16, 2007 | Cline |

7283983 | October 16, 2007 | Dooley |

7324706 | January 29, 2008 | Bassi |

7463772 | December 9, 2008 | Lefevere et al. |

8229227 | July 24, 2012 | Stojancic |

8285077 | October 9, 2012 | Fero et al. |

8405720 | March 26, 2013 | Gupta |

9066038 | June 23, 2015 | Okumura |

9214015 | December 15, 2015 | Xu |

9495587 | November 15, 2016 | Wilson |

9501692 | November 22, 2016 | Tyagi |

9716828 | July 25, 2017 | Yoo |

9742994 | August 22, 2017 | Agarwala |

20030198398 | October 23, 2003 | Guan et al. |

20040022451 | February 5, 2004 | Fujimoto et al. |

20050053304 | March 10, 2005 | Frei |

20100225937 | September 9, 2010 | Simske et al. |

20110211755 | September 1, 2011 | Xu |

20120202515 | August 9, 2012 | Hsu |

20120206778 | August 16, 2012 | Shiria et al. |

20120320427 | December 20, 2012 | Zheng et al. |

20130304399 | November 14, 2013 | Chen |

20150002642 | January 1, 2015 | Dressler |

20170372134 | December 28, 2017 | Zagaynov |

## Patent History

**Patent number**: 10289924

**Type:**Grant

**Filed**: Nov 22, 2016

**Date of Patent**: May 14, 2019

**Patent Publication Number**: 20170076169

**Assignee**: Sharp Laboratories of America, Inc. (Camas, WA)

**Inventor**: Richard John Campbell (Camas, WA)

**Primary Examiner**: Andrew M Moyer

**Assistant Examiner**: Dennis Rosario

**Application Number**: 15/359,314

## Classifications

**Current U.S. Class**:

**Tomography (e.g., Cat Scanner) (382/131)**

**International Classification**: G06T 5/00 (20060101); G06K 9/34 (20060101); G06K 9/00 (20060101); G06T 3/00 (20060101); G06K 9/32 (20060101); G06T 7/13 (20170101);