DIGITAL IMAGE STABILIZATION DEVICE AND METHOD

Info

Publication number: 20120162449
Type: Application
Filed: Dec 7, 2011
Publication Date: Jun 28, 2012
Inventors: Matthias Braun (Mountain View, CA), SungSoo Park (Cupertino, CA)
Application Number: 13/313,626

Abstract

A method of Digital Image Stabilization (DIS) including a feature point sorting algorithm for selecting optimal feature points, and a computationally efficient tile-vector based Hierarchical Block-Matching search algorithm for deriving motion vectors of the selected feature points, and a feature point motion vector grouping/comparison algorithm for grouping the selected feature points based on magnitude ratio criteria and angle difference criteria.

Description

Description

CROSS-REFERENCE TO RELATED APPLICATION

This application claims priority under 35 U.S.C. §120 to U.S. Provisional Application No. 61/426,970, and 61/426,975, both filed in the U.S. Patent and Trademark Office on Dec. 23, 2010. The disclosures of both provisional applications are incorporated by reference herein.

TECHNICAL FIELD

The present inventive concept herein relates to digital image-stabilization (DIS), and more particularly, to a method of detecting, selecting and grouping feature points for digital image stabilization.

DISCUSSION OF THE ART

Digital cameras, digital video cameras and hand-held devices including such cameras capture are often employed to capture images or video while the camera is operated in the hand of a human operator. Thus, the video camera may be shaking or jittering in the operators hand while capturing the image or video. The jitter may include a horizontal component, a vertical component, and a rotational component. The rotation may be about an axis perpendicular to the focal plane of the image capturing circuit, or about an axis parallel to the focal plane of the image capturing circuit, or about an axis askew between a perpendicular axis and a parallel axis. The jitter may make the hand-captured video distracting or disorienting for the viewer, and thus it is desirable to use digital circuits to digitally estimate camera trajectory (i.e., the jitter as detected between each pair of consecutive frames) and to filter out the jitter from a sequence of video frames of the same scene. The circuits employed to estimate camera trajectory between consecutive video frames and to filter out the jitter caused by the camera's trajectory from a sequence of video frames may be contained within the video camera itself, and activated to remove in real time the jitter prior to storage of the captured video frames (e.g., prior to or during MEPG encoding if the video camera includes a real-time MPEG encoder). Alternatively the circuit employed to estimate camera trajectory between consecutive video frames and to filter out the jitter from a stored sequence of video frames may be a general purpose microcomputer controlled by software embodying a digital image stabilization (DIS) method, or may be a dedicated hardware, such as an MEPG video encoder embodied in an ASIC (application specific integrated circuit) optimized to perform a digital image stabilization (DIS) method.

The video produced by a steady, either stationary or moving video camera contains mainly smooth motions (translation, rotation) in the captured video. On the other hand, an unsteady video camera produces video with high frequency jitter (translational and/or rotational) throughout the video images.

Digital image sequences captured from physical imaging devices often display unwanted high frequency jittering motion. The amount of jittering motion present in an image sequence depends on the physics of the image capture device relative to objects in the captured sequence. The depth of the scene and the instability in the imager's mount, dependant on the mount's weight, inertia, balance, combine to create undesired jittery global motion.

A digital image stabilization (DIS) system first estimates unwanted (unintended) motion and then applies corrections to the image sequence. The visual effect of a stabilized video is highly dependent on the quality of camera trajectory estimation. Digital image stabilization (DIS) algorithms use well-tracked feature points to estimate the jittery motion between two consecutive frames. Digital video stabilization employs hardware and/or software methods for producing a spatially stabilized video from an otherwise unstable video containing unintended jerky motions caused by an unsteady video camera. In conventional DES technology, camera movement is detected by analyzing motion vectors of various points in the scene. But the motion vectors can be caused by object movements as well as camera movement.

There are functions that provide a numerical score for each pixel of the frame, indicating how suitable this point is as a feature point detectable in timewise adjacent frames. One example of such a function is the Harris Corner Detector. However, the magnitude of the feature points are typically very different for different parts of the image. DIS methods may employ a global threshold, to be compared with each pixel's numerical score, that does not necessarily result in an optimal distribution of feature points. Thus, there may be too few feature points in regions of low contrast (e.g., blue sky without any clouds causing sparse or no feature points), while in regions with a lot of structure, the feature points may be too close to one another. The misdistribution of feature points can then increase the computational burden of calculating redundant motion vectors of the feature points that are too close, and can fail to provide accurate motion vectors.

In an implementation of a digital image stabilization (DIS) method, it is desirable to minimize the computational overhead in order to reduce power consumption of the circuit and to reduce the time required to perform the DIS method. It is also desirable to detect and measure the camera's trajectory and characterize the jitter accurately so that the jitter may be correctly compensated for and correctly removed from the stored/displayed video.

In mathematics, affine geometry is the study of geometric properties which remain unchanged by affine transformations, i.e. non-singular linear transformations and translations. A mathematical system of equations defined by numerical coefficients, called an Affine matrix, has been developed to characterize the lateral (up/down), rotational, and scalar (e.g., zoom in or zoom out) of movement detected between each pair of consecutive frames or between portions thereof (e.g., moving objects in the frames).

Thus, the jitter may be characterized by a first Affine transform matrix related to any actually-stationary objects (e.g., rocks, tables, parked cars, mountains, the sun) in the scene, called a Principal Transform, or Global Transform, while any moving objects (e.g., birds, people, balls, moving cars) in the frame may be characterized by additional Affine matrices.

The Principal Transform (principle inter-frame transform) indicating camera motion that may be caused by the user's hand jitter may be computed by detecting one or more points of interest (called “Feature Points”) associated with the actually-stationary objects in each frame captured at time t, and then searching for the same Feature Points in a timewise adjacent frame (t+1), and computing a motion vector for each of the Feature Points. A plurality of motion vectors associated (grouped) with a particular object are then used to compute the Affine Transform of that object, which defines its detected motion according to the Affine equation:

x′=sx*x+ry*y+tx

y′=rx*x+sy*y+ty

Motion vectors of feature points between consecutive frames can be computed using various search methods employed in the field of video compression. Such search methods may employ a mathematical comparison of macroblocks, such as the sum of absolute differences (SAD), the mean absolute difference (MAD), or the mean square error (MSE), in two timewise adjacent frames (e.g., searching for the location of the feature point in a reference frame (t+1) by comparing the 8×8 pixel macroblock containing the feature point in the current frame with a plurality of 8×8 pixel macroblocks in a search area in the reference frame (t+1) centered about the location of the feature point). The measured amount and direction of the displacement of a macroblock centered about a feature point, between timewise adjacent frames (t and t+1), is called the “motion vector” of the feature point.

SUMMARY

An aspect of the invention provides a highly efficient process of identifying feature points, and of deriving motion vectors for the feature points that move in a coherent way because of global movement or camera movement, while at the same time being accurate for DIS purposes.

Good feature points for the DIS algorithm are points that yield non-ambiguous motion vectors when a suitable motion estimation algorithm is applied. To identify feature points in an image, a Harris Corner Detector applied to pixels of a video frame estimates how well suited this pixel is as a feature point. Different regions of the image have a different density of identified feature point candidates. A disclosed method of raster scan order selection and sorting provides a final feature point distribution based on small regions of the video frame, called tiles, where the maximum number of feature points grows linearly with the variance σ²of the luminance image data of the tile.

Each video frame is divided into a small number j×k of tiles. The number j×k of tiles can range from 4×4 for SD video to 6×6 or larger for HD video; other numbers in the range from (4.8)×(4.8) are also possible and may be beneficial. The tile size is chosen such that sufficiently large objects that move independently cover the majority of at least one tile, so that their motion can be captured for DIS purposes, while the motion of small objects is ignored.

Tiles having more interesting image data and therefore the need for more feature points are expected to have a higher variance σ². The feature point sorting algorithm finds a programmable minimum distance between feature points but requiring minimal hardware memory.

A hierarchical motion estimation algorithm may be used to estimate the feature point movement from frame to frame, where the programmable motion range for the later search levels is intentionally small, thereby preferring large-object or global movement over local movement. Consequently, the required number of operations is minimized, while the results are sufficiently accurate for digital image stabilization applications.

For each of the feature points that have been selected, e.g., by a sorting algorithm, its motion vector is determined by block matching within a small range of start vectors that are used. The start vectors are the tile motion vectors of the tile containing the current feature point and of the surrounding tiles (e.g., Up, Down, Left, Right). The tile motion estimation is the first step in the process of deriving motion vectors of the feature points. Tile motion estimation is done on the basis of non-overlapping tiles that cover the center portion of the input image (e.g., the same tiles used in the feature point sorting algorithm). For each of the tiles, a full blockmatching search is performed on a downsampled image.

The current frame is subsampled by a second subsampling factor f_s2of four to eight for standard definition (SD) video or eight to sixteen for high definition (HD) video. In this subsampled domain, a full-search block matching is done for every tile and the tile vector is stored for later use (e.g., as a start vector for deriving the motion vectors of the feature points). One motion vector will be estimated for every tile, doing a full search with the lowest-resolution, subsampled by second subsampling factor f_s2, of subsampled luminance data, and the motion vector candidate that yields the lowest SAD is assigned to each tile. According to an embodiment, for the border tiles, the search may be restricted to the available search area, thus no motion vector that causes the reference block to be (partially) outside the search area will be generated. Relative to the resolution used, the tile motion search will generate half-pel accurate vectors: The search area will be upsampled by simple bilinear interpolation. This uses only very little local memory, thus saving memory and logic area in a VLSI implementation.

Once the motion vector for a feature point has been determined, all feature-point related data is passed to the next DIS block, particularly the motion vector grouping block.

Exemplary embodiments of the inventive concept will be described below in more detail with reference to the accompanying drawings. The inventive concept may, however, be embodied in different forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the inventive concept to those skilled in the art. Like numbers refer to like elements throughout.

BRIEF DESCRIPTION OF THE DRAWINGS

The accompanying drawings are included to provide a further understanding of the inventive concept, and are incorporated in and constitute a part of this specification. The drawings illustrate exemplary embodiments of the inventive concept and, together with the description, serve to explain principles of the inventive concept. In the figures:

FIGS. 1A through 1F are views of a current frame and a reference frame, and selected Feature Points and motion vectors thereof, for illustrating in a method of Digital Image Stabilization in accordance with an exemplary embodiment of the inventive concept;

FIG. 2A is a diagram of the captured frame of FIG. 1E containing the current frame F_tof FIG. 1A, and divided into a border region and a plurality j×k of tiles in a core region, in accordance with a step in the DIS method illustrated in FIGS. 1A through 1F;

FIG. 2B is a diagram of one tile in the core region of the image frame of FIG. 2A illustrating selected, rejected, and cancelled Feature Points, in accordance with steps in the DIS method illustrated in FIGS. 1A through 1F;

FIG. 3 is a diagram of circuit blocks configured to perform DIS processes according to embodiments of the present inventive concept;

FIGS. 4A and 4B is flow chart of a method of identifying and selecting a plurality of Feature Points in each tile of the image frame of FIGS. 1A and 2A, for performing steps in the DIS method illustrated in FIGS. 1A through 1F;

FIG. 5 is a view of the current frame F_tof FIG. 1A downsampled with tile motion vectors superimposed thereon, for illustrating motion vector calculation steps in the DIS method illustrated in FIGS. 1A through 1F;

FIG. 6 is a view of a portion of a tile in the downsampled frame of FIG. 5 illustrating using the tile motion vectors of FIG. 5 as start vectors for a block matching search to calculate the motion vector of a selected feature point used in the DIS method illustrated in FIGS. 1A through 1F;

FIG. 7 is flow chart of a method of calculating the motion vectors of the selected feature points in the image frame of FIGS. 1A and 2A, for performing steps in the DIS method illustrated in FIGS. 1A through 1F;

FIG. 8A is a diagram of motion vectors of two feature points of the same actually-stationary object in a video scene at the same distance from the camera when the camera has only translational movement and no rotational component;

FIG. 8B is a diagram of motion vectors of two feature points of actually-stationary objects at the same distance from the camera when the camera has a rotational component;

FIG. 8C is a diagram of motion vectors of two feature points of the same actually-stationary object at different distances from the camera when the camera has only translational movement and no rotational component;

FIGS. 9A and 9B are diagrams of two pairs of motion vectors of feature points of actually-stationary objects in a video scene, for illustrating that each pair may have the same magnitude of vector difference even while the four motion vectors' directions and magnitudes are all different;

FIG. 10 shows three vector diagrams illustrating the calculation of normalized vector difference for the indirect measure of angle difference used for feature point grouping in the DIS method illustrated in FIGS. 1A through 1F;

FIG. 11, is a flow chart of a grouping algorithm using normalized vector difference of FIGS. 10A, 10B, 10C for the indirect measure of angle difference between the motion vectors of the selected feature points in the image frame of FIGS. 1A and 2A, for performing vector grouping step of FIG. 1D in the DIS method illustrated in FIGS. 1A through 1F;

FIG. 12 is a graph of the magnitude |(a−b)| of normalized vector difference (a−b) versus the magnitude ratio of normalized vector difference (a−b), as a function of angle difference θ, illustrating the availability of an approximation for use in a step of the grouping algorithm of FIG. 11; and

FIG. 13 is a block diagram of Feature Point Grouping Circuit comprising a Grouping Algorithm Circuit 1310 configured to perform the feature point grouping algorithm of FIG. 11.

DETAILED DESCRIPTION OF EXEMPLARY EMBODIMENTS

FIGS. 1A through 1F are views of a current frame and a reference frames, and selected Feature Points and motion vectors thereof, for illustrating the steps in a method of Digital Image Stabilization in accordance with an exemplary embodiment of the inventive concept.

FIG. 1A shows two consecutive video frames, a current frame F_tand a reference frame F_t+1, of a scene. The scene includes stationary objects, e.g., hills in the foreground, a pole, and mountains, and the sun, and a moving object, e.g., a bird in the top left. The current frame F_tand the reference frame F_t+1are portions of respective consecutive captured frames having a larger area (as shown in FIG. 1E). The larger captured frames are the raw images captured by an image sensor, before Digital Image Stabilization (DIS). The reference frame F_t+1is rotated and translated relative to current frame F_t, due to a camera trajectory caused by jittery motion. The dimensions of the captured frame (see FIG. 1E) is typically predetermined by the hardware dimensions of the physical image sensor (not shown) of the video camera. The dimensions of the current frame F_tand the reference frame F_t+1may be selected dynamically to avoid or minimize the occurrence of “overexcursions” of the current frame F_tbeyond the bounds of the captured frame due to jittery motion of the camera.

FIG. 1B shows a plurality of selected feature points (circles) in the current frame F_tassociated with the actually-stationary objects and the moving object in the scene. The current frame F_tis divided into a plurality of rectangular tiles, and each tile may comprise at least one selected feature point. The selected feature points shown in FIG. 1B may be identified and selected by performing the steps of a method illustrated in FIGS. 2A, 2B, and 4, and/or by the circuit of FIG. 3. It should be appreciated that the methods of the present inventive concept take into account that in some instances, one or more tile may not have any selected feature points, depending on the image pattern, but for purposes of describing the embodiments of the present inventive concept, at least one selected feature point is shown in each tile in FIG. 1B. The current frame and the reference frame are stored in the memory 350 of the circuit of FIG. 3 while the selected feature points shown in FIG. 1B are being identified and selected by performing the steps of the method illustrated in FIGS. 2A, 2B, and 4.

FIG. 1C shows each selected feature point of the current frame F_thaving a motion vector (arrows). The motion vectors of the selected feature points shown in FIG. 1C may be calculated by performing the steps of a method illustrated in FIGS. 6 and 7.

FIG. 1D shows that motion vectors in the scene have been grouped (e.g., group A, group B, group C). The motion vectors of actually-stationary objects in the scene (group B, and group C) were caused by camera-movement (e.g., jitter). The grouping of the motion vectors of the selected feature points as shown in FIG. 1D may be performed by the steps of the method illustrated in FIGS. 10A, 10 B and 10 C and FIG. 11 in which motion vectors are paired/grouped (included or excluded) based on a pairing algorithm using the magnitude ratio and normalized vector difference.

FIG. 1E shows the reference frame F_t+1in the greater context of larger captured frame output by an image sensor (not shown). The position of the reference frame F_t+1is determined by using the group B and group C motion vectors of actually-stationary objects as shown in FIG. 1D to define the affine coefficients of the reference frame F_t+1. The image data of the captured frame beyond the bounds of the reference frame may be made available to the circuits that performed the steps of a method illustrated in FIGS. 6 and 7 by which motion vectors of groups B and group C are calculated.

FIG. 1F shows the reference frame F_t+1in the position it would have been received by the image sensor (not shown) but for the jittery camera motion indicated by the motion vectors of actually-stationary objects as shown in FIG. 1D. The affine coefficients of the reference frame F_t+1have been applied by a compensation circuit (not shown) to rotate and translate the reference frame F_t+1to correct for the jittery motion of the camera.

Feature Point Identification, Sorting and Distribution

FIG. 2A is a diagram of the captured frame of the current frame F_t(see FIG. 1E) divided into a border region and a plurality of j×k of tiles in a core region, to facilitate feature point identification and sorting in accordance with a step in the DIS method illustrated in FIGS. 1A through 1F. The boundary lines between the border region and the core region may be predetermined by hardware or by software independently of received image data content, while the bounds of the current frame F_tmay be dynamically selected based on the degree of jittery camera motion indicated by received image data content, for example, so as to prevent or reduce over-excursions of the current frame. Thus, the core region may or may not correspond to the dimensions and position of the current image frame F_tshown in FIG. 1A.

Each captured video frame is divided into a small number of non-overlapping tiles (e.g., 4×4 tiles for Standard Definition and 6×6 or more tiles for High Definition), for the purpose of algorithmically selecting feature points providing a good feature point distribution suitable for digital image stablization. Different regions of the image may have a different density of suitable feature points. In extreme cases, a region of the frame may not have any suitable feature points, for example in the case of a blue sky without any clouds. In other regions, the potential feature points might be very dense. When a global-based threshold is used to identity and select all feature points, the feature points tend to be concentrated in small regions of the image, resulting in poor DIS results. It is still desirable to have more feature points in regions of the image where there is more structure, because there is potentially more interesting motion. In these dense regions another issue is how to ensure that not all feature points are lumped together. Thus an aspect of the present inventive concept provides an efficient method to ensure a minimum distance (MIN_DIST) between feature points to be used for DIS.

For the stability of the DIS algorithm, feature points are distributed as widely as possible, while at the same time limiting the total number of feature points. A “good distribution” of feature points can be expressed as follows: It has a large convex hull; feature points are not too close (MIN_DIST) to one another; in tiles with fewer suitable feature points, at least a minimum number (min_features) of feature points are chosen, if possible; and, in tiles having more of suitable feature points, more feature points (max_num_features=min_features+max_plus_features*(tile_variance σ²/total_variance)) are selected.

The maximum number of feature points in each tile (max_num_features) is determined based on the tile's luminance variance σ².

In one embodiment, the maximum number of feature points in each tile (max_num_features) is the sum of a programmable minimum number of feature points per tile (min_features), plus the programmable maximum number of additional feature points (max_plus_features), multiplied by the ratio of the variance σ²of the specific tile over the sum of the tile variances. A correction factor can be applied if the tiles have different sizes. Thus, the maximum number of finally selected feature points per tile may alternatively be min_features plus the part of var_features that is proportional to the tile variance σ², normalized by the corresponding tile weight. Border tiles may be given a higher weight because they include the border region and are therefore larger. In this alternative case, the maximum number of feature points for a given tile is calculated as follows:

$max_num_features = MIN_FEATURES + ⌊ VAR_FEATURES \frac{WEIGHT (tile) σ_{\hat{2} 56} (tile)}{\sum_{t \in tiles} WEIGHT (t) σ_{256}^{2} (t)} ⌋$

Thus, the maximum number of selected feature points (max_num_features) is not kept constant in all tiles, nor kept constant between frames F_tto frame F_t+1.

In one embodiment, the maximum number of feature points (max_num_features) in each tile is a function of the variance σ²of the luminance data in each tile divided by the overall luminance variance, requiring a prior calculation of the luminance variance σ²of each tile and the total variance of the frame. One ordinarily skilled in the art can readily appreciate that other functions are possible as well; for example, functions involving the average luminance value as well as the tile variance σ².

To identify feature points a corner detector such as a Harris Corner Detector or the like may be used. The Harris Corner Detector evaluates every pixel of the image as a possible feature point candidate. Preferred feature point candidates are points where the feature quality estimation function has a local maximum. The disclosed method of feature point selection optimizes the selection of feature points identified by the Harris Corner Detector by comparing the resulting value of each identified feature point (estimating how well suited this pixel is as a feature point) to a LOCAL rather than a GLOBAL (full-frame) threshold. Thus, the disclosed method takes into account feature point density at each local area and also differences in contrast in different parts of the frame.

The obtained feature point distribution is based on small regions of the video frame, (e.g. non-overlapping tiles), where the number of feature points in each tile increases linearly with the variance σ²of the luminance image data of the tile. Tiles with more interesting image data and therefore the need for more feature points are expected to have a higher variance σ².

FIGS. 4A AND 4B is a flow diagram that illustrates a method of maintaining a minimum distance (MIN_DIST) between feature points in each tile, while at the same time requiring only little local state information, thereby reducing the hardware implementation costs.

FIG. 2B is a diagram of one tile in the core region of the image frame of FIG. 2A, illustrating Selected (grey), Rejected (white), and previously-selected-but-Cancelled (grey but X'ed) feature points. The feature points shown as small squares in FIG. 2B have been identified as feature point candidates using the Harris Corner Detector algorithm and then sequentially selected, rejected or cancelled in raster scan order, in accordance with steps in the method illustrated in FIGS. 4A AND 4B.

For each tile a maximum number (max_num_features) of identified feature point candidates are selected. According to an embodiment of the present inventive concept, each identified feature point candidate can be selected, e.g., in raster scan order by:

i. Identified feature point candidates are pixels where the Harris Corner estimation function exceeds a programmable threshold and where this estimation has a local maximum. To qualify as a local maximum, the value at the location in question must be greater than the value of all direct and diagonal neighbors that precede this pixel in scan order, but only greater than or equal to the value of the direct and diagonal neighbors that follow this location in scan order. This is done to accommodate the fact that identical values are quite likely.

ii. Once a feature point candidate has been identified, it will be entered into a data storage structure (e.g., a sorted list, but other implementations are possible) that can hold a predetermined maximum number of feature point candidates for each tile, e.g., a maximum of 32, 48, 64, or higher finally-selected feature points, provided there is no feature point candidate with a higher estimation function value that is within the programmable lockout range (MIN_DIST). For purposes of illustration, a maximum of 32 is selected to describe the present embodiment.

iii. If a later-identified feature point candidate has been stored in the data structure, all other feature point candidates having a smaller estimation function value that are closer to this point than the lockout range (MIN_DIST) are removed from the data storage structure.

For purposes of illustration, suppose the predetermined maximum number of feature point candidates of tile (5,4) is four (i.e., max_num_features=four). As shown in FIG. 2A, tile (5,4) contains four finally-selected feature points (grey) SFP3, SFP4, SFP5 and SFP7 in raster scan order, and three previously-selected-but-Cancelled feature points (grey but X'ed) SFP1, SFP2, and SFP6, plus two rejected (never-selected) feature points (white). The cancelled previously-selected feature points (grey but X'ed) SFP1, SFP2, and SFP6 were feature point candidates that were selected as feature points in raster scan order during the progress of the method illustrated in FIGS. 4A AND 4B, but were subsequently cancelled as selected feature points either because they were within the exclusionary zone (MIN_DIST) of a larger feature point candidate that was later identified as a feature point candidate and selected, or because the list of selected feature points became full (i.e., the number of selected feature point candidates SFP_count=max_num_features) and the earlier-selected feature point was the smallest among the list of selected feature points and was smaller than a feature point candidate that was later identified and selected.

The cancelled previously-selected feature point SFP1 was the first feature point to be identified and selected in raster scan order in accordance with steps in the method illustrated in FIGS. 4A AND 4B. Later, cancelled previously-selected feature point SFP2 was identified and selected, but after SFP2 was selected, selected feature point SFP3 was identified and was larger than SFP2. Since SFP2 is within the exclusionary zone (MIN_DIST) of larger, selected feature point SFP3, SFP2 was immediately cancelled when SFP3 was selected. After SFP3 was selected, a feature point candidate was identified in the lower right corner of the exclusionary zone (MIN_DIST) of SFP3, and because that feature point candidate was smaller than SFP3 and within its exclusionary zone, it was immediately rejected (i.e., not selected). Then, a feature point candidate was identified below and just outside of the exclusionary zone (MIN_DIST) of SFP3, and it became selected as SFP4 (and was not afterwards cancelled). Then, a feature point candidate was identified further below and to the right of the exclusionary zone (MIN_DIST) of SFP3, and it became selected as SFP5 (and was not afterwards cancelled as it was close to but not within the exclusionary zone of SFP7). Then, a feature point candidate was identified below and to the right of the exclusionary zone (MIN_DIST) of SFP5, and it became selected as SFP6 (but was afterwards cancelled as it was within the exclusionary zone of larger later-selected feature point SFP7). When SFP6 became selected, the list of selected feature points was already “full” (e.g., the maximum number of feature points for this tile was four), and because SFP1 was the smallest among the list of then-selected feature points SFP1, SFP3, SFP4, and SFP5, and because SFP6 was larger than SFP1, SFP1 was cancelled. Then, a feature point candidate was identified below and within the exclusionary zone (MIN_DIST) of SFP6, and it became selected as SFP7 (because SFP6 was immediately cancelled because selected feature point SFP7 is larger than SPF6 and/or because the list was full etc.). Then, a feature point candidate was identified below and within the exclusionary zone (MIN_DIST) of SFP7, and it was rejected (not selected) because that last feature point candidate is smaller than SFP7. It is possible that SFP7 is actually smaller than cancelled SFP2 (if SFP3 is much larger than SFP7) but a good distribution of feature points has been obtained. The programmable lockout range (MIN_DIST) ensures that finally-selected feature points are not clustered too close together.

The pixel luminance variance σ²of each tile may be determined during the downscaling process in which each tile is downsampled. The maximum number feature points in each tile is determined as the sum of a programmable constant minimum number of feature points per tile plus the number of total variable feature points multiplied by the ratio of the variance σ²of the specific tile over the sum of the tile variances. A correction factor may be added for the area of the edge and corner tile regions, as the feature points can also be in the border region. For each tile, up to the maximum number of feature candidates are collected and stored using a sorting process, i.e., selecting, rejecting, canceling described above, for each feature point candidate identified in raster scan order. Last, the final-selected feature point candidates for each tile are simply the feature point candidates with the highest estimation function response, the maximum number of which has been predetermined. There may be instances where there are not enough feature point candidates available in a given tile, such as a tile of low contrast image data, in which case the resulting number of feature points finally used will be smaller than the programmed minimum number (e.g. a smaller number than min_features).

Thus, a method of processing feature point candidates in raster scan order is provided wherein a list comprising at most the calculated maximum number of selected feature points not clustered too close together is maintained even while more new feature point candidates may be later identified and selected. This raster scan order method of sorting feature points has the advantage of reducing the amount of memory and computation compared to various other methods of prioritizing and selecting from among identified feature point candidates. For example, in an alternative embodiment, all feature point candidates of a tile might be identified and stored in a large list stored in a memory, and then only after all the feature point candidates of a tile have been identified, a mathematical sorting algorithm might be applied to find the optimal set (of a predetermined maximum size) of the largest feature point candidates that are not within the exclusion zone (MIN_DIST) of any other member of the set. However, such a sorting algorithm requires more physical memory (to store the entire list of identified feature point candidates of a tile) and potentially requires more total computation than the raster-order sorting (selecting, rejecting, canceling) method of FIGS. 4A AND 4B, exemplary results of which are shown in FIG. 2B. The raster scan order sorting algorithm of FIGS. 4A AND 4B does not necessarily provide a set of selected feature points that is a global optimum, since a feature point candidate can be canceled from the list by a feature point candidate that is later selected but later cancelled itself, but rather provides an algorithm that can be implemented in hardware with limited local storage. Although the method of FIGS. 4A AND 4B is described as processing identified feature point candidates in “raster scan order” (i.e., from left to right and from top to bottom) which is the pixel order that Harris Corner Detector ordinarily proceeds, any sequence of selection of feature point candidates can be employed by the method, such as discontinuous sequences of non-adjacent feature point candidates, as long as all feature points are identified and are ultimately sorted sequentially.

FIG. 3 is a block diagram of Feature Point Circuit according to an embodiment of the present inventive concept. Feature Point Circuit 3000 comprising a Feature Point Selector 300 and a selected feature point (SFP) Motion-Vector Calculator 700 and a shared RAM Memory 350. The Feature Point Selector 300 comprises a Downsampler 310, a Feature Point Candidate Identifier 330, and a Feature Point Candidate Sorter 340.

The Feature Point Candidate Identifier 330 identifies feature point candidates using a Harris Corner Detector algorithm and outputs the identified feature points, e.g., in pixel locations and Harris Corner responses, in raster scan order one tile at a time, to the Feature Point Candidate Sorter 340. The Feature Point Candidate Sorter 340 is configured to perform the method of seriatim sorting of identified feature points of each tile of FIGS. 4A AND 4B as further illustrated in FIGS. 1B and 2B. The Downsampler 310 includes a Tile-Variance σ²Calculator 320 functional block that calculates the tile-variance σ²of each tile of the image frame,

$σ^{2} = \frac{Σ y^{2}}{N} - {(\frac{Σ y}{N})}^{2}$

where the y values are the luminance values within the tile and N is the number of pixels in the tile.

The circuit as shown in FIG. 3 may be implemented in a semiconductor chip, having input/output pins configured to receive image data from a camera having sensors that capture images and circuitry to convert the captured image into image data. Data processed by the circuit of FIG. 3 are output via the input/output pins to other components of the camera. As will be further described below, the memory 350 resides within the semiconductor chip and to minimize the size of the chip, the memory need to be small in physical size and therefore storage capacity is limited. According to another embodiment of the present inventive concept, to save computational power and to reduce the number of required operations, the Feature Point Selector 300 may operate only on luma data, which will be horizontally and vertically subsampled by the Downsampler 310 by a selectable factor f_s1of 2, 4, or 8. For the present embodiment, a factor f_s1of 4 is chosen. The f_s1downsampled luma data is used for feature point identification by the Feature Point Candidate Identifier 330, and in alternative embodiments may be later used for the feature point motion-vector estimation by the Hierarchical Block-Matching Search Unit 730 of the SFP Motion-Vector Calculator 700. According to still another embodiment of the present inventive concept more suitable for applications where hardware real estate, processing bandwidth, or power capacity is more available, an off-chip storage, such as an external SDRAM, may be provided to supplement the storage capacity of the on-chip memory.

While the smaller downsampled image is calculated by the Downsampler 310, the luminance variance (tile-variance) σ²of each tile is calculated, and global maxima of the smaller eigen-value of the 3×3 Harris corner matrix are identified. Both the tile offset, which is the coordinate of the upper left pixel of the upper left tile, and the tile pixel dimensions are preferably multiples of the largest subsampling factor (f_s2) used. It is also preferred that the image core region is centered in the overall image. Therefore, the width of the left border region is identical to the width of the right border region and the height of the upper border region is the same as the height of the lower border region. (see FIG. 2A)

Once the input frame luminance data has been subsampled and stored in the RAM memory 350, the Feature Point Candidate Identifier 330 reads it back in tile order and sequentially feeds identified feature point candidates into the Feature Point Candidate Sorter 340. For the feature point identification process of block 330, the statistics area of potential feature points in the tiles adjacent to the border area extends into the border area, and thus the pixels of each border region tile are processed together with the pixels of the adjacent tile. The pixel data is read within each tile in raster scan order: Lines from top to bottom, pixels within each line from left to right.

To process each tile, the Feature Point Candidate Identifier 330 needs three additional pixels on each internal tile border for feature point identification, using the Harris corner detector. Consequently, these pixels will be read more than once. Identified feature point candidates are pixels in each tile where the lower eigenvalue λ1 of the Harris matrix has a local maximum. To qualify as a local maximum, the corner response of the pixel in question must be greater than the corner response of the upper left, upper, upper right, and left neighbors and greater than or equal to the corner response of the right, lower left, lower, and lower right neighbors. With this definition, at least one point of a larger region with the same constant corner response will be identified as potential feature candidate. The detection logic for the local maxima will require two line buffers of corner responses. Points with a local corner response maximum are first compared with a programmable corner response threshold. If the corner response of the point in question is smaller than this threshold, it is ignored. Otherwise, the feature point's coordinates and its corner response are presented to the Feature Point Candidate Sorter 340.

The Feature Point Candidate Sorter 340 keeps track of up to max_num_features (e.g., 32) feature point candidates having the highest corner response in each tile, while simultaneously ensuring that all feature points have a minimum programmable distance (MIN_DIST) from one another. The distance used in the above algorithm between two points is defined as follows:

$dist ((\begin{matrix} x_{1} \\ y_{1} \end{matrix}) \cdot (\begin{matrix} x_{2} \\ y_{2} \end{matrix})) = \max (\langle x_{1} - x_{2} \rangle, \langle y_{1} - y_{2} \rangle)$

The sorting in the method of FIGS. 4A AND 4B is done with operations taking into account only the current contents of the sorter's list of selected feature points and of the incoming feature point candidate and making a decision right away. Therefore, the Feature Point Candidate Sorter 340 adapted to perform the method of FIGS. 4A AND 4B will not inherently calculate the global optimum, and the results will depend on the order in which the incoming feature point candidates are presented.

Feature Point Candidate Sorter 340 outputs the selected feature points seriatim and they are stored in a SPF list in a portion of the memory 350 of the circuit of FIG. 3

FIGS. 4A AND 4B is flow chart of a method of identifying and selecting a plurality of Feature Points in each tile of the image frame of FIGS. 1A and 2A, for performing steps in the DIS method illustrated in FIGS. 1A through 1F. The method begins with data input step S400 in which luminance data of a current frame F_tare received, followed by downsampling step S402. Initialization step S404 resets the tile counter value current_tile and the pixel counter current_pixel.

Next, Harris Corner Detector is performed (steps S406, SD408, and S410) in raster scan order upon each pixel of the downsampled luminance data of the current_tile as current_pixel as the incremented (step S428). Each time the current_pixel's corner response exceeds a local maxima or threshold, (i.e., the “yes” branch of decision step SD408) the current_pixel is identified as the current FP (feature point) candidate (step S410) and is then immediately subjected to the feature point sorting algorithm (SD412, SD414, SD416, S417, SD430, S418, S420).

If the list is not full, the feature point sorting algorithm SELECTS (S420) the current FP candidate if it is larger than lowest previously-selected FP candidate already stored in list of Selected Feature Points. (Yes branch of decision step SD412), else the current FP candidate is REJECTED (Rejection step S417) without ever being selected (No branch of decision step SD412). If the list of selected feature points is already full, as indicated by the selected feature point count SFP_count, (i.e., SFP_count=max_num_features=min_features+max_plus_features*(tile_variance/total_variance)) when the current FP candidate is SELECTED, then CANCEL the smallest previously-selected FP candidate from the list (SD430), elsewise increment the SFP_count value (SD430).

The feature point sorting algorithm SELECTS (S420) the current FP candidate only if it is not within the exclusionary zone (MIN_DIST) of any larger (SD416) previously-selected feature point already on the list (SD414). Thus, if the current FP candidate is within MIN_DIST of any larger (SD416) previously-selected feature point already on the list (SD414), it is REJECTED (No branch of decision step SD416, and Rejection step S417) without being selected. On the other hand, if the current FP candidate is within MIN_DIST of any smaller (SD416) previously-selected feature points already on the list (SD414), all the smaller (SD416) previously-selected feature points are CANCELLED (Yes branch of decision step SD416, and Cancellation step S418), and the current FP candidate is SELECTED (S420), and the SFP_count is updated (e.g., decremented or left the same) accordingly (418).

Once the current FP candidate has been SELECTED (S420) or REJECTED (S417), the Harris Corner Detector outputs the value of the next (S428) current_pixel (S410) of the current_tile (SD422) and the next identified PF candidate is immediately subjected to the feature point sorting algorithm (SD412, SD414, SD416, 5417, SD430, 5418, S420), etc. If the last pixel of the current_tile has been processed (SD422), then the next tile (SD424, 5426) is processed. If the last tile has been processed, then the process is DONE until the next image frame is to be processed.

Feature Point Motion Vector Calculation

After the feature points of each tile in the current frame F_thave been identified and sorted, the next step in the DIS method of FIGS. 1A through 1F is to obtain motion vectors for each of the selected feature points.

Block matching algorithms (BMA) used for calculating the motion vectors of feature points are well known. In block matching, an error function (e.g. SAD, MAD, MSE) is calculated for all possible positions of a block in a target area of the reference frame. The position with the lowest result of this function is used to calculate the estimated motion vector. Block matching is computationally expensive. There are several known ways to reduce the computational cost. Hierarchical or multi-resolution block matching is one of these ways in which the global movement is calculated first at lower resolutions. The resulting vectors will be used to search a smaller range at higher resolutions, thereby reducing the total number of arithmetic operation needed.

For most applications and for video encoding in particular, accurate motion vectors are needed for all blocks of a frame. Consequently, the search range in the later stages is usually relatively large. In the digital image stabilization (DIS) method illustrated in FIGS. 1A through 1F, it is only necessary to estimate the relative movement of feature points (e.g., of actually-stationary objects) from one frame to the next. For purposes of image stabilization, accurate motion vectors representing the movement of the background and large objects are needed, whereas smaller objects do not need to have an accurate motion vector associated with them. Any inaccurate vectors for smaller objects can be filtered at a later stage of the DIS algorithm.

It is expected that feature points of the large stationary objects of significance in the DIS method will move in a coherent way because of global movement or camera movement. We recognize that sufficiently large objects that move independently cover the majority of at least one tile, so that their motion can be estimated as the predominate motion of the tile itself, while the motion of small objects has little affect on the motion vector of the tile itself. Thus, the process of calculating motion vectors may be modified to reduce computations, by using a hierarchical motion estimation algorithm and by preferring tile movement over local movement, using the motion vector of the tile. Thus, a first step is to divide the current image frame into a plurality j×k of tiles. (This first step shall have already been performed for the purpose of feature point selection as above described in regards to FIGS. 1B and 2A).

A second step of calculating the motion vectors of the feature points accurate enough for DIS would be to derive one motion vector per tile, using block matching on the lowest resolution. In this step, the SAD (sum of absolute differences) for a given tile is calculated. The motion vector for a given tile is the one that minimizes the SAD. The SAD (sum of absolute differences) for a given motion vector candidate v=(vx, vy) is defined:

$SAD (v_{x}, v_{y}) = \sum_{(x, y) \in tile} \langle ref (x, y) - search (x + v_{x}, y + v_{y}) \rangle$

By using a low resolution downsampled image, computation is reduced and the effect of small objects in the scene is further reduced.

In the third step, the motion vectors of the tiles will be used in a block matching algorithm as start vectors for the local search for the motion vector of the feature points in each tile. Because a sufficiently large object that covers the majority of at least one tile may extend into adjacent tiles, it is probable that some feature points in each tile may be associated more strongly with the motion vector of an adjacent tile rather than the motion vector of the tile they are found within. Thus, it would be effective to use the motion vectors of all the adjacent tiles as multiple start vectors in the block matching search for the motion vector of the feature points of any given tile. The tiles used here are centered in the frame with a border region of a size of at least the maximum supported motion vector, such that the motion search for all feature points in all tiles can be done without referring to pixels outside the frame.

FIG. 5 is a view of the current frame F_tof FIG. 1A downsampled with calculated tile motion vectors superimposed thereon, for illustrating motion vector calculation steps in the DIS method illustrated in FIGS. 1A through 1F. The smaller (less pixels, less data) image in FIG. 5 is derived from the original current captured frame or from the previously subsampled image thereof (step S402 of FIGS. 4A AND 4B) by subsampling both horizontally and vertically. Subsampling by a subsampling factor f_s2, e.g., 4, is used for global (tile) motion estimation. The 4×4 downsampling just averages 16 pixels (with rounding), without any overlap on the input side. Then, a block matching search using each entire subsampled tile is performed to determine each tile's motion vector.

The motion vector for a given tile is the one that minimizes the SAD. In case of a tie, the first one found is taken. The motion vectors will be used as start vectors for the local search for the motion vectors of the nearby feature points. The motion range about each start vector is programmable.

Since the number of operations needed for the tile motion estimation are only about 12% of the operations needed for the local motion estimation, it is sufficient to calculate the sum of about 8 absolute differences per cycle. Therefore, complex or additional processing components such as a systolic array may not be needed.

FIG. 6 is a view of a portion of one tile in the downsampled frame of FIG. 5 illustrating using the tile motion vectors of FIG. 5 as start vectors for a block matching search to calculate the motion vector of one selected feature point used in the DIS method illustrated in FIGS. 1A through 1F.

A small local block matching search is performed in a higher-resolution domain around each of a set of start vectors for every feature point in the tile. This step could be performed at the original video resolution, or subsampled by a factor f_s3of 2 or 4. The start vectors used are the tile motion vectors that have been determined above. The start vectors used are those of the tile the feature point belongs to as well as those belonging to the four direct neighbors (Upper tile, Left tile, Right tile, Lower tile), provided it exists. Thus, in FIG. 6: the start vector corresponding to Block Matching search area 1 is the feature point's (FP's) own tile's motion vector; the start vector corresponding to Block Matching search area 2 is the block Down below the FP's tile's motion vector; the start vector corresponding to Block Matching search area 3 is the block Right of the FP's tile's motion vector; the start vector corresponding to Block Matching search area 4 is the block Left of the FP's tile's motion vector; and the start vector corresponding to Block Matching search area 5 is the block Up above the FP's tile's motion vector. According to another embodiment, the start vectors of the four diagonal neighbors are also used. Other steps for selecting among start vectors (e.g., to reduce the number of block matching computations) can be performed, particularly if a first group of tile vectors have magnitudes and direction similar to each other suggestive of one large object (see discussion of motion vector grouping regarding FIGS. 8A, 8B, 9, 10A, 10B, 10C). Alternatively, block matching can be performed with priority given or only where the two or more Block Matching search areas overlap, or between those nearest to each other, etc.

Generally, assignment of motion vectors to feature points will proceed tile by tile, and every feature point of a given tile will use the same start vectors (e.g., the same selection of tile motion vectors). However, in various other embodiments, feature points in different parts of a given tile may use a different selection of start vectors, on the premise that a feature point adjacent to tiles in a detected grouping of tile motion vectors may more likely be a visible point on the same object that is commonly found in each member of that group. Thus, a block matching search might first be performed on those feature points near the perimeter of each tile, to detect if they are all or mostly all similar to their own tile's motion vector and/or to the tile motion vector of adjacent grouping of tile motion vectors. If, for example, the motion vectors of all the initially selected feature points (e.g., all those near the perimeter of a given tile, or farthest from its center point) are in the same or similar to their own tile's motion vector, then the set of selected start vectors for the remaining feature points may be reduced.

For each start vector used, we use a very small range for the local search. The goal here is not so much to determine accurate vectors for each and every feature point. Rather, the feature points of interest are those that belong to the background or large objects. For those feature points, one of the tile motion vectors should be good, or close to the motion vector of the feature points of interest, and therefore, a small local search about each selected tile motion vectors is sufficient.

Referring again to FIG. 3, the SFP (selected feature point) Motion-Vector Calculator 700 of the Feature Point Circuit 3000 comprises a second Downsampler 710 for outputting more-downsampled luma data than the first Downsampler 310 for tile vector calculation, a Tile-Vector Calculator 720 for calculating each tile's motion vector, and a Hierarchical Block-Matching Search Unit 730 to determine and output the motion vector of each Selected Feature Point (SFP) received from the Feature Point Candidate Sorter 340 of the Feature Point Selector 300. The second Downsampler 710 outputs the deeply downsampled current frame F_tshown in FIG. 5. The second Downsampler 710 outputs the deeply downsampled luma data of the current frame F_tshown in FIG. 5. Tile-Vector Calculator 720 calculates the motion vector of each tile using the deeply downsampled luma data of the current frame F_toutput by the second Downsampler 710. Hierarchical Block-Matching Search Unit 730 determines the motion vector of each of the selected feature points output by the Feature Point Candidate Sorter 340 of the Feature Point Selector 300, using the full-resolution luma data (or the output of the first downsampler 310) of two consecutive frames, and using the Tile Vectors as start vectors as described above.

FIG. 7 is a flow chart illustrating a method of calculating the motion vectors of the selected feature points (SFP) in the current frame F_tof FIGS. 1A and 2A, for performing steps in the DIS method illustrated in FIGS. 1A through 1F.

In initial steps, the Hierarchical Block Matching Search circuit 730 shown in FIG. 3 receives luma data of two consecutive frames of video, the current frame and the reference frame (step S700i) and the pixel locations of the Selected Feature Points (S700ii). The current frame F_tis divided into a plurality of subsampled tiles (S710) which can be preferably the same as the tiles previously used in the feature point sorting method of FIGS. 4A AND 4B. In substep S710-A the current frame F_tis divided into a plurality j×k of tiles plus a border region as illustrated in FIG. 2A. In substep S710-B the luma data associated with every tile is subsampled by factor f_s2(e.g., f_s2=4, 8 for SD; f_s2=8, 16 for HD), as illustrated in FIG. 5.

Next, in step S720 the motion vector of each tile is calculated using full-search block matching with the deeply subsampled luma data, as illustrated in FIG. 5, achieving half-pel precision relative to the subsampled resolution. The calculated minimum-SAD values corresponding to calculated motion vectors may be saved for use in other features of the DIS (e.g., to filter out feature points of small objects). In step S730, start vectors are selected for the current Selected Feature Point (SFP) based on the calculated tile motion vectors from step S720, as described herein above. In step S740, a Hierarchical block-matching algorithm is carried out with full resolution luma data and using the selected start vectors based on the tile vectors to determine the motion vector of the current SFP. Steps S730 and S740 are repeated until the motion vector of every SFP in every tile has been calculated (through loop SD 750 and S752).

Feature Point Grouping by Motion Vector Magnitude and Direction

Motion between video frames is detected by calculating motion vectors of identifiable “feature points” in adjacent frames. Motion vectors of feature points may then be “grouped” for the purpose of identifying moving-objects within the scene, as distinguished from global motion of the camera/scene. The global motion of the camera/scene is analyzed to distinguish between intended (e.g., panning) and unintended (jittery) global motion.

If there is no jitter (no camera trajectory) then each detected Feature Points of actually-stationary objects (e.g., the corners of rocks, the peaks of mountains) will be expected to be found in the same location in each of two or more consecutive frames, and the motion vector of all those detected Feature Points will be measured as null. However, if there is camera jitter, then the vectors of the many Feature Points of any given actually-stationary object may have different magnitudes and direction. A Digital Image Stabilization circuit may be used to correctly “group” a plurality of motion vectors (of Feature Points) so that they are attributed to the same actually-stationary object.

Usually jittery camera movements are the mixture of translational and rotational movements, and the distance from camera to the object varies. This contributes to the motion vector differences of the feature points of background stationary objects. But when camera movement has rotational component, the direction of motion vectors of the same object at the same distance from the camera cannot be the same. Both of the magnitude and direction of the vectors of the feature points of the same stationary object may be different.

FIGS. 8A & 8B illustrate the different vectors that result from rotational camera motion as compared with purely translational vector motion. In the figure, assume two selected feature points SFP4 & SFP5 of the same stationary physical object are physically the same distance from camera, and vector A is the motion vector of SFP4 and B is the motion vector of SFP5 in the case of purely translational camera motion, and vector A′ is the motion vector of SFP4 and B′ is the motion vector of SFP5 in the case including rotational camera motion.

With purely translational camera motion vectors A & B will have exactly the same motion vectors, but vectors A′ & B′ have different magnitude and different direction due to the rotational camera motion, even though they are in the same distance from the camera.

FIGS. 8A & 8C illustrate the different vectors that result from purely translational vector motion in the case where two feature points of the same stationary object are different distances from the camera. Assume two selected feature points SFP4 & SFP7 of the same stationary physical object are different physical distances from the camera, and vector A is still the motion vector of SFP4 and C″ is the motion vector of SFP7 in the case of purely translational camera motion. Because SFP7 is closer to the camera than SFP4, while they are points on the same stationary object, the magnitude of their motion vectors are different (vector C″ is smaller than vector A)

Thus, while grouping motion vectors, a margin of vector difference is needed to account for the vector magnitude and vector direction (angle) differences caused by these factors so that the motion vectors of all the feature points of the same stationary object can be grouped together. The usual way of detection of motion vector groups, with an error margin and use of simple motion vector differences, is to define an error threshold.

The magnitude of motion vector difference ΔM is the measurement that may be used as the basis for grouping decisions, and the error margin Th_ΔMmay be defined as:

ΔM=SQRT((xa−xb)̂2+(ya−yb)̂2)<Th_ΔM, where

- A=(xa, ya);
- B=(xb, yb); and
- Thom is an error threshold for the magnitude of vector difference ΔM. (a positive number)

The magnitude of motion vector difference method is adequate when the camera movement is purely translational (up, down, and/or side to side) because the motion vectors of all stationary objects' feature points will have the same direction, because they are all defined by the same translational camera movement. As illustrated by comparing FIGS. 8A and 8B the motion vectors of different stationary feature points can also be different due to being different distances from the object(s) to the camera, even in the case of purely translational camera movement. The magnitude differences of motion vectors of feature points of the same stationary object are typically relatively small in the usual video scene, and the magnitude difference can also be tolerated by allowing some margin of vector magnitude difference (|A|−|B|), and the magnitude of motion vector difference ΔM method is adequate in this case.

FIG. 9A illustrates a case of two motion vectors A and B in which the magnitude of vector difference ΔM is a good basis for grouping two feature points together.

The magnitude of vector difference ΔM alone may not be good basis for grouping vectors in some cases.

FIG. 9B illustrates a case of two motion vectors A′ and B′ in which the magnitude of vector difference ΔM′ cannot be a good basis for grouping two feature points together.

In FIGS. 9A and 9B, pairs of vectors (A,B) and (A′,B′) have exactly the same magnitude of vector difference (ΔM=ΔM′) as shown. Each of pairs (A,B) and (A′,B′) can also have its respective magnitude of vector difference ΔM, ΔM′ within the error margin Th_ΔM. Vector pair A and B can be appropriately grouped together on the basis of the magnitude of their vector difference ΔM. But vector pair A′ and B′ has too much angular (directional) difference (e.g., compare with the pair A and B), for it to be appropriate to group vector A′ and vector B′ together in the same group.

The magnitude of vector difference ΔM method by itself may in instances be not suitable for motion vector grouping where two feature points have their magnitude of vector difference within the margin Th_ΔMwhile they have too much angular (directional) difference. A rotational component of the camera's trajectory can cause one or more feature points of stationary objects to have the same or a similar magnitude, but different directions, which is not detected by the magnitude of vector difference method. Thus, the magnitude of vector difference method may cause incorrect jitter compensation and/or less than optimal video compression, and/or excessive computational power or time consumption, and/or video artifacts due to incorrect video compression of stationary or of moving objects.

The motion vectors of the selected feature points (SFP) output by the Hierarchical Block-Matching Search Unit 730 of the Feature Point Circuit 3000 are next grouped according to their magnitude and direction to associate the motion vectors of selected feature points (SFPs) with objects in the scene based on the object's perceived relative movement between consecutive video frames.

When camera movement has rotational component, such as about an axis orthogonol to the plane of the image sensor/photodiode array, the direction of motion vectors of one object (e.g., the background) cannot be the same. Both of the magnitude and direction of the vectors are different for different feature points of the background, even if they are actually stationary and are the same distance from the camera.

Instead of using only the magnitude of motion vector difference ΔM and the error margin Th_ΔMfor the grouping decision, we use the magnitude ratio of motion vectors and normalized vector difference to detect and tolerate some amount of motion vector differences caused by rotational camera motion.

Where vector A=(xa, ya) and vector B=(xb, yb),

A first grouping decision criteria is based on the Magnitude ratio (Mr)=|b| where

|b|̂2=(|A|̂2)/(|B|̂2)=(xâ2+yâ2)/(xb̂2+yb̂2)

A second grouping decision criteria is based on normalized vector difference (used for the evaluation of angle difference)=|a−b|, where

|a−b|̂2=[{(xa−xb)̂2+(ya−yb)̂2}/(xâ2+yâ2)].

For an angular difference between vector A and vector B not exceeding θ_thdegrees, the second grouping decision criteria is |a−b|̂2<Mâ2, where

Mâ2=SQRT(1+|b|̂2−2*|b|*cos θ_th); and

|b|=SQRT{(xb̂2+yb̂2)/(xâ2+yâ2)}

The grouping method using decisions based on these two grouping decision criteria can perform optimal motion vector grouping even in the presence of rotational camera movement.

FIG. 10 shows three vector diagrams illustrating the calculation of normalized vector difference for the indirect measure of angle difference used for feature point grouping in the DIS method illustrated in FIGS. 1A through 1F. Referring to diagram (a) in FIG. 10A, difference vector (A−B) between vector A and vector B is drawn as the horizontal vector labeled (A−B). For ease of reading, diagram c is drawn to a larger scale than diagrams a and b. The absolute magnitude ΔM(A−B) of the difference vector (A−B) would be calculated as follows:

ΔM(A−B)=SQRT((xa−xb)̂2+(ya−yb)̂2), where

A=(xa, ya)

B=(xb, yb)

Referring to diagram (b) in FIG. 10A, the normalized vector-a is defined as vector A divided by the absolute value of vector |A|, and thus the normalized vector-a has magnitude of ONE (see diagram c of FIG. 10). The normalized vector-b is defined as vector B divided by the absolute value of vector |A|. Magnitudes |A| and |IB| are defined by these equations:

|A|̂2=(xâ2+yâ2)

|B|̂2=(xb̂2+yb̂2)

Note that b=B/|A|, thus the Magnitude ratio (Mr) is absolute value |b|=|B|/|A|=|(B/|A|)|. Thus, |b| is the magnitude of normalized vector-b, which has been normalized by dividing vector B by magnitude |A| of vector A (i.e., b=B/|A|). Thus, Magnitude ratio (Mr)=|b|=SQRT{(xb̂2+yb̂2)/(xâ2+yâ2)}.

Because the magnitude |a| of normalized vector-a is ONE, the magnitude |b| is also equal to the magnitude ratio between: the magnitude |a| of normalized vector-A divided by magnitude of normalized vector-b. Thus, magnitude |b| is referred to as the Magnitude Ratio |b| (Mr). The Magnitude Ratio |b| is not a function of the angle difference θ between vectors A and B.

As our first grouping decision criteria, we compare |A|/|B| with the magnitude ratio Mr of motion vector A and vector B. If (|A|/|B|)̂2> Mr̂2, then we decide that vector A and vector B cannot be in the same group. Thus, if (|A|/|B|)̂2> (|B|/|A|)̂2 the appropriate final grouping decision is that vector A and vector B cannot be in the same group. But, if (|A|/|B|)̂2> Mr̂2 then we make a second comparison based using the normalized vector difference |a−b| as the criteria.

The absolute magnitude |(a−b)| of the normalized vector difference (a−b) is computed according to this equation:

|a−b|̂2=[{(xa−xb)̂2+(ya−yb)̂2}/(xâ2+yâ2)]

The normalized vector difference (a−b) has absolute magnitude |(a−b)| as shown in diagram c of FIG. 10, in which lengths |a|, |b| and |(a−b)| form a triangle with vector difference angle θ being opposite side |(a−b)|, which means that |(a−b)| can also be calculated using the cosine rule as a function of θ. The cosine rule defines the mathematical relation between the length of an unknown side of a triangle to the length of the other sides and the angle opposite to the unknown side. The magnitude |(a−b)| of normalized vector difference (a−b) may be obtained using the cosine rule, by equation:

|(a−b)|=SQRT(1+|b|̂²−2*|b|*cos θ).

Thus, an angle threshold expressed as a threshold of magnitude |(a−b)| of normalized vector difference (a−b) (the side of the triangle opposite the angle difference θ in diagram c of FIG. 10), can be calculated as a function of and the angle difference θ, as indicated by the cosine rule. Thus, we may define threshold magnitude Ma of the magnitude |(a−b)| as an angle difference threshold value, wherein Ma is a function of a chosen threshold angle θ_th. Thus, we may compare the square of calculated magnitude |(a−b)| of normalized vector difference (a−b) with the square of Ma. Thus, |a−b|̂2 is compared with Mâ2 to decide whether the angular difference between vector A and vector B is small enough that they should be grouped together.

We define Mâ2=(1+|b|̂2−2*|b|*cos θ_th), where

- θ_this a predetermined angular threshold for grouping decision purposes (e.g., 30 degrees), and

|b|=(B/|A|)|=SQRT{(xb̂2+yb̂2)/(xâ2+yâ2)}

If |a−b|̂2 is less than Mâ2, then we decide that vector A and vector B CAN be in the same group. Thus, if |a−b|̂2 is less than Mâ2 the appropriate final grouping decision is that vector A and vector B are in the same group.

Thus, vector A and vector B are in the same group if |a−b|̂2 is less than Mâ2 and only if (|A|/|B|) is Not Greater than the square of Magnitude ratio (Mr) |b|̂2. Thus, the exact calculation of Mâ2 requires one square root operation (i.e., for calculating |b|), and a square root operation can be computationally expensive, or requires substantial hardware implementation, elimination of a square root operation can be significant. We have devised an approximation for Ma (i.e., Ma=0.5) that provides good grouping results for |b| equal to 1 plus or minus 30 percent (i.e., for 0.7≦|b|≦1.3) and within 30 degrees of vector error (vector difference) angle (i.e., for −30 degrees≦θ≦+30 degrees). Thus, the second grouping criteria becomes |a−b|̂2<0.5̂2.

If we plot the relation between angle difference θ, Magnitude ratio |b| and normalized difference |a−b|, we can obtain the graph of FIG. 12.

FIG. 12 is a graph of the magnitude |(a−b)| of normalized vector difference (a−b) versus the magnitude ratio |b| of normalized vector difference (a−b), as a function of various values of angle difference θ (between 0 degrees and 90 degrees), illustrating the availability of an approximation available for use in decision step dS1140 of the grouping algorithm of FIG. 11.

By experiment, the typical video gives good grouping result with a magnitude ratio greater than 1 plus or minus 30% and with difference angle θ greater up to 30 degrees, as indicated by the bounds of the square region in FIG. 12. This empirical range is effective for the approximation of normalized vector difference between zero and 0.5 as shown in the graph of FIG. 12.

Using the approximation, the SQRT(1+|b|̂2−2*|b|*cos θ) can be approximated as 0.5 regardless of |b|, to reduce the computational burden. Thus, using this approximation, the second grouping decision criteria becomes |a−b|̂2<0.5̂2.

FIG. 11, is a flow chart illustrating a grouping process according to an embodiment of the present inventive concept. Grouping process 1100 uses two criteria including normalized vector difference for the indirect measure of angle difference between the motion vectors of the selected feature points (see the image frame of FIGS. 1A and 2A), for performing vector grouping step of FIG. 1D in the DIS method illustrated in FIGS. 1A through 1F. The grouping algorithm 1100 includes the magnitude ratio grouping decision criteria (in decision step SD1120) and the normalized vector difference grouping decision criteria (in decision step SD1140). A paring algorithm (steps S1104, S1106, S1152, dS1150, and S1154) operating externally to the grouping algorithm 1100 keeps track of which feature points (motion vectors) have been already paired with which others, which remain unpaired, and which will be entirely excluded from the grouping algorithm 1100. The paring algorithm provides a pair of SPF motion vectors A & B as inputs to the grouping algorithm 1100. (S1104, S1106). In initialization step iS1102, Magnitude Ratio-Margin and a Angle-Margin are received from an external circuit and provided to the grouping algorithm 1100.

The grouping algorithm 1100 calculates |A|̂2 based on received vector A and |B|̂2 based on received vector B (steps S1112 & S1114) for using these calculated values to make subsequent computations in at least subsequent steps S1116, dS1120, and dS1140. Thus, when the received vector B is excluded from grouping with received vector A, (by the Yes branch of decision step dS1120 or by the No branch of decision step dS1140), the paring algorithm assigns a new vector B (step S1152) and the grouping algorithm computes a new value of |B|̂2 (stepS114) based on the new vector B, but the calculated value of |A|̂2 (stepS112) of current vector A need not be updated at the same time because comparison of the same vector A will continue but with a new vector(s) B. Thus, hardware or software adapted to perform the grouping algorithm 1100 can be configured to separately store one or more instances of the values |B|̂2 and |A|̂2 so as to computationally efficiently make multiple comparisons using one of those values as long as only one among vectors A and B is changed at a time.

The grouping algorithm 1100 next calculates magnitude-ratio Mr (|b|̂2) and |a−b|̂2 (S1116) using |A|̂2 and |B|̂2 (from steps S1112 and S1114). The first (magnitude ratio) grouping decision criteria is applied in decision step dS1120. In decision step dS1120 the square of the magnitude ratio Mr ̂2 is compared with (|A|/|B|)̂2 and/or with a Magnitude Ratio-Margin (from step iS1102). If (|A|/|B|)̂2> Mr̂2, (Yes branch of decision step dS1120) then current vector A is not grouped with current vector B and comparison with current vector B is ended and a new vector B is selected (step S1152). If (|A|/|B|)̂2 is not greater than Mr̂2, (No branch of decision step dS1120) then current vector A may become grouped with current vector B and the second grouping decision criteria are applied (in decision step dS1140). If |b| is within a predetermined range (e.g., based on the value of |b|̂2) and if the angle difference θ is within a predetermined range (Yes branch of the decision step dS1130), then the magnitude Ma of the normalized difference vector (a−b) is approximated (e.g., as Mâ2=0.5̂2). Otherwise, (No branch of the decision step dS1130), the magnitude Ma of the normalized difference vector (a−b) is calculated (S1132).

Next, the approximated or calculated magnitude Ma of the normalized difference vector (a−b) is used in the second (normalized vector difference) grouping decision criteria in decision step dS1140. In decision step dS1140 the square of Ma (Mâ2) is compared with (|a−b|) ̂2 and/or with the Angle-Margin (from step iS1102). If (|a−b|) ̂2 is less than Mâ2, (Yes branch of decision step dS1140) then current vector A can be grouped with current vector B (S1142). If (|A|/|B|) ̂2 is not less than Mâ2 (No branch of decision step dS1140) then current vector A is not grouped with current vector B and comparison with current vector B is ended and a new vector B is selected (step S1152).

Once the current vector A has been compared with all available grouping candidates vector Bs (Yes branch of decision step dS1150), then a new vector A is selected and comparisons continue for the remaining (ungrouped) grouping candidates vector Bs (S1154, S1112 etc) or if all vectors have been grouped, the grouping algorithm 1100 waits until a new frame needs to be processed.

FIG. 13 is a block diagram of Feature Point Grouping Circuit 1300 comprising a Grouping Algorithm Circuit 1310 configured to perform the feature point grouping algorithm of FIG. 11. The Feature Point Grouping Circuit 1300 comprises a Magnitude Ratio (Mr) Comparator 1320 configured to perform the first grouping decision (of decision step dS1120 of FIG. 11) based on the criteria of Magnitude ratio Mr (|b|) and a Vector Angle Comparator 1330 configured to perform the second grouping decision (of decision step dS1140 of FIG. 11) based on the criteria of normalized vector difference (used for the evaluation of angle difference). The Vector Angle Comparator 1330 in this exemplary embodiment of the inventive concept includes a Magnitude of Normalized Difference (Ma) Calculator/Estimator 1334 and a Magnitude of Normalized Difference (Ma) Comparator 1334. The Magnitude of Normalized Difference (Ma) Calculator/Estimator 1334 generates or calculates as described herein above with respect to steps dS1130, S1132, and S1134 of FIG. 11.

The Feature Point Grouping Circuit 1300 shares the RAM Memory 350 with the Feature Point Circuit 3000 of FIG. 3. The SPF list 352 portion of the memory 350 contains the list of selected feature points output by the Feature Point Candidate Sorter 340. The Paring Algorithm Controller 1302 in the Feature Point Grouping Circuit 1300 DMA accesses the SPF list 352 and selects vectors A and vectors B for comparison in the Grouping Algorithm Circuit 1310 as described herein above with respect to steps S1152, S1154, dS1156, and dS1150 of FIG. 11. When a comparison results in one or more groups of vectors (groups of selected feature points), the Paring Algorithm Controller 1302 writes the grouped vectors or a descriptive list thereof into the FP Group Inventories 354 portion of the memory 350.

Feature Point Grouping Circuit 1300 further comprises a |A|̂2 Calculator 1312, a |B|̂2 Calculator 1312 and a Mr̂2=|b|̂2 & |a−b≡̂2 Calculator 1316, configured to perform steps S1112, S1114, and S1116 of FIG. 11 respectively.

The above-disclosed subject matter is to be considered illustrative, and not restrictive, and the appended claims are intended to cover all such modifications, enhancements, and other embodiments, which fall within the true spirit and scope of the inventive concept. Thus, to the maximum extent allowed by law, the scope of the inventive concept is to be determined by the broadest permissible interpretation of the following claims and their equivalents, and shall not be restricted or limited by the foregoing detailed description.

Claims

1. A method of processing video data, comprising:

receiving first image data representing a first image frame;

dividing a portion of the first image frame into a plurality of tiles;

selecting a feature point in each of the tiles;

deriving one tile motion vector corresponding to each of the tiles;

selecting start vectors for each selected feature point based on the tile motion vectors;

receiving a second image data representing a second frame; and

deriving a feature point motion vector corresponding to each of the selected feature points by using the selected start vectors.

2. The method of claim 1, wherein selecting the feature point in each of the tiles includes identifying feature points within the tile including performing a first downsampling of the first image data;

3. The method of claim 1, wherein the deriving tile motion vectors includes performing a second downsampling of the first image data.

4. The method of claim 3, wherein the deriving tile motion vectors further includes performing a full search block matching for each tile.

5. The method of claim 1, wherein selecting a feature point in each of the tiles includes identifying the feature points in each of the tiles in the first image frame using a corner detector.

6. The method of claim 1, wherein selecting the feature points further includes calculating the luminance variance of each tile in the first image frame.

7. The method of claim 6, wherein selecting the feature points further includes calculating the total variance of the first image frame, and computing the ratio of each tile's luminance variance to the total variance.

8. The method of claim 1, wherein selecting a feature point in each of the tiles includes:

identifying each of the feature points in each tile in the first image frame as the currently-identified feature point candidate; and

sorting the identified feature points of the tile based on comparing a corner response of the currently-identified feature point candidate to a corner response of previously identified feature point candidates in the current-tile and the proximity of the currently-identified feature point candidate to other previously identified feature point candidates.

9. The method of claim 8, wherein the sorting includes:

rejecting the currently-identified feature point candidate if it is situated within a predetermined minimum distance of a previously identified feature point candidate.

10. The method of claim of claim 9, further including rejecting the currently-identified feature point candidate if it is smaller than the previously identified stored feature point candidate.

11. The method of claim of claim 8, further including storing up to a predetermined number of identified feature point candidates in a memory.

12. The method of claim 11 further including:

storing the currently-identified feature point candidate of the current-tile in the memory if the currently-identified feature point candidate is larger than the smallest among the currently-stored identified feature point candidates; and

canceling the storage of the smallest among the currently-stored identified feature point candidates if the currently-identified feature point candidate is larger.

13. The method of claim 11, wherein the sorting further includes:

canceling the storage of currently-stored feature point candidates if:

the one or more currently-stored feature point candidates are situated within a predetermined minimum distance of the currently-identified feature point candidate; and

the currently-identified feature point candidate is larger than the one or more currently-stored feature point candidates.

14. The method of claim 11, wherein the predetermined number of identified feature point candidates corresponding to each tile is 64 or less.

15. The method of claim 1, wherein pixels in a border margin adjacent to the perimeter of the first frame are not included in the plurality of tiles.

16. An image processing circuit, comprising:

a receiver configured to receive first and second frames of image data;

a divisor configured to divide a portion of the first frame into a plurality of tiles;

a feature point circuit configured to identify feature points in each tile; and

a motion vector circuit configured to derive motion vectors for identified feature points and for each tile.

17. The circuit according to claim 16, wherein the motion vector circuit further includes:

a tile vector calculator configured to derive the vector of motion between the first and second frames of each tile; and

a search unit configured to use selected start vectors to derive the vector of motion of a selected feature point in each tile; and

a memory configured to store up to a predetermined number of feature points for each tile.

18. The circuit of claim 16, further comprising a down-sampler configured to down sample the image data of the first and second frames to derive the vector of motion between the first and second frames of a feature point in each tile.

19. The circuit of claim 16, wherein the feature circuit includes a corner detector configured to identify feature points of a tile.

20. The circuit of claim 16, wherein the motion vector circuit is further configured to select start vectors for a selected point from one of the tile vectors.

21. The circuit of claim 20, wherein the selected start vectors of the selected point in a first tile of the first frame comprise the first tile's tile vector, and the tile vectors of any tiles adjacent right, adjacent left, adjacent up and adjacent down relative to the first tile.

22. The circuit of claim 16, wherein the feature point circuit is configured to select a plurality of selected feature points from among the identified feature points in each tile.

23. The circuit of claim 16, wherein the memory stores up to 64 selected feature points for each tile.

24. The circuit of claim 16, wherein the first and second frames of image data are received via I/O pins of the image processing circuit.

25. A method of processing video data, comprising:

dividing a portion of a frame into a plurality of tiles;

estimating a motion vector for each tile;

sorting motion vectors into groups based on motion characteristics; and

selecting a motion vector group to represent a stationary object within a scene of the frame.

26. The method of claim 25, wherein the estimating a motion vector for a tile includes performing a search of feature points using subsampled luminance data.

27. The method of claim 26, wherein the estimating a motion vector for a tile includes selecting the tile motion vector candidate having the lowest sum-of-absolute-difference (SAD).

28. The method of claim 25, wherein the selecting a motion vector group representing a stationary object within the scene of the frame includes rejecting at least one motion vector group representing movement of moving objects within the scene of the frame.

29. A camera comprising:

an image capture circuit configured to capture images and output first and second frames of image data; and

an image processing circuit, comprising:

a tiling circuit configured to divide a portion of the first frame into a plurality of tiles;

a feature point circuit configured to identify feature points in each tile within the first frame; and

a motion vector circuit configured to derive motion vectors for each tile and to derive motion vectors for identified feature points in each tile.

30. The camera of claim 29, wherein the motion vector circuit is further configured to derive motion vectors for identified feature points based on the motion vector of at least one tile.

31. The camera of claim 29, further including a grouping circuit configured to sort motion vectors into groups based on motion characteristics and select a motion vector group to represent a stationary object within a scene of the frame.