FEATURE BASED HIGH RESOLUTION MOTION ESTIMATION FROM LOW RESOLUTION IMAGES CAPTURED USING AN ARRAY SOURCE
Systems and methods in accordance with embodiments of the invention enable feature based high resolution motion estimation from low resolution images captured using an array camera. One embodiment includes performing feature detection with respect to a sequence of low resolution images to identify initial locations for a plurality of detected features in the sequence of low resolution images, where the at least one sequence of low resolution images is part of a set of sequences of low resolution images captured from different perspectives. The method also includes synthesizing high resolution image portions, where the synthesized high resolution image portions contain the identified plurality of detected features from the sequence of low resolution images. The method further including performing feature detection within the high resolution image portions to identify high precision locations for the detected features, and estimating camera motion using the high precision locations for said plurality of detected features.
Latest Pelican Imaging Corporation Patents:
- Extended Color Processing on Pelican Array Cameras
- Systems and Methods for Synthesizing Images from Image Data Captured by an Array Camera Using Restricted Depth of Field Depth Maps in which Depth Estimation Precision Varies
- Thin Form Factor Computational Array Cameras and Modular Array Cameras
- Array Camera Configurations Incorporating Constituent Array Cameras and Constituent Cameras
- Array Cameras Incorporating Independently Aligned Lens Stacks
The current application claims priority to U.S. Provisional Patent Application Ser. No. 61/692,547, entitled “Feature Based High Resolution Motion Estimation From Low Resolution Images Captured Using an Array Source” filed Aug. 23, 2012, the disclosure of which is incorporated by reference herein in its entirety.
FIELD OF THE INVENTIONThe present invention relates generally to feature detection in digital images and more specifically to the use of array cameras and super resolution to improve the performance and efficiency of feature detection.
BACKGROUND OF THE INVENTIONIn digital imaging and computer vision, feature detection is a fundamental operation that is typically a preliminary step to feature-based algorithms such as motion estimation, stabilization, image registration, object tracking, and depth estimation. The performance of these algorithms depends sensitively on the quality of the feature point estimates.
Various types of image features include edges, corners or interest points, and blobs or regions of interest. Edges are points where there is a boundary between two image regions, and are usually defined as sets of points in the image which have a strong gradient magnitude. Corners or interest points can refer to point-like features in an image that have a local two dimensional structure. A corner can be the intersection of two edges, or a point for which there are two dominant and different edge directions in a local neighborhood of the point. An interest point can be a point which has a well-defined position and can be robustly detected, such as a corner or an isolated point of local maximum or minimum intensity. Blobs or regions of interest can describe a type of image structure in terms of regions, which often contain a preferred point. In that sense, many blob detectors may also be regarded as interest point operators.
A simple but computationally intensive approach to corner detection is using correlation. Other methods include the Harris & Stephens corner detection algorithm that considers the differential of the corner score with respect to direction using the sum of squared differences.
Achieving effective feature detection depends in part on providing high quality data, i.e., high resolution image(s), to the feature detector.
SUMMARY OF THE INVENTIONSystems and methods in accordance with embodiments of the invention enable feature based high resolution motion estimation from low resolution images captured using an array camera. One embodiment includes performing feature detection with respect to a sequence of low resolution images using a processor configured by software to identify initial locations for a plurality of detected features in the sequence of low resolution images, where the at least one sequence of low resolution images is part of a set of sequences of low resolution images captured from different perspectives, synthesizing high resolution image portions from the set of sequences of low resolution images captured from different perspectives using the processor configured by software to perform a super-resolution process, where the synthesized high resolution image portions contain the identified plurality of detected features from the sequence of low resolution images, performing feature detection within the high resolution image portions to identify high precision locations for said plurality of detected features using the processor configured by software, and estimating camera motion using the high precision locations for said plurality of detected features using the processor configured by software.
In a further embodiment, wherein the detected features are selected from the group consisting of: edges, corners, and blobs.
In another embodiment, performing feature detection with respect to a sequence of low resolution images further includes detecting the location of features in a first frame from the low resolution sequence of images, and detecting the location of features in a second frame from the low resolution sequence of images.
In a still further embodiment, detecting the location of features in a second frame from the sequence of low resolution images further includes searching the second frame from the sequence of low resolution images to locate features detected in the first frame from the sequence of low resolution images.
In still another embodiment, searching the second frame from the sequence of low resolution images to locate features detected in the first frame from the sequence of low resolution images further includes identifying an image patch surrounding the location of the given feature in the first frame in the sequence of low resolution images, and searching the second frame in the sequence of low resolution images for a corresponding image patch using a matching criterion.
In a yet further embodiment, the matching criterion involves minimizing an error distance metric.
In yet another embodiment, performing feature detection within the high resolution image portions to identify high precision locations for said plurality of detected features further comprises searching the high resolution image regions containing the features from the second frame in the sequence of low resolution images for features from the first frame in the sequence of low resolution images using the high resolution image regions containing the features from the first frame in the low resolution sequence of images.
In a further embodiment again, searching the high resolution image regions containing the features from the second frame in the sequence of low resolution images for features from the first frame in the sequence of low resolution images further comprises comparing high resolution image regions containing features from the second frame in the sequence of low resolution images to the high resolution image portions containing the features from the first frame in the sequence of low resolution images using a matching criterion.
In another embodiment again, the matching criterion involves minimizing an error distance metric.
In a further additional embodiment, the processor is part of an array camera that further comprises an imager array, the method further comprising capturing at least a plurality of the sequences of low resolution images in the set of sequences of low resolution images from different perspectives using the imager array.
In another additional embodiment, the high precision locations for said plurality of detected features estimate feature location at a subpixel precision relative to the size of the pixels of the frames in the sequence of low resolution images.
In a still yet further embodiment, performing feature detection with respect to a sequence of low resolution images further comprises performing feature detection with respect to a plurality of sequences of low resolution images, where each sequence is from a different perspective.
In sill yet another embodiment, the set of sequences of low resolution images comprises sequences of low resolution images captured in a plurality of different color channels, and performing feature detection with respect to a sequence of low resolution images further comprises performing feature detection with respect to a at least one sequence of low resolution images in each color channel.
Another embodiment includes an imager array, a processor configured by software to control various operating parameters of the imager array. In addition, the software further configures the processor to: capture a set of sequences of low resolution images captured from different perspectives using the imager array; perform feature detection with respect to one of the set of sequences of low resolution images to identify initial locations for a plurality of detected features in the sequence of low resolution images; synthesize high resolution image portions from the set of sequences of low resolution images captured from different perspectives, where the high resolution image portions contain the identified plurality of detected features from the sequence of low resolution images; perform feature detection within the high resolution image portions to identify high precision locations for said plurality of detected features; and estimate camera motion using the high precision locations for said plurality of detected features.
In a further embodiment, the detected features are selected from the group consisting of: edges, corners, and blobs.
In a still further embodiment, the processor is further configured to perform feature detection with respect to a sequence of low resolution images by detecting the location of features in a first frame from the sequence of low resolution images, and detecting the location of features in a second frame from the sequence of low resolution images.
In still another embodiment, the processor is further configured by software to detect the location of features in a second frame from the sequence of low resolution images by searching the second frame from the sequence of low resolution images to locate features detected in the first frame from the sequence of low resolution images.
In a yet further embodiment, the processor is further configured by software to search a second frame from the sequence of low resolution images to locate a given feature detected in the first frame from the sequence of low resolution images by: identifying an image patch surrounding the location of the given feature in the first frame in the sequence of low resolution images; and searching the second frame in the sequence of low resolution images for a corresponding image patch using a matching criterion.
In yet another embodiment, the matching criterion involves minimizing an error distance metric.
In a further embodiment again, the processor is further configured by software to perform feature detection within the high resolution image portions to identify high precision locations for said plurality of detected features by searching the high resolution image regions containing the features from the second frame in the sequence of low resolution images for features from the first frame in the sequence of low resolution images using the high resolution image regions containing the features from the first frame in the low resolution sequence of images.
In another embodiment again, the processor is further configured by software to search the high resolution image regions containing the features from the second frame in the sequence of low resolution images for features from the first frame in the sequence of low resolution images by comparing high resolution image regions containing features from the second frame in the sequence of low resolution images to the high resolution image portions containing the features from the first frame in the sequence of low resolution images using a matching criterion.
In a further additional embodiment, the matching criterion involves minimizing an error distance metric.
In another additional embodiment, the high precision locations for said plurality of detected features estimate feature location at a subpixel precision relative to the size of the pixels of the frames in the sequence of low resolution images.
In a still yet further embodiment, a plurality of the imagers in the imager array sense different wavelengths of light and the set of sequences of low resolution images comprises sequences of low resolution images captured in a plurality of different color channels.
In still yet another embodiment, the processor is further configured by software to perform feature detection with respect to a sequence of low resolution images by performing feature detection with respect to a at least one sequence of low resolution images in each color channel.
In a still further embodiment again, the processor is further configured by software to perform feature detection with respect to a sequence of low resolution images by performing feature detection with respect to a plurality of sequences of low resolution images, where each sequence is from a different perspective.
Turning now to the drawings, systems and methods for feature based high resolution motion estimation from low resolution images captured using an array camera in accordance with embodiments of the invention are illustrated. Two images sequentially captured using a legacy camera can reflect a relative displacement due to motion of the camera. This camera motion, or equivalently, the 3D structure of the scene, can be recovered from the images. The first step toward these goals is to perform feature matching between the two images, the initial step of which is to detect features in one or both images independently. Initial correspondences are then formed between image features by selecting a patch around each feature and minimizing an error distance metric, such as normalized cross correlation, between the patch and candidate patches in the other image. The set of initial correspondences between features can then be refined using a validation procedure such as Random Sample Consensus (RANSAC) for a given motion model.
Array cameras offer a number of advantages and features over legacy cameras. An array camera typically contains two or more imagers, each of which receives light through a separate lens system. The imagers operate to capture images of a scene from slightly different perspectives. The images captured by these imagers are typically referred to as low resolution images and super resolution processing can be used to synthesize a high resolution or super resolution image from a subset of the low resolution images. A comparison of a minimum of two low resolution images can provide parallax information used in super resolution processing. The terms low resolution and high resolution are used relatively and not to indicate any specific image resolution. Imagers in the array may sense different wavelengths of light (e.g., red, green, blue, infrared), which can improve performance under different lighting conditions and the performance of super resolution processing on images captured using the array. Super resolution processes that can generate higher resolution images using low resolution images captured by an array camera include those disclosed in U.S. patent application Ser. No. 12/967,807 entitled “Systems and Methods for Synthesizing High Resolution Images Using Super-Resolution Processes,” the disclosure of which is hereby incorporated by reference in its entirety.
A sequence of low resolution images captured by the imagers of an array camera typically contain temporal displacement between the frames due to camera motion, as in a legacy camera, but also intra-frame displacement between the constituent images of the array (i.e. the low resolution images captured by each imager in the array) for each frame due to parallax. Because the offset distances of each imager in the array are known, the parallax displacement can be calculated and used to register the images to perform super resolution processing.
In several embodiments, feature detection can be performed on a sequence of super resolution images generated using low resolution images captured by the array camera. Performing feature detection in this way can yield a subpixel estimate of feature positions (i.e. an estimate has a precision smaller than the size of the pixels of the sensors in the array camera used to capture the low resolution images). Referring to
By applying a super resolution process, such as one of the processes described in U.S. patent application Ser. No. 12/967,807, the accuracy of the feature detection can be increased. Higher resolution images obtained by applying super resolution processing to low resolution images including the low resolution images shown in
In many embodiments of the invention, accurate feature detection can be achieved in a computationally efficient manner by initially identifying the location of features in low resolution images and then selectively performing super resolution processing to obtain in higher resolution the portions of the low resolution images containing the identified features. By only performing super resolution processing to obtain the portions of the super resolution images utilized in feature detection, feature detection can be performed at a higher speed (i.e. with fewer computations) while preserving the benefits of increased accuracy. In this way advanced functionality relying upon feature recognition such as (but not limited to) real time image stabilization during video capture can be performed in a computationally efficient manner. Array cameras and the use of super resolution processes to obtain high resolution image portions for performing feature detection in accordance with embodiments of the invention are discussed further below.
Array Camera ArchitectureAn array camera architecture that can be used in a variety of array camera configurations in accordance with embodiments of the invention is illustrated in
Although a specific architecture is illustrated in
In many embodiments of the invention, super resolution is performed to obtain a portion of a high resolution image corresponding to the portion of a low resolution image containing an identified feature. Once the high resolution image portion is obtained, feature correspondences that were initially determined using the low resolution images can be refined at the higher resolution. A flow chart illustrating a process 120 for refining feature correspondences using high resolution image portions obtained using super resolution processing in accordance with an embodiment of the invention is shown in
A feature detection algorithm is run on a first low resolution image captured by an imager in an array camera to identify (122) features in the image. An image from any of the imagers in the array camera can be chosen, so long as the second low resolution image used to perform feature detection is captured from the same imager. In many embodiments, feature detection can be performed with respect to sequences of images captured by multiple cameras to obtain additional information concerning the location of features. In a number of embodiments, the array camera includes cameras that capture images in different color channels and the array camera performs feature detection with respect to a sequences of images captured by cameras in multiple cameras. In certain embodiments, feature detection is performed with respect to a sequence of images captured by at least one camera in each color channel.
As discussed above, the types of features that can be detected in the low resolution image can include (but are not limited to) edges, corners, and blobs. Typically, a feature detection algorithm identifies one type of feature based upon the definition of the feature. A corner detector such as the Harris and Stephens detection algorithm can be used to identify corners. In the Harris and Stephens algorithm, an image patch is considered over a specified area and shifted. A corner is characterized by a large variation in the weighted sum of squared differences between the two patches in all directions.
Referring to
In some embodiments of the invention, each feature in the first frame is matched (i.e. determined to correspond) to a feature in the second frame where possible. This initial correspondence may not be possible if the feature has moved out of the second frame or has moved a significant distance. In other embodiments, the features are not matched in the low resolution images, but are matched after performing super resolution on portions of the low resolution images (frames).
A neighborhood of pixels around each feature is selected (126) in each frame. Suitable dimensions for such a neighborhood can be 20 pixels by 20 pixels (20×20) to 60 pixels×60 pixels (60×60), although smaller or larger neighborhoods are possible and may be determined by the limitations of the computing platform carrying out calculations on the image. Moreover, the neighborhood can be of any shape and need not be square. The feature typically can fall within the boundaries of the neighborhood, but need not be centered in the neighborhood.
Referring to
As discussed above, differences exist in the low resolution images captured by the imagers of a camera array due to the effects of parallax. In order to synthesize a high resolution image portion containing a designated neighborhood, the effects of parallax are accounted for by determining the parallax between the images and applying appropriate pixel shifts to the pixels of the low resolution images. The pixel shifts may involve moving pixels into the designated neighborhood and shifting pixels out of the designated neighborhood. Accordingly, although a specific neighborhood of pixels in the synthesized high resolution image is identified, the super resolution algorithm may utilize pixels from the low resolution images that are outside the neighborhood and exclude pixels from the low resolution images within the neighborhood following correcting for parallax. Therefore, the input pixels from the low resolution images utilized to obtain a designated neighborhood of a high resolution image using super resolution processing are not limited to pixels within the designated neighborhood identified by performing feature detection within the initial low resolution image pair. The designated neighborhood simply guides the super resolution process with respect to the low resolution pixels to utilize to synthesize the portion of the high resolution image corresponding to the designated neighborhood. Methods for obtaining distance and other information using parallax calculations via an array camera that can be used in super resolution processing include those disclosed in U.S. Patent Application Ser. No. 61/691,666 entitled “Systems and Methods for Parallax Detection and Correction in Images Captured Using Array Cameras,” the disclosure of which is incorporated by reference herein in its entirety.
The resulting frames are illustrated in
In high resolution neighborhood 146′, the position of point 142′ is slightly to the right of where it appears in low resolution neighborhood 146. Because super resolution restores the actual high frequency content of the image, the higher resolution neighborhood provides a “truer” representation of the point's actual position. In many embodiments of the invention, the newly calculated positions of points 140′ and 142′ within high resolution neighborhoods 144′ and 146′ can be used in matching (i.e. determining a correspondence between) points 140′ and 142′.
Referring to
Using the initial correspondences, any of a variety of feature-based algorithms, including (but not limited to), motion estimation, stabilization, image registration, object tracking, or depth estimation, can be performed on the images. The model that is developed using the features and correspondences (for example, a motion model) can be further refined using high resolution neighborhoods of pixels that encompass the relevant features.
The initial correspondences between points 140′ and 142′ are refined (130) using the high resolution neighborhoods (i.e. the high resolution image portions). Refinement may be accomplished using a variety of methods, including (but not limited to) recomputing a matching metric (e.g., normalized cross-correlation) between a pair of corresponding high resolution neighborhoods. Recomputing a matching metric can involve finding the normalized cross-correlation between high resolution neighborhoods 144′ and 146′, and using the metric to compute an estimated position of point 142, i.e., future position of point 140′ in the subsequent frame. In other embodiments, any of a variety of methods can be utilized appropriate to the requirements of a specific application.
A variety of validation procedures can be used such as the RANdom SAmple Consensus (RANSAC) method for a given model that was formed using the initial features and correspondences (such as a motion model for motion estimation). The RANSAC method utilizes a set of observed data values, a parameterized model which can be fitted to the observations, and confidence parameters. A random subset of the original data is iteratively selected as hypothetical inliers and tested by: fitting the parameters of a model to the hypothetical inliers, testing all other data against the fitted model, including a point as a hypothetical inlier if it fits well to the estimated model, keeping the estimated model if sufficiently many points have been classified as hypothetical inliers, re-estimating the model from the updated set of all hypothetical inliers, and estimating the error of the inliers relative to the model. Other suitable validation procedures appropriate to a specific application can also be utilized in accordance with embodiments of the invention.
Although a specific process is illustrated in
Although the description above contains many specificities, these should not be construed as limiting the scope of the invention but as merely providing illustrations of some of the presently preferred embodiments of the invention. Various other embodiments are possible within its scope.
Claims
1. A method for performing feature based high resolution motion estimation from a plurality of low resolution images, comprising:
- performing feature detection with respect to a sequence of low resolution images using a processor configured by software to identify initial locations for a plurality of detected features in the sequence of low resolution images, where the at least one sequence of low resolution images is part of a set of sequences of low resolution images captured from different perspectives;
- synthesizing high resolution image portions from the set of sequences of low resolution images captured from different perspectives using the processor configured by software to perform a super-resolution process, where the synthesized high resolution image portions contain the identified plurality of detected features from the sequence of low resolution images;
- performing feature detection within the high resolution image portions to identify high precision locations for said plurality of detected features using the processor configured by software; and
- estimating camera motion using the high precision locations for said plurality of detected features using the processor configured by software.
2. The method of claim 1, wherein the detected features are selected from the group consisting of: edges, corners, and blobs.
3. The method of claim 1, wherein performing feature detection with respect to a sequence of low resolution images further comprises:
- detecting the location of features in a first frame from the low resolution sequence of images; and
- detecting the location of features in a second frame from the low resolution sequence of images.
4. The method of claim 3, wherein detecting the location of features in a second frame from the sequence of low resolution images further comprises searching the second frame from the sequence of low resolution images to locate features detected in the first frame from the sequence of low resolution images.
5. The method of claim 4, wherein searching the second frame from the sequence of low resolution images to locate features detected in the first frame from the sequence of low resolution images further comprises:
- identifying an image patch surrounding the location of the given feature in the first frame in the sequence of low resolution images; and
- searching the second frame in the sequence of low resolution images for a corresponding image patch using a matching criterion.
6. The method of claim 5, wherein the matching criterion involves minimizing an error distance metric.
7. The method of claim 3, wherein performing feature detection within the high resolution image portions to identify high precision locations for said plurality of detected features further comprises searching the high resolution image regions containing the features from the second frame in the sequence of low resolution images for features from the first frame in the sequence of low resolution images using the high resolution image regions containing the features from the first frame in the low resolution sequence of images.
8. The method of claim 7, wherein searching the high resolution image regions containing the features from the second frame in the sequence of low resolution images for features from the first frame in the sequence of low resolution images further comprises comparing high resolution image regions containing features from the second frame in the sequence of low resolution images to the high resolution image portions containing the features from the first frame in the sequence of low resolution images using a matching criterion.
9. The method of claim 8, wherein the matching criterion involves minimizing an error distance metric.
10. The method of claim 1, wherein the processor is part of an array camera that further comprises an imager array, the method further comprising capturing at least a plurality of the sequences of low resolution images in the set of sequences of low resolution images from different perspectives using the imager array.
11. The method of claim 1, wherein the high precision locations for said plurality of detected features estimate feature location at a subpixel precision relative to the size of the pixels of the frames in the sequence of low resolution images.
12. An array camera configured to perform feature based high resolution motion estimation from low resolution images captured using the array camera, comprising:
- an imager array;
- a processor configured by software to control various operating parameters of the imager array;
- wherein the software further configures the processor to: capture a set of sequences of low resolution images captured from different perspectives using the imager array; perform feature detection with respect to one of the set of sequences of low resolution images to identify initial locations for a plurality of detected features in the sequence of low resolution images, synthesize high resolution image portions from the set of sequences of low resolution images captured from different perspectives, where the high resolution image portions contain the identified plurality of detected features from the sequence of low resolution images; perform feature detection within the high resolution image portions to identify high precision locations for said plurality of detected features; and estimate camera motion using the high precision locations for said plurality of detected features.
13. The array camera of claim 12, where the detected features are selected from the group consisting of: edges, corners, and blobs.
14. The array camera of claim 12, wherein the processor is further configured to perform feature detection with respect to a sequence of low resolution images by:
- detecting the location of features in a first frame from the sequence of low resolution images; and
- detecting the location of features in a second frame from the sequence of low resolution images.
15. The array camera of claim 14, wherein the processor is further configured by software to detect the location of features in a second frame from the sequence of low resolution images by searching the second frame from the sequence of low resolution images to locate features detected in the first frame from the sequence of low resolution images.
16. The array camera of claim 15, wherein the processor is further configured by software to search a second frame from the sequence of low resolution images to locate a given feature detected in the first frame from the sequence of low resolution images by:
- identifying an image patch surrounding the location of the given feature in the first frame in the sequence of low resolution images; and
- searching the second frame in the sequence of low resolution images for a corresponding image patch using a matching criterion.
17. The array camera of claim 16, wherein the matching criterion involves minimizing an error distance metric.
18. The array camera of claim 14, wherein the processor is further configured by software to perform feature detection within the high resolution image portions to identify high precision locations for said plurality of detected features by searching the high resolution image regions containing the features from the second frame in the sequence of low resolution images for features from the first frame in the sequence of low resolution images using the high resolution image regions containing the features from the first frame in the low resolution sequence of images.
19. The array camera of claim 18, wherein the processor is further configured by software to search the high resolution image regions containing the features from the second frame in the sequence of low resolution images for features from the first frame in the sequence of low resolution images by comparing high resolution image regions containing features from the second frame in the sequence of low resolution images to the high resolution image portions containing the features from the first frame in the sequence of low resolution images using a matching criterion.
20. The array camera of claim 19, wherein the matching criterion involves minimizing an error distance metric.
21. The array camera of claim 12, wherein the high precision locations for said plurality of detected features estimate feature location at a subpixel precision relative to the size of the pixels of the frames in the sequence of low resolution images.
Type: Application
Filed: Aug 23, 2013
Publication Date: Feb 27, 2014
Applicant: Pelican Imaging Corporation (Mountain View, CA)
Inventors: Dan Lelescu (Morgan Hill, CA), Ankit K. Jain (San Diego, CA)
Application Number: 13/975,159
International Classification: H04N 5/232 (20060101);