RESOLVING HOMOGRAPHY DECOMPOSITION AMBIGUITY BASED ON VIEWING ANGLE RANGE
The homography between captured images of a planar object is determined and decomposed into at least one possible solution, and typically at least two ambiguous solutions. The removal of the ambiguity between the two solutions, or validation of a single solution, is performed using a viewing angle range. The viewing angle range may be used by comparing the viewing angle range to the orientation of each solution as derived from the rotation matrix resulting from the homography decomposition. Any solution with an orientation outside the viewing angle range may be eliminated as a solution.
Latest QUALCOMM Incorporated Patents:
- Layer 1 (L1) and layer 2 (L2) based mobility procedures
- Enhancements to observed time difference of arrival positioning of a mobile device
- Methods and apparatus to facilitate managing multi-sim concurrent mode for co-banded or spectrum overlap carriers
- Signaling to support power utilization modes for power saving
- Application client and edge application server discovery with service authorization and location service
This application claims priority under 35 USC 119 to U.S. Provisional Application No. 61/533,733, filed Sep. 12, 2011, and entitled “Resolving Homography Decomposition Ambiguity,” which is assigned to the assignee hereof and which is incorporated herein by reference.
BACKGROUNDVision based tracking techniques use images captured by a mobile platform to determine the position and orientation (pose) of the mobile platform with respect to an object in the environment. Tracking is useful for many applications such as navigation and augmented reality, in which virtual objects are inserted into a user's view of the real world.
One type of vision based tracking initializes a reference patch by detecting a planar surface in the environment. The surface is typically detected using multiple images of the surface the homography between the two images is computed and used to estimate 3D locations for the points detected on the surface. Any two camera images of the same planar surface are related by a 3×3 homography matrix h. The homography h can be decomposed into rotation R and translation t between two images. The pose information [R|t] may then be used for navigation, augmented reality or other such applications.
However, in most cases, the decomposition of homography h yields multiple possible solutions. Only one of these solutions, however, represents the actual planar surface. Thus, there is an ambiguity in the decomposition of homography h that must be resolved. Known methods of resolving homography decomposition ambiguity require the use of extra information to select the correct solution, such as additional images or prior knowledge of the planar surface.
By way of example, tracking technologies such as that described by Georg Klein and David Murray, “Parallel Tracking and Mapping on a Camera Phone”, In Proc. International Symposium on Mixed and Augmented Reality (ISMAR), 4 pages, 2009 (“PTAM”), suffers from the ambiguity in the pose selection after homography decomposition. PTAM requires additional video frames, i.e., images, to resolve the ambiguity. For each possible solution, PTAM computes the 3D camera pose and compares the pose reprojection error for a number of subsequent frames. When the average projection error for one solution is greater than another, such as two times greater, the solution with the greater error is eliminated. Using pose reprojection to resolve the ambiguity, however, takes a long time to converge and often yields incorrect results.
Another approach used to resolve the ambiguity is to choose the homography solution with normal closest to the initial orientation of the camera. This approach, however, restricts the user to always begin close to a head-on orientation and move camera away from that position.
In an approach described by D. Santosh Kumar and C. V. Jawahar, “Robust Homography-Based Control for Camera Positioning in Piecewise Planar Environments”, Indian Conference on Computer Vision, Graphics and Image Processing (ICVGIP), 906-918 (2006), another planar surface in space is required or prior knowledge about the plane is needed to select the correct solution. Thus, this approach has limited practical application.
SUMMARYThe homography between captured images of a planar object is determined and decomposed into at least one possible solution, and typically at least two ambiguous solutions. The removal of the ambiguity between the two solutions, or validation of a single solution, is performed using a viewing angle range. The viewing angle range may be used comparing the viewing angle range to the orientation of each solution as derived from the rotation matrix resulting from the homography decomposition. Any solution with an orientation outside the viewing angle range may be eliminated as a solution.
In one embodiment, a method includes capturing two images of a planar object with at least one camera from a first position and a second position; determining a homography between the two images; decomposing the homography to obtain at least one possible solution for the second position; using a viewing angle range to eliminate the at least one possible solution; and storing any remaining solution for the second position.
In another embodiment, an apparatus includes a camera for capturing images of a planar object; and a processor coupled to receive two images captured from a first position and a second position, the processor configured to determine a homography between the two images, decompose the homography to obtain at least one possible solution for the second position, use a viewing angle range to eliminate the at least one possible solution, and store any remaining solution for the second position in a memory coupled to the processor.
In another embodiment, an apparatus includes means for capturing two images of a planar object with at least one camera from a first position and a second position; means for determining a homography between the two images; means for decomposing the homography to obtain at least one possible solution for the second position; means for using a viewing angle range to eliminate the at least one possible solution; and means for storing any remaining solution for the second position.
In yet another embodiment, a non-transitory computer-readable medium including program code stored thereon includes program code to determine a homography between two images of a planar object captured from different positions by at least one camera; program code to decompose the homography to obtain at least one possible solution; program code to use a viewing angle range to eliminate the at least one possible solution; and program code to store any remaining solution.
As shown in
q′≈hq eq. 1
The homography h between two views of a planar surface can be decomposed into the rotation matrix R, translation t and the normal n using a well-known procedure described in Faugeras, O., Lustman, F.: “Motion and structure from motion in a piecewise planar environment”, International Journal of Pattern Recognition and Artificial Intelligence 2 (1988) 485-508, which is incorporated herein by reference. In most general cases, the decomposition of homography h generates four possible solutions. Two solutions could be eliminated by enforcing non-crossing constraints and visibility constraints. The non-crossing constraint requires that the two camera images are captured from the same side of the planar object, e.g., both images are captured from above the planar object. The visibility constraint requires that all the 3D points on the planar object must be in front of the camera when the images are captured. However, the ambiguity between the other two possible solutions remains.
The ambiguity between the two remaining solutions may be resolved (or validated if only one solution remains) using a valid viewing angle range between which the correct solution is presumed to lie. The valid viewing angle range is based on the viewing direction of the user 201.
Thus, it can be seen that the viewing direction with respect to a horizontal, vertical (or any orientation there between) is from the first (I) quadrant, in which case a first valid viewing range α, e.g., between 0° and 90°, is used or in the fourth (IV) quadrant, in which case a second valid viewing range α, e.g., between 270° and 360°, is used. The quadrant that the mobile platform 100 is located, and thus, the viewing direction, may be determined based on orientation sensors, user input, heuristics, or any other desired manner.
Referring back to
In order to resolve the ambiguity in the possible solutions 200 and 202, the orientations θ200 and θ202 are compared to the valid viewing angle range α. As discussed, above, the viewing angle range α is a range of angles in which the user 201 and mobile platform 100 are most like positioned. The viewing angle range α may be defined as a predetermined angular range {start angle, end angle} extending in the same general direction as the orientation for each possible solution encompassing the likely position of the mobile platform 100. By way of example,
Any possible homography h decomposition solution with an orientation θ with respect to the n axis that is outside the viewing angle range α may be eliminated as a valid solution. Thus, for example, as illustrated in
As discussed above, the output of the orientation sensors 116 may be used to indicate the orientation of the mobile platform 100 with respect to gravity, from which the viewing direction may be determined and the viewing angle range α adjusted as appropriate. For example, if the orientation sensors 116 indicate that while capturing the second image the mobile platform 100 is held in up-side down while in position B, the viewing angle range α may be modified to extend from 0° to 90°, thus, making solution 202 the valid solution and eliminating solution 200. Additionally, a heuristic analysis of the possible solutions or user input may be used to determine whether mobile platform is in the first (I) quadrant or the fourth (IV) quadrant, and thus, select the appropriate viewing angle range α.
In the rare case when both possible solutions 200 and 202 fall within the viewing angle range α, the correct solution may be selected by continuing to track both solutions until the orientations from one of the solutions falls outside the viewing angle range α as a result of user generated motion of the mobile platform 100. If the ambiguity remains unresolved past a threshold, e.g., a desired period of time or number of images, the correct solution may be selected using other known techniques, such as pose reprojection error, if desired.
Additionally, there are cases where only one possible solution may be generated from homography decomposition and the solution may be incorrect due to poor correlation of the 2D points. In this case, the same process may be used to validate the solution, i.e., if the possible solution has an orientation θ with respect to the n axis that is outside the viewing angle range α, the solution fails, and the process may be reset rather than assuming the only solution is correct.
The mobile platform 100 also includes a control unit 160 that is connected to and communicates with the camera 114 and orientation sensors 116. The control unit 160 accepts and processes images captured by camera 114 or multiple cameras, signals from the orientation sensors 116 and controls the display 112. The control unit 160 may be provided by a processor 161 and associated memory 164, hardware 162, software 165, and firmware 163. The control unit 160 may include an image processing unit 166 that performs homography decomposition on two images captured by the camera 114. The control unit 160 further includes a solution validating unit 168 that receives the solutions from the homography decomposition and determines if a solution is correct based on the viewing angle range as described in
The image processing unit 166 and solution validating unit 168 are illustrated separately from processor 161 for clarity, but may be part of the processor 161 or implemented in the processor based on instructions in the software 165 which is run in the processor 161. It will be understood as used herein that the processor 161 can, but need not necessarily include, one or more microprocessors, embedded processors, controllers, application specific integrated circuits (ASICs), digital signal processors (DSPs), and the like. The term processor is intended to describe the functions implemented by the system rather than specific hardware. Moreover, as used herein the term “memory” refers to any type of computer storage medium, including long term, short term, or other memory associated with the mobile platform, and is not to be limited to any particular type of memory or number of memories, or type of media upon which memory is stored.
The methodologies described herein may be implemented by various means depending upon the application. For example, these methodologies may be implemented in hardware 162, firmware 163, software 165, or any combination thereof. For a hardware implementation, the processing units may be implemented within one or more application specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPDs), programmable logic devices (PLDs), field programmable gate arrays (FPGAs), processors, controllers, micro-controllers, microprocessors, electronic devices, other electronic units designed to perform the functions described herein, or a combination thereof.
For a firmware and/or software implementation, the methodologies may be implemented with modules (e.g., procedures, functions, and so on) that perform the functions described herein. Any machine-readable medium tangibly embodying instructions may be used in implementing the methodologies described herein. For example, software codes may be stored in memory 164 and executed by the processor 161. Memory may be implemented within or external to the processor 161. If implemented in firmware and/or software, the functions may be stored as one or more instructions or code on a computer-readable medium. Examples include non-transitory computer-readable media encoded with a data structure and computer-readable media encoded with a computer program. Computer-readable media includes physical computer storage media. A storage medium may be any available medium that can be accessed by a computer. By way of example, and not limitation, such computer-readable media can comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium that can be used to store desired program code in the form of instructions or data structures and that can be accessed by a computer; disk and disc, as used herein, includes compact disc (CD), laser disc, optical disc, digital versatile disc (DVD), floppy disk and Blu-ray disc where disks usually reproduce data magnetically, while discs reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable media.
Although the present invention is illustrated in connection with specific embodiments for instructional purposes, the present invention is not limited thereto. Various adaptations and modifications may be made without departing from the scope of the invention. Therefore, the spirit and scope of the appended claims should not be limited to the foregoing description.
Claims
1. A method comprising:
- capturing two images of a planar object with at least one camera from a first position and a second position;
- determining a homography between the two images;
- decomposing the homography to obtain at least one possible solution for the second position;
- using a viewing angle range to eliminate the at least one possible solution; and
- storing any remaining solution for the second position.
2. The method of claim 1, wherein a plurality of possible solutions are obtained and the viewing angle range is used to eliminate at least one of the plurality of possible solutions.
3. The method of claim 1, wherein decomposing the homography produces a rotation matrix associated with the at least one possible solution, wherein using the viewing angle range comprises:
- extracting an orientation with respect to normal of the planar object from the rotation matrix;
- comparing the orientation to the viewing angle range; and
- eliminating any possible solution with the orientation outside the viewing angle range.
4. The method of claim 3, wherein the viewing angle range is a predefined angular range between the planar object and the normal of the planar object.
5. The method of claim 3, wherein the viewing angle range is between approximately 270° and 360°.
6. The method of claim 1, the method further comprising:
- determining an orientation of the at least one camera with respect to gravity; and
- using the orientation of the at least one camera with respect to gravity to define the viewing angle range.
7. The method of claim 1, wherein using the viewing angle range to eliminate the at least one possible solution does not use prior knowledge of the planar object and does not use additional images of the planar object.
8. The method of claim 1, wherein a plurality of possible solutions are obtained and each of the plurality of possible solutions are within the viewing angle range, the method further comprising tracking each of the plurality of possible solutions until a solution is outside the viewing angle range before using the viewing angle range to eliminate the solution that is outside the viewing angle range.
9. An apparatus comprising:
- a camera for capturing images of a planar object; and
- a processor coupled to receive two images captured from a first position and a second position, the processor configured to determine a homography between the two images, decompose the homography to obtain at least one possible solution for the second position, use a viewing angle range to eliminate the at least one possible solution, and store any remaining solution for the second position in a memory coupled to the processor.
10. The apparatus of claim 9, wherein the at least one possible solution is a plurality of possible solutions and wherein the processor is configured to use the viewing angle range to eliminate at least one of the plurality of possible solutions.
11. The apparatus of claim 9, wherein the processor is configured to produce a rotation matrix associated with the at least one possible solution, wherein the processor is configured to use the viewing angle range by being configured to:
- extract an orientation with respect to normal of the planar object from the rotation matrix;
- compare the orientation to the viewing angle range; and
- eliminate any possible solution with the orientation outside the viewing angle range.
12. The apparatus of claim 11, wherein the viewing angle range is predefined and is between the planar object and the normal of the planar object.
13. The apparatus of claim 11, wherein the viewing angle range is between approximately 270° and 360°.
14. The apparatus of claim 9, further comprising orientation sensors coupled to the processor, the processor being further configured to determine an orientation of the camera with respect to gravity using the orientation sensors, and to use the orientation of the camera with respect to gravity to define the viewing angle range.
15. The apparatus of claim 9, wherein the processor is configured to use the viewing angle range to eliminate the at least one possible solution without prior knowledge of the planar object and without additional images of the planar object.
16. The apparatus of claim 9, wherein a plurality of possible solutions are obtained and each of the plurality of possible solutions are within the viewing angle range, wherein the processor is further configured to track each of the plurality of possible solutions until a solution is outside the viewing angle range before the processor uses the viewing angle range to eliminate the solution that is outside the viewing angle range.
17. An apparatus comprising:
- means for capturing two images of a planar object with at least one camera from a first position and a second position;
- means for determining a homography between the two images;
- means for decomposing the homography to obtain at least one possible solution for the second position;
- means for using a viewing angle range to eliminate the at least one possible solution; and
- means for storing any remaining solution for the second position.
18. The apparatus of claim 17, wherein the means for decomposing the homography produces a plurality of possible solutions and wherein the means for using the viewing angle range eliminates at least one of the plurality of possible solutions.
19. The apparatus of claim 17, wherein the means for decomposing the homography produces a rotation matrix associated with the at least one possible solution, wherein the means for using the viewing angle range comprises:
- means for extracting an orientation with respect to normal of the planar object from the rotation matrix;
- means for comparing the orientation to the viewing angle range; and
- means for eliminating any possible solution with the orientation outside the viewing angle range.
20. The apparatus of claim 19, wherein the viewing angle range is predefined and is between the planar object and the normal of the planar object.
21. The apparatus of claim 19, wherein the viewing angle range is between approximately 270° and 360°.
22. The apparatus of claim 17, the method further comprising:
- means for determining an orientation of the at least one camera with respect to gravity;
- means for using the orientation of the at least one camera with respect to gravity to define the viewing angle range.
23. The apparatus of claim 17, wherein the means for using the viewing angle range to eliminate the at least one possible solution does not use prior knowledge of the planar object and does not use additional images of the planar object.
24. The apparatus of claim 17, wherein a plurality of possible solutions are obtained and each of the plurality of possible solutions are within the viewing angle range, the apparatus further comprising means for tracking each of the plurality of possible solutions until a solution is outside the viewing angle range before the means for eliminating eliminates the solution that is outside the viewing angle range.
25. A non-transitory computer-readable medium including program code stored thereon, comprising:
- program code to determine a homography between two images of a planar object captured from different positions by at least one camera;
- program code to decompose the homography to obtain at least one possible solution;
- program code to use a viewing angle range to eliminate the at least one possible solution; and
- program code to store any remaining solution.
26. The non-transitory computer-readable medium of claim 25, wherein a plurality of possible solutions are obtained and the viewing angle range is used to eliminate at least one of the plurality of possible solutions.
27. The non-transitory computer-readable medium of claim 25, wherein a rotation matrix is produced that is associated with the at least one possible solution, wherein the program code to use the viewing angle range comprises:
- program code to extract an orientation with respect to normal of the planar object from the rotation matrix;
- program code to compare the orientation to the viewing angle range; and
- program code to eliminate any possible solution with the orientation outside the viewing angle range.
28. The non-transitory computer-readable medium of claim 27, wherein the viewing angle range is predefined and is between the planar object and the normal of the planar object.
29. The non-transitory computer-readable medium of claim 27, wherein the viewing angle range is between approximately 270° and 360°.
30. The non-transitory computer-readable medium of claim 25, further comprising:
- program code to determine an orientation of the at least one camera with respect to gravity;
- program code to use the orientation of the at least one camera with respect to gravity to define the viewing angle range.
31. The non-transitory computer-readable medium of claim 25, wherein the program code to use the viewing angle range to eliminate the at least one possible solution does not use prior knowledge of the planar object and does not use additional images of the planar object.
32. The non-transitory computer-readable medium of claim 25, wherein a plurality of possible solutions are obtained and each of the plurality of possible solutions are within the viewing angle range, the non-transitory computer-readable medium further comprising program code to track each of the plurality of possible solutions until a solution is outside the viewing angle range before the solution that is outside the viewing angle range is eliminated.
Type: Application
Filed: Jan 27, 2012
Publication Date: Mar 14, 2013
Applicant: QUALCOMM Incorporated (San Diego, CA)
Inventor: Dheeraj Ahuja (San Diego, CA)
Application Number: 13/360,505
International Classification: G06K 9/00 (20060101);