Calibration between depth and color sensors for depth cameras
A system described herein includes a receiver component that receives a first digital image from a color camera, wherein the first digital image comprises a planar object, and a second digital image from a depth sensor, wherein the second digital image comprises the planar object. The system also includes a calibrator component that jointly calibrates the color camera and the depth sensor based at least in part upon the first digital image and the second digital image.
Latest Microsoft Patents:
Recently there have been an increasing number of depth sensors that are available at relatively low prices. In an example, a sensor unit that communicates with a video game console includes a depth sensor. In another example, computing devices (desktops, laptops, tablet computing devices) are being manufactured with depth sensors therein. A sensor unit that includes both a color camera as well as a depth sensor can be referred to herein as a depth camera. Depth cameras have created a significant amount of interest in applications such as three-dimensional shape scanning, foreground-background segmentation, facial expression tracking, amongst others.
Depth cameras generate simultaneous streams of color images and depth images. To facilitate the applications discussed above (and other applications that employ color images and depth images), the depth sensor and color camera may be desirably calibrated. More specifically, both the color camera and the depth sensor have their own respective coordinate systems, and how such coordinate systems are aligned with respect to one another may be desirably determined to allow pixels in a color image generated by the color camera to be effectively mapped to pixels in a depth image generated by the depth sensor and vice versa.
Many difficulties exist with respect to calibrating a color camera and depth sensor. For example, color cameras have been calibrated utilizing colored patterns. Colored patterns, however, cannot be analyzed in a depth image, as such image does not include captured colors (e.g., corners of a pattern are often indistinguishable from other surface points in a depth image). Furthermore, although depth discontinuity can be observed in a depth image, boundary points of an object are generally unreliable due to unknown depth reconstruction mechanisms utilized in the depth sensor.
An exemplary approach to calibrate a color camera and depth sensor is to co-center an infrared image with a depth image. This may require, however, external infrared illumination. Additionally, commodity depth cameras typically produce relatively noisy depth images, rendering it difficult to calibrate the depth sensor with the color camera.
SUMMARYThe following is a brief summary of subject matter that is described in greater detail herein. This summary is not intended to be limiting as to the scope of the claims.
Described herein are various technologies pertaining to jointly calibrating a color camera and a depth sensor based at least in part upon images of a scene captured by the color camera and the depth sensor, wherein the scene includes a planar object. For instance, the planar object may be a checkerboard. Further, the depth sensor may be any suitable type of depth sensing system, including a triangulation system (such as stereo vision or structured light system), a depth from focus system, a depth from shape system, a depth from motion system, a time of flight system, or other suitable type of depth sensor system.
As will be described in greater detail herein, jointly calibrating the color camera and the depth sensor includes ascertaining a rotation and a translation between coordinate systems of the color camera and the depth sensor, respectively. In connection with computing these values, instructions can be output to a user that instructs the user to move a planar object, such as a checkerboard, to different positions in front of the color camera and the depth sensor. The color camera and the depth sensor may be synchronized, such that an image pair (an image from the color camera and an image from the depth sensor) include the planar object at a particular position and orientation. Rotation and translation between the coordinate systems of the color camera and the depth sensor can be ascertained based at least in part upon a plurality of such image pairs that include the planar object at various positions and orientations.
Two exemplary techniques for ascertaining the rotation and translation between the coordinate systems of the color camera and the depth sensor are described herein. In a first exemplary technique, an image generated by the color camera can be analyzed to locate the known pattern of the planar object that has been captured in such image. Because the pattern in the planar object is known, such planar object can be automatically located in the color image, and the three-dimensional orientation and position of the planar object in the color image can be computed relative to the color camera. A corresponding plane may be then fit into a corresponding image generated by the depth sensor. The plane can be fit based at least in part upon depth values in the image generated by the depth sensor. The plane fit in the image generated by the depth sensor corresponds to the observed plane in the color image after application of a rotation and translation to the plane in the depth image. Through such approach the rotation and translation between the coordinate systems of the color camera and the depth sensor can be computed.
In another exemplary approach, rather than fitting a plane into the depth image, a set of points in the depth image can be randomly sampled. A relatively large number of points in the depth image can be sampled, and at least some of such points will correspond to points of the planar object in the color image by way of a desirably computed rotation and translation between coordinate systems of the color camera and the depth sensor. If a sufficient number of points are sampled, a likelihood function can be learned and evaluated to compute the rotation and translation mentioned above.
Other aspects will be appreciated upon reading and understanding the attached Figs. and description.
Various technologies pertaining to jointly calibrating a color camera and a depth sensor will now be described with reference to the drawings, where like reference numerals represent like elements throughout. In addition, several functional block diagrams of exemplary systems are illustrated and described herein for purposes of explanation; however, it is to be understood that functionality that is described as being carried out by certain system components may be performed by multiple components. Similarly, for instance, a component may be configured to perform functionality that is described as being carried out by multiple components. Additionally, as used herein, the term “exemplary” is intended to mean serving as an illustration or example of something, and is not intended to indicate a preference.
As used herein, the terms “component” and “system” are intended to encompass computer-readable data storage that is configured with computer-executable instructions that cause certain functionality to be performed when executed by a processor. The computer-executable instructions may include a routine, a function, or the like. It is also to be understood that a component or system may be localized on a single device or distributed across several devices.
With reference now to
In an exemplary embodiment, a housing 110 may comprise the color camera 104, the depth sensor 106, and the clock 108. The housing 110 may be a portion of a sensor that is utilized in connection with a video game console to detect position and motion of a game player. In another exemplary embodiment, the housing 110 may be a portion of a computing system that includes the color camera 104 and the depth sensor 106 for purposes of video-based communications. In still yet another exemplary embodiment, the housing 110 may be for a video camera that is configured to generate three-dimensional video. These embodiments are presented for purposes of explanation and are not intended to limit the scope of the claims. For example, the combination of the color camera 104 and the depth sensor 106 can be utilized in connection with a variety of different types of applications, including three-dimensional shape scanning, foreground-background segmentation, facial expression tracking, three-dimensional image or video generation, amongst others.
Pursuant to an example, the color camera 104 and the depth sensor 106 may be directed at a user 112 that is holding or supporting a planar object 114. In an example, the planar object 114 may be a patterned object such as a game board. For instance, the planar object 114 may be a checkerboard. Moreover, the user 112 can be instructed to move the planar object 114 to a plurality of different locations, and the color camera 104 and the depth sensor 106 can capture images that include the planar object 114 at these various locations.
A calibrator component 116 is in communication with the receiver component 102 and jointly calibrates the color camera 104 and the depth sensor 106 based at least in part upon the first digital image generated by the color camera 104 and the second digital image generated by the depth sensor 106. Pursuant to an example, jointly calibrating the color camera 104 and the depth sensor 106 may comprise computing a rotation and translation between a coordinate system of the color camera 104 and a coordinate system of the depth sensor 106. In other words, the calibrator component 116 can output values that indicate how the color camera 104 is aligned and rotated with respect to the depth sensor 106.
A data store 118 can be accessible to the calibrator component 116, and the calibrator component 116 can cause the rotation and translation to be retained in the data store 118. The data store 118 may be any suitable hardware data store, including a hard drive, memory, or the like. The calibrator component 116 may utilize any suitable technique for jointly calibrating the color camera 104 and the depth sensor 106. In an exemplary embodiment, the calibrator component 116 can have knowledge of the three-dimensional orientation and position of the planar object 114 in the first digital image generated by the color camera 104 based at least in part upon a priori knowledge of the pattern of the planar object 114. As the depth sensor 106 is also directed to capture an image of the planar object 114, the calibrator component 116 can leverage the knowledge of the existence of the planar object 114 in the second digital image generated by the depth sensor 106 to compute the rotation and translation between the coordinate systems of the color camera 104 and the depth sensor 106, respectively. Specifically, the calibrator component 116 can fit a plane that corresponds to the planar object 114 in the image generated by the color camera 104 onto the second digital image generated by the depth sensor 106. Such plane can be fit based at least in part upon three-dimensional points in the second digital image generated by the depth sensor 106. The plane fit onto the image generated by the depth sensor 106 and the plane corresponding to the planar object 114 observed in the first digital image generated by the color camera 104 correspond to one another by the rotation and translation that is desirably computed. The calibrator component 116 can compute such rotation and translation and cause these values to be retained in the data store 118.
In another exemplary embodiment, the calibrator component 116 can randomly sample points in the second digital image generated by the depth sensor 106 that are known to correspond to the planar object 114 in the second digital image. Each randomly sampled point in the image generated by the depth sensor 106 will correspond to a point in the color image that corresponds to the planar object 114. Each point in the image generated by the depth sensor 106 that corresponds to the planar object 114 is related to a point in the image generated by the color camera 104 that corresponds to the planar object 114 by the desirably computed rotation and translation values. If a sufficient number of points are sampled, the calibrator component 116 can compute the values for rotation and translation. Still further, a combination of these approaches can be employed.
Moreover, while the examples provided above have referred to a single image pair (a color image and a depth image), it is to be understood that the calibrator component 116 can consider multiple image pairs with the planar object 114 placed at various different locations and orientations relative to the color camera 104 and the depth sensor 106. For instance, a minimum number of image pairs used by the calibrator component 116 to determine a rotation matrix can be 2, while a minimum number of image pairs used by the calibrator component 116 to determine a translation can be 3. The rotation and translation between the color camera 104 and the depth sensor 106 may then be computed based upon correspondence of the planar object 114 across various color image/depth image pairs.
Further, while the calibrator component 116 has been described above as jointly calibrating the color camera 104 and the depth sensor 106 through analysis of images generated thereby that include the planar object 114, in other exemplary embodiments an object captured in the images need not be entirely planar. For instance, a planar board that includes a plurality of apertures in a pattern can be utilized such that the pattern can be recognized in the first digital image generated by the color camera 104 and the pattern can also be recognized in the second digital image generated by the depth sensor 106. A correspondence between the located patterns in the first digital image and the second digital image may then be employed by the calibrator component 116 to compute the rotation and translation between respective coordinate systems of the color camera 104 and the depth sensor 106.
In yet another exemplary embodiment, the calibrator component 116 can consider point correspondences between the first digital image generated by the color camera 104 and the second digital image generated by the depth sensor 106 in connection with jointly calibrating the color camera 104 and the depth sensor 106. For instance, a user may manually indicate a point in the color image and a point in the depth image, wherein these two points correspond to one another across the images. Additionally or alternatively, image analysis techniques can be employed to automatically locate corresponding points across images generated by the color camera 104 and the depth sensor 106. For instance, the calibrator component 116 can learn a likelihood function that minimizes projected distance between corresponding point pairs across images generated by the color camera 104 and images generated by the depth sensor 106.
In yet another exemplary embodiment, the calibrator component 116 may consider distortion in the depth sensor 106 when jointly calibrating the color camera 104 with the depth sensor 106. For example, depth values generated by the depth sensor 106 may have some distortion associated therewith. A model of such distortion is contemplated and can be utilized by the calibrator component 116 when jointly calibrating the color camera 104 and the depth sensor 106.
With reference now to
sm=A[I 0]M (1)
where I is the identity matrix, 0 is the zero vector, and s can be a scale factor. In an exemplary embodiment, s=Z. A is the intrinsic matrix of the color camera 104, which can be given as follows:
where α and β are the scale factors in the image coordinate system, (u0, v0) are the coordinates of the principal point and γ is the skewness of the two image axes.
The depth sensor 106 has a second coordinate system 204 that is different from the coordinate system 202 of the color camera 104. The depth sensor 106 generally outputs an image with depth values denoted by x=[u, v, z]T, where (u, v) are the pixel coordinates, and z is the depth value. The mapping from x to the point in the three-dimensional coordinate system 204 of the depth sensor 106, Md=[Xd, Yd, Zd, 1]1, is usually known, and is denoted as Md=f(x). The rotation and translation between the color camera 104 and the depth camera or depth sensor 106 is denoted by R and t:
As mentioned above, the planar object 114 can be moved in front of the color camera 104 and the depth sensor 106. This can create n image pairs (color and depth) captured by the depth camera (the color camera 104 and the depth sensor 106). As shown, the position of the planar object 114 in the n images will be different. The model plane 204 thus has different positions and orientations relative to the position of the color camera 104. Three-dimensional coordinate systems 203a-203b (Xi, Yi, Zi) can be set up for each position of the model plane 204a and 204b across the images such that the Zi=0 plane coincides with the model plane 204. Additionally, it can be assumed that the model plane 204 has a set of M feature points. In an example, the feature points can be corners of a known pattern in the planar object 114, such as a checkerboard pattern. The feature points can be denoted as Pj, j=1, . . . , m. It can be noted that the three-dimensional coordinates of such feature points in each model plane's local coordinate system are identical. Each feature point's local three-dimensional coordinate is associated with a corresponding world coordinate as follows:
where Mij is the jth feature point of the ith image in the world coordinate system 202, Ri and ti are the rotation and translation from the ith model plane's local coordinate system 203a to the world coordinate system 202. The feature points are observed in the color image as mi,j, which are associated with Mi,j through Eq. (1).
Given the set of feature points Pj and their projections mi,j, it is desirable to recover the intrinsic matrix A, the rotations and translations between the models planes 204a and 204b and the model plane 204 Ri and ti, and the transform between the color camera 104 and the depth sensor 106 R and t. The intrinsic matrix A and the model plane positions Ri and ti (relative to the global coordinate system 202) can be computed through conventional techniques. Images generated by the depth sensor 106 can be used to compute R and t automatically.
As mentioned previously, the calibration solution for only the color camera 104 is known. Due to the use of the pinhole camera model, the following can be acquired:
sijmij=A[Ri,ti]Pj. (5)
In practice, feature points on images generated by the color camera 104 are typically extracted automatically through utilization of computer-executable algorithms, and therefore may have errors associated therewith. Accordingly, if it is assumed that Mi,j follows a Gaussian distribution with the ground truth position as its mean, e.g.,
mij˜N(
then the log likelihood function can be written as follows:
Terms related to images generated by the depth sensor 106 are now discussed. There are a set of points in the image generated by the depth sensor 106 that correspond to the model plane 204. Ki points within the quadrilateral in the depth image can be randomly sampled and denoted by Mik
which indicates that if these points are transformed to the local coordinate system of each model plane 204a-204b, the coordinate shall be zero.
Since images generated by the depth sensor 106 tend to be noisy, Mik
mik
The log likelihood function can thus be written as follows:
As mentioned above, it may be helpful to have a plurality of corresponding point pairs in images generated by the color camera 104 and images generated by the depth sensor 106. Such point pairs can be denoted as (mip
sip
Further, whether the point correspondences are manually labeled or automatically established, such point correspondences may not be accurate. According, the following can be assumed:
mip
where Φip
Combining the above information together, the overall log likelihood can be maximized as follows:
maxA,R
where ρi, i=1,2,3 are weighting parameters. This objective function can be classified as a nonlinear least squares problem, which can be solved by the calibrator component 116 using the Levenberg-Marquardt method. The result is the computation of the parameters A, Ri, tiR, t.
The above algorithms describe calibration of the color camera 104 and the depth sensor 106 with an assumption of no distortions or noise in either of the color camera 104 or the depth sensor 106. A few other parameters, however, may be desirably estimated during calibration by the calibrator component 116. These parameters can include focus, camera center, and depth mapping function for both the color camera 104 and the depth sensor 106. For instance, the color camera 104 may exhibit lens distortions and thus it may be desirable to estimate such distortions based upon the observed model planes 204a-204b in images generated by the color camera 104. Another set of unknown parameters may be in a depth mapping function. For example, an exemplary structured light-based depth camera may have a depth mapping function as follows:
where μ and υ are the scale and bias of the z value, and Ad is the intrinsic matrix of the depth sensor 106, which is typically predetermined. The other two parameters μ and υ can be used to model the calibration of the depth sensor 106 due to temperature variation or mechanical vibration, and can be estimated within the same maximum likelihood framework by the calibrator component 116.
The exemplary solution described above pertains to randomly sampling points in the image generated by the depth sensor 106. As discussed, however, the calibrator component 116 can use other approaches as alternatives to the techniques described above or in combination with such techniques. For instance, fitting the model plane 204a-204b onto the corresponding image generated by the depth sensor 106 can be undertaken by the calibrator component 116 in connection with calibrating the color camera 104 with the depth sensor 106. In an exemplary embodiment, this plane fitting can be undertaken during initialization to have a first estimate of unknown parameters. For instance, for the parameters related to the color camera 104, e.g., A, Ri, ti, a known initialization scheme can be adapted. Below, methods that can be utilized by the calibrator component 116 to provide an initial estimation of R and t between the color camera 104 and the depth sensor 106 are discussed. During the discussion below, it is assumed that A, Ri and ti of the color camera 104 are known.
For most commodity depth cameras, the color camera 104 and the depth sensor 106 are positioned relatively proximate to one another. Accordingly, it is relatively simple to automatically identify a set of points in each image generated by the depth sensor 106 that lies on the corresponding model plane 204a-204b. These points can be referred to as Mik
where nid is the normal of the model plane in the three-dimensional coordinate system of the depth sensor 106, ∥nid∥2=1, and bid is the bias from the origin. ∥nid∥ and bid can be found by the calibrator component 116 through least squares fitting.
In the coordinate system of the color camera 104 (the global coordinate system 202), the model plane can also be described by the following plane equation:
Since Ri and ti are known, the plane's normal can be represented as ni, ∥ni∥2=1, and bias from the origin bi.
The rotation matrix R may first be solved. For instance, R can be denoted as follows:
The following objective function may then be minimized with constraint:
J(R)=Σi=1n∥ni−Rnid∥+Σj=13λj(rjTrj−1)+2λ4r1Tr2+2λ5r1Tr3+2λ6r2Tr3. (26)
Such objective function can be solved in closed form as follows:
C=Σi=1nnidniT (27)
The singular value decomposition of C can be written as:
C=UDVT, (28)
where U and V are orthogonal matrices and D is a diagonal matrix. The rotation matrix is as follows:
R=VUT. (29)
The minimum number of images to determine the rotation matrix R is n=2, provided that the two model planes are not parallel to one another.
For translation, the following relationship can exist:
(nid)Tt+bid=bi. (30)
Accordingly, three non-parallel model planes can determine a unique t. If n>3, t may be solved through least squares fitting.
Another exemplary method that can be used by the calibrator component 116 to estimate the initial rotation R and translation t is through knowledge of a set of point correspondences between images generated by the color camera 104 and images generated by the depth sensor 106. Such point pairs can be denoted as (mip
sip
It can be noted that the intrinsic matrix A is known. In conventional methods, it has been shown that given three point pairs, there are in general four solutions to the rotation and translation. When one has four or more non-co-planar point pairs, the so-called POSIT algorithm can be used to find initial values of R and t.
With reference now to
With reference now to
Moreover, the acts described herein may be computer-executable instructions that can be implemented by one or more processors and/or stored on a computer-readable medium or media. The computer-executable instructions may include a routine, a sub-routine, programs, a thread of execution, and/or the like. Still further, results of acts of the methodologies may be stored in a computer-readable medium, displayed on a display device, and/or the like. The computer-readable medium may be any suitable computer-readable storage device, such as memory, hard drive, CD, DVD, flash drive, or the like. As used herein, the term “computer-readable medium” is not intended to encompass a propagated signal.
The exemplary methodology 400 facilitates jointly calibrating a color camera and depth sensor is illustrated. The methodology 400 starts at 402, and at 404 an image generated by a color camera that includes a planar object is received. Prior to receiving the image, an instruction can be output to a user with respect to placement of the planar object relative to the color camera and depth sensor. At 406, a depth image generated by a depth sensor is received, wherein the depth image additionally comprises the planar object. The image generated by the color camera and the image generated by the depth sensor may coincide with one another in time.
At 408, the color camera and the depth sensor are automatically jointly calibrated based at least in part upon the image that comprises the planar object generated by the color camera and the depth image that comprises the planar object generated by the depth sensor. Exemplary techniques for automatically jointly calibrating the color camera in the depth sensor have been described above. Further, while the above has indicated that a single image pair is used, it is to be understood that several image pairs (color images and depth images) can be utilized to jointly calibrate the color camera and depth sensor. The methodology 400 completes at 410.
Now referring to
The computing device 500 additionally includes a data store 508 that is accessible by the processor 502 by way of the system bus 506. The data store may be or include any suitable computer-readable storage, including a hard disk, memory, etc. The data store 508 may include executable instructions, images, etc. The computing device 500 also includes an input interface 510 that allows external devices to communicate with the computing device 500. For instance, the input interface 510 may be used to receive instructions from an external computer device, from a user, etc. The computing device 500 also includes an output interface 512 that interfaces the computing device 500 with one or more external devices. For example, the computing device 500 may display text, images, etc. by way of the output interface 512.
Additionally, while illustrated as a single system, it is to be understood that the computing device 500 may be a distributed system. Thus, for instance, several devices may be in communication by way of a network connection and may collectively perform tasks described as being performed by the computing device 500.
It is noted that several examples have been provided for purposes of explanation. These examples are not to be construed as limiting the hereto-appended claims. Additionally, it may be recognized that the examples provided herein may be permutated while still falling under the scope of the claims.
Claims
1. A method, comprising:
- receiving an image of a scene generated by a color camera, the scene comprising a planar object, the planar object having a known pattern;
- locating, in the image, the pattern in the planar object based upon the pattern in the planar object being known;
- computing a three-dimensional position and orientation of the planar object in the scene responsive to the pattern being located in the image, the three-dimensional position and orientation of the planar object derived from the image of the scene generated by the color camera;
- receiving a depth image of the scene generated by a depth sensor, the depth image comprises pixels having depth values; and
- jointly calibrating the color camera and the depth sensor based upon: the computed three-dimensional position and orientation of the planar object; and the depth values of the pixels in the depth image generated by the depth sensor.
2. The method of claim 1, wherein the color camera has a first coordinate system and the depth sensor has a second coordinate system, and wherein jointly calibrating the color camera and the depth sensor comprises determining a rotation and translation between the first coordinate system and the second coordinate system.
3. The method of claim 2, wherein jointly calibrating the color camera and the depth sensor comprises calculating a plurality of values of intrinsic parameters of the color camera and the depth sensor, the plurality of intrinsic parameters comprising a focus, a camera center, and a depth mapping function.
4. The method of claim 1, further comprising:
- receiving a first plurality of images of the scene that are generated by the color camera over time;
- receiving a second plurality of images of the scene that are generated by the depth sensor over time, wherein the planar object is at different locations in the scene relative to the color camera and the depth sensor in each of the images in the first plurality of images and the second plurality of images; and
- jointly calibrating the color camera and the depth sensor based upon the first plurality of images and the second plurality of images.
5. The method of claim 1, wherein the color camera is a video camera and the depth sensor comprises an infrared camera.
6. The method of claim 1, wherein the depth sensor is one of a time of flight sensor or a structured light sensor.
7. The method of claim 1, wherein the planar object is a checkerboard, the known pattern being a checkerboard pattern.
8. The method of claim 1, wherein jointly calibrating the color camera and the depth sensor further comprises
- fitting a plane on the depth image based upon the computed three-dimensional position and orientation of the planar object; and
- learning a translation and rotation between a coordinate system of the depth sensor and a coordinate system of the color camera based upon an estimated correspondence between the computed three-dimensional position and orientation of the planar object and the plane fitted on the depth image.
9. The method of claim 1, wherein jointly calibrating the color camera and the depth sensor comprises:
- sampling the values of the pixels in the depth image; and
- learning a likelihood function that is configured to output a likelihood that a particular pixel in the depth image corresponds to the planar object.
10. The method of claim 9, wherein jointly calibrating the color camera and the depth sensor further comprises learning a translation and rotation between a coordinate system of the depth sensor and a coordinate system of the color camera based upon an evaluation of the likelihood function.
11. The method of claim 1, further comprising:
- subsequent to jointly calibrating the color camera and the depth sensor, receiving a first image from the color camera;
- subsequent to jointly calibrating the color camera and the depth sensor, receiving a second image from the depth sensor; and
- overlaying at least a portion of the first image onto the second image to generate a three- dimensional image based upon the calibrating of the color camera and the depth sensor.
12. A system comprising:
- a receiver component that receives: a first digital image of a scene from a color camera, wherein the scene comprises a planar object that has a known pattern; and a second digital image of the scene from a depth sensor, the first digital image and the second digital image being coincident in time; and a calibrator component that jointly calibrates the color camera and the depth sensor based upon: a computed position and orientation of the planar object in a coordinate system of the first digital image; and values of pixels in the second digital image that correspond to the planar object.
13. The system of claim 12 comprised by a gaming console.
14. The system of claim 12, wherein the color camera and the depth sensor are included together in a housing.
15. The system of claim 12, wherein the planar object is a checkerboard.
16. The system of claim 12, wherein the calibrator component outputs a rotation and translation between the coordinate system of the color camera and a coordinate system of the depth sensor.
17. The system of claim 16, further comprising:
- a mapper component that maps pixels of an image generated by the color camera to pixels of an image generated by the depth sensor.
18. The system of claim 17, wherein the mapper component generates a three-dimensional image based upon the pixels of the image generated by the color camera being mapped to the pixels of the image generated by the depth sensor.
19. A computer-readable data storage medium comprising instructions that, when executed by a processor, cause the processor to perform acts comprising:
- outputting at least one instruction to a user with respect to placement of a planar object having a known pattern relative to a color camera and a depth sensor;
- subsequent to outputting the at least one instruction, causing the color camera to capture a first image of a scene that includes the planar object;
- causing the depth sensor to capture a second image of the scene, the first image and the second image being coincident in time;
- computing a position and orientation of the planar object in a coordinate system of the color camera;
- identifying a plane that includes the planar object in the coordinate system of the color camera, the plane identified based upon the computed position and orientation of the planar object in the coordinate system of the color camera;
- identifying pixels in the second image that correspond to the planar object based upon values of the pixels;
- fitting the plane in the second image based upon the values of the pixels; and
- computing an estimated translation and rotation between the coordinate system of the color camera and a coordinate system of the depth sensor based upon the fitting of the plane in the second image.
20. The computer-readable data storage medium of claim 19, the acts further comprising:
- subsequent to computing the estimated translation and rotation between the coordinate system of the color camera and the coordinate system of the depth sensor, generating a three-dimensional color image based upon the estimated translation and rotation.
6373518 | April 16, 2002 | Sogawa |
6633664 | October 14, 2003 | Minamida et al. |
6768509 | July 27, 2004 | Bradski et al. |
6816187 | November 9, 2004 | Iwai et al. |
6858826 | February 22, 2005 | Mueller et al. |
7912252 | March 22, 2011 | Ren et al. |
8090194 | January 3, 2012 | Golrdon et al. |
20050231476 | October 20, 2005 | Armstrong |
20060128087 | June 15, 2006 | Bamji et al. |
20070115484 | May 24, 2007 | Huang et al. |
20090201384 | August 13, 2009 | Kang et al. |
20090213240 | August 27, 2009 | Sim et al. |
20090231425 | September 17, 2009 | Zalweski |
20100207938 | August 19, 2010 | Yau et al. |
20100225743 | September 9, 2010 | Florencio et al. |
20100235129 | September 16, 2010 | Sharma et al. |
20100303341 | December 2, 2010 | Hausler |
20110018973 | January 27, 2011 | Takayama |
20110054295 | March 3, 2011 | Masumoto et al. |
20110069892 | March 24, 2011 | Tsai et al. |
20110150101 | June 23, 2011 | Liu |
20120026296 | February 2, 2012 | Lee |
- J. Smisek, J. Jancosek, & T. Pajdla, “3D with Kinect”, 2011 IEEE Int'l conf. on Computer Vision Workshops 1154-1160 (Nov. 2011).
- C. Daniel Herrera, J. Kannala, & J. Heikkila, “Accurate and Practical Calibration of a Calibration of a Depth and Color Camera Pair”, 6855 Lecture Notes in Computer Sci. 437-445 (Aug. 2011).
- C. Daniel Herrera, J. Kannala, & J. Heikkila, “Joint Depth and Color Camera Calibration with Distortion Correction”, 34 IEEE Transactions on Patern Analysis & Machine Intelligence 2058-2064 (May 2012).
- C. Raposo, J.P. Barreto, & U. Nunes, Fast and Accurate Calibration of a Kinect Sensor, 2013 Int'l Conf. on 3D Vision 342-349 (2013).
- Frick, et al., “Generation of 3D-TV LDV-Content with Time of Flight Camera”, Retrieved at <<http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.155.5198&rep=rep1&type=pdf>>, 3DTV Conference: The True Vision—Capture, Transmission and Display of 3D Video, May 4-6, 2009, pp. 1-4.
- Guan, et al., “3D Object Reconstruction with Heterogeneous Sensor Data”, Retrieved at <<http://www.cc.gatech.edu/conferences/3DPVT08/Program/Papers/paper108.pdf>>, The Fourth International Symposium on 3D Data Processing, Visualization and Transmission, 2008, pp. 1-8.
- Cui, et al., “3D Shape Scanning with a Time-of-Flight Camera”, Retrieved at <<http://ai.stanford.edu/˜schuon/sr/cvpr10—scanning.pdf>>, IEEE Conference on Computer Vision and Pattern Recognition, Jun. 13-18, 2010, pp. 1-8.
- Crabb, et al., “Real-Time Foreground Segmentation via Range and Color Imaging”, Retrieved at <<http://mplab.ucsd.edu/wp-content/uploads/CVPR2008/WorkShops/data/papers/221.pdf>>, IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, Jun. 23-28, 2008, pp. 1-5.
- Cai, et al., “3D Deformable Face Tracking with a Commodity Depth Camera”, Retrieved at <<http://research.microsoft.com/en-us/um/people/zhang/papers/eccv2010-facetrackingwithdepthcamera.pdf>>, Proceedings of the 11th European conference on computer vision: Part III, 2010, pp. 229-242.
- Tsai, Roger Y., “A Versatile Camera Calibration Technique for High-Accuracy 3D Machine Vision Metrology using Off-the-Shelf TV Cameras and Lenses”, Retrieved at <<http://www.vision.caltech.edu/bouguetj/calib—doc/papers/Tsai.pdf>>, IEEE Journal of Robotics and Automation, vol. 3, No. 4, Aug. 1987, pp. 323-344.
- Zhang, et al., “A Flexible New Technique for Camera Calibration”, Retrieved at <<http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=888718>>, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 22, No. 11, Nov. 2000, pp. 1330-1334.
- Arun, et al., “Least-Squares Fitting of Two 3-D Point Sets”, Retrieved at <<http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=4767965>>, IEEE Transactions on Pattern Analysis and Machine Intelligenc, vol. PAMI-9, No. 5, Sep. 1987, pp. 698-700.
- Fischler, et al., “Random Sample Consensus : A Paradigm for Model Fitting with Apphcatlons to Image Analysis and Automated Cartography”, Retrieved at <<http://www.ai.sri.com/pubs/files/836.pdf>>, Communications of the ACM, vol. 24, No. 6, 1981, pp. 381-395.
- Yuan, Joseph S. C., “A General Photogrammetric Method for Determining Object Position and Orientation”, Retrieved at <<http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=88034>>, IEEE Transactions on Robotics and Automation, vol. 5, No. 2, Apr. 1989, pp. 129-142.
- Dementhon, et al., “Model-Based Object Pose in 25 Lines of Code”, Retrieved at <<http://www.cfar.umd.edu/˜daniel/daniel—papersfordownload/Pose25Lines.ps.gz>>, Computer Vision—ECCV'95, 1995, pp. 1-30.
- “International Search Report”, Mailed Date: Dec. 21, 2012, Application No. PCT/US2012/045879, Filed Date: Dec. 21, 2012, pp. 1-9.
Type: Grant
Filed: Jul 8, 2011
Date of Patent: Feb 23, 2016
Patent Publication Number: 20130010079
Assignee: Microsoft Technology Licensing, LLC (Redmond, WA)
Inventors: Cha Zhang (Sammamish, WA), Zhengyou Zhang (Bellevue, WA)
Primary Examiner: Dave Czekaj
Assistant Examiner: David N Werner
Application Number: 13/178,494
International Classification: H04N 13/02 (20060101); G06T 7/00 (20060101); H04N 13/00 (20060101);