METHODS, SYSTEMS, AND COMPUTER-READABLE STORAGE MEDIA FOR GENERATING STEREOSCOPIC CONTENT VIA DEPTH MAP CREATION

Info

Publication number: 20110025830
Type: Application
Filed: Jul 23, 2010
Publication Date: Feb 3, 2011
Applicant: 3DMEDIA CORPORATION (Durham, NC)
Inventors: Michael McNamer (Apex, NC), Patrick Mauney (Raleigh, NC), Tassos Markas (Chapel Hill, NC)
Application Number: 12/842,257

Abstract

Methods, systems, and computer program products for generating stereoscopic content via depth map creation are disclosed herein. According to one aspect, a method includes receiving a plurality of images of a scene captured at different focal planes. The method can also include identifying a plurality of portions of the scene in each captured image. Further, the method can include determining an in-focus depth of each portion based on the captured images for generating a depth map for the scene. Further, the method can include generating the other image of the stereoscopic image pair based on the captured image where the intended subject is found to be in focus and the depth map.

Description

Description

CROSS REFERENCE TO RELATED APPLICATIONS

This application claims the benefit of U.S. provisional patent application No. 61/230,138, filed Jul. 31, 2009, the disclosure of which is incorporated herein by reference in its entirety. The disclosures of the following U.S. provisional patent applications, commonly owned and simultaneously filed Jul. 31, 2009, are all incorporated by reference in their entirety: U.S. provisional patent application No. 61/230,131; and U.S. provisional patent application No. 61/230,133.

TECHNICAL FIELD

The subject matter disclosed herein relates to generating three-dimensional images. In particular, the subject matter disclosed herein relates to methods, systems, and computer-readable storage media for generating stereoscopic content via depth map creation.

BACKGROUND

Stereoscopic, or three-dimensional, imagery is based on the principle of human vision. Two separate detectors detect the same object or objects in a scene from slightly different angles and project them onto two planes. The resulting images are transferred to a processor which combines them and gives the perception of the third dimension, i.e. depth, to a scene.

Many techniques of viewing stereoscopic images have been developed and include the use of colored or polarizing filters to separate the two images, temporal selection by successive transmission of images using a shutter arrangement, or physical separation of the images in the viewer and projecting them separately to each eye. In addition, display devices have been developed recently that are well-suited for displaying stereoscopic images. For example, such display devices include digital cameras, personal computers, digital picture frames, high-definition televisions (HDTVs), and the like.

The use of digital image capture devices, such as digital still cameras, digital camcorders (or video cameras), and phones with built-in cameras, for use in capturing digital images has become widespread and popular. Because images captured using these devices are in a digital format, the images can be easily distributed and edited. For example, the digital images can be easily distributed over networks, such as the Internet. In addition, the digital images can be edited by use of suitable software on the image capture device or a personal computer.

Digital images captured using conventional image capture devices are two-dimensional. It is desirable to provide methods and systems for using conventional devices for generating three-dimensional images. In addition, it is desirable to provide methods and systems for aiding users of image capture devices to select appropriate image capture positions for capturing two-dimensional images for use in generating three-dimensional images. Further, it is desirable to provide methods and systems for altering the depth perceived in three-dimensional images.

SUMMARY

Methods, systems, and computer program products for generating stereoscopic content via depth map creation are disclosed herein. According to one aspect, a method includes receiving a plurality of images of a scene captured at different focal planes. The method can also include identifying a plurality of portions of the scene in each captured image. Further, the method can include determining an in-focus depth of each portion based on the captured images for generating a depth map for the scene. Further, the method can include generating the other image of the stereoscopic image pair based on the captured image where the intended subject is found to be in focus and the depth map.

According to another aspect, a method for generating a stereoscopic image pair by altering a depth map can include receiving an image of a scene. The method can also include receiving a depth map associated with at least one captured image of the scene. The depth map can define depths for each of a plurality of portions of at least one captured image. Further, the method can include receiving user input for changing, in the depth map, the depth of at least one portion of at least one captured image. The method can also include generating a stereoscopic image pair of the scene based on the received image of the scene and the changed depth map.

According to an aspect, a system for generating a three-dimensional image of a scene is disclosed. The system may include at least one computer processor and memory configured to: receive a plurality of images of a scene captured at different focal planes; identify a plurality of portions of the scene in each captured image; determine an in-focus depth of each portion based on the captured images for generating a depth map for the scene; identify the captured image where the intended subject is found to be in focus as being one of the images of a stereoscopic image pair; and generate the other image of the stereoscopic image pair based on the identified captured image and the depth map.

According to another aspect, the computer processor and memory are configured to: scan a plurality of focal planes ranging from zero to infinity; and capture a plurality of images, each at a different focal plane.

According to another aspect, the system includes an image capture device for capturing the plurality of images.

According to another aspect, the image capture device comprises at least one of a digital still camera, a video camera, a mobile phone, and a smart phone.

According to another aspect, the computer processor and memory are configured filter the portions of the scene for generating a filtered image; apply thresholded edge detection to the filtered image; and determine whether each filtered portion is in focus based on the applied threshold edge detection.

According to another aspect, the computer processor and memory are configured to: identify at least one object in each captured image; and generate a depth map for the at least one object.

According to another aspect, the at least one object is a target subject. The computer processor and memory are configured to determine one of the captured images having the highest contrast based on the target subject.

According to another aspect, the computer processor and memory are configured to generate the other image of the stereoscopic pair based on translation and perspective projection.

According to another aspect, the computer processor and memory are configured to generate a three-dimensional image of the scene using the stereoscopic image pairs.

According to another aspect, the computer processor and memory are configured to implement one or more of registration, rectification, color correction, matching edges of the pair of images, transformation, depth adjustment, motion detection, and removal of moving objects.

According to another aspect, the computer processor and memory are configured to display the three-dimensional image on a suitable three-dimensional image display.

According to another aspect, the computer processor and memory are configured to display the three-dimensional image on one of a digital still camera, a computer, a video camera, a digital picture frame, a set-top box, and a high-definition television.

BRIEF DESCRIPTION OF THE DRAWINGS

The foregoing summary, as well as the following detailed description of preferred embodiments, is better understood when read in conjunction with the appended drawings. For the purposes of illustration, there is shown in the drawings exemplary embodiments; however, the invention is not limited to the specific methods and instrumentalities disclosed. In the drawings:

FIG. 1 is a block diagram of an exemplary device for creating three-dimensional images of a scene according to embodiments of the present invention;

FIG. 2 is a flow chart of an exemplary method for generating a stereoscopic image pair of a scene using a depth map and the device shown in FIG. 1, alone or together with any other suitable device described herein, in accordance with embodiments of the present invention;

FIGS. 3A and 3B are a flow chart of an exemplary method of a sharpness/focus analysis procedure in accordance with embodiments of the present invention;

FIG. 4 is schematic diagram of an image-capture, “focus scan” procedure, which facilitates later conversion to stereoscopic images, and an associated table according to embodiments of the present invention;

FIG. 5 illustrates several exemplary images related to sharpness/focus analysis with optional image segmentation according to embodiments of the present invention;

FIG. 6 illustrates schematic diagrams showing close and medium-distance convergence points according to embodiments of the present invention;

FIG. 7 is a schematic diagram showing a translational offset determination technique according to embodiments of the present invention;

FIG. 8 is a schematic diagram showing pixel repositioning via perspective projection with translation according to embodiments of the present invention; and

FIG. 9 illustrates an exemplary environment for implementing various aspects of the subject matter disclosed herein.

DETAILED DESCRIPTION

The subject matter of the present invention is described with specificity to meet statutory requirements. However, the description itself is not intended to limit the scope of this patent. Rather, the inventors have contemplated that the claimed subject matter might also be embodied in other ways, to include different steps or elements similar to the ones described in this document, in conjunction with other present or future technologies. Moreover, although the term “step” may be used herein to connote different aspects of methods employed, the term should not be interpreted as implying any particular order among or between various steps herein disclosed unless and except when the order of individual steps is explicitly described.

The present invention includes various embodiments for the creation and/or alteration of a depth map for an image using a digital still camera or other suitable device as described herein. Using the depth map for the image, a stereoscopic image pair and its associated depth map may be rendered. These processes may be implemented by a device such as a digital camera or any other suitable image processing device.

FIG. 1 illustrates a block diagram of an exemplary device 100 for generating three-dimensional images or a stereoscopic image pair of a scene using a depth map according to embodiments of the present invention. In this example, device 100 is a digital camera capable of capturing several consecutive, still digital images of a scene. In another example, the device 100 may be a video camera capable of capturing a video sequence including multiple still images of a scene. The device may generate a stereoscopic image pair using a depth map as described in further detail herein. A user of the device 100 may position the camera in different positions for capturing images of different perspective views of a scene. The captured images may be suitably stored, analyzed and processed for generating three-dimensional images using a depth map as described herein. For example, subsequent to capturing the images of the different perspective views of the scene, the device 100, alone or in combination with a computer, may use the images for generating a three-dimensional image of the scene and for displaying the three-dimensional image to the user.

Referring to FIG. 1, the device 100 includes a sensor array 102 of charge coupled device (CCD) sensors or CMOS sensors which may be exposed to a scene through a lens and exposure control mechanism as understood by those of skill in the art. The device 100 may also include analog and digital circuitry such as, but not limited to, a memory 104 for storing program instruction sequences that control the device 100, together with a CPU 106, in accordance with embodiments of the present invention. The CPU 106 executes the program instruction sequences so as to cause the device 100 to expose the sensor array 102 to a scene and derive a digital image corresponding to the scene. The digital image may be stored in the memory 104. All or a portion of the memory 104 may be removable, so as to facilitate transfer of the digital image to other devices such as a computer 108. Further, the device 100 may be provided with an input/output (I/O) interface 110 so as to facilitate transfer of digital image even if the memory 104 is not removable. The device 100 may also include a display 112 controllable by the CPU 106 and operable to display the images for viewing by a user.

The memory 104 and the CPU 106 may be operable together to implement an image generator function 114 for generating three-dimensional images of a scene using a depth map in accordance with embodiments of the present invention. The image generator function 114 may generate a three-dimensional image of a scene using two or more images of the scene captured by the device 100. FIG. 2 illustrates a flow chart of an exemplary method for generating a stereoscopic image pair of a scene using a depth map and the device 100, alone or together with any other suitable device, in accordance with embodiments of the present invention. Referring to FIG. 2, the method includes receiving 200 a plurality of images of a scene captured at different focal points. For example, all or a portion of a focal plane from zero to infinity may be scanned and all images captured during the scanning process may be stored. The sensor array 102 may be used for capturing still images of the scene.

The method includes identifying 202 a plurality of portions of the scene in each captured image. For example, objects in each captured image can be identified and segmented to concentrate focus analysis on specific objects in the scene. A focus map, as described in more detail herein, may be generated and used for approximating the depth of image segments. Using the focus map, an in-focus depth of each portion may be determined 204 based on the captured images for generating a depth map for the scene.

The method uses the image where the intended subject is found to be in focus by the camera (as per normal camera focus operation) as the first image of the stereoscopic pair. The other image of the stereoscopic image pair is then generated 206 based on the first image and the depth map.

Generating a Stereoscopic Image Pair of a Scene Using a Depth Map

A method in accordance with embodiments of the present invention for generating a stereoscopic image pair of a scene using a depth map may be applied during image capture and may utilize camera, focus, and optics information for estimating the depth of each pixel in the image scene. The technique utilizes the concept of depth of field (or similarly, the circle of confusion) and relies upon fast capture and evaluation of a plurality of images while adjusting the lens focus from near field to infinity, before refocusing to capture the intended focused image. FIGS. 3A and 3B are a flow chart of an exemplary method of a sharpness/focus analysis procedure in accordance with embodiments of the present invention. Referring to FIGS. 3A and 3B, the method may begin when the camera enters a stereoscopic mode (step 300).

The method of FIGS. 3A and 3B includes scanning the entire focal plane from zero to infinity and storing all images during the scanning process (step 302). For example, when the user activates the focus process for the camera (e.g., by pressing the shutter button half-way or fully), the camera may immediately begin to capture multiple images across the full range of focus for the lens (termed a focus scan, herein), as shown in the example of FIG. 4. As indicated in the table shown in FIG. 4, each image capture at a given increment of the focus distance of the lens may result in a specific Depth of Field (area of image sharpness that encompasses a range of distance from the user) for the scene, with the distance of the sharply focused objects from the user increasing as the focus distance of the lens increases. Once the focus distance reaches the hyperfocal distance of the lens/camera combination, the far end of the Depth of Field will be “infinite,” indicating that all objects beyond the near end of the depth will be sharply in focus. To reduce complexity of the evaluation of this plurality of images, if necessary, each may down-scaled to a reduced resolution before subsequent processing.

The method of FIGS. 3A and 3B may optionally include executing image segmentation to break the image into multiple objects (step 304). For example, each captured image from this sequence may be divided into N×M blocks, and each block (or the full image) may be high-pass filtered. An edge detection operation may be performed on each of the resultant images, and each image may be converted to black and white, using a threshold of T (e.g., T>=0.75*max(filtered image)). Finally, a black and white fill operation may be performed to fill the areas between edges, and an “in-focus” pixel map for each image may be obtained. Optionally, objects in the image may be segmented to concentrate the focus analysis on specific objects in the scene. An example is illustrated in FIG. 4, which illustrates several exemplary images related to sharpness/focus analysis with optional image segmentation according to embodiments of the present invention.

If object segmentation is performed, each N×M block may be further subdivided into n×m sized sub-blocks corresponding to portions of a given segmented object (step 306). In each sub-block, the images for which the pixels are deemed by the procedure above to be “in-focus” may be analyzed for those pixels to identify in which of the candidate images the local contrast is at its highest level (step 308). This process can continue hierarchically for smaller sub-blocks as needed. The nearest focus distance at which a given pixel is deemed “in focus,” the farthest distance at which it is “in focus,” and the distance at which it is optimally “in focus,” as indicated by the highest local contrast for that pixel, may be recorded in a “focus map.”

Given the focus map for the pixels in an image, an approximate depth for those pixels can be calculated. For a given combination of image (camera) format circle of confusion, c, f-stop (aperture), N, and focal length, F, the hyperfocal distance (the nearest distance at which the depth of field extends to infinity) of the combination can be approximated as follows:

$H \approx \frac{F^{2}}{N * c} .$

In turn, the near field depth of field (D_n) for an image for a given focus distance, d, can be approximated as follows:

$D_{n} \approx \frac{H * d}{(H + d)}$

(for moderate to large s), and the far field DOF (D_f) as follows:

$D_{f} \approx \frac{H * d}{(H - d)}$

for d<H. For d>=H, the far end depth of field becomes infinite, and only the near end depth of field value is informative.

Using the values in the focus map, these relationships can be combined to build a depth map for the captured image (step 310). For example, for a given pixel (P) the focus map contains the value for the shortest focus distance at which the pixel is in focus, d_s(P), the longest distance, d_l(P), and the optimum contrast distance, d_c(P). Using these values, one can approximate that the closest possible distance for the pixel is given by the following equation:

$D_{n_{s}} (P) \approx \frac{H * d_{s} (P)}{(H + d_{s} (P))},$

And the furthest distance (again, remembering that for a given focus distance, d_i, if d_i>=H, the associated value of D_fwill be infinite) is given by the following equation:

$D_{f_{l}} (P) \approx \frac{H * d_{l} (P)}{(H - d_{l} (P))},$

and the optimum distance is between the equation,

$D_{n_{c}} (P) \approx \frac{H * d_{c} (P)}{(H + d_{c} (P))},$

and the equation,

$D_{f_{c}} (P) \approx \frac{H * d_{c} (P)}{(H - d_{c} (P))} .$

Further, it is known that for the focus distances d_s(P) and d_l(P),

$D_{f_{s}} (P) \approx \frac{H * d_{s} (P)}{(H - d_{s} (P))}$

and

$D_{n_{l}} (P) \approx \frac{H * d_{l} (P)}{(H + d_{l} (P))} .$

Given these values, a depth for each pixel, D_p, can be approximated as follows:

D_p=(D_nl(P)<D_fc(P))

(D_fs(P)>D_nc(P))->[max(D_ns(P),D_nl(P),D_nc(P))+min(D_fs(P),D_fl(P),D_fc(P))]/2

Else->max(D_ns(P),D_nl(P),D_nc(P))+min(D_fl(P),D_fc(P))]/2

Else

(D_fs(P)>D_nc(P))->[max(D_ns(P),D_nc(P))+min(D_fs(P),D_fl(P),D_fc(P))]/2

Else->max(D_ns(P),D_nl(P),D_nc(P))+min(D_fl(P),D_fc(P))]/2

if any of the D_f(P) values are non-infinite. In the case that all D_f(P) values are infinite, D_pis instead approximated as

D_p=[max(D_ns(P),D_nc(P))+min(D_nl(P),D_nc(P))]/2.

The method of FIGS. 3A and 3B includes assigning the left eye image to be the image where the intended subject is found to be in focus by the camera (step 312). Based on the depth map and the left eye image, the right eye image may be generated by translation and perspective projection (step 314). A dual-image process may also be implemented (step 316). The selected left and right eye images may be labeled as a stereoscopic image pair (step 318).

Altering a Depth Map for Generating a Stereoscopic Image Pair

A method in accordance with embodiments of the present invention for altering a depth map for generating a stereoscopic image pair may be applicable either pre- or post-capture. Touchscreen technology may be used in this method. Touchscreen technology has become increasingly common, and with it, applications such as touchscreen user directed focus for digital cameras (encompassing both digital still camera and cellphone camera units) has emerged. Using this technology, a touchscreen interface may be used for specifying the depth of objects in a two dimensional image capture. Either pre- or post-capture, the image field may be displayed in the live view LCD window, which also functions as a touchscreen interface. A user may touch and highlight the window area he or she wishes to change the depth on, and subsequently uses a right/left (or similar) brushing gesture to indicate an increased or decreased (respectively) depth of the object(s) at the point of the touchscreen highlight. Alternatively, depth can be specified by a user by use of any suitable input device or component, such as, for example, a keyboard, a mouse, or the like.

Embodiments of the present invention are applicable pre-capture, while composing the picture, or alternatively can be used post-capture to create or enhance the depth of objects in an eventual stereoscopic image, optionally in conjunction with the technology of the first part of the description. In conjunction with the technology above, the technology described can be used for selective artistic enhancements by the user; whereas in a stand-alone sense, the technology described can be the means of creation of a relative depth map for the picture, allowing the user to create a depth effect only for the objects he/she feels are of import.

Once an image view and depth map are available using the techniques above, rendering of the stereoscopic image pair may occur.

For any stereoscopic image, there is an overlapping field of view from the left and right eyes that defines the stereoscopic image. At the point of convergence of the eyes, the disparity of an object between the two views will be zero, i.e. no parallax. This defines the “screen point” when viewing the stereoscopic pair. Objects in front of the screen and behind the screen will have increasing amounts of parallax disparity as the distance from the screen increases (negative parallax for objects in front of the screen, positive parallax for objects behind the screen).

The central point of the overlapping field of view on the screen plane (zero parallax depth) of the two eyes in stereoscopic viewing defines a circle that passes through each eye, with a radius, R, equal to the distance to the convergence point. Moreover, the angle, θ, between the vectors from the central convergence point to each of the two eyes can be measured. Examples for varying convergence points are described herein below.

Medium distance convergence gives a relatively small angular change, while close convergence gives a relatively large angular change. FIG. 6 illustrates schematic diagrams showing an example of close and medium-distance convergence points according to embodiments of the present invention.

The convergence point is chosen as center pixel of the image on the screen plane. It should be noted that this may be an imaginary point, as the center pixel of the image may not be at a depth that is on the screen plane, and hence, the depth of that center pixel can be approximated. This value (D_focus) is approximated to be 10-30% behind the near end depth of field distance for the final captured image, and is approximated by the equation:

$D_{focus} \approx Screen * scale * \frac{H * d_{focus}}{(H + d_{focus})},$

where D_focusis the focus distance of the lens for the final capture of the image, “Screen” is a value between 1.1 and 1.3, representing the placement of the screen plane behind the near end depth of field, and “scale” represents any scaled adjustment of that depth by the user utilizing the touchscreen interface.

The angle, θ, is dependent upon the estimated distance of focus and the modeled stereo baseline of the image pair to be created. Hence, θ may be estimated as follows:

$θ = 2 * \sin^{- 1} \frac{Baseline}{2 * D_{focus}}$

for D_focuscalculated in centimeters. Typically, θ would be modeled as at most 2 degrees.

In addition to the rotational element in the Z plane, there can also be an X axis translational shift between views. Since no toe-in should occur for the image captures, as would be the case for operation of the eyes, there can be horizontal (X axis) displacement at the screen plane for the two images at the time of capture. For example, FIG. 7 illustrates a schematic diagram showing a translational offset determination technique according to embodiments of the present invention. For a given pixel, P, at a depth D_p, the X axis (horizontal) displacement, S, is calculated using the angle of view, V, for the capture. The angle of view is given by the following equation:

$V = 2 * \tan^{- 1} \frac{W}{2 * F}$

for the width of the image sensor, W, and the focal length, F.

Depth D_phas been approximated for each pixel in the image, and is available from the depth map. It should be noted that the calculations that follow for a given pixel depth, D_p, may be imperfect, since each pixel is not centrally located between the two eye views; however, the approximation is sufficient for the goal of producing a stereoscopic effect. Hence, knowing V and the depth, D_p, of a given pixel, the approximate width of the field of view (WoV) may be represented as follows:

$WoV = D_{p} * \frac{W}{F} .$

Hence, if the stereo baseline is estimated, the translational offset in pixels, S, for displacement on the X axis to the left (assuming without loss of generality, right image generated from left) is given by the following equation:

$S = \frac{P_{W}}{WoV} * StereoBaseline = \frac{2 * P_{w}}{\frac{W}{F}} * \sin \frac{θ}{2},$

for P_Wthe image width in pixels. Since W, F, and P_ware camera-specific quantities, the only specified quantity is the modeled convergence angle, θ, as noted typically 1-2 degrees.

For each pixel, p, in the image, knowing (x_p, y_p) coordinates, pixel depth D_p, pixel X-axis displacement S, and the angle θ, a perspective projective transform can be defined to generate a right eye image from the single “left eye” image. A projective perspective transform is defined as having an aspect of translation (defined by S), rotation in the x/y plane (which will be zero for this case), rotation in the y/z plane (again will be zero for this case), and rotation in the x/z plane, which will be defined by the angle θ. For example, the transform may be defined as follows:

$\begin{matrix} {Dx}_{p} & \cos - θ & 0 & - \sin - θ \\ {Dy}_{p} & = & 0 & 1 & 0 \\ {Dz}_{p} & \sin - θ & 0 & \cos - θ \end{matrix} x (\begin{matrix} x_{p} - S \\ y_{p} \\ D_{p} \end{matrix}),$

where (Dx_p, Dy_p, Dz_p) are 3D coordinate points resulting from the transform that can be projected onto a two dimensional image plane, which may be defined as follows:

$x_{p}^{'} = [({Dx}_{p} - Ex) x (\frac{Ez}{{Dz}_{p}})]$ $x_{p}^{'} = [({Dy}_{p} - Ey) x (\frac{Ez}{{Dz}_{p}})]$

where Ex, Ey, and Ez are the coordinates of the viewer relative to the screen, and can be estimated for a given target display device. Ex and Ey can be assumed to be, but are not limited to, 0. The pixels defined by (x_p′, y_p′) make up the right image view for the new stereoscopic image pair.

Following the calculation of (x_p′, y_p′) for each pixel, some pixels may map to the same coordinates. The choice of which is in view is made by using the Dz_pvalues of the two pixels, after the initial transform, but prior to the projection onto two-dimensional image space, with lowest value displayed. An example of the pixel manipulations that occur in the course of the transform is shown in FIG. 8, which illustrates a schematic diagram showing pixel repositioning via perspective projection with translation according to embodiments of the present invention.

Similarly, there may be points in the image for which no pixel maps. This can be addressed with pixel fill-in and/or cropping. A simple exemplary pixel fill-in process that may be utilized in the present invention assumes a linear gradient between points on each horizontal row in the image. For points on the same row, n, without defined pixel values between two defined points (x_i, y_n) and (x_j, y_n), the fill-in process first determines the distance, which may be defined as follows:

d=j−i−1,

and then proceeds to determine an interpolated gradient between the two pixel positions to fill in the missing values. For simplicity of implementation, the interpolation may be performed on a power of two, meaning that the interpolation will produce one of 1, 2, 4, 8, 16, etc. pixels as needed between the two defined pixels. Pixel regions that are not a power of two are mapped to the closest power of two, and either pixel repetition or truncation of the sequence is applied to fit. As an example, if j=14 and i=6, then d=7, and the following intermediate pixel gradient is calculated as follows:

$p 1 = \frac{7}{8} * (x_{6}, y_{n}) + \frac{1}{8} * (x_{14}, y_{n})$ $p 2 = \frac{6}{8} * (x_{6}, y_{n}) + \frac{2}{8} * (x_{14}, y_{n})$ $p 3 = \frac{5}{8} * (x_{6}, y_{n}) + \frac{3}{8} * (x_{14}, y_{n})$ $p 4 = \frac{4}{8} * (x_{6}, y_{n}) + \frac{4}{8} * (x_{14}, y_{n})$ $p 5 = \frac{3}{8} * (x_{6}, y_{n}) + \frac{5}{8} * (x_{14}, y_{n})$ $p 6 = \frac{2}{8} * (x_{6}, y_{n}) + \frac{6}{8} * (x_{14}, y_{n})$ $p 7 = \frac{1}{8} * (x_{6}, y_{n}) + \frac{7}{8} * (x_{14}, y_{n})$ $p 8 = (x_{14}, y_{n}) .$

Since only 7 values are needed, p8 would go unused in this case, such that the following assignments can be made:

- (x₇, y_n)=p1
- (x₈, y_n)=p2
- (x₉, y_n)=p3
- (x₁₀, y_n)=p4
- (x₁₁, y_n)=p5
- (x₁₂, y_n)=p6
- (x₁₃, y_n)=p7.

This process may repeat for each line in the image following the perspective projective transformation. The resultant image may be combined with the initial image capture to create a stereo image pair that may be rendered for 3D viewing via stereo registration and display. Other, more complex and potentially more accurate pixel fill in processes may be utilized.

Embodiments in accordance with the present invention may be implemented by a digital still camera, a video camera, a mobile phone, a smart phone, and the like. In order to provide additional context for various aspects of the disclosed invention, FIG. 9 and the following discussion are intended to provide a brief, general description of a suitable operating environment 900 in which various aspects of the disclosed subject matter may be implemented. While the invention is described in the general context of computer-executable instructions, such as program modules, executed by one or more computers or other devices, those skilled in the art will recognize that the disclosed subject matter can also be implemented in combination with other program modules and/or as a combination of hardware and software.

Generally, however, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular data types. The operating environment 900 is only one example of a suitable operating environment and is not intended to suggest any limitation as to the scope of use or functionality of the subject matter disclosed herein. Other well known computer systems, environments, and/or configurations that may be suitable for use with the invention include but are not limited to, personal computers, hand-held or laptop devices, multiprocessor systems, microprocessor-based systems, programmable consumer electronics, network PCs, minicomputers, mainframe computers, distributed computing environments that include the above systems or devices, and the like.

With reference to FIG. 9, an exemplary environment 900 for implementing various aspects of the subject matter disclosed herein includes a computer 902. The computer 902 includes a processing unit 904, a system memory 906, and a system bus 908. The system bus 908 couples system components including, but not limited to, the system memory 906 to the processing unit 904. The processing unit 904 can be any of various available processors. Dual microprocessors and other multiprocessor architectures also can be employed as the processing unit 904.

The system bus 908 can be any of several types of bus structure(s) including the memory bus or memory controller, a peripheral bus or external bus, and/or a local bus using any variety of available bus architectures including, but not limited to, 11-bit bus, Industrial Standard Architecture (ISA), Micro-Channel Architecture (MCA), Extended ISA (EISA), Intelligent Drive Electronics (IDE), VESA Local Bus (VLB), Peripheral Component Interconnect (PCI), Universal Serial Bus (USB), Advanced Graphics Port (AGP), Personal Computer Memory Card International Association bus (PCMCIA), and Small Computer Systems Interface (SCSI).

The system memory 906 includes volatile memory 910 and nonvolatile memory 912. The basic input/output system (BIOS), containing the basic routines to transfer information between elements within the computer 902, such as during start-up, is stored in nonvolatile memory 912. By way of illustration, and not limitation, nonvolatile memory 912 can include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable ROM (EEPROM), or flash memory. Volatile memory 910 includes random access memory (RAM), which acts as external cache memory. By way of illustration and not limitation, RAM is available in many forms such as synchronous RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDR SDRAM), enhanced SDRAM (ESDRAM), Synchlink DRAM (SLDRAM), and direct Rambus RAM (DRRAM).

Computer 902 also includes removable/nonremovable, volatile/nonvolatile computer storage media. FIG. 9 illustrates, for example a disk storage 914. Disk storage 914 includes, but is not limited to, devices like a magnetic disk drive, floppy disk drive, tape drive, Jaz drive, Zip drive, LS-100 drive, flash memory card, or memory stick. In addition, disk storage 914 can include storage media separately or in combination with other storage media including, but not limited to, an optical disk drive such as a compact disk ROM device (CD-ROM), CD recordable drive (CD-R Drive), CD rewritable drive (CD-RW Drive) or a digital versatile disk ROM drive (DVD-ROM). To facilitate connection of the disk storage devices 914 to the system bus 908, a removable or non-removable interface is typically used such as interface 916.

It is to be appreciated that FIG. 9 describes software that acts as an intermediary between users and the basic computer resources described in suitable operating environment 900. Such software includes an operating system 918. Operating system 918, which can be stored on disk storage 914, acts to control and allocate resources of the computer system 902. System applications 920 take advantage of the management of resources by operating system 918 through program modules 922 and program data 924 stored either in system memory 906 or on disk storage 914. It is to be appreciated that the subject matter disclosed herein can be implemented with various operating systems or combinations of operating systems.

A user enters commands or information into the computer 902 through input device(s) 926. Input devices 926 include, but are not limited to, a pointing device such as a mouse, trackball, stylus, touch pad, keyboard, microphone, joystick, game pad, satellite dish, scanner, TV tuner card, digital camera, digital video camera, web camera, and the like. These and other input devices connect to the processing unit 904 through the system bus 908 via interface port(s) 928. Interface port(s) 928 include, for example, a serial port, a parallel port, a game port, and a universal serial bus (USB). Output device(s) 930 use some of the same type of ports as input device(s) 926. Thus, for example, a USB port may be used to provide input to computer 902 and to output information from computer 902 to an output device 930. Output adapter 932 is provided to illustrate that there are some output devices 930 like monitors, speakers, and printers among other output devices 930 that require special adapters. The output adapters 932 include, by way of illustration and not limitation, video and sound cards that provide a means of connection between the output device 930 and the system bus 908. It should be noted that other devices and/or systems of devices provide both input and output capabilities such as remote computer(s) 934.

Computer 902 can operate in a networked environment using logical connections to one or more remote computers, such as remote computer(s) 934. The remote computer(s) 934 can be a personal computer, a server, a router, a network PC, a workstation, a microprocessor based appliance, a peer device or other common network node and the like, and typically includes many or all of the elements described relative to computer 902. For purposes of brevity, only a memory storage device 936 is illustrated with remote computer(s) 934. Remote computer(s) 934 is logically connected to computer 902 through a network interface 938 and then physically connected via communication connection 940. Network interface 938 encompasses communication networks such as local-area networks (LAN) and wide-area networks (WAN). LAN technologies include Fiber Distributed Data Interface (FDDI), Copper Distributed Data Interface (CDDI), Ethernet/IEEE 1102.3, Token Ring/IEEE 1102.5 and the like. WAN technologies include, but are not limited to, point-to-point links, circuit switching networks like Integrated Services Digital Networks (ISDN) and variations thereon, packet switching networks, and Digital Subscriber Lines (DSL).

Communication connection(s) 940 refers to the hardware/software employed to connect the network interface 938 to the bus 908. While communication connection 940 is shown for illustrative clarity inside computer 902, it can also be external to computer 902. The hardware/software necessary for connection to the network interface 938 includes, for exemplary purposes only, internal and external technologies such as, modems including regular telephone grade modems, cable modems and DSL modems, ISDN adapters, and Ethernet cards.

The various techniques described herein may be implemented with hardware or software or, where appropriate, with a combination of both. Thus, the methods and apparatus of the disclosed embodiments, or certain aspects or portions thereof, may take the form of program code (i.e., instructions) embodied in tangible media, such as floppy diskettes, CD-ROMs, hard drives, or any other machine-readable storage medium, wherein, when the program code is loaded into and executed by a machine, such as a computer, the machine becomes an apparatus for practicing the invention. In the case of program code execution on programmable computers, the computer will generally include a processor, a storage medium readable by the processor (including volatile and non-volatile memory and/or storage elements), at least one input device and at least one output device. One or more programs are preferably implemented in a high level procedural or object oriented programming language to communicate with a computer system. However, the program(s) can be implemented in assembly or machine language, if desired. In any case, the language may be a compiled or interpreted language, and combined with hardware implementations.

The described methods and apparatus may also be embodied in the form of program code that is transmitted over some transmission medium, such as over electrical wiring or cabling, through fiber optics, or via any other form of transmission, wherein, when the program code is received and loaded into and executed by a machine, such as an EPROM, a gate array, a programmable logic device (PLD), a client computer, a video recorder or the like, the machine becomes an apparatus for practicing the invention. When implemented on a general-purpose processor, the program code combines with the processor to provide a unique apparatus that operates to perform the processing of the present invention.

While the embodiments have been described in connection with the preferred embodiments of the various figures, it is to be understood that other similar embodiments may be used or modifications and additions may be made to the described embodiment for performing the same function without deviating therefrom. Therefore, the disclosed embodiments should not be limited to any single embodiment, but rather should be construed in breadth and scope in accordance with the appended claims.

Claims

1. A method for generating a stereoscopic image pair of a scene using a depth map, the method comprising:

receiving a plurality of images of a scene captured at different focal planes;

identifying a plurality of portions of the scene in each captured image;

determining an in-focus depth of each portion based on the captured images for generating a depth map for the scene;

identifying the captured image where the intended subject is found to be in focus as being one of the images of a stereoscopic image pair; and

generating the other image of the stereoscopic image pair based on the identified captured image and the depth map.

2. The method of claim 1 further comprising:

scanning a plurality of focal planes ranging from zero to infinity; and

capturing a plurality of images, each at a different focal plane.

3. The method of claim 2 comprising using an image capture device for capturing the plurality of images.

4. The method of claim 3 wherein the image capture device comprises at least one of a digital still camera, a video camera, a mobile phone, and a smart phone.

5. The method of claim 1 further comprising:

for each captured image: filtering the portions of the scene for generating a filtered image; applying thresholded edge detection to the filtered image; and determining whether each filtered portion is in focus based on the applied threshold edge detection.

6. The method of claim 5 further comprising:

identifying any in-focus objects in each captured image; and

generating a depth map value for each object.

7. The method of claim 6, wherein an object that is determined to be in focus for a sequence of images that is a subset of the full set of captured images is a target subject, and

wherein identifying one of the subset of images having a predetermined contrast comprises determining which of the subset of images has the highest local contrast based on the target subject.

8. The method of claim 1 wherein generating the other image of the stereoscopic image pair comprises generating the other image of the stereoscopic pair based on translation and perspective projection.

9. The method of claim 1 further comprising generating a three-dimensional image of the scene using the stereoscopic image pairs.

10. The method of claim 9 wherein generating a three-dimensional image comprises one or more of registration, rectification, color correction, matching edges of the pair of images, transformation, depth adjustment, motion detection, and removal of moving objects.

11. The method of claim 1 further comprising displaying the three-dimensional image on a suitable three-dimensional image display.

12. The method of claim 11 wherein displaying the three-dimensional image comprises displaying the three-dimensional image on one of a digital still camera, a computer, a video camera, a digital picture frame, a set-top box, and a high-definition television.

13. A system for generating a three-dimensional image of a scene, the system comprising:

at least one computer processor and memory configured to: receive a plurality of images of a scene captured at different focal planes; identify a plurality of portions of the scene in each captured image; determine an in-focus depth of each portion based on the captured images for generating a depth map for the scene; identify the captured image where the intended subject is found to be in focus as being one of the images of a stereoscopic image pair; and generate the other image of the stereoscopic image pair based on the identified captured image and the depth map.

14. A computer-readable storage medium having stored thereon computer executable instructions for performing the following steps:

receiving a plurality of images of a scene captured at different focal planes;

identifying a plurality of portions of the scene in each captured image;

determining an in-focus depth of each portion based on the captured images for generating a depth map for the scene;

identifying the captured image where the intended subject is found to be in focus as being one of the images of a stereoscopic image pair; and

generating the other image of the stereoscopic image pair based on the identified captured image and the depth map.

15. A method for generating a stereoscopic image pair by altering a depth map, the method comprising:

receiving an image of a scene;

receiving a depth map associated with at least one captured image of the scene, wherein the depth map defining depths for each of a plurality of portions of the at least one captured image;

receiving user input for changing, in the depth map, a depth of at least one portion of the at least one captured image; and

generating a stereoscopic image pair of the scene based on the received image of the scene and the changed depth map.

16. The method of claim 15 wherein receiving an image of a scene comprises receiving a plurality of images of a scene captured at different focal planes.

17. The method of claim 15 wherein receiving user input for changing a depth comprises receiving user input on a touchscreen display.

18. The method of claim 15 further comprising:

using the received image as one of the images of the stereoscopic image pair; and

generating the other image of the stereoscopic image pair based on the received image.

19. The method of claim 18 wherein generating the other image of the stereoscopic image pair comprises:

defining a perspective projective transform; and

using the perspective projective transform for determining pixel values of the other image of the stereoscopic image pair.

20. The method of claim 19 further comprising:

determining points in the other image of the stereoscopic image pair where no pixel maps; and

using one of a pixel fill-in technique and a cropping technique for generating pixel values where no pixel maps.

21. A system for generating a three-dimensional image of a scene, the system comprising

at least one computer processor and memory configured to: receive an image of a scene; receive a depth map associated with at least one captured image of the scene, wherein the depth map defining depths for each of a plurality of portions of the at least one captured image; receive user input for changing, in the depth map, a depth of at least one portion of the at least one captured image; and generate a stereoscopic image pair of the scene based on the received image of the scene and the changed depth map.

22. A computer-readable storage medium having stored thereon computer executable instructions for performing the following steps:

receive an image of a scene;

receive a depth map associated with at least one captured image of the scene, wherein the depth map defining depths for each of a plurality of portions of the at least one captured image;

receive user input for changing, in the depth map, a depth of at least one portion of the at least one captured image; and

generate a stereoscopic image pair of the scene based on the received image of the scene and the changed depth map.