STEREO IMAGING SYSTEM WITH AUTOMATIC DISPARITY ADJUSTMENT FOR DISPLAYING CLOSE RANGE OBJECTS

Info

Publication number: 20230142214
Type: Application
Filed: Nov 30, 2022
Publication Date: May 11, 2023
Applicant: INTUITIVE SURGICAL OPERATIONS, INC. (Sunnyvale, CA)
Inventors: Wenyi ZHAO (Weston, FL), Catherine J. MOHR (Mountain View, CA), Simon P. DIMAIO (San Carlos, CA)
Application Number: 18/060,119

Abstract

A stereo imaging system comprises a depth sensing system, a stereo viewer, and a processor configured to receive captured stereo worksite images, compare a depth value to a predetermined threshold value, and generate stereo images for display by adjusting or not adjusting native disparities based on the comparison of the depth value to the threshold value.

Description

Description

CROSS REFERENCE TO RELATED APPLICATIONS

This application is a continuation of U.S. Application Serial No. 16/202,211, filed Nov. 28, 2018, which is a continuation of U.S. Application Serial No. 14/053,021 filed Oct. 14, 2013 (now U.S. Pat. No. 10,178,368), which claims benefit of provisional U.S. Application No. 61/717,443, filed Oct. 23, 2012 (now expired), each of which is incorporated herein by reference.

FIELD OF THE INVENTION

The present invention generally relates to stereo imaging systems. In particular, it relates to a stereo imaging system, and a method implemented therein, for providing automatic disparity adjustment for displaying close range objects.

BACKGROUND OF THE INVENTION

A stereo imaging system includes a stereoscopic camera which has two image capturing elements for capturing left and right stereo images. Details for such a stereo imaging system may be found, for example, in U.S. Pat. No. 6,720,988 entitled “Stereo Imaging System and Method for Use in Telerobotic System.”

FIG. 1 illustrates a schematic of the stereo geometry for two image capturing elements, e.g., left and right optical lens 101, 102, which are separated by a baseline distance “b”. Left and right image planes 121, 122 are shown at a focal length “f” (i.e., a depth at which the left and right images are focused). The image planes 121, 122 represent stereo images that are captured by the lens 101, 102 and are bounded by their fields of view. The focal length may be adjusted within a focusing range, but the baseline distance is fixed for the stereoscopic camera.

A point “P” at a depth “Z” from the lens 101, 102 is seen at different points on the image planes 121, 122. In particular, the point “P” is projected at a position “d1” on the left image plane 121 and projected at a position “d2” on the right image plane 122. The difference or disparity “D” between the two positions “d2” and “d1” can be determined from the following well-known relationship:

$\frac{D}{b} = \frac{f}{z}$

Thus, as the depth “Z” gets smaller and smaller, the disparity “D” gets larger and larger.

Stereo images captured by the stereoscopic camera are displayed on a stereo viewer. As an example, the stereo viewer may have left and right display screens upon which left and right stereo images are respectively displayed. The stereo viewer in this case, may also have left and right eyepieces through which a user places his/her left and right eyes to respectively view the left and right display screens.

When objects are viewed at very close range by the stereoscopic camera, even after or while adjusting the camera focus, the user may have difficulty fusing his/her eyes on an image of the object being displayed on the stereo viewer due to the large disparity “D” between corresponding points on the left and right display screens. In addition, after the camera focus control has reached the end of its range, a practical limit may be placed on how close an object may be viewed relative to the stereoscopic camera.

To address this problem, another stereoscopic camera with a smaller baseline “b” (i.e., closer spacing between the image capturing elements) may be used at very close range to reduce the disparity “D” between displayed stereo images and consequently, allow a user to comfortably see the stereo images being displayed on the stereo viewer. However, the use of multiple stereoscopic cameras with different baselines adversely adds to the cost of the stereo imaging system and increases the difficulty of its use by an operator.

SUMMARY OF THE INVENTION

The embodiments of the invention are summarized by the claims that follow below.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 illustrates a schematic of the stereo geometry for two image capturing elements of a stereo imaging system.

FIG. 2 illustrates a block diagram of a stereo imaging system utilizing aspects of the present invention.

FIG. 3 illustrates a flow diagram of a method for providing automatic disparity adjustment for both normal and close range viewing in a stereo viewer of a stereo imaging system utilizing aspects of the present invention.

FIG. 4 illustrates a schematic of a region of interest in a stereo view of a stereo imaging system utilizing aspects of the present invention.

FIG. 5 illustrates a schematic of a stereo view constructed from a pair of stereo images in a stereo imaging system utilizing aspects of the present invention.

FIGS. 6a-6d schematically illustrate steps performed in a first embodiment of a method for modifying stereo images as part of the method of FIG. 3 utilizing aspects of the present invention.

FIGS. 7a-7c schematically illustrate steps performed in a second embodiment of a method for modifying stereo images as part of the method of FIG. 3 utilizing aspects of the present invention.

DETAILED DESCRIPTION

FIG. 2 illustrates, as an example, a block diagram of a stereo imaging system 200 in which a method 300 utilizing the invention is implemented. A stereoscopic camera 201 is preferably a high-definition digital stereo camera which generates a video stream of stereo images captured at a frame rate of the camera, such as thirty frames per second. Each frame of stereo images includes a left stereo image 211 and a right stereo image 212 which has been captured at a high resolution, such as 1920×1080 pixels. An image processor 202 modifies the stereo images received from the stereoscopic camera 201 according to the method 300. A stereo viewer 204 has left and right display screens for respectively displaying the modified left stereo image 221 and modified right stereo image 222 received from the image processor 202. The resolution of the left and right display screens is typically a lower resolution, such as 1280×1024 pixels, than that of the camera 201.

A depth sensing system 203 determines depth values for a region of interest in the stereo images and provides the depth value to the image processor 202 so that it may be used by the method 300. The region of interest may be predefined as a default region in the stereo images, for example, the center region, or it may be user defined. As an example, the user may define the region of interest using a telestrator 231, so that the user may draw a region of interest over one of the left and right stereo images being displayed at the time on the telestrator 231. Details on such a telestration system may be found, for example, in U.S. 2007/0156017 entitled “Stereo Telestration for Robotic Surgery”, which is incorporated herein by reference. Alternatively, the user may define the center of a region of interest using a gaze tracking unit 232 which tracks the user’s gaze point on one or both of the display screens of the stereo viewer 204. Details for such a gaze tracking system may be found, for example, in U.S. Application No. 61/554,741 entitled “Method and System for Stereo Gaze Tracking”. The location and/or dimensions of the region of interest may be predefined or definable by the user using any conventional means such as a Graphical User Interface (GUI). Regardless of how the region of interest is defined, it may be displayed for the convenience of the user on the stereo viewer 204 at its proper location as an overlay to any three-dimensional objects or surface topology being displayed therein at the time. The overlay may be a three-dimensional overlay at the same depths and following the contour of the underlying objects or surface topology or it may be a two-dimensional overlay floating over the underlying objects or surface topology at a specified depth value.

The depth sensing system 203 may determine the depth values for the region of interest in the stereo images using one or a combination of known methods. As an example, a structured light technique may be used in which a known light pattern is projected onto a three-dimensional scene and the relative light intensities on the scene tracked to derive a depth map for the scene. See, e.g., Daniel Scharstein and Richard Szeliski, “High-Accuracy Stereo Depth Maps Using Structured Light,” IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2003), vol. 1, pages 195-202, Madison, Wis., June 2003. As another example, the depth value may be determined by determining corresponding points in stereo images using a robust sparse image matching algorithm, determining disparities between the corresponding points, and converting the disparities to depths using a predetermined disparity to depth mapping. See, e.g., U.S. Pat. No. 8,184,880 entitled “Robust Sparse Image Matching for Robotic Surgery”, which is incorporated herein by reference. As yet another example, a laser range finder may be used for determining depth values of a three-dimensional scene. The depth value may be an average depth value for the surface topology appearing in the region of interest. When tools, which are being used to interact with objects of the surface topology, appear above the surface topology, depth values for the tools which occlude part of the surface topology may be included or excluded from the calculation.

FIG. 3 illustrates, as an example, a flow diagram of a method 300 for modifying stereo images captured at close range to aid a user to fuse his/her eyes on objects in the modified stereo images when they are displayed on a stereo viewer. Typically, the method may be used while the user is adjusting the distance of the camera lens from one or more objects comprising a surface topology. Manual or auto focus may be employed during the distance adjustment. The method 300 is preferably implemented as program code executed by the image processor 202. In this example, the method is only active when the mode is turned ON, giving a user the option to turn OFF the mode and not use the method. Alternatively, the mode may always be ON, giving the user no option to turn it OFF. To help describe the method, FIGS. 4, 5, 6a-6d, and 7a-7c are provided.

In block 301, the method determines a depth value for a region of interest appearing in a current frame in a video stream of stereo images being received from a stereoscopic camera. The depth value is determined, for example, by the depth sensing system 203 as previously described.

As an example of a region of interest, FIG. 4 illustrates a region of interest 430 which is cylindrically shaped and extends within the stereo view 410 to a surface topology (not shown). The stereo view 410 is the intersection of the fields of view 401, 402 of the image capturing elements 101, 102. As shown in FIG. 5, the region of interest 430 is centered within the stereo view 410, but offset in the left and right stereo images 211, 212 by a disparity corresponding to the average depth value of the surface topology. Note that the area 420 outside the stereo view 410 is only viewable in monovision since it is viewable by only the image capturing element 101.

In block 302, a determination is made whether the depth value is less than a first threshold value Z1 (as shown in FIG. 4). The first threshold value Z1 is a depth value at which it becomes difficult for a user viewing the stereo images on the stereo viewer to visually fuse the stereo images with his/her eyes. Typically, this depth value is a function of the baseline distance “b” between the image capturing elements of the stereoscopic camera. It may also be a function of the focal lengths and fields of view of the image capturing elements. As an example, a depth value of 5 centimeters has been empirically determined as such a point for a medical robotic system employing a stereoscopic endoscope. Thus, the first threshold value Z1 may be empirically determined and pre-programmed into the stereo imaging system as a default value. Additionally, or alternatively, it may be specified and/or altered by the operator in a conventional manner to accommodate specific user preferences.

If the determination in block 302 is NO, then the method loops back to block 301 to process a next frame in the video stream of stereo images without any disparity adjustment so that the stereo imaging system operates in a normal viewing mode.

On the other hand, if the determination in block 302 is YES, then in block 303, the method determines a target disparity adjustment. It may do this, for example, by using either an empirically determined equation which is a function of depth values or using an empirically determined look-up table which is indexed by depth values. When using the look-up table, linear or best curve fitting interpolation between look-up table values may also be performed as necessary.

To empirically determine the target disparity adjustment for a depth value, the left and right stereo images may be shifted in horizontal directions to reduce the disparity between them until a user viewing the stereo images on a stereo viewer is able to comfortably fuse his/her eyes on the stereo images being displayed at the time. As the depth value increasingly becomes smaller, more horizontal shifting may be required to achieve comfortable fusing of stereo images. Thus, target disparity adjustments as a function of depth value may be empirically determined in this manner starting with the target depth value Z1 and ending with a minimum depth value Z2 (as shown in FIG. 4). The minimum depth value Z2 is reached when the intersection of the fields of view of the two image capturing elements becomes too small for practical stereo viewing and/or too small to accommodate the horizontal pixel shifting for the target disparity adjustments. Note that in determining the target disparity adjustments in this manner, conversion between the low resolution display screens of the stereo viewer and the high resolution stereo images is implicitly accommodated. As an example of an empirically determined function for target disparity adjustments, a medical robotic system employing a stereoscopic endoscope has been found to have satisfactory results with a target disparity adjustment which is linearly scaled from 24 pixels at 5 cm (the target depth value) to 250 pixels at 2 cm (the minimum depth value). As with the first threshold depth value, the minimum depth value Z2 may be pre-programmed into the stereo imaging system or it may be user specified and/or user alterable using conventional human/computer interaction means. Once the minimum depth value is reached, the target disparity adjustment may remain fixed as the depth value further becomes smaller. Alternatively, the stereo imaging system may restrict the depth value from becoming less than the minimum depth value.

In block 304, the method optionally filters the target disparity adjustment to avoid an abrupt change and/or jitter in the stereo images to be displayed on the stereo viewer. As an example, a maximum pixel shift per frame may be specified, such as 1-3 pixels per frame. The use of low pass filters, moving averages, and other well known smoothing techniques may also be used.

In block 305, the method modifies the stereo images. In modifying the stereo images, the method takes into account the target disparity adjustment determined in block 303, the optional filtering performed in block 304, and the respective resolutions of the stereo images and the left and right display screens of the stereo viewer. The modification may be performed using the sub-blocks 311, 312, 313.

In a first embodiment, as shown in FIGS. 6a-6d, all three sub-blocks 311-313 are used.

In sub-block 311, a shift step may be performed such as shown in FIGS. 6a-6b. In this case, all pixels are shifted horizontally in both the left and right stereo images 211, 212 so as to reduce the disparity between corresponding points by the filtered target disparity adjustment. As a result, shifted-out pixel columns 603, 604 are not used (e.g., discarded) and shifted-in pixel columns 601, 602 are filled with filler pixel values such as a specific color. A plus sign “+” is shown in the center of each stereo image to provide a reference point.

In sub-block 312, an image cropping step is performed such as shown in FIG. 6c. In this first embodiment, the shifted-in pixel columns 601, 602 are cropped out along with areas 611, 612 surrounding areas 621, 622, which will be described further with respect to the following zooming step. Note that the plus sign “+” is in the same place before and after the cropping, because the same amount is cropped off each pair of opposing ends.

In sub-block 313, an image zooming step is performed such as shown in FIG. 6d. In this example, the areas 621, 622 remaining after the cropping step of sub-block 312 are zoomed-out so that the resulting left and right stereo images 221, 222 have the same resolutions as their respective left and right display screens of the stereo viewer. With this in mind, the areas 621, 622 are determined so that when zoomed-out, proper filtered disparity adjustments result in the resulting left and right stereo images 221, 222 which are to be displayed in the stereo viewer.

In a second embodiment, as shown in FIGS. 7a-7c, only sub-blocks 312 and 313 are used since sub-block 311 is eliminated with a modification to sub-block 312.

In modified sub-block 312, the stereo images 211, 212 are cropped so as to directly result in the areas 621, 622, which are zoomed out as previously described with respect to sub-block 313.

Finally, in block 306, the method displays the modified stereo images on the left and right display screens of the stereo viewer. The method then loops back to block 301 to process the next frame of stereo images.

Although the various aspects of the present invention have been described with respect to a preferred embodiment, it will be understood that the invention is entitled to full protection within the full scope of the appended claims.

Claims

1. (canceled)

2. A system comprising:

a stereoscopic endoscope comprising a stereoscopic camera positionable within a patient to capture stereo images of a worksite within the patient and output captured stereo worksite image pairs, each of the captured stereo worksite image pairs having a native disparity;

a depth sensing system configured to determine a depth value indicating a distance from image capturing elements of the stereoscopic camera to a region of the worksite; and

an image processor configured to: in response to the depth value being equal to or exceeding a threshold value, generate stereo image pairs for display based on corresponding ones of the captured stereo worksite image pairs, wherein each of the stereo image pairs for display has a disparity equal to the native disparity of the corresponding captured stereo worksite image pair, and cause the stereo image pairs for display to be displayed on a stereo viewer; and in response to the depth value being less than the threshold value, generate modified stereo image pairs for display based on corresponding ones of the captured stereo worksite image pairs, wherein each of the modified stereo image pairs for display has an adjusted disparity that differs from the native disparity of the corresponding captured stereo worksite image pair; and cause the modified stereo image pairs for display to be displayed on the stereo viewer.

3. The system according to claim 2,

wherein the image processor is configured to, in generating the modified stereo image pairs for display, modify the captured stereo worksite image pairs by one or both of: horizontal shifting of pixels in the captured stereo worksite image pairs; and cropping outer edges of the captured stereo worksite image pairs or of shifted stereo worksite image pairs generated from horizontally shifting pixels of the captured stereo worksite image pairs.

4. The system according to claim 3,

wherein the image processor is configured to, in generating the modified stereo image pairs for display: crop outer edges in each of the captured stereo worksite image pairs to generate cropped stereo worksite image pairs, and scale the cropped stereo worksite image pairs so as to conform to a resolution of the stereo viewer to generate the modified stereo image pairs for display.

5. The system according to claim 3,

wherein the image processor is configured to, in generating the modified stereo image pairs for display: horizontally shift pixels in the captured stereo worksite image pairs to generate shifted stereo worksite image pairs; crop outer edges in each of the shifted stereo worksite image pairs, except for edges at which pixels were horizontally shifted out, to generate shifted-and-cropped stereo worksite image pairs; and scale the shifted-and-cropped stereo worksite image pairs so as to conform to a resolution of the stereo viewer.

6. The system according to claim 3,

wherein the image processor is configured to, in horizontally shifting the pixels of a given one of the captured stereo worksite image pairs, shift out a vertical band of pixels from each image of the given captured stereo worksite image pair and shift in a vertical band of filler pixel columns in each image of the given captured stereo worksite image pair.

7. The system according to claim 3,

wherein the image processor is configured to, in generating the modified stereo image pairs for display, modify the captured stereo worksite image pairs such that the adjusted disparities are smaller than the native disparities by a defined disparity reduction amount that depends on the depth value, the defined disparity reduction amount increasing as the depth value decreases from the threshold value to a minimum depth value.

8. The system according to claim 2,

wherein the image processor is configured to, in generating the modified stereo image pairs for display, modify the captured stereo worksite image pairs such that the adjusted disparities are smaller than the native disparities by a defined disparity reduction amount that depends on the depth value, the defined disparity reduction amount increasing as the depth value decreases from the threshold value to a minimum depth value.

9. The system of claim 8,

wherein the threshold value is 5 centimeters and the minimum depth value is 2 centimeters.

10. The system of claim 2,

wherein the depth sensing system updates the depth value in response to the distance between the image capturing elements of the stereoscopic camera and the region of the worksite changing.

11. The system of claim 2,

wherein the depth sensing system determines the depth value of the region of the worksite based on a region of interest in the captured stereo worksite image pairs, the region of interest corresponding to the region of the worksite.

12. The system of claim 11,

wherein the depth sensing system determines the depth value to be equal to an average depth for the region of interest.

13. The system of claim 11,

wherein the region of interest is one of a region centered in the captured stereo worksite image pairs and a user selected region in the captured stereo worksite image pairs.

14. The system of claim 13,

wherein the user selected region corresponds to one of: an area specified by a user interacting with a telestrator; and an area centered at a gaze point of the user using the stereo viewer.

15. The system of claim 11,

wherein the depth sensing system determines the depth value based on determined depth values for one or more points or objects detected in in the region of interest excluding depth values for all tools which appear in the region of interest.

16. The system of claim 2,

wherein each of the captured stereo worksite image pairs comprises a left captured image and a right captured image, and for each of the captured stereo worksite image pairs, the native disparity is a disparity between the left captured image and the right captured image; and

wherein each of the modified stereo image pairs for display comprises a left modified image for display and a right modified image for display, and for each of the modified stereo image pairs for display, the adjusted disparity is a disparity between the left modified image for display and a right modified image for display that differs from the native disparity.

17. The system of claim 16,

wherein for each of the captured stereo worksite image pairs, the left captured image and the right captured image comprise corresponding points and the native disparity corresponds to a difference in the relative positions of the corresponding points within the left captured image and the right captured image; and

wherein for each of the modified stereo image pairs for display, the left modified image for display and the right modified image for display comprise the corresponding points and the adjusted disparity corresponds to a difference in the relative positions of the corresponding points within the left modified image for display and the right modified image for display.

18. A medical system comprising:

an image processor couplable to a stereoscopic endoscope, a depth sensing system, and a stereo viewer,

wherein the image processor is configured to: receive captured stereo worksite image pairs comprising images of a worksite from a stereoscopic camera of the stereoscopic endoscope, each of the captured stereo worksite image pairs having a native disparity; receive a depth value from the depth sensing system, the depth value indicative of a distance from image capturing elements of the stereoscopic camera to a region of the worksite; in response to the depth value being equal to or exceeding a threshold value, generate stereo image pairs for display based on corresponding ones of the captured stereo worksite image pairs, whereineach of the stereo image pairs for display has a disparity equal to the native disparity of the corresponding captured stereo worksite image pair, and cause the stereo image pairs for display to be displayed on a stereo viewer; and in response to the depth value being less than the threshold value, generate modified stereo image pairs for display based on corresponding ones of the captured stereo worksite image pairs, wherein each of the modified stereo image pairs for display has an adjusted disparity that differs from the native disparity of the corresponding captured stereo worksite image pair; and cause the modified stereo image pairs for display to be displayed on the stereo viewer.

19. The medical system according to claim 18,

wherein the image processor is configured to, in generating the modified stereo image pairs for display, modify the captured stereo worksite image pairs such that the adjusted disparities are smaller than the native disparities by a defined disparity reduction amount that depends on the depth value, the defined disparity reduction amount increasing as the depth value decreases from the threshold value to a minimum depth value.

20. The medical system according to claim 18,

wherein the image processor is configured to, in generating the modified stereo image pairs for display, modify the captured stereo worksite image pairs by one or both of: horizontal shifting of pixels in the captured stereo worksite image pairs; and cropping outer edges of the captured stereo worksite image pairs or of shifted stereo image pairs generated from horizontally shifting pixels of the captured stereo worksite image pairs.

21. The medical system according to claim 18,

wherein the image processor is configured to, in generating the modified stereo image pairs for display, modify the captured stereo worksite image pairs by one of: horizontally shifting pixels in the captured stereo worksite image pairs to generate shifted stereo image pairs and scaling the shifted stereo image pairs to conform to a resolution of the stereo viewer; cropping outer edges of the captured stereo worksite image pairs to generate cropped stereo image pairs and scaling the cropped stereo image pairs to conform to the resolution of the stereo viewer; and horizontally shifting the captured stereo worksite image pairs to generate shifted stereo image pairs, cropping outer edges of the shifted stereo image pairs to generate shifted and cropped stereo image pairs, and scaling the shifted and cropped stereo worksite image pairs to conform to the resolution of the stereo viewer.