METHODS AND APPARATUS FOR MAKING ENVIRONMENTAL MEASUREMENTS AND/OR USING SUCH MEASUREMENTS IN 3D IMAGE RENDERING

Info

Publication number: 20160253839
Type: Application
Filed: Mar 1, 2016
Publication Date: Sep 1, 2016
Inventors: David Cole (Laguna Beach, CA), Alan McKay Moss (Laguna Beach, CA)
Application Number: 15/057,210

Abstract

Methods and apparatus for making and using environmental measurements are described. Environmental information captured using a variety of devices is processed and combined to generate an environmental model which is communicated to customer playback devices. A UV map which is used for applying, e.g., wrapping, images onto the environmental model is also provided to the playback devices. A playback device uses the environmental model and UV map to render images which are then displayed to a viewer as part of providing a 3D viewing experience. In some embodiments updated environmental model is generated based on more recent environmental measurements, e.g., performed during the event. The updated environmental model and/or difference information for updating the existing model, optionally along with updated UV map(s), is communicated to the playback devices for use in rendering and playback of subsequently received image content. By communicating updated environmental information improved 3D simulations are achieved.

Description

Description

RELATED APPLICATIONS

The present application claims the benefit of U.S. Provisional Application Ser. No. 62/126,701 filed Mar. 1, 2015, U.S. Provisional Application Ser. No. 62/126,709 filed Mar. 1, 2015, and U.S. Provisional Application Ser. No. 62/127,215 filed Mar. 2, 2015, each of which is hereby expressly incorporated by reference in its entirety.

FIELD

The present invention relates to methods and apparatus for capturing and using environmental information, e.g., measurements and images, to support various applications including the generation and/or display of stereoscopic images which can be used as part of providing a 3D viewing experience.

BACKGROUND

Accurate representation of a 3D environment often requires reliable models of the environment. Such models, when available, can be used to during image playback so that object captured in images of a scene appear to the view to be the correct size. Environmental maps can also be used in stitching together different pieces of an image and to facilitate alignment of images captured by different cameras.

While environment maps, when available, can facilitate a much more realistic stereoscopic displays than when a simple spherical model of an environment is assumed, there are numerous difficulties associated with obtaining accurate environmental information during an event which may be filmed for later stereoscopic playback. For example, while LIDAR may be used to make environmental measures of distances relative to a camera position prior to deployment of a stereoscopic camera to capture an event, the laser(s) used for LIDAR measurements may be a distraction or unsuitable for use during an actual event while people trying to view a concert, game or other activity. In addition, the placement of the camera rig used to capture an event may preclude a LIDAR device being placed at the same location during the event.

Thus it should be appreciated that while LIDAR may be used to make accurate measurements of a stadium or other event location prior to an event, because of the use of LASER light as well as the time associated with making LIDAR measures of an area, LIDAR is not well suited for making measurements of an environment from the location of a camera position during an event which is to be captured by one or more cameras placed and operated from the camera position during an ongoing event.

While LIDAR can be used to make highly accurate distance measurements, for the above discussed reasons it is normally used when a stadium or other event area does not have an ongoing event. As a result, the LIDAR distance measurement normally measure an empty stadium or event area without people present. In addition, since the LIDAR measurements are normally made before any modification or display set ups for a particular event, the static environmental map provided by a LIDAR or other measurement system, while in many cases highly accurate with regard to the environment at the time of measurement, often does not accurately reflect the state and shape of an environment during an event such as a sports game, concert or fashion show.

In view of the above discussion it should be appreciated that there is a need for new and improved methods of making environmental measurements and, in particular, measuring the shape of an environment during an event and using the environmental information in simulating the 3D environment. While not necessary for all embodiments, it would be desirable if an environment could be accurately measured during an event with regard to a camera position from which stereoscopic or other images are captured for later playback as part of simulating the 3D environment of the event.

SUMMARY

Methods and apparatus for making and using environmental measurements are described. Environmental information captured using a variety of devices is processed and combined. In some embodiments different devices are used to capture environmental information at different times, rates and/or resolutions. At least some of the environmental information used to map the environment is captured during an event. Such information is combined, in some but not necessarily all embodiments, with environmental information that was captured prior to the event. However, depending on the embodiment, a single environmental measurement technique may be used but in many embodiments multiple environmental measurement techniques are used with the environmental information, e.g., depth information relative to a camera position, being combined to generate a more reliable and timely environmental map than might be possible if a single source of environmental information were used to generate a depth map.

In various embodiments environmental information is obtained from one or more sources. In some embodiments, a static environmental map or model, such as one produced form LIDAR measurements before an event is used. LIDAR is a detection system that works on the principle of radar, but uses light from a laser for distance measurement. From LIDAR measurements made from a location to be used for a camera position where a camera is placed for capturing images during the actual event, or from model of the environment made based on another location but with information about the location of the camera position, a static map of an environment relative to a camera position is generated. The static map provides accurate distance information for the environment in many cases, assuming the environment is unoccupied or has not otherwise changed from the time the measurements used to make the static map were made. Since the static map normally corresponds to an empty environment, the distances indicated in the static depth map are often maximum distances since objects such as persons, signs, props, etc, are often added to an environment for an event and it is rare that a structure shown in the static map is removed for an event. Thus, static map can and sometimes is used to provide maximum distance information and to provide information on the overall scale/size of the environment.

In addition to static model information, in some embodiments environmental measurements are made using information captured during an event. The capture of the environmental information during the event involves, in some embodiments, the use of one or more light field cameras which capture images from which depth information can be obtained using known techniques. In some embodiments, light field cameras which provide both images and depth maps generated from the images captured by the light field camera are used. The cameras may be, and sometimes are, mounted on or incorporated into a camera rig which also includes one or more pairs of stereoscopic cameras. Methods for generating depth information from light field cameras are used in some embodiments. For example, image data corresponding to an area or a point in the environment captured by sensor portions corresponding to different lenses of the light field micro array can be processed to provide information on the distance to the point or area.

The light field camera has the advantage of being able to passively collect images during an event which can be used to provide distance information. A drawback of the use of a light field camera is that it normally has lower resolution than that of a regular camera due to the use of the lens array over the sensor which effectively lowers the resolution of the individual captured images.

In addition to the images of the light field camera or cameras, the images captured by other cameras including, e.g., stereoscopic camera pairs, can be processed and used to provide depth information. This is possible since the cameras of a stereoscopic pair are spaced apart by a known distance and this information along with the captured images can, and in some embodiments is used to determine the distance from the camera to a point in the environment captured by the cameras in the stereoscopic camera pair. The depth information, in terms of the number of environmental points or locations for which depth can be estimated, maybe as high or almost as high as the number of pixels of the image captured by the individual cameras of the stereoscopic pairs since the camera do not use a micro lens array over the sensor of the camera.

While the output of the stereoscopic cameras can, and in some embodiments are, processed to generate depth information, it may be less reliable in many cases than the depth information obtained from the output of the light field cameras.

In some embodiments, the static model of the environment provides maximum distance information, the depth information from the light field cameras provides more up to date depth information which normally indicates depths which are equal to or less than the depths indicated by the static model but which are more timely and which may vary during an event as environmental conditions change. Similarly the depth information from the images captured by the stereo camera pair or pairs tends to be timely and available form images captured during an event.

In various embodiments the depth information from the different sources, e.g., static model which may be based on LIDAR measurements prior to an event, depth information from the one or more light field cameras and depth information generated from the stereoscopic images are combined, e.g., reconciled. The reconciliation process may involve a variety of techniques or information weighting operations taking into consideration the advantages of different depth information sources and the availability of such information. For example, in one exemplary resolution process LIDAR based depth information obtained from measurements of the environment prior to an event is used to determine maximum depths, e.g., distances, from a camera position and are used in the absence of additional depth information to model the environment. When depth information is available from a light field camera or array of light field cameras, the depth information is used to refine the environmental depth map so that it can reflect changes in the environment during an ongoing event. In some embodiments reconciling depth map information obtained from images captured by a light field camera includes refining the LIDAR based depth map to include shorter depths reflecting the presence of objects in the environment during an event. In some cases reconciling an environmental depth map that is based on light field depth measurements alone, or in combination with information from a static or LIDAR depth map, includes using depth information to further clarify the change in depths between points where the depth information is known from the output of the light field camera. In this way, the greater number of points of information available from the light field and/or stereoscopic images can be used to refine the depth map based on the output of the light field camera or camera array.

Based on depth information and/or map a 3D model of the environment, sometimes referred to as the environmental mesh model, is generated in some embodiments. The 3D environmental model may be in the form of a grid map of the environment onto which images can be applied. In some embodiments the environmental model is generated based on environmental measurements, e.g., depth measurements, of the environment of interest performed using a light field camera, e.g., with the images captured by the light field cameras being used to obtain depth information. In some embodiments an environmental model generated based on measurements of at least a portion of the environment made using a light field camera at a first time, e.g., prior to and/or at the start of an event. The environmental model is communicated to one or more customer devices, e.g., rendering and playback devices for use in rendering and playback of image content. In some embodiments a UV map which is used to apply, e.g., wrap, images onto the 3D environmental model is also provided to the customer devices.

The application of images to such a map is sometimes called wrapping since the application has the effect of applying the image, e.g., a 2D image, as if it was being wrapped unto the 3D environmental model. The customer playback devices use the environmental model and UV map to render image content which is then displayed to a viewer as part of providing the viewer a 3D viewing experience.

Since the environment is dynamic and changes may occur while the event is ongoing as discussed above, in some embodiments updated environmental information is generated to accurately model the environmental changes during the event and provided to the customer devices. In some embodiments the updated environmental information is generated based on measurements of the portion of the environment made using the light field camera at a second time, e.g., after the first time period and during the event. In some embodiments the updated model information communicates a complete updated mesh model. In some embodiments the updated mesh model information includes information indicating changes to be made to the original environmental model to generate an updated model with the updated environmental model information providing new information for portions of the 3D environment which have changed between the first and second time periods.

The updated environmental model and/or difference information for updating the existing model, optionally along with updated UV map(s), is communicated to the playback devices for use in rendering and playback of subsequently received image content. By communicating updated environmental information improved 3D simulations are achieved.

By using the depth map generation techniques described herein, relatively accurate depth maps of a dynamic environment such as an ongoing concert, sporting event, play, etc. in which items in the environment may move or be changed during the event can be generated. By communicating the updated depth information, e.g., in the form of a 3D model of the environment or updates to an environmental model, improved 3D simulations can be achieved which can in turn be used for enhanced 3D playback and/or viewing experience. The improvements in 3D environmental simulation can be achieved over systems which use static depth maps since the environmental model onto which images captured in the environment to be simulated will more accurately reflect the actual environment than in cases where the environmental model is static.

It should be appreciated that as changes to the environment in which images are captured by the stereoscopic and/or other camera occur, such changes can be readily and timely reflected in the model of the environment used by a playback device to display the captured images.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 illustrates a camera rig implemented in accordance with one embodiment along with a calibration target which may be used to for calibrating the camera rig.

FIG. 2 illustrates the camera rig with three pairs of cameras, e.g., 3 pairs of cameras capturing stereoscopic image data, mounted in the camera rig.

FIG. 3 illustrates an exemplary camera rig with an exemplary protective cover implemented in accordance with some exemplary embodiments.

FIG. 4 illustrates another exemplary camera rig implemented in accordance with an exemplary embodiment with various elements of the camera rig being shown for clarity in partially disassembled form.

FIG. 5 shows the camera rig of FIG. 4 with the cameras mounted thereon along with an audio capture device including ear shaped devices including microphones used for capturing stereo audio.

FIGS. 6-8 illustrate various views of an exemplary camera rig implemented in accordance with some exemplary embodiments.

FIG. 9 illustrates yet another exemplary camera rig implemented in accordance with some exemplary embodiments.

FIG. 10 illustrates a front view of an exemplary arrangement of an array of cameras that can be used in the exemplary camera rigs of the present invention such as camera rigs shown in FIGS. 1-9, in accordance with some embodiments.

FIG. 11 illustrates a front view of yet another exemplary arrangement of an array of cameras that can be used in any of the camera rigs of the present invention.

FIG. 12 illustrates an exemplary system implemented in accordance with some embodiments of the invention.

FIG. 13A is a first part of FIG. 13 which illustrates a flowchart of an exemplary method of operating an imaging system in accordance with some embodiments.

FIG. 13B is a second part of FIG. 13 which illustrates a flowchart of an exemplary method of operating the imaging system.

FIG. 14A is a first part of FIG. 14 which illustrates a flowchart of an exemplary method of generating and updating 3D mesh models and UV maps in accordance with an exemplary embodiment that is well suited for use with the method shown in FIGS. 13A and 13B.

FIG. 14B is a second part of FIG. 14 which illustrates a flowchart of generating and updating 3D mesh models and UV maps in accordance with an exemplary embodiment.

FIG. 15 illustrates an exemplary light field camera which can be used in the camera rig shown in FIGS. 1-9.

FIG. 16 illustrates an exemplary processing system implemented in accordance with an exemplary embodiment.

FIG. 17 illustrates a flowchart of an exemplary method of operating an exemplary rendering and playback device in accordance with an exemplary embodiment.

FIG. 18 illustrates an exemplary rendering and playback device implemented in accordance with an exemplary embodiment.

FIG. 19 illustrates an exemplary 3D environmental mesh model that may be used in various embodiments with a plurality of nodes illustrated as the point of intersection of lines used to divide the 3D model into segments.

FIG. 20 illustrates an exemplary UV map that can be used for mapping portions of a 2D frame, providing a texture, to the mesh model of FIG. 19.

DETAILED DESCRIPTION

Various features relate to the field of panoramic stereoscopic imagery and more particularly, to an apparatus suitable for capturing high-definition, high dynamic range, high frame rate stereoscopic, 360-degree panoramic video using a minimal number of cameras in an apparatus of small size and at reasonable cost while satisfying weight, and power requirements for a wide range of applications.

Stereoscopic, 360-degree panoramic video content is increasingly in demand for use in virtual reality displays. In order to produce stereoscopic, 360-degree panoramic video content with 4K or greater of resolution, which is important for final image clarity, high dynamic range, which is important for recording low-light content, and high frame rates, which are important for recording detail in fast moving content (such as sports), an array of professional grade, large-sensor, cinematic cameras or other cameras of suitable quality are often needed.

In order for the camera array to be useful for capturing 360-degree, stereoscopic content for viewing in a stereoscopic virtual reality display, the camera array should acquire the content such that the results approximate what the viewer would have seen if his head were co-located with the camera. Specifically, the pairs of stereoscopic cameras should be configured such that their inter-axial separation is within an acceptable delta from the accepted human-model average of 63 mm. Additionally, the distance from the panoramic array's center point to the entrance pupil of a camera lens (aka nodal offset) should be configured such that it is within an acceptable delta from the accepted human-model average of 101 mm.

In order for the camera array to be used to capture events and spectator sports where it should be compact and non-obtrusive, it should be constructed with a relatively small physical footprint allowing it to be deployed in a wide variety of locations and shipped in a reasonable sized container when shipping is required.

The camera array should also be designed such that the minimum imaging distance of the array to be small, e.g., as small as possible, which minimizes the “dead zone” where scene elements are not captured because they fall outside of the field of view of adjacent cameras.

It would be advantageous if the camera array can be calibrated for optical alignment by positioning calibration targets where the highest optical distortion is prone to occur (where lens angles of view intersect AND the maximum distortion of the lenses occur). To facilitate the most efficacious calibration target positioning, target locations should, and in some embodiments are, determined formulaically from the rig design.

FIG. 1 shows an exemplary camera configuration 100 used in some embodiments. The support structure shown in FIGS. 4 and 5 is not shown in FIG. 1 to allow for better appreciation of the camera pair arrangement shown used in some embodiments.

While in some embodiments three camera pairs are used such as in the FIG. 1 example in some but not all embodiments a camera array, e.g., the camera positions of the rig, is populated with only 2 of the 6-total cameras which may be used to support simultaneous 360-degree stereoscopic video. When the camera rig or assembly is configured with less than all 6 cameras which can be mounted in the rig, the rig is still capable of capturing the high-value, foreground 180-degree scene elements in real-time while manually capturing static images of the lower-value, background 180-degree scene elements, e.g., by rotating the rig when the foreground images are not being captured. For example, in some embodiments when a 2-camera array is used to capture a football game with the field of play at the 0-degree position relative to the cameras, the array is manually rotated around the nodal point into the 120-degree and 240-degree positions. This allows the action on the field of a sports game or match, e.g., foreground, to be captured in real time and the sidelines and bleachers, e.g., background areas, to be captured as stereoscopic static images to be used to generate a hybridized panorama including real time stereo video for the front portion and static images for the left and right rear portions. In this manner, the rig can be used to capture a 360 degree view with some portions of the 360 view being captured at different points in time with the camera rig being rotated around its nodal axis, e.g., vertical center point between the different points in time when the different view of the 360 scene area are captured. Alternatively, single cameras may be mounted in the second and third camera pair mounting positions and mono (non-stereoscopic) image content captured for those areas.

In other cases where camera cost is not an issue, more than two cameras can be mounted at each position in the rig with the rig holding up to 6 cameras as in the FIG. 1 example. In this manner, cost effect camera deployment can be achieved depending on the performance to be captured and, the need or ability of the user to transport a large number, e.g., 6 cameras, or the user's ability to transport fewer than 6 cameras, e.g., 2 cameras. In some embodiments an environmental depth map is generated from the images captured by the cameras in the camera rig 100.

FIG. 1 depicts a six (6) camera assembly 100 also sometimes referred to as a rig or camera array, along with a calibration target 115. The camera rig 100 illustrated in FIG. 1 includes a support structure (shown in FIGS. 4 and 5) which holds the cameras in the indicated positions, 3 pairs 102, 104, 106 of stereoscopic cameras (101, 103), (105, 107), (109, 111) for a total of 6 cameras. The support structure includes a base 720 also referred to herein as a mounting plate (see element 720 shown in FIG. 4) which supports the cameras and to which plates on which the cameras are mounted can be secured. The support structure maybe made of plastic, metal or a composite material such as graphite or fiberglass, and is represented by the lines forming the triangle which is also used to show the spacing and relationship between the cameras. The center point at which the doted lines intersect represents the center nodal point around which the camera pairs 102, 104, 106 can be rotated in some but not necessarily all embodiments. The center nodal point corresponds in some embodiments to a steel rod or threaded center mount, e.g., of a tripod base, around which a camera support frame represented by the triangular lines can be rotated. The support frame may be a plastic housing in which the cameras are mounted or tripod structure as shown in FIGS. 4 and 5.

In FIG. 1, each pair of cameras 102, 104, 106 corresponds to a different camera pair position. The first camera pair 102 corresponds to a 0 degree forward to front facing position and normally meant to cover the foreground where the main action occurs. This position normally corresponds to the main area of interest, e.g., a field upon which a sports game is being played, a stage, or some other area where the main action/performance is likely to occur. The second camera pair 104 corresponds to a 120 degree camera position (approximately 120 degree from the front facing) degree position) and is used to capture a right rear viewing area. The third camera pair 106 corresponds to a 240 degree viewing position (approximately 240 degree from the front facing) and a left rear viewing area. Note that the three camera positions are 120 degrees apart.

Each camera viewing position includes one camera pair in the FIG. 1 embodiment, with each camera pair including a left camera and a right camera which are used to capture images. The left camera captures what are sometimes referred to as a left eye images and the right camera captures what is sometime referred to as right eye images. The images may be part of a view sequence or still image captured at one or more times. Normally at least the front camera position corresponding to camera pair 102 will be populated with high quality video cameras. The other camera positions may be populated with high quality video cameras, lower quality video cameras or a single camera used to capture still or mono images. In some embodiments the second and third camera embodiments are left unpopulated and the support plate on which the cameras are mounted is rotated allowing the first camera pair 102 to capture images corresponding to all three camera positions but at different times. In some such embodiments left and right rear images are captured and stored and then video of the forward camera position is captured during an event. The captured images may be encoded and streamed in real time, e.g. while an event is still ongoing, to one or more playback devices.

The first camera pair 102 shown in FIG. 1 includes a left camera 101 and a right camera 103. The left camera has a first lens assembly 120 secured to the first camera and the right camera 103 has a second lens assembly secured to the right camera 103. The lens assemblies 120, 120′ include lenses which allow for a wide angle field of view to be captured. In some embodiments each lens assembly 120, 120′ includes a fish eye lens. Thus each of the cameras 102, 103 can capture a 180 degree field of view or approximately 180 degrees. In some embodiments less than 180 degrees is captured but there is still at least some overlap in the images captured from adjacent camera pairs in some embodiments. In the FIG. 1 embodiment a camera pair is located at each of the first (0 degree), second (120 degree), and third (240 degree) camera mounting positions with each pair capturing at least 120 degrees or more of the environment but in many cases with each camera pair capturing 180 degrees or approximately 180 degrees of the environment.

Second and third camera pairs 104, 106 are the same or similar to the first camera pair 102 but located at 120 and 240 degree camera mounting positions with respect to the front 0 degree position. The second camera pair 104 includes a left camera 105 and left lens assembly 122 and a right camera 107 and right camera lens assembly 122′. The third camera pair 106 includes a left camera 109 and left lens assembly 124 and a right camera 111 and right camera lens assembly 124′.

In FIG. 1, D represents the inter-axial distance of the first 102 stereoscopic pair of cameras 101, 103. In the FIG. 1 example D is 117 mm which is the same or similar to the distance between pupils of the left and right eyes of an average human being. Dashed line 150 in FIG. 1 depicts the distance from the panoramic array's center point to the entrance pupil of the right camera lens 120′ (aka nodal offset). In one embodiment corresponding to the FIG. 1 which example the distance indicated by reference number 150 is 315 mm but other distances are possible.

In one particular embodiment the footprint of the camera rig 100 is relatively small. Such a small size allows the camera rig to be placed in an audience, e.g., at a seating position where a fan or attendance might normally be located or positioned. Thus in some embodiments the camera rig is placed in an audience area allowing a viewer to have a sense of being a member of the audience where such an effect is desired. The footprint in some embodiments corresponds to the size of the base to which the support structure including, in some embodiments a center support rod is mounted or support tower is located. As should be appreciated the camera rigs in some embodiments can rotate around the center point of the base which corresponds to the center point between the 3 pairs of cameras. In other embodiments the cameras are fixed and do not rotate around the center of the camera array.

The camera rig 100 is capable of capturing relatively close as well as distinct object. In one particular embodiment the minimum imaging distance of the camera array is 649 mm but other distances are possible and this distance is in no way critical.

The distance from the center of the camera assembly to the intersection point 151 of the views of the first and third camera parts represents an exemplary calibration distance which can be used for calibrating images captured by the first and second camera pairs. In one particular exemplary embodiment, an optimal calibration distance, where lens angles of view intersect and the maximum distortion of the lenses occur is 743 mm. Note that target 115 may be placed at a known distance from the camera pairs located at or slightly beyond the area of maximum distortion. The calibration target include a known fixed calibration pattern. The calibration target can be and is used for calibrating the size of images captured by cameras of the camera pairs. Such calibration is possible since the size and position of the calibration target is known relative to the cameras capturing the image of the calibration target 115.

FIG. 2 is a diagram 200 of the camera array 100 shown in FIG. 1 in greater detail. While the camera rig 100 is again shown with 6 cameras, in some embodiment the camera rig 100 is populated with only two cameras, e.g., camera pair 102 including cameras 101 and 103. As shown there is a 120 degree separation between each of the camera pair mounting positions. Consider for example if the center between each camera pair corresponds to the direction of the camera mounting position. In such a case the first camera mounting position corresponds to 0 degrees, the second camera mounting position corresponds to 120 degrees and the third camera mounting position corresponding to 240 degrees. Thus each camera mounting position is separated by 120 degrees. This can be seen if the center line extending out through the center of each camera pair 102, 104, 106 was extended and the angle between the lines measured.

In the FIG. 2 example, the pair 102, 104, 106 of cameras can, and in some embodiments do, rotate around the center point of the camera rig allowing for different views to be captured at different times without having to alter the position of the camera rig base. That is, the cameras can be rotated around the center support of the rig and allowed to capture different scenes at different times allowing for a 360 degree scene capture using the rig shown in FIG. 2 while it is populated with only two cameras. Such a configuration is particularly desirable from a cost perspective given the cost of stereoscopic cameras and is well suited for many applications where it may be desirable to show a background captured from the same point of view but at a different time than the time at which the front scene including the main action during a sporting event or other event may occur. Consider for example that during the event objects may be placed behind the camera that it would be preferable not to show during the main event. In such a scenario the rear images may be, and sometimes are, captured prior to the main event and made available along with the real time captured images of the main event to provide a 360 degree set of image data.

Various features also relate to the fact that the camera support structure and camera configuration can, and in various embodiments does, maintain a nodal offset distance in a range from 75 mm to 350 mm. In one particular embodiment, a nodal offset distance of 315 mm is maintained. The support structure also maintains, in some embodiments an overall area (aka footprint) in a range from 400 mm²to 700 mm². In one particular embodiment, an overall area (aka footprint) of 640 mm²is maintained. The support structure also maintains a minimal imaging distance in a range from 400 mm to 700 mm. In one particular embodiment, a minimal imaging distance of 649 mm is maintained. In one particular embodiment the optimal calibration distance of the array is where lens angles of view intersect AND the maximum distortion of the lenses occur. In one particular exemplary embodiment this distance is 743 mm.

As discussed above, in various embodiments the camera array, e.g., rig, is populated with only 2 of the 6-total cameras which would normally be required for simultaneous 360-degree stereoscopic video for the purpose of capturing the high-value, foreground 180-degree scene elements in real-time while manually capturing static images of the lower-value, background 180-degree scene elements.

FIG. 3 shows an exemplary camera rig 300 which is the same or similar to the rig of FIGS. 1 and 2 but without a support tripod and with a plastic cover 350 placed over the camera pairs. The plastic cover 350 includes handles 310, 312, 314 which can be used to lift or rotate, e.g., when placed on a tripod, the camera rig 300. The camera rig 300 is shown with three pairs of cameras, a first camera pair 302 including cameras 301, 303 with lens assemblies 320, 320′, a second camera pair 304 including cameras with lens assemblies 322, 322′, and a third camera pair 306 including cameras with lens assemblies 324, 324′. The plastic cover 350 is secured to the mounting platform 316, which may be implemented as a flat plate with one or more slots and screw holes as shown in FIG. 4. The plastic cover 350 is secured to the base with nuts or screws 330, 331 which can be removed or tightened by hand to allow for easy removal or attachment of the cover 350 and easy access to the cameras of the camera pairs. While six cameras are included in the rig 300 shown in FIG. 3, a single camera pair may be included and/or a single camera pair with one or more individual cameras located at the other camera mounting positions where the camera pairs are not mounted may be used.

FIG. 4 is a detailed diagram of a camera rig assembly 400 shown in partially disassembled form to allow better view of how the components are assembled. The camera rig 400 is implemented in accordance with one exemplary embodiment and may have the camera configuration shown in FIGS. 1 and 2. In the example shown in FIG. 4 various elements of the camera rig 400 are shown in disassembled form for clarity and detail. As can be appreciated from FIG. 4, the camera rig 400 includes 3 pairs of cameras 702, 704 and 706, e.g., stereoscopic cameras, which can be mounted on a support structure 720 of the camera rig 400. The first pair of cameras 702 includes cameras 750 and 750′. The second pair of cameras 704 includes cameras 752, 752′ and the third pair of cameras 706 includes cameras 754, 754′. The lenses 701, 701′ of the cameras 750, 750′ can be seen in FIG. 7. While elements 701 and 701′ are described as lenses, in some embodiments they are lens assemblies which are secured to the cameras 750, 750 with each lens assembly including multiple lenses positioned in a lens barrel which is secured to the cameras 750, 750′ via a friction fit or twist lock connection.

In some embodiments the three pairs (six cameras) of cameras 702, 704 and 706 are mounted on the support structure 720 via the respective camera pair mounting plates 710, 712 and 714. The support structure 720 may be in the form of a slotted mounting plate 720. Slot 738 is exemplary of some of the slots in the plate 720. The slots reduce weight but also allow for adjustment of the position of the camera mounting plates 710, 712, 714 used to support camera pairs or in some cases a single camera.

The support structure 720 includes three different mounting positions for mounting the stereoscopic camera pairs 702, 704, 706, with each mounting position corresponding to a different direction offset 120 degrees from the direction of the adjacent mounting position. In the illustrated embodiment of FIG. 7, the first pair of stereoscopic cameras 702 is mounted in a first one of the three mounting positions, e.g., front facing position, and corresponds to a front viewing area. The second pair 704 of stereoscopic cameras 704 is mounted in a second one of the three mounting positions, e.g., background right position rotating 120 degrees clockwise with respect the front position, and corresponds to a different right rear viewing area. The third pair 706 of stereoscopic cameras is mounted in a third one of the three mounting positions, e.g., background left position rotating 240 degrees clockwise with respect the front position, and corresponds to a left rear viewing area. The cameras in each camera position capture at least a 120 viewing area but may capture in many case at least a 180 degree viewing area resulting in overlap in the captured images which can facilities combining of the images into a 360 degree view with some of the overlapping portions being cut off in some embodiments.

The first camera pair mounting plate 710 includes threaded screw holes 741, 741′, 741″ and 741′″ through which screws 704, 740′, 740″, 740″ can be inserted, respectively through slots 738 and 738′; to secure the plate 710 to the support structure 720. The slots allow for adjustment of the position of the support plate 710.

The cameras 750, 750′ of the first camera pair are secured to individual corresponding camera mounting plates 703, 703′ using screws that pass through the bottom of the plates 703, 703′ and extend into threaded holes on the bottom of the cameras 750, 750′. Once secured to the individual mounting plates 703, 703′ the cameras 750, 750′ and mounting plates 703, 703′ can be secured to the camera pair mounting plate 710 using screws. Screws 725, 725′, 725″ (which is not fully visible) and 725′″ pass through corresponding slots 724 into threaded holes 745, 745′, 745″ and 745′″ of the camera pair mounting plate 710 to secure the camera plate 703 and camera 750 to the camera pair mounting plate 710. Similarly, screws 727, 727′(which is not fully visible), 727″ and 727″ pass through corresponding slots 726, 726′, 726″ and 726′″ into threaded holes 746, 746′, 746″ and 746′″ of the camera pair mounting plate 710 to secure the camera plate 703′ and camera 750′ to the camera pair mounting plate 710.

The support structure 720 has standoff rollers 732, 732′ mounted to reduce the risk that an object moving past the support structure will get caught on the support structure as it moves nearby. This reduces the risk of damage to the support structure 720. Furthermore by having a hollow area inside behind the roller an impact to the roller is less likely to be transferred to the main portion of the support structure. That is, the void behind the rollers 732, 732′ allows for some deformation of the bar portion of the support structure on which the standoff roller 732′ is mounted without damage to the main portion of the support structure including the slots used to secure the camera mounting plates.

In various embodiments the camera rig 400 includes a base 722 to which the support structure 720 is rotatable mounted e.g. by a shaft or threaded rod extending trough the center of the base into the support plate 720. Thus in various embodiments the camera assembly on the support structure 720 can be rotated 360 degrees around an axis that passes through the center of the base 722. In some embodiments the base 722 may be part of a tripod or another mounting device. The tripod includes legs formed by pairs of tubes (742, 742′), (742″ and 742″) as well as additional leg which is not visible in FIG. 4 due to the viewing angle. The legs are secured by a hinge to the base 722 and can be folded for transport. The support structure maybe made of plastic, metal or a composite material such as graphite or fiberglass or some combination thereof. The camera pairs can be rotated around a central point, sometimes referred to as center nodal point, in some embodiments.

The assembly 400 shown in FIG. 4 allows for the position of individual cameras to be adjusted from the top by loosing the screws securing the individual camera mounting plates to the camera pair mounting plate and then adjusting the camera position before retightening the screws. The position of a camera pair can be adjusted by moving the camera pair mounting plate after loosening the screws accessible from the bottom side of the support structure 720, moving the plate and then retightening the screws. Accordingly, what the general position and direction of the camera pairs is defined by the slots in the support plate 720, the position and direction can be finely adjusted as part of the camera calibration process to achieve the desired camera alignment while the cameras are secured to the support structure 720 in the field where the camera rig is to be used.

In FIG. 5 reference numbers which are the same as those used in FIG. 4 refer to the same elements. FIG. 5 illustrates a drawing 500 showing the exemplary camera rig 400 in assembled form with additional stabilization plates 502, 502′, 504, 504′, 506 and stabilization plate joining bars 503, 505, 507, 509, 511, 513 added to the tops of the camera pairs to increase the rigidity and stability of the cameras pairs after they have been adjusted to the desired positions.

In the drawing 500 the camera pairs 702, 704, 706 can be seen mounted on the support structure 720 with at least one of the camera pair mounting plate 710 being visible in the illustrated drawing. In addition to the elements of camera rig 400 already discussed above with regard to FIG. 4, in drawing 500 two simulated ears 730, 732 mounted on the camera rig can also be seen. These simulated ears 730, 732 imitate human ears and in some embodiments are made from silicone or plastic molded in the shape of a human ear. Simulated ears 730, 732 include microphones with the two ears being separated from each other by a distance equal to, or approximately equal to, the separation between human ears of an average human. The microphones mounted in the simulated ears 730, 732 are mounted on the front facing camera pair 702 but could alternatively be mounted on the support structure, e.g., platform, 720. The simulated ears 730, 732 are positioned perpendicular to the front surface of the camera pair 702 in a similar manner as human ears are positioned perpendicular to the front surface of eyes on a human head. Holes in the side of the simulated ears 730, 732 act as an audio/sound entry point to the simulated ears with the simulated ears and hole operating in combination to direct audio towards a microphone mounted in each one of the simulated ears much as a human ear directs audio sounds into the eardrum included in a human ear. The microphones in the left and right simulated ears 730, 732 provide for stereo sound capture similar to what a human at the location of the camera rig 500 would perceive via the human's left and right ears if located at the position of the camera rig. The audio input of the microphones mounted in the simulate ears is perpendicular to the face of the outer lens of front facing cameras 750, 750′ in the same manner that the sensor portion of a human ear would be somewhat perpendicular to the humans beings face. The simulate ears direct sound into toward the microphone just as a human ear would direct sound waves towards a human ear drum.

The simulated ears 730, 730 are mounted on a support bar 510 which includes the microphones for capturing sound. The audio capture system 730, 732, 810 is supported by a movable arm 514 which can be moved via handle 515.

While FIGS. 4-5 illustrate one configuration of an exemplary camera rig with three stereoscopic camera pairs, it should be appreciated that other variations are possible. For example, in one implementation the camera rig 400 includes a single pair of stereoscopic cameras which can rotate around the center point of the camera rig allowing for different 120 degree views to be captured at different times. Thus a single camera pair can be mounted on the support structure and rotated around the center support of the rig and allowed to capture different scenes at different times allowing for a 360 degree scene capture.

In other embodiments the camera rig 400 includes a single stereoscopic camera pair 702 and one camera mounted in each of the second and third positions normally used for a pair of stereoscopic cameras. In such an embodiment a single camera is mounted to the rig in place of the second camera pair 704 and another single camera is mounted to the camera rig in place of the camera pair 706. Thus, in such an embodiment, the second camera pair 704 may be thought of as being representative of a single camera and the camera pair 706 may be thought of as being illustrative of the additional single camera.

FIGS. 6-9 illustrate various views of other exemplary camera rigs implemented in accordance with some exemplary embodiments.

FIG. 6 illustrates a drawing 800 showing one view of an exemplary camera rig 801 implemented in accordance with some exemplary embodiments. An array of cameras is included in the camera rig 801 some of which are stereoscopic cameras. In the illustrated view of the camera rig 801 in drawing 800, only a portion of the camera rig 801 is visible while a similar arrangement of cameras exist on the other sides (also referred to as different faces) of the camera rig 801 which cannot be fully seen in the drawing 800. In some but not all embodiments, the camera rig 801 includes 13 cameras secured by a top plastic body or cover 805 and a bottom base cover 842. In some embodiments 8 of these 13 cameras are stereoscopic cameras such as the cameras 804, 806, 812 and 814 in pairs while many other cameras are light field cameras such as cameras 802 and 810 which are visible in the drawing 800 and cameras 815 and 820 which are not fully but partially visible in drawing 800. Various other combinations of the cameras are possible. In some embodiments a camera 825 is also mounted on the top portion of the camera rig 801, e.g., top face 840 of camera rig 801, to capture images of a top hemisphere of an environment of interest. The plastic body/cover 805 includes handles 811, 813, 817 which can be used to lift or rotate the camera rig 801.

In some embodiments the camera rig 801 includes one light field camera (e.g., camera 802) and two other cameras (e.g., cameras 804, 806) forming a stereoscopic camera pair on each longer side of the camera rig 801. In some such embodiments there are four such longer sides (also referred to as the four side faces 830, 832, 834 and 836) with each longer side having one light field camera and one stereoscopic camera pair, e.g., light field camera 802 and stereoscopic camera pair 804, 806 on one longer side 836 to the left while another light field camera 810 and stereoscopic camera pair 812, 814 on the other longer side 830 to the right can be seen in drawing 800. While the other two side faces are not fully shown in drawing 800, they are shown in more detail in FIG. 8. In some embodiments at least some of the cameras, e.g., stereoscopic cameras and the light field cameras, in the camera rig 801 use a fish eye lens. In various embodiments each of the cameras in the camera rig 801 is protected by a corresponding lens/camera guard to protect the camera and/or lens against a physical impact and/or damage that may be caused by an object. For example cameras 802, 804 and 806 are protected by guards 845, 847 and 849 respectively. Similarly cameras 810, 812 and 814 are protected by guards 850, 852 and 854 respectively.

In addition to the stereoscopic camera pair and the light field camera on each of the four side faces 830, 832, 834 and 836, in some embodiments the camera rig 801 further includes a camera 825 facing in the upward vertical direction, e.g., towards the sky or another top ceiling surface in the case of a closed environment, on the top face 840 of the camera rig 801. In some such embodiments the camera 825 on the top face of the camera rig 801 is a light field camera. While not shown in drawing 800, in some other embodiments the top face 840 of the camera rig 801 also includes, in addition to the camera 825, another stereoscopic camera pair for capturing left and right eye images. While in normal circumstances the top hemisphere (also referred to as the sky portion) of a 360 degree environment, e.g., stadium, theater, concert hall etc., captured by the camera 825 may not include action and/or remain static in some cases it may be important or desirable to capture the sky portion at the same rate as other environmental portions are being captured by other cameras on the rig 801.

While one exemplary camera array arrangement is shown and discussed above with regard to camera rig 801, in some other implementations instead of just a single light field camera (e.g., such as cameras 802 and 810) arranged on top of a pair of stereoscopic cameras (e.g., cameras 804, 806 and 812, 814) on four faces 830, 832, 834, 836 of the camera rig 801, the camera rig 801 includes an array of light field cameras arranged with stereoscopic camera pair. For example in some embodiments there are 3 light field cameras arranged on top of a stereoscopic camera pair on each of the longer sides of the camera rig 801. In another embodiment there are 6 light field cameras arranged on top of stereoscopic camera pair on each of the longer sides of the camera rig 801, e.g., with two rows of 3 light field cameras arranged on top of the stereoscopic camera pair. Some of such variations are discussed with regard to FIGS. 12-13. Moreover in another variation a camera rig of the type shown in drawing 800 may also be implemented such that instead of four faces 830, 832, 834, 836 with the cameras pointed in the horizontal direction as shown in FIG. 8, there are 3 faces of the camera rig with cameras pointing in the horizontal direction.

In some embodiments the camera rig 801 may be mounted on a support structure such that it can be rotated around a vertical axis. In various embodiments the camera rig 801 may be deployed in an environment of interest, e.g., such as a stadium, auditorium, or another place where an event to be captured is taking place. In some embodiments the light field cameras of the camera rig 801 are used to capture images of the environment of interest, e.g., a 360 degree scene area of interest, and generate depth maps which can be used in simulating a 3D environment and displaying stereoscopic imaging content.

FIG. 7 illustrates a drawing 900 showing the exemplary camera rig 801 with some elements of the camera rig 801 being shown in a disassembled form for more clarity and detail. Various additional elements of the camera rig 801 which were not visible in the illustration shown in drawing 800 are shown in FIG. 7. In FIG. 7, same reference numbers have been used to identify the elements of the camera rig 801 which were shown and identified in FIG. 6. In drawing 900 at least the two side faces 830 and 836 as well as the top face 840 and bottom face 842 of the camera rig 801 are visible.

In drawing 900 various components of the cameras on two out of four side faces 830, 832, 834, 836 of the camera rig 801 are shown. The lens assemblies 902, 904 and 906 correspond to cameras 802, 804 and 806 respectively of side face 836 of the camera rig 801. Lens assemblies 910, 912 and 914 correspond to cameras 810, 812 and 814 respectively of side face 830 while lens assembly 925 corresponds to camera 825 on the top face of the camera rig 801. Also show in drawing 900 are three side support plates 808, 808′, and 808′″ which are support the top and bottom cover plates 805 and 842 of the camera rig 801. The side support plates 808, 808′, and 808′″ are secured to the top cover 805 and bottom base cover 842 via the corresponding pairs of screws shown in the Figure. For example the side support plate 808 is secured to the top and bottom cover plates 805, 842 via the screw pairs 951 and 956, the side support plate 808′ is secured to the top and bottom cover plates 805, 842 via the screw pairs 952 and 954, and the side support plate 808′″ is secured to the top and bottom cover plates 805, 842 via the screw pairs 950 and 958. The camera rig 801 in some embodiments includes a base support 960 secured to the bottom cover plate 842 via a plurality of screws 960. In some embodiments via the base support 960 the camera rig may be mounted on a support structure such that it can be rotated around a vertical axis, e.g., axis going through the center of base 960. The external support structure may be a tripod or another platform.

FIG. 8 illustrates a drawing 1000 showing a top view of the exemplary camera rig 801 with more elements of the camera rig 801 being shown in greater detail. In the top view of the camera rig 801 the other two side faces 832 and 834 which were not fully visible in drawings 800-900 are more clearly shown. The lens assemblies 915, 916 and 918 correspond to camera 815 and the stereoscopic camera pair on the side face 832 of the camera rig 801. Lens assemblies 920, 922 and 924 correspond to camera 920 and the stereoscopic camera pair on the side face 834 of the camera rig 801.

As can be seen in drawing 1000, the assembly of cameras on each of the four sides faces 830, 832, 834, 836 (small arrows pointing towards the faces) and the top face 840 of the camera rig 801 face in different directions. The cameras on the side faces 830, 832, 834, 836 of the camera rig 801 are pointed in the horizontal (e.g., perpendicular to the corresponding face) while the camera(s) on the top face 840 is pointed in the upward vertical direction. For example as shown in FIG. 8 the cameras on the face 836 of the camera rig 801 (cameras corresponding to lens assemblies 902, 904, 906) are facing in a first direction shown by arrow 1002. The arrow 1004 shows a second direction in which the cameras on the face 830 of the camera rig 801 (cameras corresponding to lens assemblies 910, 912, 914) are facing, arrow 1006 shows a third direction in which the cameras on the face 832 of the camera rig 801 (cameras corresponding to lens assemblies 915, 916, 918) are facing, arrow 1008 shows a fourth direction in which the cameras on the face 834 of the camera rig 801 (cameras corresponding to lens assemblies 920, 922, 924) are facing and arrow 1010 shows a fifth (vertical) direction in which the camera on the top face 840 of the camera rig 801 (camera 825 corresponding to lens assembly 925, is facing. In various embodiments the first, second, third and fourth directions are generally horizontal directions while the fifth direction is a vertical direction. In some embodiments the cameras on the different side faces 830, 832, 834 and 836 are uniformly spaced. In some embodiments the angle between the first, second, third and fourth directions is the same. In some embodiments the first, second, third and fourth directions are different and 90 degrees apart. In some other embodiments the camera rig is implemented such that instead of four side faces the camera rig has 3 side faces with the same or similar camera assemblies as shown in drawings 800-1000. In such embodiments the cameras on the side faces of the camera rig 801 point in three different directions, e.g., a first, second and third direction, with the first, second and third directions being 120 degrees apart.

FIG. 9 illustrates a drawing 1100 showing a view of yet another exemplary camera rig 1101 implemented in accordance with some exemplary embodiments. The exemplary camera rig 1101 is similar to the camera rig 801 in most and many aspects and includes the same or similar configuration of cameras as discussed with regard to camera rig 801 above. The camera rig 1101 includes four side faces 1130, 1132, 1134, 1136 and a top face 1140 similar to camera rig 801. Each of the four side faces 1130, 1132, 1134, 1136 of the camera rig 1101 includes an array of cameras including a light field camera and a pair of stereoscopic camera pair while the top face 1140 of camera rig includes at least one camera device 1125 similar to what has been shown and discussed with regard to camera rig 801. However the camera rig 1101 further includes, in addition to the camera arrays on each of the five faces 1130, 1132, 1134, 1136 and 1140, a sixth bottom face 1142 including at least one camera 1126 facing vertically downward, e.g., towards the ground. In some such embodiments the bottom surface camera 1126 facing vertically downwards and the top face camera 1125 facing vertically upwards are light field cameras. In some embodiments each of the cameras 1125 and 1126 are part of a corresponding stereoscopic camera pair on the top and bottom faces 1140, 1142 of the camera rig 1101.

While the stereoscopic cameras of the camera rigs 801 and 1101 are used to capture stereoscopic imaging content, e.g., during an event, the use of light field cameras allows for scanning the scene area of interest and generate depth maps of various portions of the scene area captured by the light field cameras (e.g., from the captured images corresponding to these portions of the scene of interest). In some embodiments the depth maps of various portions of the scene area may be combined to generate a composite depth map of the scene area. Such depth maps and/or composite depth map may, and in some embodiments are, provided to a playback device for use in displaying stereoscopic imaging content and simulating a 3D environment which can be experienced by the viewers.

FIG. 10 illustrates a front view of an exemplary arrangement 1200 of an array of cameras that can be used in an exemplary camera rig implemented in accordance with the invention such as camera rig 300, camera rig 400 and/or camera rigs 801 and 1101 in accordance with some embodiments. In comparison to the arrangement shown in drawing 800 with a single light field camera arranged on top of a pair of stereoscopic cameras on each of the faces of the camera rig 801, the exemplary arrangement 1200 uses an array of light field cameras 1202, 1204 and 1206 arranged with a stereoscopic camera pair 1208, 1210. The exemplary arrangement 1200 may be, and in some embodiments is, used in a camera rig (such as camera rig 801) implemented in accordance with the invention. In such embodiments each face of the camera rig uses the exemplary arrangement 1200 with three light field cameras (e.g., 1202, 1204 and 1206) arranged with a single pair of stereoscopic cameras (e.g., 1208, 1210). It should be appreciated that many variations in arrangement are possible and are within the scope of the invention.

FIG. 11 illustrates a front view of yet another exemplary arrangement 1300 of an array of cameras that can be used in an exemplary camera rig such as camera rig 801 or any of the other camera rigs discussed earlier, in accordance with some embodiments. In comparison to the arrangement shown in drawing 800 with a single light field camera arranged on top of a pair of stereoscopic cameras, the exemplary arrangement 1300 uses an array of six light field cameras 1302, 1304, 1306, 1308, 1310 and 1312 arranged with a stereoscopic camera pair 1320, 1322. The light field cameras are stacked in two rows of 3 light field cameras arranged one on top of the other with each row including a group of three light field cameras as shown. The exemplary arrangement 1300 may be, and in some embodiments is, used in a camera rig (such as camera rig 801) implemented in accordance with the invention with each face of the camera rig using the arrangement 1300.

While the stereoscopic cameras of the camera rigs discussed above are used to capture stereoscopic imaging content, e.g., during an event, the use of light field cameras allows for scanning the scene area of interest and generate depth maps of various portions of the scene area captured by the light field cameras (from the captured images corresponding to these portions of the scene of interest). In some embodiments the depth maps of various portions of the scene area may be combined to generate a composite depth map of the scene area. Such depth maps and/or composite depth map may, and in some embodiments are, provided to a playback device for use in displaying stereoscopic imaging content and simulating a 3D environment which can be experienced by the viewers.

The use of light field camera on combination with the stereoscopic cameras allows for environmental measurements and generation the environmental depth maps in real time, e.g., during an event being shot, thus obviating the need for deployment of environmental measurements to be performed offline ahead in time prior to the start of an event, e.g., a football game.

While the depth map generated from each image corresponds to a portion of the environment to be mapped, in some embodiments the depth maps generated from individual images are processed, e.g., stitched together, to form a composite map of the complete environment scanned using the light field cameras. Thus by using the light field cameras a relatively complete environmental map can be, and in some embodiments is generated.

In the case of light field cameras, an array of micro-lenses captures enough information that one can refocus images after acquisition. It is also possible to shift, after image capture, one's viewpoint within the sub-apertures of the main lens, effectively obtaining multiple views. In the case of a light field camera, depth cues from both defocus and correspondence are available simultaneously in a single capture. This can be useful when attempting to fill in occluded information/scene portions not captured by the stereoscopic cameras.

The depth maps generated from the light field camera outputs will be current and is likely to accurately measure changes in a stadium or other environment of interest for a particular event, e.g., a concert or game to be captured by a stereoscopic camera. In addition, by measuring the environment from the same location or near the location at which the stereoscopic camera are mounted, the environmental map, at least in some embodiments, accurately reflects the environment as it is likely to be perceived from the perspective of the stereoscopic cameras that are used to capture the event.

In some embodiments images captured by the light field cameras can be processed and used to fill in for portions of the environment which are not captured by a stereoscopic camera pair, e.g., because the position and/or field of view of the stereoscopic camera pair may be slightly different from that of the light field camera and/or due to an obstruction of view from the stereoscopic cameras. For example, when the light field camera is facing rearward relative to the position of the stereoscopic pair it may capture a rear facing view not visible to a forward facing stereoscopic camera pair. In some embodiments output of the light field camera is provided to a playback device separately or along with image data captured by the stereoscopic camera pairs. The playback device can use all or portions of the images captured by the light field camera when display of a scene area not sufficiently captured by the stereoscopic camera pairs is to be displayed. In addition a portion of an image captured by the light field camera may be used to fill in a portion of the a stereoscopic image that was occluded from view from the position of the stereoscopic camera pair but which a user expects to be able to see when he or she shifts his or her head to the left or right relative to the default viewing position corresponding to the location of the stereoscopic camera pair. For example, if a user leans to the left or right in an attempt to peer around a column obstructing his/her view in some embodiments content from one or more images captured by the light field camera will be used to provide the image content which was not visible to the stereoscopic camera pair but which is expected to be visible to the user from the shifted head portion the user achieves during playback by leaning left or right.

Various exemplary camera rigs illustrated in FIGS. 1-9 may be equipped with a variety of different cameras, e.g., normal cameras, stereoscopic camera pairs, light field cameras etc. The exemplary camera rigs are used in various embodiments to capture, e.g., using the equipped cameras, environmental information, e.g., measurements and images, to support various applications in accordance with the features of the present invention.

FIG. 12 illustrates an exemplary system 1400 implemented in accordance with some embodiments of the invention. The system 1400 supports environmental information measurement and capture including image capture, processing and delivery, e.g., imaging content, environmental model and/or texture map delivery, to one or more customer devices, e.g., playback devices/content players, located at customer premises. The system 1400 includes an exemplary imaging apparatus 1404, a stereoscopic imaging system 1406, a processing system 1408, a communications network 1450, and a plurality of customer premises 1410, . . . , 1412. The imaging apparatus 1404 includes one or more light field cameras while stereoscopic imaging system 1406 includes one or more stereoscopic cameras. In some embodiments the imaging apparatus 1404 and the stereoscopic imaging system 1406 are included in an exemplary camera rig 1402 which may be any of the camera rigs discussed earlier with regard to FIGS. 1-9. The camera rig 1402 may include additional imaging and/or environmental measurement devices in addition to the light field camera apparatus and the stereoscopic imaging system 1406. The imaging apparatus 1402 captures and processes imaging content in accordance with the features of the invention. The communications network 1450 may be, e.g., a hybrid fiber-coaxial (HFC) network, satellite network, and/or internet.

The processing system 1408 is configured to process imaging data received from the one or more light field cameras 1404 and one or more stereoscopic cameras included in the stereoscopic imaging system 1406, in accordance with the invention. The processing performed by the processing system 1408 includes generating depth map of the environment of interest, generating 3D mesh models and UV maps and communicating them to one or more playback devices in accordance with some features of the invention. The processing performed by the processing system 1408 further includes processing and encoding stereoscopic image data received from the stereoscopic imaging system 1406 and delivering that to one or more playback devices for use in rendering/playback of stereoscopic content generated from stereoscopic cameras.

In some embodiments the processing system 1408 may include a server with the server responding to requests for content, e.g., depth map corresponding to environment of interest and/or 3D mesh model and/or imaging content. The playback devices may, and in some embodiments do, use such information to simulate a 3D environment and render 3D image content. In some but not all embodiments the imaging data, e.g., depth map corresponding to environment of interest and/or imaging content generated from images captured by the light field camera device of the imaging apparatus 1404, is communicated directly from the imaging apparatus 1404 to the customer playback devices over the communications network 1450.

The processing system 1408 is configured to stream, e.g., transmit, imaging data and/or information to one or more customer devices, e.g., over the communications network 1450. Via the network 1450, the processing system 1408 can send and/or exchange information with the devices located at the customer premises 1410, 1412 as represented in the figure by the link 1409 traversing the communications network 1450. The imaging data and/or information may be encoded prior to delivery to one or more playback devices.

Each customer premise 1410, 1412 may include a plurality of devices/players, which are used to decode and playback/display the imaging content, e.g., captured by stereoscopic cameras 1406 and/or other cameras deployed in the system 100. The imaging content is normally processed and communicated to the devices by the processing system 1408. The customer premise 1 1410 includes a decoding apparatus/playback device 1422 coupled to a display device 1420 while customer premise N 1412 includes a decoding apparatus/playback device 1426 coupled to a display device 1424. In some embodiments the display devices 1420, 1424 are head mounted stereoscopic display devices. In some embodiments the playback devices 1422, 1426 receive and use the depth map of the environment of interest and/or 3D mesh model and UV map received from the processing system 1408 in displaying stereoscopic imaging content generated from stereoscopic content captured by the stereoscopic cameras.

In various embodiments playback devices 1422, 1426 present the imaging content on the corresponding display devices 1420, 1424. The playback devices 1422, 1426 may be devices which are capable of decoding stereoscopic imaging content captured by stereoscopic camera, generate imaging content using the decoded content and rendering the imaging content, e.g., 3D image content, on the display devices 1420, 1424. In various embodiments the playback devices 1422, 1426 receives the image data and depth maps and/or 3D mesh models from the processing system 1408 and use them to display 3D image content.

FIG. 13, which comprises a combination of FIGS. 13A and 13B, illustrates a flowchart 1500 of an exemplary method of operating an imaging system in accordance with some embodiments. The method of flowchart 1500 is implemented in some embodiments using the imaging system including image capturing devices and a processing system. The image capturing devices, e.g., light field cameras and/or stereoscopic cameras, in the system may be included in and/or mounted on the various camera rigs shown in the drawings and discussed in detail above.

The method starts in step 1502, e.g., with the imaging system being powered on and initialized. The method proceeds from start step 1502 to steps 1504, 1506, 1508 which may be performed in parallel by different elements of the imaging system, e.g., one or more cameras and a processing system.

In step 1506 the processing system acquires static environmental depth map corresponding to an environment of interest, e.g., by downloading it on the system and/or uploading it on the processing system a storage medium including the environmental depth map. The environment of interest may be, e.g., a stadium, an auditorium, a field etc. where an event of interest takes place. In various embodiments the event is captured, e.g., recorded, by one or more camera devices including stereoscopic cameras and light field cameras. The static environmental depth map includes environmental measurements of the environment of interest that have been previously made, e.g., prior to the event and thus are called static. Static environmental depth maps for various famous environments of interests, e.g., known stadiums, auditoriums etc., where events occur are readily available however such environmental depth maps do not take into consideration dynamic changes to the environment that may occur during an event and/or other changes that may have occurred since the time when the environmental measurements were made. The static depth map of environment of interest may be generated using various measurement techniques, e.g., using LIDAR and/or other methods. Operation proceeds from step 1504 to step 1510. While in various embodiments the processing systems acquires static depth map when available, in case when static depth map is not available operation proceeds to next step 1510.

In step 1510 it is checked if the static depth map is available, e.g., to the processing system. If the static depth map is available the operation proceeds from step 1510 to step 1512 otherwise the operation proceeds to step 1518. In step 1512 the processing system sets the current depth map (e.g., base environmental depth map to be used) to be the static depth map. In some embodiments when the system is initialized and depth maps from other sources are not available then the processing system initially sets the current depth map to be the static depth map. Operation proceeds from step 1512 to step 1518.

Referring to steps along the path corresponding to step 1506. In step 1506 stereoscopic image pairs of portions of the environment of interest, e.g., left and right eye images, are captured using one or more stereoscopic camera pair(s). In some embodiments the stereoscopic camera pair(s) capturing the images are mounted on the camera rigs implemented in accordance with various embodiments discussed above. Operation proceeds from step 1506 to step 1514. In step 1514 the captured stereoscopic image pairs are received at the processing system. Operation proceeds from step 1514 to step 1516. In step 1516 an environmental depth map (e.g., composite depth map of the environment of interest) is generated from the one or more stereoscopic image pairs. Operation proceeds from step 1516 to step 1518.

Returning to step 1518. In step 1518 the processing system determines if the environmental depth map generated from the one or more stereoscopic image pairs is available (for example in some cases when the stereoscopic camera pair(s) have not started capturing stereoscopic images and/or the environmental depth map has not yet been generated, the environmental depth map based on the stereoscopic images may not be available to the processing system). If in step 1518 it is determined that environmental depth map generated from the one or more stereoscopic image pairs is available the operation proceeds from step 1518 to step 1520 otherwise the operation proceeds to step 1530.

In step 1520 it is determined if a current depth map has already been set. If it is determined that the current depth map has not been set, the operation proceeds to step 1522 where the processing system sets the current depth map to be the environmental depth map generated from the one or more stereoscopic image pairs. Operation proceeds from step 1522 to step 1530. If in step 1520 it is determined that the current depth map has already been set, (e.g., for example the static depth map may have been set as the current depth map) the operation proceeds to step 1524 where the processing system reconciles the environmental depth map generated from the one or more stereoscopic image pairs with the current depth map. After reconciling operation completes the reconciled environmental depth map is set as the current depth map. In various embodiments the reconciled depth map has more and enhanced depth information compared to either one of the two individual depth maps used for reconciliation. Operation proceeds from step 1524 to step 1530.

Referring to steps along the path corresponding to step 1508. In step 1508 images of portions of the environment of interest are captured using one or more light field cameras. In some embodiments the one or more light field cameras capturing the images are mounted on the camera rigs implemented in accordance with various embodiments discussed above. Operation proceeds from step 1508 to step 1526. In step 1526 the images captured by the light field cameras are received at the processing system optionally along with depth maps of the portions of the environment of interest. Thus in some embodiments the one or more light field cameras generate depth maps of portions of the environment from the captured images and provides them to the processing system. In some other embodiments the images captured by the light field cameras are provided and the processing system generates depth maps of portions of the environment of interest. Operation proceeds from step 1526 to step 1528. In step 1528 an environmental depth map (e.g., composite depth map of the environment of interest) is generated from the one or more received images captured by the light field cameras and/or from the depth maps of portions of the environment of interest. Operation proceeds from step 1528 to step 1530.

Returning to step 1530. In step 1530 the processing system determines if the environmental depth map, generated from the image captured by the light field cameras or from the depth maps of one or more portions of the environment of interest, is available to the processing system. If in step 1530 it is determined that environmental depth map is available the operation proceeds from step 1530 to step 1532 otherwise the operation proceeds to step 1542 via connecting node B 1540.

In step 1532 it is determined if a current depth map has already been set. If it is determined that the current depth map has not been set, the operation proceeds from step 1532 to step 1534 where the processing system sets the current depth map to be the environmental depth map generated from the one or more received images captured by the light field cameras and/or from the depth maps of portions of the environment of interest. Operation proceeds from step 1534 to step 1546 via connecting node A 1538. If in step 1532 it is determined that the current depth map has already been set, (e.g., for example the static depth and/or environmental depth map generated from stereoscopic images and/or reconciled depth map may have been set as the current depth map) the operation proceeds to step 1536 where the processing system reconciles the environmental depth map generated in step 1528 from the one or more received images captured by the light field cameras with the current depth map. After reconciling operation completes the reconciled environmental depth map is set as the current depth map. Operation proceeds from step 1536 to step 1546 via connecting node A 1538.

If in step 1530 it is determined that environmental depth map is not available the operation proceeds from step 1530 to step 1542 via connecting node B 1540. In step 1542 it is determined if a current depth map has already been set. If it is determined that the current depth map has not been set, the operation proceeds from step 1542 to step 1544 where the processing system sets the current depth map to a default depth map corresponding to a sphere since no other environmental depth map is available to the processing system. Operation proceeds from step 1544 to step 1546.

In step 1542 if it is determined if a current depth map has already been set (e.g., set to one of the generated/reconciled environmental depth maps or the static depth map or the default sphere environmental depth) the operation proceeds from step 1542 to step 1546.

Returning to step 1546. In step 1546 the processing system outputs the current depth map. The current environmental depth map may be, and in various embodiments is, provided to one or more customer rendering and playback devices, e.g., for use in displaying 3D imaging content. The environmental depth map may be generated multiple times during an event, e.g., a game and/or other performance, as things may change dynamically during the event which may impact the environment of interest and thus updating the environmental depth map to keep it current is useful if the system is to be provide information and imaging content which can be used to provide a real life 3D experience to the viewers. It should be appreciated that method discussed with regard to flowchart 1500 allows for generating an enhanced and improved environmental depth map based on depth information from multiple sources, e.g., static depth maps, depth maps generated using images captured by one or more stereoscopic camera pairs and/or depth maps generated using images captured by one or more light field cameras.

FIGS. 14A and 14B in combination, illustrate a method of generating and updating 3D mesh models and UV maps in accordance with an exemplary embodiment that is well suited for use with the method shown in FIGS. 13A and 13B

FIGS. 15C and 15D in combination, illustrate a flowchart 1550 of a method of generating and updating 3D mesh models and UV maps in accordance with an exemplary embodiment that is well suited for user with the method shown in FIGS. 15A and 15B. In accordance with one aspect of some embodiments, the generation, transmission and updating of the 3D mesh model and UV map may be triggered by detection of significant changes to environmental depth information obtained from one or more depth measurement sources, e.g., the light field camera outputs and/or stereoscopic camera pair output. In some embodiments various steps of the method of flowchart 1550 are performed by the processing system 1408 of system 1400. The method starts in step 1552 and proceeds to 1554. In step 1554 a current environmental depth map, e.g., a first environmental depth map, is received (e.g., selected from the environmental depth maps generated by the processing system using input from one or more depth measurement sources).

Operation proceeds from step 1554 to 1556. In step 1556 a first 3D mesh model is generated from the current environmental depth map. Operation proceeds from step 1556 to 1558. In step 1558 a first UV map to be used for wrapping frames (e.g., frames of images) onto the first 3D mesh model is generated. Operation proceeds from step 1558 to 1560 wherein the first 3D mesh model and the first UV map is communicated, e.g., transmitted, to a playback device.

Operation proceeds from step 1560 to step 1562. In step 1562 the processing system initializes a current 3D mesh model and UV map to the first 3D mesh model and the first UV map respectively, e.g., by setting the current 3D mesh model as the first 3D mesh model and current UV map as the first UV map. Operation proceeds from step 1562 to step 1564. In step 1564 the processing system receives current environmental depth map, e.g., a new environmental depth map.

Operation proceeds from step 1564 to step 1566 where it is determined whether the current environmental depth map reflect a significant environmental change from the environmental depth map used to generate the current 3D mesh model. In some embodiments, the system processing the depth information monitors the depth information to detect a significant change in the depth information, e.g., a change in depth information over a predetermined amount. In some embodiments detection of such a significant change triggers updating of the current mesh model and/or UV map. Thus if in step 1566 it is determined that a significant environmental change is detected between the current environmental depth map and the environmental depth map used to generate the current 3D mesh model, the operation proceeds to step 1568 otherwise the operation proceeds back to step 1564.

Following the determination that a significant environmental change is detected, in step 1568 the processing system generates an updated 3D mesh model from the new current environmental depth map. Operation proceeds from step 1568 to step 1570. In step 1570 an updated UV map to be used for wrapping frames onto the updated 3D mesh model is generated.

Operation proceeds from step 1570 to step 1574 via connecting node M 1572. In step 1574 3D mesh model difference information is generated. In various embodiments the 3D mesh model difference information includes information reflecting the difference between the new updated 3D mesh model and the currently used 3D mesh model, e.g., first 3D mesh model. In some cases communicating the difference information to a playback device is more efficient rather than communicating the entire updated 3D mesh model. In such cases by using the received different information the playback device can, and in various embodiments does, updates its current 3D mesh model to generate an updated mesh model. While the 3D mesh model difference information is generated in some embodiments, e.g., where it is determined that it is more convenient and/or efficient to send difference information rather than the entire updated mesh model, step 1574 is optional and not necessarily performed in all embodiments. Operation proceeds from step 1574 to step 1576. In step 1576, which is optional too, UV map difference information is generated, where the UV map difference information reflects the difference between the new updated UV map and the currently used UV map, e.g., first UV map.

Operation proceeds from step 1576 to step 1578. In step 1578 the processing system communicates updated 3D mesh model information, e.g., the generated updated 3D mesh model or the mesh model difference information, to a playback device. Operation proceeds from step 1578 to step 1580. In step 1580 the processing system communicates updated UV map information, e.g., the generated updated UV map or the UV map difference information, to a playback device.

Operation proceeds from step 1580 to step 1582. In step 1582 the processing system sets the current 3D mesh model to be the updated 3D mesh model. Operation proceeds from step 1582 to step 1584. In step 1584 the processing system sets the current UV map to be the updated UV map. It should be appreciated that the updated mesh model and UV map is based on current depth measurements making the new mesh model and/or UV map more accurate than the older mesh models and/or maps based on depth measurement taken at a different time. Operation proceeds from step 1584 back to 1564 via connecting node N 1585 and the process continues in the manner as discussed above.

FIG. 15 illustrates an exemplary light field camera 1600 implemented in accordance with one exemplary embodiment of the present invention which can be used in any of the camera rigs discussed above and shown in the preceding figures. The exemplary camera device 1600 includes a display device 1602, an input device 1604, an I/O interface 1606, a processor 1608, memory 1610, and a bus 1609 which are mounted in a housing represented by the rectangular box touched by the line leading to reference number 1600. The camera device 1600 further includes an optical chain 1612 and a network interface 1614. The various components are coupled together via bus 1609 which allows for signals and information to be communicated between the components of the camera 1600.

The display device 1602 may be, and in some embodiments is, a touch screen, used to display images, video, information regarding the configuration of the camera device, and/or status of data processing being performed on the camera device. In the case where the display device 1602 is a touch screen, the display device 1602 serves as an additional input device and/or as an alternative to the separate input device, e.g., buttons, 1604. The input device 1604 may be, and in some embodiments is, e.g., keypad, touch screen, or similar device that may be used for inputting information, data and/or instructions.

Via the I/O interface 1606 the camera device 1600 may be coupled to external devices and exchange information and signaling with such external devices. In some embodiments via the I/O interface 1606 the camera 1600 may, and in some embodiments does, interfaces with the processing system 1600. In some such embodiments the processing system 1600 can be used to configure and/or control the camera 1600.

The network interface 1614 allows the camera device 1600 to be able to receive and/or communicate information to an external device over a communications network. In some embodiments via the network interface 1614 the camera 1600 communicates captured images and/or generated depth maps to other devices and/or systems over a communications network, e.g., internet and/or other network.

The optical chain 1610 includes a micro lens array 1624 and an image sensor 1626. The camera 1600 uses the micro lens array 1624 to capture light information of a scene of interest coming from more than one direction when an image capture operation is performed by the camera 1600.

The memory 1612 includes various modules and routines, which when executed by the processor 1608 control the operation of the camera 1600 in accordance with the invention. The memory 1612 includes control routines 1620 and data/information 1622. The processor 1608, e.g., a CPU, executes control routines and uses data/information 1622 to control the camera 1600 to operate in accordance with the invention and implement one or more steps of the method of flowchart 1500. In some embodiments the processor 1608 includes an on-chip depth map generation circuit 1607 which generates depth map of various portions of the environment of interest from captured images corresponding to these portions of the environment of interest which are captured during the operation of the camera 1600 in accordance with the invention. In some other embodiments the camera 1600 provides captured images 1628 to the processing system 1600 which generates depth maps using the images captured by the light field camera 1600. The depth maps of various portions of the environment of interest generated by the camera 1600 are stored in the memory 1612 as depth maps 1630 while images corresponding to one or more portions of the environment of interest are stored as captured image(s) 1628. The captured images and depth maps are stored in memory 1612 for future use, e.g., additional processing, and/or transmission to another device. In various embodiments the depth maps 1630 generated by the camera 1600 and one or more captured images 1628 of portions of the environment of interest captured by the camera 1600 are provided to the processing system 1408, e.g., via interface 1606 and/or 1614, for further processing and actions in accordance with the features of the invention. In some embodiments the depth maps and/or captured images are provided, e.g., communicated by the camera 1500, to one or more customer devices.

FIG. 17 illustrates an exemplary processing system 1700 in accordance with the features of the invention. The processing system 1700 can be used to implement one or more steps of the method of flowchart 1500. The processing system 1700 includes multi-rate encoding capability that can be used to encode and stream stereoscopic imaging content. The exemplary processing system 1700 may be used as the processing system 1408 of system 1400.

The processing system 1700 may be, and in some embodiments is, used to perform composite environmental depth map generation operation, multi-rate encoding operation, storage, and transmission and/or content output in accordance with the features of the invention. The processing system 1700 may also include the ability to decode and display processed and/or encoded image data, e.g., to an operator.

The system 1700 includes a display 1702, input device 1704, input/output (I/O) interface 1706, a processor 1708, network interface 1710 and a memory 1712. The various components of the system 1700 are coupled together via bus 1709 which allows for data to be communicated between the components of the system 1700.

The memory 1712 includes various routines and modules which when executed by the processor 1708 control the system 1700 to implement the composite environmental depth map generation, environmental depth map reconciling, encoding, storage, and streaming/transmission and/or output operations in accordance with the invention.

The display device 1702 may be, and in some embodiments is, a touch screen, used to display images, video, information regarding the configuration of the processing system 1700, and/or indicate status of the processing being performed on the processing device. In the case where the display device 602 is a touch screen, the display device 602 serves as an additional input device and/or as an alternative to the separate input device, e.g., buttons, 1706. The input device 1704 may be, and in some embodiments is, e.g., keypad, touch screen, or similar device that may be used for inputting information, data and/or instructions.

Via the I/O interface 1706 the processing system 1700 may be coupled to external devices and exchange information and signaling with such external devices, e.g., such as the camera rig 801 and/or other camera rigs shown in the figures and/or light field camera 1600. The I/O interface 1606 includes a transmitter and a receiver. In some embodiments via the I/O interface 1706 the processing system 1700 receives images captured by various cameras, e.g., stereoscopic camera pairs and/or light field cameras (e.g., camera 1600), which may be part of a camera rig such as camera rig 801.

The network interface 1710 allows the processing system 1700 to be able to receive and/or communicate information to an external device over a communications network, e.g., such as communications network 105. The network interface 1710 includes a multiport broadcast transmitter 1740 and a receiver 1742. The multiport broadcast transmitter 1740 allows the processing system 1700 to broadcast multiple encoded stereoscopic data streams each supporting different bit rates to various customer devices. In some embodiments the processing system 1700 transmits different portions of a scene, e.g., 180 degree front portion, left rear portion, right rear portion etc., to customer devices via the multiport broadcast transmitter 1740. Furthermore, in some embodiments via the multiport broadcast transmitter 1740 the processing system 1700 also broadcasts a current environmental depth map to the one or more customer devices. While the multiport broadcast transmitter 1740 is used in the network interface 1710 in some embodiments, still in some other embodiments the processing system transmits, e.g., unicasts, the environmental depth map, 3D mesh model, UV map, and/or stereoscopic imaging content to individual customer devices.

The memory 1712 includes various modules and routines, which when executed by the processor 1708 control the operation of the system 1700 in accordance with the invention. The processor 1708, e.g., a CPU, executes control routines and uses data/information stored in memory 1712 to control the system 1700 to operate in accordance with the invention and implement one or more steps of the method of flowchart of FIGS. 13 and 14. The memory 1712 includes control routines 1714, image encoder(s) 1716, a depth map availability determination module 1717, a composite depth map generation module 1718, a current depth map determination module 1719, streaming controller 1720, an image generation module 1721, a depth map reconciliation module 1722, a 3D mesh model generation and update module 1740, a UV map generation and update module 1742, received images 1723 of environment of interest captured by one or more light field cameras, optional received depth maps of the environment of interest 1725, received stereoscopic image data 1724, encoded stereoscopic image data 1728, acquired static depth map 1730, environmental depth map generated from stereoscopic image pairs 1732, environmental depth map generated from images captured by one or more light field cameras 1734, reconciled environmental depth map(s) 1736, a default depth map corresponding to a sphere 1738, generated 3D mesh model(s) 1744, generated UV map(s) 1746, current 3D mesh model 1748, current UV map 1750.

In some embodiments the modules are, implemented as software modules. In other embodiments the modules are implemented outside the memory 1612 in hardware, e.g., as individual circuits with each module being implemented as a circuit for performing the function to which the module corresponds. In still other embodiments the modules are implemented using a combination of software and hardware. In the embodiments where one or more modules are implemented as software modules or routines, the modules and/or routines are executed by the processor 1708 to control the system 1700 to operate in accordance with the invention and implement one or more operations discussed with regard to flowcharts 1500 and/or 1550.

The control routines 1714 include device control routines and communications routines to control the operation of the processing system 1700. The encoder(s) 1716 may, and in some embodiments do, include a plurality of encoders configured to encode received image content, stereoscopic images of a scene and/or one or more scene portions in accordance with the features of the invention. In some embodiments encoder(s) include multiple encoders with each encoder being configured to encode a stereoscopic scene and/or partitioned scene portions to support a given bit rate stream. Thus in some embodiments each scene portion can be encoded using multiple encoders to support multiple different bit rate streams for each scene. An output of the encoder(s) 1716 is the encoded stereoscopic image data 1728 stored in the memory for streaming to customer devices, e.g., playback devices. The encoded content can be streamed to one or multiple different devices via the network interface 1710.

The composite depth map generation module 1717 is configured to generate a composite environmental depth maps of the environment of interest from the images captured by various cameras, e.g., stereoscopic camera pairs and one or more light field cameras. Thus the composite depth map generation module 1717 generates the environmental depth map 1732 from stereoscopic image pairs, the environmental depth map 1734 from images captured by one or more light field cameras.

The depth map availability determination module 1718 is configured to determine whether a given depth map is available at a given time, e.g., whether a static depth map is available and/or whether an environmental depth map generated from images captured by light field cameras is available and/or whether an environmental depth map generated from images captured by stereoscopic camera pairs is available, at given times.

The current depth map determination module 1719 is configured to determine if a current depth map has been set. In various embodiments the current depth map determination module 1719 is further configured to set one of the environmental depth map or a reconciled depth map as the current depth map in accordance with the features of the invention. For example when a reconciled environmental depth map is available, e.g., having been generated by reconciling environmental depth maps generated from two or more sources, the current depth map determination module 1719 sets the reconciled environmental depth map as the current depth map.

The streaming controller 1720 is configured to control streaming of encoded content for delivering the encoded image content (e.g., at least a portion of encoded stereoscopic image data 1728) to one or more customer playback devices, e.g., over the communications network 105. In various embodiments the streaming controller 1720 is further configured to communicate, e.g., transmit, an environmental depth map that has been set as the current depth map to one or more customer playback devices, e.g., via the network interface 1710.

The image generation module 1721 is configured to generate a first image from at least one image captured by the light field camera, e.g., received images 1723, the generated first image including a portion of the environment of interest which is not included in at least some of the stereoscopic images (e.g., stereoscopic image content 1724) captured by the stereoscopic cameras. In some embodiments the streaming controller 1720 is further configured to transmit at least a portion of the generated first image to one or more customer playback devices, e.g., via the network interface 1710.

The depth map reconciliation module 1722 is configured to perform depth map reconciling operations in accordance with the invention, e.g., by implementing the functions corresponding to steps 1526 and 1536 of flowchart 1500. The 3D mesh model generation and update module 1740 is configured to generate a 3D mesh model from a current environmental depth map (e.g., reconciled depth map or environmental depth map that has been set as the current environmental depth map). The module 1740 is further configured to update the 3D mesh model when significant environmental changes have been detected in a current environmental depth map compared to the environmental depth map used to generate the current 3D mesh model. In some embodiments the generated 3D mesh model(s) 1744 may include one or more 3D mesh models generated by module 1740 and the most recently updated 3D mesh model in the 3D mesh model(s) 1744 is set as the current 3D mesh model 1748. The UV map generation and update module 1742 is configured to generate a UV map to be used in wrapping frames onto the generated 3D mesh model. The module 1742 is further configured to update the UV map. The generated UV map(s) 1746 may include one or more UV maps generated by module 1742 and the most recently updated UV map in the generated UV map(s) 1746 is set as the current UV map 1750. In some embodiments the modules are configured to perform the functions corresponding to various steps discussed in FIGS. 14A and 14B.

Received stereoscopic image data 1724 includes stereoscopic image pairs captured by received from one or more stereoscopic cameras, e.g., such as those included in the rig 801. Encoded stereoscopic image data 1728 includes a plurality of sets of stereoscopic image data which have been encoded by the encoder(s) 1716 to support multiple different bit rate streams.

The static depth map 1730 is the acquired, e.g., downloaded, depth map of the environment of interest. The environmental depth map generated from images captured by stereoscopic camera pairs 1732 and the environmental depth map generated from images captured by one or more light field cameras 1734 are outputs of the composite environmental depth map generation module 1717. The reconciled environmental depth map(s) 1736 includes one or more environmental depth maps generated by the reconciliation module 1722 in accordance with the invention. The default depth map corresponding to a sphere 1738 is also stored in memory 1712 for use in the event when an environmental depth map is not available from other sources, e.g., when none of the static depth map 1730, environmental depth map 1732 and environmental depth map 1734 is available for use. Thus in some embodiments the reconciled environmental depth map(s) 1736 is set as the current environmental depth map and used in generating 3D mesh models.

In some embodiments generation, transmission and updating of the 3D mesh model and UV map may be triggered by detection of significant changes to environmental depth information obtained from one or more depth measurement sources, e.g., the light field camera outputs and/or stereoscopic camera pair output. See for example FIGS. 14A and 14B which in combination show a 3D model updating process. In some embodiments, the system processing the depth information monitors the depth information to detect a significant change in the depth information, e.g., a change in depth over a predetermined amount, e.g., over 20% of the original measured distance to the perimeter of the environment for an area corresponding to a portion of the environment, e.g. a portion over a predetermined threshold size, e.g., 5%, 10%, 20% or some other amount. In response to detecting such a change, a new model and/or UV map is generated and transmitted to the playback devices. The new map is based on current depth measurements making the new mesh model and/or map more accurate than the old mesh model and/or map based on depth measurement taken at a different time. Since, in some embodiments the depth measurements are made during an event on an ongoing basis and/or are based on environmental measurement made from images (light field and/or stereoscopic image pairs) captured during an event, 3D models can be generated in response to changes in the environment, e.g., changes representing a significant change in distance from the camera position from which images used as textures are captured to an object or edge of the environment, e.g., a wall, roof, curtain, etc. or changes in overall volume, e.g., due to a roof retracting, a wall moving, etc.

A complete new 3D model or model difference information maybe, and in some embodiments is, transmitted to the playback device as updated model information. In addition to the generation and transmission of updated 3D model information, updated UV map information maybe, and some embodiments is, generated and transmitted to the playback device to be used when rendering images using the updated 3D model information. Mesh model and/or UV map updates are normally timed to coincide with scene changes and/or to align with group of picture (GOP) boundaries in a transmitted image stream. In this way, application of the new model and/or map will normally begin being applied in the playback device at a point where decoding of a current frame does not depend on a frame or image which was to be rendered using the older model or map since each GOP boundary normally coincides with the sending of intra-frame coded image data. Since the environmental changes will frequently coincide with scene changes such as the closing of a curtain, moving of a wall, etc. the scene change point is a convenient point to implement the new model and in many cases will coincide with the event that triggered the generation and transmission of the updated model information and/or updated UV map.

FIG. 17 illustrates the steps of a method 1800 of operating a playback device in on exemplary embodiment. In some embodiments the playback and rendering system 1900 is used to implement the steps of the method of flowchart 1900. In the FIG. 17 exemplary embodiment the playback device receives information, e.g., 3D model information and a UV map and then at a later time, e.g., while an event is on-going in the case of live streaming, receives updated model and/or UV map information reflecting changes to the 3D environment being modeled. For example a stage change and/or intermission event may have environment changes associated with it which may be reflected in the new model information and/or UV map. The updated model information is communicated to, and received by the playback device as difference information in some embodiments with the playback device using received information indicating changes from the original model in combination with the original model information, e.g., original set of node coordinates in X, Y, Z space defining the mesh, to produce the updated mesh model, e.g., by replacing some coordinates in the set of coordinates defining the first mesh model with coordinates in the updated mesh model information to create an updated mesh model. While model difference information is received and used to create the updated mesh model in some embodiments, in other embodiments or in cases where there are changes to the majority of a previously supplied model, a complete new mesh model may be, and sometimes is, received as part of the updated mesh model information by the playback device. The mesh model update information may be based on depth measurements, e.g., environmental distance measurements based on light field camera and/or stereoscopic image data captured during the event.

In addition to receiving a updated mesh model, in many cases the playback device receives a corresponding UV map to be used to map images, e.g., frames, to the 3D space, e.g., onto a 3D mesh model defining the 3D environmental space. The frames may be, and sometimes are, generated from image data captured by one or more stereoscopic camera pairs mounted on a camera rig which also includes one or more light field cameras, e.g., Lytro cameras, used to capture depth information useful in updating a 3D map. While new or updated UV map information is often received when updated mesh model information is received, if the number of nodes in the 3D mesh model remains the same before and after an update, the UV map may not be updated at the same time as the 3D mesh model. UV map information may be transmitted and received as a complete new map or as difference information. Thus, in some embodiments UV map difference information is received and processed to generate an updated UV map. The updated difference map maybe and sometimes is, generated by applying the differences indicated in the updated UV map information to the previous UV map.

The method of flowchart 1800 begins in start step 1802 with a playback device such as a game console and display or head mounted display assembly being powered on and set to begin receiving, storing and processing 3D related image data, e.g., frames representing texture information produced from captured images, model information and/or UV maps to be used in rendering images. Operation proceeds from start step 1802 to step 1804 in which information communicating a first mesh model of a 3D environment, e.g., a stadium, theater, etc., generated based on measurements of at least a portion of the environment made using a light field camera at a first time is received and stored, e.g., in memory. The model maybe, and sometimes is, in the form of a set of 3D coordinates (X, Y, Z) indicating distances to nodes from an origin corresponding to a user viewing position. The node coordinates define a mesh model. Thus in some embodiments the first mesh model information includes a first set of coordinate triples, each triple indicating a coordinate in X, Y, Z space of a node in the first mesh model.

The mesh model includes segments formed by the interconnection of the nodes points in an indicated or predetermined manner. For example, each node in all or a portion of the mesh may be coupled to the nearest 3 adjacent nodes for portions of the mesh model where 3 sided segments are used. In portions of the mesh model where four sided segments are used, each node may be known to interconnect with its four nearest neighbors. In addition to node location information, the model may, and in some embodiments does, include information about how nodes in the model are to be interconnected. In some embodiments information communicating the first mesh model of the 3D environment includes information defining a complete mesh model.

Operation proceeds from step 1804 to step 1806 in which a first map, e.g., a first UV map, indicating how a 2D image, e.g., received frame, is to be wrapped onto the first 3D model is received. The first UV map usually includes one segment for each segment of the 3D model map with there being a one to one indicated or otherwise known correspondence between the first UV map segments to the first 3D model segments. The first UV map can, and is used, as part of the image rendering process to apply, e.g., wrap, the content of 2D frames which correspond to what is sometimes referred to as UV space to the segments of the first 3D model. This mapping of the received textures in the form of frames corresponding to captured image data to the 3D environment represented by the segments of the 3D model allows received left and right eye frames corresponding to stereoscopic image pairs to be rendered into images which are to be viewed by the user's left and right eyes, respectively.

The receipt of the first 3D model and first rendering map, e.g., a first UV map, can occur together or in any order and are shown as sequential operation in FIG. 17 for purposes of providing a simple to understand example. Operation process from step 1806 to step 1808 in which image content, e.g., one or more frames is received. The image content maybe, and in some embodiments is, in the form of stereoscopic image data where pairs of left and right eye images are received with the content for each eye sometimes being represented as a single frame of a stereoscopic frame pair. The image content received in step 1808 will normally be a sequence of frame pairs, e.g., a video sequence, corresponding to a portion of an event.

Operation proceeds from step 1808 to step 1810 in which at least one image is rendered using the first mesh model. As part of the image rendering performed in step 1810, the first UV map is used to determine how to wrap an image included in the received image content on to the first mesh model to generate an image which can be displayed and viewed by a user. Each of the left and right eye images of a stereoscopic pair will be, in some embodiments, rendered individually and may be displayed on different portions of a display so that different images are viewed by the left and right eyes allowing for images to be perceived by the user as having a 3D effect. The rendered images are normally displayed to the user after rendering, e.g., via a display device which in some embodiments is a cell phone display mounted in a helmet which can be worn on a person's head, e.g., as a head mounted display device.

While multiple images may be rendered and displayed over time as part of step 1810, at some point during the event being captured and streamed for playback, a change in the environment may occur such as a curtain being lowered, a wall of a stage being moved, a dome on a stadium being opened or closed. Such events may, and in various embodiments will be, detected by environmental measurements being performed. In response to detecting a change in the environment, a new 3D mesh model and UV map may be generated by the system processing the captured images and/or environmental measurements.

In step 1814, updated mesh model information is received. The updated mesh model, in some cases, includes updated mesh model information, e.g., new node points, generated based on measurement of a portion of the environment. The measurements may correspond to the same portion of the environment to which the earlier measurements for the first mesh model correspond and/or the new measurements may at include measurements of the portion of the environment. Such measurements maybe, and sometimes are, based on environmental depth measurements relative to the camera rig position obtained using a light field camera, e.g., such as the ones illustrated in the preceding figures. In some embodiments updated mesh model information including at least some updated mesh model information generated based on measurements of at least the portion said environment using said light field camera at a second time, e.g., a time period after the first time period.

The updated mesh model information received in step 1814 may be in the form of a complete updated mesh model or in the form of difference information indicating changes to be made to the first mesh model to form the updated mesh model. Thus in some embodiments updated mesh model information is difference information indicating a difference between said first mesh model and an updated mesh model. In optional step 1815 which is performed when model difference information is received, the playback device generates the updated mesh model from the first mesh model and the received difference information. For example, in step 1815 nodes not included in the updated mesh model may be deleted from the set of information representing the first mesh model and replaced with new nodes indicated by the mesh module update information that was received to thereby create the updated mesh model. Thus in some embodiments the updated mesh model information includes information indicating changes to be made to the first mesh model to generate an updated mesh model. In some embodiments the updated mesh model information provides new mesh information for portions of the 3D environment which have changed between the first and second time periods. In some embodiments the updated mesh model information includes at least one of: i) new sets of mesh coordinates for at least some nodes in the first mesh model information, the new coordinates being intended to replace coordinates of corresponding nodes in the first mesh model; or ii) a new set of coordinate triples to be used for at least a portion of the mesh model in place of a previous set of coordinate triples, the new set of coordinate triples including the same or a different number of coordinate triples than the previous set of coordinate triples to be replaced.

In addition to receiving updated mesh model information the playback device may receive updated map information. This is shown in step 1816. The updated map information maybe in the form of a complete new UV map to be used to map images to the updated mesh model or in the form of difference information which can be used in combination with the first map to generate an updated map. While an updated UV map need not be supplied with each 3D model update, UV map updates will normally occur at the same time as the model updates and will occur when a change in the number of nodes occurs resulting in a different number of segments in the 3D mesh model. Updated map information need not be provided if the number of segments and nodes in the 3D model remain unchanged but will in many cases be provided even if there is no change in the number of model segments given that the change in the environmental shape may merit a change in how captured images are mapped to the 3D mesh model being used.

If difference information is received rather than a complete UV map, the operation proceeds from step 1816 to step 1818. In step 1818, which is used in the case where map difference information is received in step 1816, an updated map is generated by applying the map difference information included in the received updated map information to the first map. In the case where a complete updated UV map is received in step 1816 there is no need to generate the updated map from difference information since the full updated map is received.

In parallel with or after the receipt and/or generation of the updated 3D mesh model and/or updated UV map, additional image content is received in step 1820. The additional image content, may and sometimes does correspond to, for example, a second portion of an event which follows a first event segment to which the first 3D model corresponded. Operation proceeds from step 1820 to step 1822. In step 1822 the additional image content is rendered. As part of the image rendering performed in step 1822, the updated 3D model is used to render at least some of the received additional image content as indicated in step 1824. The update UV map will also be used as indicated by step 1826 when it is available. When no updated UV map has been received or generated, the image rendering in step 1822 will use the old, e.g., first UV map as part of the rendering process. Images rendered in step 1822 are output for display.

The updating of the 3D model and/or UV map may occur repeatedly during a presentation in response to environmental changes. This on going potential for repeated model and UV map updates is represented by arrow 1827 which returns processing to step 1814 where additional updated mesh model information may be received. With each return to step 1814, the current mesh model and UV model is treated as the first mesh model for purposes of generating a new updated mesh model and/or UV map in the case where the update includes difference information.

The processing described with regard to FIG. 17 is performed under control of a playback device processor. Accordingly, in some embodiments the playback device includes a processor configured to control the playback device to implement the steps shown in FIG. 17. The transmission and receiving steps are performed via the interfaces (which include transmitters and receivers) of the playback devices.

In some embodiments the playback device includes instructions which, when executed by a processor of the playback device, control the playback device to implemented the steps shown in FIG. 17. Separate processor executable code can be and sometimes is included for each of the steps shown in FIG. 17. In other embodiments a circuit is included in the playback device for each of the individual steps shown in FIG. 17.

FIG. 18 illustrates an exemplary playback device, e.g., system, 1900 that can be used to receive, decode and display the content streamed by one or more sub-systems of the system 1400 of FIG. 12, e.g., such as the processing system 1408/1700. The exemplary rendering and playback system 1900 may be used as any of the rendering and playback devices shown in FIG. 12. In various embodiments the playback system 1900 is used to perform the various steps illustrated in flowchart 1800 of FIG. 17.

The rendering and playback system 1900 in some embodiments include and/or coupled to 3D head mounted display 1905. The system 1900 includes the ability to decode the received encoded image data and generate 3D image content for display to the customer. The playback system 1900 in some embodiments is located at a customer premise location such as a home or office but may be located at an image capture site as well. The playback system 1900 can perform signal reception, decoding, 3D mesh model updating, rendering, display and/or other operations in accordance with the invention.

The playback system 1900 includes a display 1902, a display device interface 1903, a user input interface device 1904, input/output (I/O) interface 1906, a processor 1908, network interface 1910 and a memory 1912. The various components of the playback system 1900 are coupled together via bus 1909 which allows for data to be communicated between the components of the system 1900.

While in some embodiments display 1902 is included as an optional element as illustrated using the dashed box, in some embodiments an external display device 1905, e.g., a head mounted stereoscopic display device, can be coupled to the playback system 1900 via the display device interface 1903. The head mounted display 1202 maybe implemented using the OCULUS RIFT™ VR (virtual reality) headset which may include the head mounted display 1202. Other head mounted displays may also be used. The image content is presented on the display device of system 1900, e.g., with left and right eyes of a user being presented with different images in the case of stereoscopic content. By displaying different images to the left and right eyes on a single screen, e.g., on different portions of the single screen to different eyes, a single display can be used to display left and right eye images which will be perceived separately by the viewer's left and right eyes. While various embodiments contemplate a head mounted display to be used in system 1900, the methods and system can also be used with non-head mounted displays which can support 3D image.

The operator of the playback system 1900 may control one or more parameters and/or provide input via user input device 1904. The input device 1904 may be, and in some embodiments is, e.g., keypad, touch screen, or similar device that may be used for inputting information, data and/or instructions.

Via the I/O interface 1906 the playback system 1900 may be coupled to external devices and exchange information and signaling with such external devices. In some embodiments via the I/O interface 1906 the playback system 1900 receives images captured by various cameras, e.g., stereoscopic camera pairs and/or light field cameras, receive 3D mesh models and UV maps.

The memory 1912 includes various modules, e.g., routines, which when executed by the processor 1908 control the playback system 1900 to perform operations in accordance with the invention. The memory 1912 includes control routines 1914, a user input processing module 1916, a head position and/or viewing angle determination module 1918, a decoder module 1920, a stereoscopic image rendering module 1922 also referred to as a 3D image generation module, a 3D mesh model update module 1924, a UV map update module 1926, received 3D mesh model 1928, received UV map 1930, and data/information including received encoded image content 1932, decoded image content 1934, updated 3D mesh model information 1936, updated UV map information 1938, updated 3D mesh model 1940, updated UV map 1940 and generated stereoscopic content 1934.

The processor 1908, e.g., a CPU, executes routines 1914 and uses the various modules to control the system 1900 to operate in accordance with the invention. The processor 1908 is responsible for controlling the overall general operation of the system 1100. In various embodiments the processor 1108 is configured to perform functions that have been discussed as being performed by the rendering and playback system 1900.

The network interface 1910 includes a transmitter 1911 and a receiver 1913 which allows the playback system 1900 to be able to receive and/or communicate information to an external device over a communications network, e.g., such as communications network 1450. In some embodiments the playback system 1900 receives, e.g., via the interface 1910, image content 1932, 3D mesh model 1928, UV map 1930, updated mesh model information 1936, updated UV map information 1938 from the processing system 1700 over the communications network 1450. Thus in some embodiments the playback system 1900 receives, via the interface 1910, information communicating a first mesh model, e.g., the 3D mesh model 1928, of a 3D environment generated based on measurements of at least a portion of the environment made using a light field camera at a first time. The playback system 1900 in some embodiments further receives via the interface 1910, image content, e.g., frames of left and right eye image pairs.

The control routines 1914 include device control routines and communications routines to control the operation of the system 1900. The request generation module 1916 is configured to generate request for content, e.g., upon user selection of an item for playback. The received information processing module 1917 is configured to process information, e.g., image content, audio data, environmental models, UV maps etc., received by the system 1900, e.g., via the receiver of interface 1906 and/or 1910, to recover communicated information that can be used by the system 1900, e.g., for rendering and playback. The head position and/or viewing angle determination module 1918 is configured to determine a current viewing angle and/or a current head position, e.g., orientation, of the user, e.g., orientation of the head mounted display, and in some embodiment report the determined position and/or viewing angle information to the processing system 1700.

The decoder module 1920 is configured to decode encoded image content 1932 received from the processing system 1700 or the camera rig 1402 to produce decoded image data 1934. The decoded image data 1934 may include decoded stereoscopic scene and/or decoded scene portions.

The 3D image renderer 1922 uses decoded image data to generate 3D image content in accordance with the features of the invention for display to the user on the display 1902 and/or the display device 1905. In some embodiments the 3D image renderer 1922 is configured to render, using a first 3D mesh model at least some of received image content. In some embodiments the 3D image renderer 1922 is further configured to use a first UV map to determine how to wrap an image included in received image content onto the first 3D mesh model.

The 3D mesh model update module 1924 is configured to update a received first 3D mesh model 1928 (e.g., initially received mesh model) using received updated mesh model information 1936 to generate an updated mesh model 1940. In some embodiments the received updated mesh model information 1936 includes mesh model difference information reflecting the changes with respect to a previous version of the 3D mesh model received by the playback device 1900. In some other embodiments the received updated mesh model information 1936 includes complete information for generating a full complete 3D mesh model which is then output as the updated mesh model 1940.

The UV map update module 1926 is configured to update a received first UV map 1930 (e.g., initially received UV map) using received updated UV map information 1938 to generate an updated UV map 1942. In some embodiments the received updated UV map information 1938 includes difference information reflecting the changes with respect to a previous version of the UV map received by the playback device 1900. In some other embodiments the received updated UV map information 1938 includes information for generating a full complete UV map which is then output as the updated UV map 1942.

In various embodiments when the 3D mesh model and/or UV map is updated in accordance with the invention, 3D image rendering module 1922 is further configured to render, using a updated mesh model, at least some of the image content, e.g., additional image content. In some such embodiments the 3D image rendering module 1922 is further configured use the updated UV map to determine how to wrap an image included in the image content to be rendered onto the updated 3D mesh model. The generated stereoscopic image content 1944 is the output of the 3D image rendering module 1922.

In some embodiments some of the modules are implemented, e.g., as circuits, within the processor 1908 with other modules being implemented, e.g., as circuits, external to and coupled to the processor. Alternatively, rather than being implemented as circuits, all or some of the modules may be implemented in software and stored in the memory of the playback device 1900 with the modules controlling operation of the playback device 1900 to implement the functions corresponding to the modules when the modules are executed by a processor, e.g., processor 1908. In still other embodiments, various modules are implemented as a combination of hardware and software, e.g., with a circuit external to the processor 1908 providing input to the processor 1908 which then under software control operates to perform a portion of a module's function.

While shown in FIG. 18 example to be included in the memory 1912, the modules shown included in the memory 1912 can, and in some embodiments are, implemented fully in hardware within the processor 1908, e.g., as individual circuits. In other embodiments some of the elements are implemented, e.g., as circuits, within the processor 1108 with other elements being implemented, e.g., as circuits, external to and coupled to the processor 1108. As should be appreciated the level of integration of modules on the processor and/or with some modules being external to the processor may be one of design choice.

While shown in the FIG. 18 embodiment as a single processor 1908, e.g., computer, within device 1900, it should be appreciated that processor 1908 may be implemented as one or more processors, e.g., computers. When implemented in software, the modules include code, which when executed by the processor 1908, configure the processor, e.g., computer, to implement the function corresponding to the module. In some embodiments, processor 1908 is configured to implement each of the modules shown in memory 1912 in FIG. 18 example. In embodiments where the modules are stored in memory 1912, the memory 1912 is a computer program product, the computer program product comprising a computer readable medium, e.g., a non-transitory computer readable medium, comprising code, e.g., individual code for each module, for causing at least one computer, e.g., processor 1908, to implement the functions to which the modules correspond.

As should be appreciated, the modules illustrated in FIG. 18 control and/or configure the system 1900 or elements therein respectively such as the processor 1908 to perform the functions of corresponding steps of the methods of the present invention, e.g., such as those illustrated and/or described in the flowchart 1800.

In one exemplary embodiment the processor 1908 is configured to control the playback device 1900 to: receive, e.g., via interface 1910, information communicating a first mesh model of a 3D environment generated based on measurements of at least a portion of said environment made using a light field camera at a first time; receive, e.g., via the interface 1910, image content; and render, using said first mesh model at least some of the received image content.

In some embodiments the processor is further configured to control the playback device to receive, e.g., via the interface 1910, updated mesh model information, said updated mesh model information including at least some updated mesh model information generated based on measurements of at least the portion said environment using said light field camera at a second time. In some embodiments the updated mesh model information communicates a complete updated mesh model.

In some embodiments the processor is further configured to control the playback device to: receive additional image content; and render, using said updated mesh model information, at least some of the received additional image content.

In some embodiments the processor is further configured to control the playback device to: receive (e.g., via the interface 1910 or 1906), a first map mapping a 2D image space to said first mesh model; and use said first map to determine how to wrap an image included in said received image content onto said first mesh model as part of being configured to render, using said first mesh model, at least some of the received image content.

In some embodiments the processor is further configured to control the playback device to: receive (e.g., via the interface 1910 or 1906) updated map information corresponding to said updated mesh model information; and use said updated map information to determine how to wrap an additional image included in said received additional image content onto said updated mesh model as part of being configured to render, using said updated mesh model information, at least some of the received additional image content.

In some embodiments the updated map information includes map difference information. In some such embodiments the processor is further configured to control the playback device to: generate an updated map by applying said map difference information to said first map to generate an updated map; and use said updated map to determine how to wrap an additional image included in said received additional image content onto said updated mesh model as part of rendering, using said updated mesh model information, at least some of the received additional image content.

While steps are shown in an exemplary order it should be appreciated that in many cases the order of the steps may be altered without adversely affecting operation. Accordingly, unless the exemplary order of steps is required for proper operation, the order of steps is to be considered exemplary and not limiting.

While various embodiments have been discussed, it should be appreciated that not necessarily all embodiments include the same features and some of the described features are not necessary but can be desirable in some embodiments.

While various ranges and exemplary values are described the ranges and values are exemplary. In some embodiments the ranges of values are 20% larger than the ranges discussed above. In other embodiments the ranges are 20% smaller than the exemplary ranges discussed above. Similarly, particular values may be, and sometimes are, up to 20% larger than the values specified above while in other embodiments the values are up to 20% smaller than the values specified above. In still other embodiments other values are used.

FIG. 19 illustrates an exemplary 3D mesh model 2000 that may be used in various embodiments with a plurality of nodes illustrated as the point of intersection of lines used to divide the 3D model into segments. Note that the model of FIG. 19 is shown in 3D space and can be expressed as a set of [X,Y,Z] coordinates defining the location of the nodes in the mesh in 3D space assuming the shape of the segments is known or the rules for interconnecting the nodes is known or defined in the 3D model. In some embodiments the segments are predetermined to have the same number of sides with each node connecting to a predetermined number of adjacent nodes by straight lines. In the FIG. 19 example the top portion of the model 2000 is a set of triangular segments while the side portions are formed by a plurality of four sided segments. Such a configuration, e.g., top portion being formed of 3 sided segments and a side portion formed by 4 sided segments may be included in the information forming part of the 3D model or predetermined. Such information is provided to the customer rendering and playback devices along with or as part of the mesh model information.

FIG. 20 shows an exemplary UV map 2002 which may be used in mapping a frame in what is sometimes referred to as 2D UV space to the 3D model 2000 shown in FIG. 19. Note that the UV map 2002 includes the same number of nodes and segments as in the 3D model 2000 with a one to one mapping relationship. Frames which provide what is sometimes referred to as texture, but which normally include content of images captured from the vantage point of a camera rig in a real environment, at a location corresponding to the position [0, 0, 0] within the 3D model 2000 of the simulated environment, may be applied, e.g., wrapped, on to the 3D model 2000 in accordance with the map 2002 as part of an image rendering operation.

In FIGS. 19 and 20, exemplary node P which is shown as a dot for emphasis, like each of the other mesh nodes, appears in both the UV map 2002 and the 3D model 2000. Note that the node P[X, Y, Z] corresponds to the node P[U,V], where X, Y, Z specify the position of node P in X, Y, Z space and U,V specify the location of the corresponding node P in the two dimensional space. Each U,V pair represents the X, Y of a single pixel of the 2D image texture, e.g., a frame. Surrounding pixels are mapped from the 2D frame to the 3D mesh during the rendering process by interpolating between nearby U,V pairs.

The techniques of various embodiments may be implemented using software, hardware and/or a combination of software and hardware. Various embodiments are directed to apparatus, e.g., a image data capture and processing systems. Various embodiments are also directed to methods, e.g., a method of image capture and/or processing image data. Various embodiments are also directed to a non-transitory machine, e.g., computer, readable medium, e.g., ROM, RAM, CDs, hard discs, etc., which include machine readable instructions for controlling a machine to implement one or more steps of a method.

Various features of the present invention are implemented using modules. Such modules may, and in some embodiments are, implemented as software modules. In other embodiments the modules are implemented in hardware. In still other embodiments the modules are implemented using a combination of software and hardware. In some embodiments the modules are implemented as individual circuits with each module being implemented as a circuit for performing the function to which the module corresponds. A wide variety of embodiments are contemplated including some embodiments where different modules are implemented differently, e.g., some in hardware, some in software, and some using a combination of hardware and software. It should also be noted that routines and/or subroutines, or some of the steps performed by such routines, may be implemented in dedicated hardware as opposed to software executed on a general purpose processor. Such embodiments remain within the scope of the present invention. Many of the above described methods or method steps can be implemented using machine executable instructions, such as software, included in a machine readable medium such as a memory device, e.g., RAM, floppy disk, etc. to control a machine, e.g., general purpose computer with or without additional hardware, to implement all or portions of the above described methods. Accordingly, among other things, the present invention is directed to a machine-readable medium including machine executable instructions for causing a machine, e.g., processor and associated hardware, to perform one or more of the steps of the above-described method(s).

Some embodiments are directed a non-transitory computer readable medium embodying a set of software instructions, e.g., computer executable instructions, for controlling a computer or other device to encode and compresses stereoscopic video. Other embodiments are embodiments are directed a computer readable medium embodying a set of software instructions, e.g., computer executable instructions, for controlling a computer or other device to decode and decompresses video on the player end. While encoding and compression are mentioned as possible separate operations, it should be appreciated that encoding may be used to perform compression and thus encoding may, in some include compression. Similarly, decoding may involve decompression.

In various embodiments a processor of a processing system is configured to control the processing system to perform the method steps performed by the exemplary described processing system. In various embodiments a processor of a playback device is configured to control the playback device to implement the steps, performed by a playback device, of one or more of the methods described in the present application.

Numerous additional variations on the methods and apparatus of the various embodiments described above will be apparent to those skilled in the art in view of the above description. Such variations are to be considered within the scope.

Claims

1. A method of operating a playback device, the method comprising:

receiving information communicating a first mesh model of a 3D environment generated based on measurements of a portion of said environment made using a light field camera at a first time;

receiving image content; and

rendering, using said first mesh model at least some of the received image content.

2. The method of claim 1, further comprising:

receiving updated mesh model information, said updated mesh model information including at least some updated mesh model information generated based on measurements of said portion of said environment using said light field camera at a second time.

3. The method of claim 2, further comprising:

receiving additional image content; and

rendering, using said updated mesh model information at least some of the received additional image content.

4. The method of claim 3, wherein said information communicating a first mesh model of the 3D environment includes information defining a complete mesh model.

5. The method of claim 4, wherein said updated mesh model information communicates a complete updated mesh model.

6. The method of claim 5, wherein said updated mesh model information provides new mesh information for portions of said 3D environment which have changed between said first and second time periods.

7. The method of claim 6, wherein said updated mesh model information is difference information indicating a difference between said first mesh model and an updated mesh model.

8. The method of claim 7, wherein said first mesh model information includes a first set of coordinate triples, each coordinate triple indicating a coordinate in X, Y, Z space of a node in the first mesh model.

9. The method of claim 8, wherein said updated mesh model information includes at least one of: i) new sets of mesh coordinates for at least some nodes in said first mesh model information, said new coordinates being intended to replace coordinates of corresponding nodes in said first mesh model; or ii) a new set of coordinate triples to be used for at least a portion of said first mesh model in place of a previous set of coordinate triples, said new set of coordinate triples including the same or a different number of coordinate triples than the previous set of coordinate triples to be replaced.

10. The method of claim 9, further comprising:

receiving a first map mapping a 2D image space to said first mesh model; and

wherein rendering, using said first mesh model at least some of the received image content, includes using said first map to determine how to wrap an image included in said received image content onto said first mesh model.

11. The method of claim 10, further comprising:

receiving updated map information corresponding to said updated mesh model information; and

wherein rendering, using said updated mesh model information at least some of the received additional image content, includes using said updated map information to determine how to wrap an additional image included in said received additional image content onto said updated mesh model.

12. The method of claim 11, wherein the updated map information includes map difference information, the method further comprising:

generating an updated map by applying said map difference information to said first map to generate an updated map; and

wherein rendering, using said updated mesh model information, at least some of the received additional image content, includes using said updated map to determine how to wrap an additional image included in said received additional image content onto said updated mesh model.

13. A computer readable medium including computer executable instructions which, when executed by a computer, control the computer to:

receive information communicating a first mesh model of a 3D environment generated based on measurements of a portion of said environment made using a light field camera at a first time;

receive image content; and

render, using said first mesh model at least some of the received image content.

14. A playback apparatus, comprising:

a processor configured to control said playback apparatus to: receive information communicating a first mesh model of a 3D environment generated based on measurements of a portion of said environment made using a light field camera at a first time; receive image content; and render, using said first mesh model at least some of the received image content.

15. The playback apparatus of claim 14, wherein the processor is further configured to control the playback apparatus to:

receive updated mesh model information, said updated mesh model information including at least some updated mesh model information generated based on measurements of the portion of said environment using said light field camera at a second time.

16. The playback apparatus of claim 15, wherein the processor is further configured to control the playback apparatus to:

receive additional image content; and

render, using said updated mesh model information, at least some of the received additional image content.

17. The playback apparatus of claim 14, wherein the processor is further configured to control the playback apparatus to:

receive a first map mapping a 2D image space to said first mesh model; and

use said first map to determine how to wrap an image included in said received image content onto said first mesh model as part of being configured to render, using said first mesh model, at least some of the received image content.

18. The playback apparatus of claim 17, wherein the processor is further configured to control the playback apparatus to:

receive updated map information corresponding to said updated mesh model information; and

use said updated map information to determine how to wrap an additional image included in said received additional image content onto said updated mesh model as part of being configured to render, using said updated mesh model information, at least some of the received additional image content.

19. The playback apparatus of claim 18, wherein the updated map information includes map difference information; and

wherein the processor is further configured to control the playback apparatus to: generate an updated map by applying said map difference information to said first map to generate an updated map; and use said updated map to determine how to wrap an additional image included in said received additional image content onto said updated mesh model as part of rendering, using said updated mesh model information, at least some of the received additional image content.

20. The playback apparatus of claim 16, wherein said information communicating a first mesh model of the 3D environment includes information defining a complete mesh model.