GENERATION, TRANSMISSION AND RENDERING OF VIRTUAL REALITY MULTIMEDIA
A method of generating virtual reality data includes: obtaining point cloud data, the point cloud data including colour and three-dimensional position data for each of a plurality of points corresponding to locations in a capture volume; generating primary image data containing (i) a first projection of a first subset of the points into a two-dimensional frame of reference, and (ii) for each point of the first subset, depth data derived from the corresponding position data; generating secondary image data containing (i) a second projection of a second subset of the points into the two-dimensional frame of reference, the second projection overlapping with at least part of the first projection in the two-dimensional frame of reference, and (ii) for each point of the second subset, depth data derived from the corresponding position data; and storing the primary image data and the secondary image data in a memory.
This application claims priority from PCT patent application no. PCT/CA2015/000306, filed May 13, 2015 and entitled “Method, System And Apparatus For Generation And Playback Of Virtual Reality Multimedia”, which is incorporated herein by reference.
FIELDThe specification relates generally to processing techniques for multimedia data, and specifically to the generation, transmission and rendering of virtual reality multimedia.
BACKGROUNDVirtual reality display devices, such as the GearVR and the Oculus Rift, enable viewing of content such as video, games and the like in a virtual reality environment, in which the display adapts to the user's movements. Various challenges confront implementations of virtual reality display. For example, particularly in the case of captured video, capturing video from a sufficient variety of viewpoints to account for potential movements of the operator of the display can be difficult, particularly for large or complex scenes. In addition, the resulting volume of captured data can be large enough to render storing, transmitting and processing the data prohibitively costly in terms of computational resources.
SUMMARYAccording to an aspect of the specification, a method of generating virtual reality multimedia data is provided, comprising: obtaining point cloud data at a processor of a generation computing device, the point cloud data including colour and three-dimensional position data for each of a plurality of points corresponding to locations in a capture volume; generating, at the processor, primary image data containing (i) a first projection of a first subset of the points into a two-dimensional frame of reference, and (ii) for each point of the first subset, depth data derived from the corresponding position data; generating, at the processor, secondary image data containing (i) a second projection of a second subset of the points into the two-dimensional frame of reference, the second projection overlapping with at least part of the first projection in the two-dimensional frame of reference, and (ii) for each point of the second subset, depth data derived from the corresponding position data; and storing the primary image data and the secondary image data in a memory connected to the processor.
Embodiments are described with reference to the following figures, in which:
System 100 includes a generation computing device 104, also referred to herein as generation device 104. Generation device 104, as will be discussed in detail below, is configured to generate the above-mentioned virtual reality multimedia data for transmission to, and rendering at, a client computing device 108, also referred to herein as client device 108. Client device 108 is configured to receive the virtual reality multimedia data generated by generation device 104, and to render (that is, play back) the virtual reality multimedia data. The virtual reality multimedia data can be transferred between generation device 104 and client device 108 in a variety of ways. For example, the multimedia data can be transmitted to client device 108 via a network 112. Network 112 can include any suitable combination of wired and wireless networks, including but not limited to a Wide Area Network (WAN) such as the Internet, a Local Area Network (LAN) such as a corporate data network, cell phone networks, WiFi networks, WiMax networks and the like.
Transmission of the multimedia data to client device 108 via network 112 need not occur directly from generation device 104. For example, the multimedia data can be transmitted from generation device 104 to an intermediate device via network 112, and subsequently to client device 108. In other embodiments, the multimedia data can be sent from generation device 104 to a portable storage medium (e.g. optical discs, flash storage and the like), and the storage medium can be physically transported to client device 108.
Generation device 104 can be based on any suitable computing environment, such as a server or personal computer. In the present example, generation device 104 is a desktop computer housing one or more processors, referred to generically as a processor 116. The nature of processor 116 is not particularly limited. For example, processor 116 can include one or more general purpose central processing units (CPUs), and can also include one or more graphics processing units (GPUs). The performance of the various processing tasks discussed herein can be shared between such CPUs and GPUs, as will be apparent to a person skilled in the art.
Processor 116 is interconnected with a non-transitory computer readable storage medium such as a memory 120. Memory 120 can be any suitable combination of volatile (e.g. Random Access Memory (“RAM”)) and non-volatile (e.g. read only memory (“ROM”), Electrically Erasable Programmable Read Only Memory (“EEPROM”), flash memory, magnetic computer storage device, or optical disc) memory. In the present example, memory 120 includes both a volatile memory and a non-volatile memory. Processor 116 and memory 120 are generally comprised of one or more integrated circuits (ICs), and can have a wide variety of structures, as will now be apparent to those skilled in the art.
Generation device 104 can also include one or more input devices 124 interconnected with processor 116. Input device 124 can include any suitable combination of a keyboard, a mouse, a microphone, and the like. Such input devices are configured to receive input and provide data representative of such input to processor 116. For example, a keyboard can receive input from a user in the form of the depression of one or more keys, and provide data identifying the depressed key or keys to processor 116.
Generation device 104 can also include one or more output devices interconnected with processor 116, such as a display 128 (e.g. a Liquid Crystal Display (LCD), a plasma display, an Organic Light Emitting Diode (OLED) display, a Cathode Ray Tube (CRT) display). Other output devices, such as speakers (not shown), can also be present. Processor 116 is configured to control display 128 to present images to an operator of generation device 104. Generation device 104 also includes one or more network interfaces interconnected with processor 116, such as a network interface 132, which allows generation device 104 to connect to other computing devices (e.g. client device 108) via network 112. Network interface 132 thus includes the necessary hardware (e.g. radios, network interface controllers, and the like) to communicate over network 112.
As noted above, generation device 104 is configured to generate the multimedia data to be provided to client device 108. To that end, generation device is connected to, or houses, or both, one or more sources of data to be employed in the generation of virtual reality multimedia data. The sources of such raw data can include a multimedia capture apparatus 134. In general, capture apparatus 134 captures video (with or without accompanying audio) of an environment or scene and provides the captured data to generation device 104. Capture apparatus 134 will be described below in greater detail. The sources of raw data can also include, in some embodiments, an animation application 135 (e.g. a three-dimensional animation application) stored in memory 120 and executable by processor 116 to create the raw data. In other words, the virtual reality multimedia data can be generated from raw data depicting a virtual scene (via application 135) or from raw data depicting a real scene (via capture apparatus 134).
Client device 108 can be based on any suitable computing environment, such as a personal computer (e.g. a desktop or laptop computer), a mobile device such as a smartphone, a tablet computer, and the like. Client device 108 includes a processor 136 interconnected with a memory 140. Client device 108 can also include an input device 144, a display 148 and a network interface 152. Processor 136, memory 140, input device 144, display 148 and network interface 152 can be substantially as described above in connection with the corresponding components of generation device 108. As will be discussed in greater detail below, in some embodiments the components of client device 108, although functionally similar to those of generation device 104, may have limited computational resources relative to generation device 104. For example, processor 136 can include a CPU and a GPU that, due to power, thermal envelope or physical size constraints (or a combination thereof), are able to process a smaller volume of data in a given time period than the corresponding components of generation device 104. As noted in connection with generation device 104, the CPU and GPU of client device 108 (collectively referred to as processor 136) can share computational tasks between them, as will be apparent to those skilled in the art. In certain situations, however, as will be described below, specific computational tasks are assigned specifically to one or the other of the CPU and the GPU.
In addition, system 100 includes a virtual reality display 156 connected to processor 136 of client device 108 via any suitable interface. Virtual reality display 156 includes any suitable device comprising at least one display and a mechanism to track movements of an operator. For example, virtual reality display 156 can be a head-mounted display device with head tracking, such as the Oculus Rift from Oculus VR, Inc. or the Gear VR from Samsung. Virtual reality display 156 can include a processor, memory, communication interfaces, displays and the like beyond those of client device 108, in some embodiments. In other embodiments, certain components of client device 108 can act as corresponding components for virtual reality display 156. For example, the above-mentioned Gear VR device mounts a mobile device such as a smart phone, and employs the display (e.g. display 148) and processor (e.g. processor 136) of the smart phone. In any event, client device 108 is configured to control virtual reality display 156 to render the virtual reality multimedia received from generation device 104.
In general, generation device 104 is configured, via the execution by processor 116 of a virtual reality data generation application 160 consisting of computer readable instructions maintained in memory 120, to receive source data (also referred to as raw data) from capture apparatus 134 or application 135 (or a combination thereof), and to process the source data to generate virtual reality multimedia data packaged for transmission to client device 108. Client device 108, in turn, is configured via the execution by processor 136 of a virtual reality playback application 164 consisting of computer readable instructions maintained in memory 140, to receive the virtual reality multimedia data generated by generation device 104, and process the virtual reality multimedia data to render a virtual reality scene via virtual reality display 156. Those skilled in the art will appreciate that in some embodiments, the functionality of the above-described applications (e.g. applications 135, 160 and 164) may be implemented using pre-programmed hardware or firmware elements (e.g., application specific integrated circuits (ASICs), electrically erasable programmable read-only memories (EEPROMs), etc.), or other related components.
Turning now to
Beginning at block 205, generation device 104 is configured to obtain point cloud data. The point cloud data includes colour and three-dimensional position data for each of a plurality of points corresponding to locations in a capture volume. An example illustration of point cloud data obtained at block 205 is shown in
As noted above, each point 308 in point cloud data 306 includes colour data and three-dimensional position data. The colour data indicates the colour of the point 308 in any suitable representation of any suitable colour model (e.g. RGB, CMYK, HSV, HSL, YUV and the like). The position data indicates the position of the point 308 within point cloud 306, and thus corresponds to a certain location in capture volume 304. The nature of the position data is also not particularly limited. For example, the position data can be in the form of a set of Cartesian coordinates (e.g. distances along x, y, and z axes that intersect at the center of point cloud data 306). In another example, the position data can be in the form of spherical coordinates (e.g. a radial distance, a polar angle and an azimuthal angle, all relative to a center of point cloud data 306).
The point cloud data obtained at block 205 can be stored in any of a variety of data structures, including, for example, a table containing a plurality of records, each corresponding to one point 308 and containing the colour data and position data for that point. A variety of other data structures will also occur to those skilled in the art.
The manner in which point cloud data 306 is obtained at block 205 is not particularly limited. As noted earlier, point cloud data can be generated by generation device 104 via the execution of animation application 135, in which case obtaining point cloud data 306 can include retrieving the point cloud data from memory 120. In other embodiments, in which capture volume 304 is a volume of real space (rather than a virtual volume generation via application 135), obtaining point cloud data at block 205 includes receiving and processing data from capture apparatus 134. A description of capture apparatus 134 itself follows, with reference to
Capture apparatus 134 includes a plurality of capture nodes arranged in or around capture volume 304. Each node, placed in a distinct position from the other nodes, generates colour and depth data for a plurality of points in its field of view. In the present example, the field of view for each node is about three hundred and sixty degrees by about three hundred sixty degrees (that is, each node captures data in a full sphere). However, in other embodiments nodes may have reduced fields of view. The nature of the nodes is not particularly limited. For example, each node can include a camera and a depth sensor (e.g. a lidar sensor). In some embodiments, each node may include a plurality of cameras and depth sensors to achieve the above-mentioned field of view. An example of a device that may be employed for each node is the Bublcam by Bubl Technology Inc. of Toronto, Canada.
A wide variety of node arrangements may be employed to capture the raw data to be processed by generation device 104 in order to obtain point cloud data 306 at block 205. In general, greater numbers of nodes allow for a greater level of detail to be captured, particularly in complex scenes. Examples of presently preferred configurations of nodes for capture apparatus 134 are discussed below.
The arrangements of capture apparatus 134 illustrated in
Returning briefly to
At block 510, generation device 104 is configured to register the raw point cloud data received at block 505 to a common frame of reference (i.e. the same coordinate space). For example, each node of capture apparatus 134 can be configured to generate point cloud data in which each point has coordinates (either Cartesian or spherical, as mentioned earlier) centered on the node itself. With the relative locations of the nodes being known, the point cloud data from any given node can be transformed via conventional techniques to a frame of reference centered on the center of capture volume 304.
It will now be apparent that when the sets of raw point cloud data are registered to a common frame of reference, a number of locations within capture volume 304 may be represented multiple times within the co-registered point cloud data. That is, more than one node may capture the same location in capture volume 304. Generation device 104 is therefore configured to collapse fully or partially any overlapping points in the co-registered point cloud data to a smaller number of points, as discussed below.
At block 515 generation device 104 is configured to determine, for each point in the co-registered point cloud data, whether the point overlaps (either exactly or partially) with other points in the common frame of reference. When the determination is negative, generation device 104 proceeds to block 520, at which the co-registered point cloud data is updated with no change being made to the non-overlapping points (in other words, the update may be a null update). When the determination at block 515 is affirmative for any points, however, generation device 104 can be configured to perform block 525. At block 525, generation device 104 is configured to determine whether the difference in colour between the overlapping points identified at block 515 is greater than a predetermined threshold. That is, if different nodes record significantly different appearances for the same location in capture volume 304, that is an indication that the capture volume includes surfaces that are highly reflective, specular or the like.
When the determination at block 525 is negative (e.g. the differences in colour for overlapping points are non-existent or below the above-mentioned threshold), generation device 104 proceeds to block 520 and updates the co-registered point cloud by replacing the overlapping points with a single point. The single point can have a colour value equivalent to an average of the colour values of the original overlapping points, for example.
When the determination at block 525 is affirmative, however, generation device 104 can be configured to create a palette image containing a subset, or all, of the colour values from the overlapping points. A palette image stores a plurality of possible colours for a single point in the co-registered point cloud. The palette image preferably stores possible colours in a two-dimensional array. The colour at the center of the palette image corresponds to the colour of the point when viewed from the center of the point cloud, and colours spaced apart from the center of the palette image in varying directions and at varying distances correspond to the colour of the point when viewed from corresponding directions and distances from the center of the point cloud. In some embodiments, rather than full colour values, the palette image can store only luminance or intensity values, while chrominance or other colour values can be stored in the point itself (along with a reference to the palette image).
At block 520, generation device 104 is then configured to update the co-registered point cloud with an index value pointing to the palette image (which can be stored separately from point cloud data 306), in place of a colour value. In some embodiments, the performance of blocks 525 and 530 can be omitted.
Returning to
In brief, the primary image data generated at block 210 depicts the portions of point cloud data 306 that are visible from a predicted viewpoint of virtual reality display 156. In other words, the primary image data depicts the portions of point cloud data 306 that are expected to be initially visible to the operator of virtual reality display 156. As will now be apparent, from any given viewpoint within point cloud data 306, any object may be occluded by other objects or by the object itself (e.g. the rear surface of an object may be occluded from view by the remainder of that same object). The above-mentioned subset of points in the primary image data correspond to the portions of point cloud data 306 that are visible from the initial viewpoint. Other points in point cloud data 306 that are not visible from the initial viewpoint are not included in the subset.
In general, generation device 104 is configured to generate the primary image data by selecting the above-mentioned subset of points, and for each of the subset of points, determining a projected location in a two-dimensional frame of reference for that point, along with accompanying depth data. An example implementation of block 210 will be discussed below, following the discussion of block 215.
At block 215, generation device 104 is configured to generate secondary image data. The secondary image data includes a projection (distinct from the projection mentioned above in connection with the primary image data) of a second subset of the points in point cloud data 306 into the two-dimensional frame of reference mentioned above. The second subset of points is distinct from the subset of points represented by the primary image data. More specifically, the second subset of points, when projected into the two-dimensional frame of reference, overlaps with at least part of the projection in the primary image data. That is, each of the second subset of points, when projected, occupies a location in the two-dimensional frame of reference that matches the location (in that same frame of reference) of a point in the first subset. The secondary image data also includes, for each point of the second subset, depth data derived from the corresponding position data of that point in point cloud data 306.
In contrast to the primary image data, the secondary image data depicts the portions of point cloud data 306 that are not visible from a predicted initial viewpoint established by virtual reality display 156. Instead, the secondary image data depicts portions of point cloud data 306 that are initially occluded by the primary image data, but may become visible due to movement of the viewpoint through manipulation of virtual reality display 156 by an operator.
As with the primary image data, generation device 104 is configured to generate the secondary image data by selecting the above-mentioned second subset of points, and for each of the second subset of points, determining a projected location in a two-dimensional frame of reference for that point, along with accompanying depth data. In the present example, generation device 104 is configured to perform blocks 210 and 215 in parallel (that is, substantially simultaneously) according to the process depicted in
Referring now to
At block 610, generation device 104 is configured to select a vector (also referred to as a path) for processing. In the example above, in which point cloud data 306 defines a spherical volume (i.e. defined by spherical coordinates), the selection of a vector at block 610 comprises selecting azimuthal and polar angles. In general, at block 610 generation device 104 selects a path extending from the viewpoint selected at block 605, but does not select a depth (e.g. a radial distance when using spherical coordinates) corresponding to that path.
At block 615, generation device 104 is configured to identify the first point in point cloud data 306 that is visible to the selected viewpoint along the selected path or vector. That is, travelling along the selected path from the selected viewpoint, the first point in point cloud data 306 that the selected path intersects is identified, and projected into a two-dimensional frame of reference. Projection at block 615 includes determining two-dimensional coordinates, such as an x and a y coordinate, corresponding to the first visible point in a previously selected two-dimensional frame of reference. The projection can also include determining a depth for the first visible point, which defines the distance (generally in scalar form) from the viewpoint to the first visible point.
A wide variety of two-dimensional frames of reference may be employed at block 615. In the present example, the two-dimensional frame of reference is a cube map. Various features of cube maps, and various techniques for projecting points in three-dimensional space onto two-dimensional faces of cube maps, will be familiar to those skilled in the art. To illustrate the application of cube mapping to the present disclosure, reference is now made to
Referring now to
Returning to
When the determination at block 620 is affirmative, generation device 104 is configured to determine two-dimensional coordinates and depth data for any additional points along the selected path at block 625. As will now be apparent, the two-dimensional coordinates for the additional points are identical to those of the first visible point, and thus there is no need to repeat projection calculations at block 625. Instead, only the depth of such additional points needs to be determined.
At block 630, generation device 104 is configured to determine whether any additional paths remain to be processed. Generation device 104 is configured to process a plurality of paths to generate the primary and secondary image data. The number of paths to be processed is set based on the desired resolution of the primary and secondary image data—a greater number of paths (i.e. a higher-resolution sampling of point cloud data 306) leads to higher resolution image data. When paths remain to be processed, the performance of method 600 returns to block 610 to select the next path. When all paths have been processed, the performance of method 600 instead proceeds to block 635.
At block 635, generation device 104 is configured to store the first visible point projections and corresponding depth values as primary image data, and at block 640, generation device 104 is configured to store the additional point projections and corresponding depth values as secondary image data. Blocks 635 and 640 need not be performed separately after a negative determination at block 630. Instead, the storage operations at blocks 635 and 640 can be integrated with blocks 615 and 625, respectively.
The storage operations of blocks 635 and 640 will be described in greater detail in conjunction with
Each of files 800 and 804 consists of a two-dimensional array. In the case of file 800, the two-dimensional array is an array of pixels, each storing colour data in any suitable format (e.g. HSV). Thus, as illustrated in
Turning to
As seen in
It will now be apparent that the “back” of object 300 is also not visible in the primary image data. In some examples, the back of object 300 would therefore also be depicted in files 900 and 904. In the present example, however for illustrative purposes, the back of object 300 has been omitted from files 900 and 904. More specifically, generation device 104 can be configured, at block 625, to determine whether a fold point is within a predicted range of motion of the viewpoint selected at block 605 before projecting the fold point. That is, generation device 104 can store a predicted maximum travel distance for viewpoint 700, and omit fold points entirely if such points would only become visible if the viewpoint moved beyond the maximum travel distance. In the presently preferred embodiment, however, such determinations are omitted from the generation of secondary image data, and instead addressed at the rendering stage, to be discussed further below.
In the example shown in
Returning to
While files 900 and 904 do not contain explicit position data in the pixels thereof—since such position data is implicit in the pixel array—files 908 and 912 do contain such position data, indicating the position of the colour and depth values of files 908 and 912 within the array of files 800 and 804. This is because the dimensions of files 908 and 912 generally do not match those of files 800 and 804, and thus the position of a data point within file 908 or 912 may not imply a specific position in the array of files 800 and 804. Generation device 104 is configured to perform various processing activities to reduce the volume of position data stored in files 908 and 912.
In general, generation device 104 is configured to identify portions of the two-dimensional frame of reference (that is, the two-dimensional array according to which files 800, 804, 900 and 904 are formatted) that are occupied at least to a threshold fraction by fold points. For any such portions that are identified, generation device 104 is configured to select geometric parameters identifying the portion, and store the geometric parameters along with the colour or depth data for the fold points within the portion (absent individual positional data for each fold point) in files 908 and 912. In other words, generation device 104 is configured to group the fold points into portions such that the locations of those fold points within the two-dimensional array can be represented with a volume of data that does not exceed—and is preferably smaller than—the volume of data required to store the individual coordinates of each fold point.
Generation device 104 can be configured to identify portions of a variety of different types. For example, generation device 104 can be configured to identify any one of, or any combination of, straight lines, curved lines, polygons, circles and the like. Generation device 104 is configured to select a portion, determine the total number of available positions in the two-dimensional array that are contained by that portion, and determine whether at least a threshold fraction of the positions within the portion contain fold data. The threshold fraction can be preconfigured at generation device, or can be determined dynamically based on the selected portion. When the determination is negative (i.e. too few fold points are present within the portion), generation device 104 is configured to select and evaluate a different portion according to the above process. When the determination is affirmative, however, generation device 104 is configured to store geometric parameters corresponding to the portion, as well as colour data or depth data for each fold point within the portion, in files 908 and 912. Having stored the geometric parameters and corresponding colour and depth data, generation device 104 is configured to repeat the above process on the remaining fold points (that is, those not yet assigned to portions) until all fold point data has been stored.
Turning to
Generation device 104 may determine that a portion of face 720 in the shape of a line extending between the two points in face 720 encompasses both of those points. However, storing secondary image data for such a line, as indicated at 1100, requires storing geometric parameters such as a start location and an end location (e.g. the locations of the two fold points themselves), as well as colour (or depth) data for the entire line. As indicated by the darker-coloured points, only two colour or depth values in the line represent fold points. The remaining, light-coloured, points simply contain null values. Thus, the total storage requirements for the line are greater than simply storing the two points individually with location data for each point. In other words, generation device 104 may determine that the number of fold points on the line is below a threshold at which the volume of data required to store the line is lower than the volume of data required to store the individual points. Generation device 104 therefore does not store the line, and may instead select a different portion of face 720 to test.
Referring to face 716, on the other hand, generation device 104 may determine that a polygon having corners at the corners of the top of object 300 is entirely filled with fold point data. Thus, face 716 can be represented in files 908 and 912 by coordinates for the four corners, and a set of colour or depth data without explicitly specified position data. As will now be apparent, this requires less storage space than storing each individual point in the polygon along with its individual coordinates within the array.
As noted earlier, a variety of types of geometric structures are contemplated for storing fold points. These include x-folds, indicating horizontal lines extending across the entirety of either a face of the array or the entire array; y-folds, indicating vertical lines extending across the entirety of either a face of the array or the entire array; partial x- or y-folds, indicating horizontal and vertical lines, respectively, that extend only partially across a face or array and thus are represented by start and end point rather than a single x or y index value. The types of geometric structures also include curved lines (e.g. defined by start points, end points and radii), polygons (e.g. defined by coordinates of the corners of the polygons), and angled lines (e.g. defined by start and end points). Any points that cannot be assigned to portions more efficiently than storing the points individually (that is, any points for which the determinations above remain negative after all other fold data has been stored) can be stored as individual points, also referred to as m-folds.
Returning to
Returning to
When the primary and secondary image data have been stored, generation device 104 proceeds to the next frame at block 645. As will now be apparent, the above process generates primary and secondary image data for a single set of point cloud data 306, which depicts a single frame (i.e. a still image or a frame of a video). Method 600 can therefore be repeated for a plurality of other frames when the virtual reality multimedia data includes video data.
Variations to the processes described above for storing primary and secondary image data are contemplated. For example, rather than selecting the first visible point (i.e. the “shallowest” point) at block 615, generation device can be configured to select the deepest point at block 615, and to detect additional points as those points that are in between the viewpoint and the primary point rather than behind the primary point. In further embodiments, other divisions of image data between the primary image data and the secondary image data can be implemented. For example, instead of dividing point cloud data 306 based on visibility to viewpoint 700 as described above, the primary and secondary image data can be selected based on a predetermined depth threshold. That is, points located at a depth (from viewpoint 700) greater than the threshold can be included in one of the primary and secondary image data, while points located at a depth smaller than the threshold are included in the other of the primary and secondary image data. When this implementation is used, both primary and secondary image data can be stored in structures similar to that shown in
Various other data structures are also contemplated for storing the primary and secondary image data. For example, files 800 and 804 can be subdivided into a plurality of files or other package types, each file corresponding to a single face of cube 704. In further embodiments, individual files may be generated by generation device 104 for each face of cube 704, but each file can contain both colour and depth data rather than colour and depth data being separated into distinct files. In such embodiments, the above-mentioned index can also include data defining the relative position of the face-specific files.
Further variations to the generation process are contemplated. For example, generation device 104 can be configured to employ depth files such as files 804 and 904 as intermediate files, not sent to client device 108 but rather employed to generate an index file. More specifically, generation device 104 can be configured to perform a method 1200, depicted in
Having identified the above-mentioned portion, at block 1215 generation device 104 is configured to discard the identified depth values. The corresponding colour data for the identified points is retained, however. Thus, for certain points in files 800, 900 or 908, the corresponding depth values in files 804, 904 or 912, respectively, are discarded.
Returning to
Thus, through method 1200, generation device 104 replaces depth files (such as file 804) with an index of a subset of the depth values in the original depth files. The index can additionally identify individual points as well as geometric parameters encompassing a plurality of points. That is, each subregion 1404 can include a plurality of index lists, each list containing depth and position data for a different type of geometry (e.g. different point sizes including both small, or single-pixel points, and large, or multi-pixel points, background polygons such as rectangles, other polygons such as triangles, and the like). For example, each subregion 1404 of index 1400 can include a background plane corresponding to the equivalent portion of plane 1320 shown in
Returning to
In addition, generation device 104 can be configured to create additional versions of the primary and secondary image data having lower resolutions than the original versions. For example, generation device 104 can receive an indication from client device 108 of the location and direction of viewpoint 700, and transmit virtual reality multimedia data that includes either down sampled versions or omits entirely the portions of the primary and secondary image data that is not currently visible from the viewpoint location and direction. For example, image data for one face of cube 704 may be transmitted at an original resolution, while the other faces may be transmitted at a lower resolution, or simply omitted. Combinations of the above are also contemplated.
At block 225, client device 108 is configured to receive the data transmitted by generation device 104 (or an intermediary, as noted earlier), and decode the prepared data, based on the standard according to which the data was encoded for transmission at block 220 (e.g. MPEG4). In other words, at block 225 client device 108 is configured to retrieve, from the data received from generation device 104, the primary and secondary image data described above, in the form of files 800 and 804, as well as files 900 and 904 or files 908 and 912 (or variants thereof). Alternatively, client device 108 can receive the above-mentioned index files as discussed in connection with
At block 230, client device 108 is configured to receive a viewpoint position and direction from virtual reality display 156. The position and direction received at block 230 need not match the position of viewpoint 700 discussed above. Viewpoint 700 was employed for projecting point cloud data 306 into two dimensions, and reprojecting the primary and secondary image data into three dimensions to recreate point cloud data 306. The position and direction received at block 230, on the other hand, corresponds to the position and direction of the viewpoint within point cloud data 306 as detected by virtual reality display 156 under the command of an operator. The position and direction of the viewpoint may be detected by way of accelerometers, pupil detection cameras, or any other suitable sensors included in virtual reality display 156.
Upon receipt of the viewpoint position and direction, at block 235 client device 108 is configured to select and render at least a portion of the primary and secondary image data received at block 225, based on the viewpoint position and direction received at block 230. In general, the selection and rendering process includes selecting data from the primary and secondary image data at the CPU of client device 108, and issuing one or more draw calls to a GPU, for causing the GPU to regenerate point cloud data based on the selected image data and control virtual reality display 156. As will be discussed below, client device 108 is configured to implement various processing techniques to reduce the volume of point cloud data to be regenerated and processed to control virtual reality display 156 (i.e. to reduce the computational load on the GPU).
Client device 108 is configured to select a subset of the primary and secondary image data received at block 225, based on the viewpoint position and direction (including a definition of the field of view of the viewpoint, also referred to as the frustum of the viewpoint) received at block 230. For example, client device 108 can be configured to determine which face, or combination of faces of cube 704 are visible based on the viewpoint position. Generally, three or fewer faces will be visible from the viewpoint. Client device 108 can therefore be configured to omit from further processing any primary and secondary image data located on the faces that are not visible to the viewpoint. For example, if the viewpoint has the same location as shown in
In further embodiments, referring to
Following the selection of primary and secondary image data for rendering, client device 108 is configured to transmit the selected colour and depth data for rendering. For example, the CPU of client device 108 can be configured to generate a plurality of draw calls and transmit the draw calls to the GPU of client device 108, or to a GPU or other processor integrated with virtual reality display 156. Response to receiving such data, the GPU (or any other suitable processor connected to virtual reality display 156) is configured to regenerate point cloud data from the selected colour and depth data, and present the regenerated point cloud data at virtual reality display 156.
The data transmitted to the GPU or any other suitable processing hardware at block 1225 can include one or more indices of points or geometries, according to the format of data received from generation device 104. For example, when indices such as those described in connection with
It is contemplated that blocks 230 and 235 are generally performed twice in parallel. Virtual reality display 156 generally includes two distinct displays (corresponding to the eyes of the operator), and thus at block 230 includes receiving two viewpoint positions and block 235 includes selecting and rendering two distinct sets of image data. Having rendered image data at block 235, client device 108 is configured to return to block 230 to receive updated viewpoint positions. In some embodiments, the viewpoint positions can be transmitted to generation device 104, which can perform at least some of the selection activities referred to above, and send the resulting image data to client device 108.
Variations to the above systems and processes are contemplated. For example, rather than capturing, processing and rendering a scene (e.g. capture volume 304) in order to simulate movement of the operator of virtual reality display 156 within the scene, system 100 can also be configured to capture, process and render an object in order to simulate movement of the operator of virtual reality display 156 around the object. Capturing the object to generate point cloud data can be accomplished substantially as described above, however central nodes (e.g. node “x” in
In further variations, rendering computational performance (e.g. at block 235) may be improved by reducing the resolution of the rendered image data based on the proximity of the image data to the center of the viewpoint frustum. For example, image data determined by client device 108 to be near the outer edge of the viewpoint frustum can be reduced in resolution. In an example embodiment, the reduction in resolution can be achieved by replacing a number of small points with a small number of large points, prior to transmission of image data and geometric parameters to the GPU for rendering. In implementations employing the subdivisions shown in
It is also contemplated that the source of the image data described herein can be supplemented or replaced with light field capture data (e.g. obtained from one or more light field cameras in capture apparatus 134). Light field data represents a collection of light rays passing through a volume. Such data can indicate not only position and colour data for a plurality of points, but also properties such as the incident direction of light on the points and the appearance of each point from a plurality of different directions. In some embodiments, the light field data can omit depth data. However, the depth data can be determined from the depth data.
The scope of the claims should not be limited by the embodiments set forth in the above examples, but should be given the broadest interpretation consistent with the description as a whole.
Claims
1. A method of generating virtual reality multimedia data, comprising:
- obtaining point cloud data at a processor of a generation computing device, the point cloud data including colour and three-dimensional position data for each of a plurality of points corresponding to locations in a capture volume;
- generating, at the processor, primary image data containing (i) a first projection of a first subset of the points into a two-dimensional frame of reference, and (ii) for each point of the first subset, depth data derived from the corresponding position data;
- generating, at the processor, secondary image data containing (i) a second projection of a second subset of the points into the two-dimensional frame of reference, the second projection overlapping with at least part of the first projection in the two-dimensional frame of reference, and (ii) for each point of the second subset, depth data derived from the corresponding position data; and
- storing the primary image data and the secondary image data in a memory connected to the processor.
2. The method of claim 1, wherein obtaining the point cloud data includes retrieving the point cloud data from a memory.
3. The method of claim 1, wherein obtaining the point cloud data includes:
- receiving raw point cloud data from a capture apparatus; and
- generating the point cloud data by registering the raw point cloud data to a common three-dimensional frame of reference.
4. The method of claim 1, the primary image data including:
- a first image dimensioned according to the two-dimensional frame of reference and containing colour data for each point of the first subset; and
- a second image dimensioned according to the two-dimensional frame of reference and containing depth data for each point of the first subset.
5. The method of claim 4, wherein the first image and the second image are cube map projections.
6. The method of claim 4, wherein the first image and the second image are YUV images having luminance and chrominance channels, and wherein the depth data includes a depth value stored in the luminance channel.
7. The method of claim 1, the secondary image data including:
- a first image dimensioned according to the two-dimensional frame of reference and containing colour data for each point of the second subset; and
- a second image dimensioned according to the two-dimensional frame of reference and containing depth data for each point of the first subset.
8. The method of claim 1, wherein the first image and the second image are cube map projections.
9. The method of claim 7, further comprising:
- detecting that a plurality of colliding ones of the second subset of points have a common position in the two-dimensional frame of reference;
- storing colour data and depth data for one of the colliding points in the first image and the second image according to the common position;
- storing colour data and depth data for another of the colliding points in the first image and the second image at a two-dimensional offset from the common position.
10. The method of claim 9, further comprising:
- storing the two-dimensional offset in the second image.
11. The method of claim 1, wherein generating the primary image data includes:
- setting a viewpoint position corresponding to a location in the capture volume; and
- selecting the first subset of the points by: for each of a plurality of paths extending from the viewpoint position, selecting the first point of the point cloud data encountered by the path.
12. The method of claim 1, wherein generating the primary image data includes:
- setting a viewpoint position corresponding to a location in the capture volume; and
- selecting the first subset of the points by: determining a distance from the viewpoint to each of the plurality of points; comparing the distance to a threshold; and selecting the points having a smaller distance than the threshold from the viewpoint.
13. The method of claim 1, further comprising:
- transmitting the primary image data and the secondary image data for receipt by a client device.
14. A generation computing device, comprising:
- a memory;
- a network interface; and
- a processor interconnected with the memory and the network interface, the processor configured to: obtain point cloud data at a processor of a generation computing device, the point cloud data including colour and three-dimensional position data for each of a plurality of points corresponding to locations in a capture volume; generate primary image data containing (i) a first projection of a first subset of the points into a two-dimensional frame of reference, and (ii) for each point of the first subset, depth data derived from the corresponding position data; generate secondary image data containing (i) a second projection of a second subset of the points into the two-dimensional frame of reference, the second projection overlapping with at least part of the first projection in the two-dimensional frame of reference, and (ii) for each point of the second subset, depth data derived from the corresponding position data; and store the primary image data and the secondary image data in the memory.
15. A method of rendering virtual reality multimedia data, comprising:
- obtaining primary image data containing (i) a first projection of a first subset of points in a three-dimensional point cloud into a two-dimensional frame of reference, and (ii) for each point of the first subset, depth data derived from corresponding position data of the points;
- obtaining secondary image data containing (i) a second projection of a second subset of the points into the two-dimensional frame of reference, the second projection overlapping with at least part of the first projection in the two-dimensional frame of reference, and (ii) for each point of the second subset, depth data derived from the corresponding position data;
- receiving a viewpoint position from a virtual reality display;
- selecting at least a portion of the primary image data and the secondary image data based on the viewpoint position; and
- rendering the selected primary and secondary image data on the virtual reality display.
16. A client computing device, comprising:
- a memory;
- a network interface; and
- a processor interconnected with the memory and the network interface, the processor configured to: obtain primary image data containing (i) a first projection of a first subset of points in a three-dimensional point cloud into a two-dimensional frame of reference, and (ii) for each point of the first subset, depth data derived from corresponding position data of the points; obtain secondary image data containing (i) a second projection of a second subset of the points into the two-dimensional frame of reference, the second projection overlapping with at least part of the first projection in the two-dimensional frame of reference, and (ii) for each point of the second subset, depth data derived from the corresponding position data; receive a viewpoint position from a virtual reality display; select at least a portion of the primary image data and the secondary image data based on the viewpoint position; and render the selected primary and secondary image data on the virtual reality display.
Type: Application
Filed: Nov 19, 2015
Publication Date: May 3, 2018
Inventors: Erik PETERSON (Toronto), Aria SHAHINGOHAR (Toronto)
Application Number: 15/573,682