VIDEO GENERATION USING CONVICT HULLS
Video of a scene is generated and presented to a user. A stream of mesh models of the scene is generated from one or more streams of sensor data that represent the scene. Each of the mesh models is sliced using a series of planes that are parallel to each other, where each of the planes in the series defines one or more contours each of which defines a specific region on the plane where the mesh model intersects the plane. A texture map is generated for each of the mesh models which defines texture data corresponding to each of the contours that is defined by the series of planes. Images of the scene are rendered from scene proxies that include a stream of mathematical equations describing the contours, and a stream of the texture maps. The images are displayed.
Latest Microsoft Patents:
- ULTRA DENSE PROCESSORS WITH EMBEDDED MICROFLUIDIC COOLING
- Automatic Binary Code Understanding
- ARTIFICIAL INTELLIGENCE INFERENCING VIA DELTA MODELS
- CODING ACTIVITY TASK (CAT) EVALUATION FOR SOURCE CODE GENERATORS
- Personalized Branding with Prompt Adaptation in Large Language Models and Visual Language Models
This application claims the benefit of and priority to provisional U.S. patent application Ser. No. 61/653,983 filed May 31, 2012.
BACKGROUNDA given video generally includes one or more scenes, where each scene in the video can be either relatively static (e.g., the objects in the scene do not substantially change or move over time) or dynamic (e.g., the objects in the scene substantially change and/or move over time). As is appreciated in the art of computer graphics, polygonal modeling is commonly used to represent three-dimensional objects in a scene by approximating the surface of each object using polygons. A polygonal model of a given scene includes a collection of vertices. Two neighboring vertices that are connected by a straight line form an edge in the polygonal model. Three neighboring and non-co-linear vertices that are interconnected by three edges form a triangle in the polygonal model. Four neighboring and non-co-linear vertices that are interconnected by four edges form a quadrilateral in the polygonal model. Triangles and quadrilaterals are the most common types of polygons used in polygonal modeling, although other types of polygons may also be used depending on the capabilities of the renderer that is being used to render the polygonal model. A group of polygons that are interconnected by shared vertices are referred to as a mesh and as such, a polygonal model of a scene is also known as a mesh model. Each of the polygons that makes up a mesh is referred to as a face in the polygonal/mesh model. Accordingly, a polygonal/mesh model of a scene includes a collection of vertices, edges and polygonal (i.e., polygon-based) faces that represents/approximates the shape of each object in the scene.
SUMMARYThis Summary is provided to introduce a selection of concepts, in a simplified form, that are further described hereafter in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used as an aid in determining the scope of the claimed subject matter.
Video generation technique embodiments described herein are generally applicable to generating a video of a scene and presenting it to a user. In an exemplary embodiment of this generation, one or more streams of sensor data that represent the scene are received. Scene proxies are then generated from the streams of sensor data. This scene proxies generation includes the following actions. A stream of mesh models of the scene is generated from the streams of sensor data. Then, for each of the mesh models, the following actions take place. The mesh model is sliced using a series of planes that are parallel to each other, where each of the planes in the series defines one or more contours each of which defines a specific region on the plane where the mesh model intersects the plane. A texture map is then generated for the mesh model which defines texture data corresponding to each of the contours that is defined by the series of planes.
In an exemplary embodiment of the just-mentioned presentation, the scene proxies are received. The scene proxies include a stream of mathematical equations describing contours that are defined by a series of planes that are parallel to each other, and a stream of texture maps defining texture data corresponding to each of the contours that is defined by the series of planes. Images of the scene are then rendered from the scene proxies and displayed. This image rendering includes the following actions. The series of planes is constructed using data specifying the spatial orientation and geometry of the series of planes. The contours that are defined by the series of planes are then constructed using the stream of mathematical equations. A series of point locations is then constructed along each of the contours that is defined by the series of planes, where this construction is performed in a prescribed order across each of the planes in the series of planes, and this construction is also performed starting from a prescribed zero position on each of these contours. The point locations that are defined by the series of planes are then tessellated, where this tessellation generates a stream of polygonal models, and each polygonal model includes a collection of polygonal faces that are formed by neighboring point locations on corresponding contours on neighboring planes in the series of planes. The stream of texture maps is then sampled to identify the texture data that corresponds to each of the polygonal faces in the stream of polygonal models. This identified texture data is then used to add texture to each of the polygonal faces in the stream of polygonal models.
The specific features, aspects, and advantages of the video generation technique embodiments described herein will become better understood with regard to the following description, appended claims, and accompanying drawings where:
In the following description of video generation technique embodiments reference is made to the accompanying drawings which form a part hereof, and in which are shown, by way of illustration, specific embodiments in which the video generation technique can be practiced. It is understood that other embodiments can be utilized and structural changes can be made without departing from the scope of the video generation technique embodiments.
It is also noted that for the sake of clarity specific terminology will be resorted to in describing the video generation technique embodiments described herein and it is not intended for these embodiments to be limited to the specific terms so chosen. Furthermore, it is to be understood that each specific term includes all its technical equivalents that operate in a broadly similar manner to achieve a similar purpose. Reference herein to “one embodiment”, or “another embodiment”, or an “exemplary embodiment”, or an “alternate embodiment”, or “one implementation”, or “another implementation”, or an “exemplary implementation”, or an “alternate implementation” means that a particular feature, a particular structure, or particular characteristics described in connection with the embodiment or implementation can be included in at least one embodiment of the video generation technique. The appearances of the phrases “in one embodiment”, “in another embodiment”, “in an exemplary embodiment”, “in an alternate embodiment”, “in one implementation”, “in another implementation”, “in an exemplary implementation”, and “in an alternate implementation” in various places in the specification are not necessarily all referring to the same embodiment or implementation, nor are separate or alternative embodiments/implementations mutually exclusive of other embodiments/implementations. Yet furthermore, the order of process flow representing one or more embodiments or implementations of the video generation technique does not inherently indicate any particular order not imply any limitations of the video generation technique.
As is known in the arts of human anatomy and medical research, the Visible Human Project was conceived in the late 1980s and run by the U.S. National Library of Medicine. The goal of the Project was to create a detailed human anatomy data set using cross-sectional photographs of the human body in order to facilitate anatomy visualization applications. A convicted murderer named Joseph Paul Jernigan was executed in 1993 and his cadaver was used to provide male data for the Project. More particularly, Jernigan's cadaver was encased and frozen in a gelatin and water mixture in order to stabilize the cadaver for cutting thereof. Jernigan's cadaver was then segmented (i.e., “cut”) along its axial plane (also known as its transverse plane) at one millimeter intervals from the top of the cadaver's scalp to the soles of its feet, resulting in 1,871 “slices”. Each of these slices was photographed and digitized at a high resolution. The term “convict hull” is accordingly used herein to refer to a given plane that is used to “slice” a given mesh model of a given scene.
The term “sensor” is used herein to refer to any one of a variety of scene-sensing devices which can be used to generate a stream of sensor data that represents a given scene. Generally speaking and as will be described in more detail hereafter, the video generation technique embodiments described herein employ one or more sensors which can be configured in various arrangements to capture a scene, thus allowing one or more streams of sensor data to be generated each of which represents the scene from a different geometric perspective. Each of the sensors can be any type of video capture device (e.g., any type of video camera), or any type of audio capture device (such as a microphone, or the like), or any combination thereof. Each of the sensors can also be either static (i.e., the sensor has a fixed spatial location and a fixed rotational orientation which do not change over time), or moving (i.e., the spatial location and/or rotational orientation of the sensor change over time). The video generation technique embodiments described herein can employ a combination of different types of sensors to capture a given scene.
1.0 Video Generation Using Convict HullsThe video generation technique embodiments described herein generally involve using convict hulls to generate a video of a given scene and then present the video to one or more end users. The video generation technique embodiments support the generation, storage, distribution, and end user presentation of any type of video. By way of example but not limitation, one embodiment of the video generation technique supports various types of traditional, single viewpoint video in which the viewpoint of the scene is chosen by the director when the video is recorded/captured and this viewpoint cannot be controlled or changed by an end user while they are viewing the video. In other words, in a single viewpoint video the viewpoint of the scene is fixed and cannot be modified when the video is being rendered and displayed to an end user. Another embodiment of the video generation technique supports various types of free viewpoint video in which the viewpoint of the scene can be interactively controlled and changed by an end user at will while they are viewing the video. In other words, in a free viewpoint video an end user can interactively generate synthetic (i.e., virtual) viewpoints of the scene on-the-fly when the video is being rendered and displayed. Exemplary types of single viewpoint and free viewpoint video that are supported by the video generation technique embodiments are described in more detail hereafter.
The video generation technique embodiments described herein are advantageous for various reasons including, but not limited to, the following. Generally speaking and as will be appreciated from the more detailed description that follows, the video generation technique embodiments serve to minimize the size of (i.e., minimize the amount of data in) the video that is generated, stored and distributed. Based on this video size/data minimization, it will also be appreciated that the video generation technique embodiments minimize the cost and maximize the performance associated with storing and transmitting the video in a client-server framework where the video is generated and stored on a server computing device, and then transmitted from the server over a data communication network to one or more client computing devices upon which the video is rendered and then viewed and navigated by the one or more end users. Furthermore, the video generation technique embodiments maximize the photo-realism of the video that is generated when it is rendered and then viewed and navigated by the end users. As such, the video generation technique embodiments provide the end users with photo-realistic video that is free of discernible artifacts, thus creating a feeling of immersion for the end users and enhancing their viewing experience.
Additionally, the video generation technique embodiments described herein eliminate having to constrain the complexity or composition of the scene that is being captured (e.g., neither the environment(s) in the scene, nor the types of objects in the scene, nor the number of people of in the scene, among other things has to be constrained). Accordingly, the video generation technique embodiments are operational with any type of scene, including both relatively static and dynamic scenes. The video generation technique embodiments also provide a flexible, robust and commercially viable method for generating a video, and then presenting it to one or more end users, that meets the needs of today's various creative video producers and editors. By way of example but not limitation and as will be appreciated from the more detailed description that follows, the video generation technique embodiments are applicable to various types of video-based media applications such as consumer entertainment (e.g., movies, television shows, and the like) and video-conferencing/telepresence, among others.
1.1 Video Processing PipelineReferring again to
Referring again to
Referring again to
Referring again to
Referring again to
Referring again to
This section provides a more detailed description of the generation stage of the video processing pipeline. As described heretofore, the video generation technique embodiments described herein generally employ one or more sensors which can be configured in various arrangements to capture a scene. These one or more sensors generate one or more streams of sensor data each of which represents the scene from a different geometric perspective.
Referring again to
Referring again to
Referring again to
It is noted that the contours that are defined by a given plane can be analyzed in any order across the plane. By way of example but not limitation, in an exemplary embodiment of the video generation technique described herein the contours that are defined by each of the planes are analyzed in a left-to-right order across the plane. It is also noted that the just-described zero position on the contour can be any position thereon as long as it is the same for each of the contours being analyzed. In an exemplary embodiment of the video generation technique this zero position is the left-most point on the contour. It is also noted that a trade-off exists in the selection of the distance which separates successive point locations along the contours that are defined by the series of planes. More particularly, using a smaller distance between successive point locations along the contours creates a finer approximation of each of the mesh models, but also increases the amount of processing that is associated with the aforementioned processing sub-stage of the video processing pipeline exemplified in
The series of planes that is used to slice each of the mesh models has a prescribed spatial orientation and a prescribed geometry. It will be appreciated that the geometry of the series of planes can be specified using various types of data. Examples of such types of data include, but are not limited to, data specifying the number of planes in the series of planes, data specifying a prescribed spacing that is used between successive planes in the series of planes, and data specifying the shape and dimensions of each of the planes in the series of planes. Both the spatial orientation and the geometry of the series of planes are arbitrary and as such, various spatial orientations and geometries can be used for the series of planes. In one embodiment of the video generation technique the series of planes has a horizontal spatial orientation. In another embodiment of the video generation technique the series of planes has a vertical spatial orientation. It will be appreciated that any number of planes can be used, any spacing between successive planes can be used, any plane shape/dimensions can be used, and any distance which separates successive point locations along the contours can be used. In an exemplary embodiment of the video generation technique described herein the spacing that is used between successive planes in the series of planes is selected such that the series of planes intersects a maximum number of vertices in each of the mesh models. This is advantageous in a situation where each of the mesh models of the scene includes a mesh texture map that defines texture data which has already been computed for the vertices of the mesh model.
In one embodiment of the video generation technique described herein the spatial orientation and geometry of the series of planes, the order across each of the planes in the series of planes by which each of the contours that is defined by the plane is analyzed (hereafter simply referred to as the contour analysis order), the number of texels in each of the scanlines, the zero position on each of the contours (hereafter simply referred to as the contour zero position), and the number of times each of the scanlines in the texture map is replicated are pre-determined and thus are known to each of the end user computing devices in advance of the scene proxies being distributed thereto. In another embodiment of the video generation technique one or more of the spatial orientation of the series of planes, or the geometry of the series of planes, or the contour analysis order, or the number of texels in each of the scanlines, or the contour zero position, or the number of times each of the scanlines in the texture map is replicated, may not be pre-determined and thus may not be known to each of the end user computing devices in advance of the scene proxies being distributed thereto.
In one embodiment of the video generation technique described herein the mathematical equation describing a given contour specifies a polygon approximation of the contour. In another embodiment of the video generation technique the mathematical equation describing a given contour specifies a non-uniform rational basis spline (NURBS) curve approximation of the contour.
This section provides a more detailed description of the end user presentation stage of the video processing pipeline.
The video generation technique embodiments described herein assume that each of the mathematical equations specifies how the particular contour it describes is positioned on the particular plane that defines this contour (e.g., by specifying one or more control points for the contour, among other ways). An alternate embodiment of the video generation technique is also possible where each of the mathematical equations does not specify how the particular contour it describes is positioned on the particular plane that defines this contour, in which case this positioning information will be separately specified, stored and distributed to the end user.
Referring again to
Referring again to
In an exemplary embodiment of the video generation technique described herein, the just-described adaption operates in the following manner. In the case where the number of texels in a given scanline that are assigned to a particular contour that is defined by the plane corresponding to the scanline is greater than the average of the number of texels in the scanline that are assigned to the particular contour and the number of texels in the next scanline in the series of scanlines that are assigned to another contour that corresponds to the particular contour, the adaption of the number of texels in the scanline that are assigned to the particular contour involves using conventional methods to contract a series of texels in the scanline. In the case where the number of texels in a given scanline that are assigned to a particular contour that is defined by the plane corresponding to the scanline is less than the average of the number of texels in the scanline that are assigned to the particular contour and the number of texels in the next scanline in the series of scanlines that are assigned to another contour that corresponds to the particular contour, the adaption of the number of texels in the scanline that are assigned to the particular contour involves using conventional methods to expand a series of texels in the scanline.
In addition to performing the just-described adaption of the number of scanline texels that are assigned to each one of the contours that is defined by the plane corresponding to each of the scanlines in each of the texture maps, any sampling errors that may occur during the sampling of the stream of texture maps can further be minimized by optionally performing an action of, for each of the scanlines in each of the texture maps, inserting one or more “gutter texels” between neighboring texel series in the scanline, and optionally also inserting one or more “gutter scanlines” between neighboring scanlines in each of the texture maps. It will be appreciated that implementing such gutter texels and gutter scanlines serves to prevent bleed-over when the stream of texture maps is sampled using certain sampling methods such as the conventional bilinear interpolation method.
1.4 Supported Video TypesThis section provides a more detailed description of exemplary types of single viewpoint video and exemplary types of free viewpoint video that are supported by the video generation technique embodiments described herein.
Referring again to
Referring again to
Referring again to
Referring again to
While the video generation technique has been described by specific reference to embodiments thereof, it is understood that variations and modifications thereof can be made without departing from the true spirit and scope of the video generation technique. By way of example but not limitation, rather than supporting the generation, storage, distribution, and end user presentation of video, alternate embodiments of the video generation technique described herein are possible which support any other digital image application where a scene is represented by a mesh model and a corresponding mesh texture map which defines texture data for the mesh model.
It is also noted that any or all of the aforementioned embodiments can be used in any combination desired to form additional hybrid embodiments. Although the video generation technique embodiments have been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described heretofore. Rather, the specific features and acts described heretofore are disclosed as example forms of implementing the claims.
3.0 Computing EnvironmentThe video generation technique embodiments described herein are operational within numerous types of general purpose or special purpose computing system environments or configurations.
For example,
To allow a device to implement the video generation technique embodiments described herein, the device should have a sufficient computational capability and system memory to enable basic computational operations. In particular, as illustrated by
In addition, the simplified computing device 1300 of
The simplified computing device 1300 of
Storage of information such as computer-readable or computer-executable instructions, data structures, program modules, and the like, can also be accomplished by using any of a variety of the aforementioned communication media to encode one or more modulated data signals or carrier waves, or other transport mechanisms or communications protocols, and includes any wired or wireless information delivery mechanism. Note that the terms “modulated data signal” or “carrier wave” generally refer to a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. For example, communication media includes wired media such as a wired network or direct-wired connection carrying one or more modulated data signals, and wireless media such as acoustic, radio frequency (RF), infrared, laser, and other wireless media for transmitting and/or receiving one or more modulated data signals or carrier waves. Combinations of the any of the above should also be included within the scope of communication media.
Furthermore, software, programs, and/or computer program products embodying the some or all of the various embodiments of the video generation technique described herein, or portions thereof, may be stored, received, transmitted, or read from any desired combination of computer or machine readable media or storage devices and communication media in the form of computer executable instructions or other data structures.
Finally, the video generation technique embodiments described herein may be further described in the general context of computer-executable instructions, such as program modules, being executed by a computing device. Generally, program modules include routines, programs, objects, components, data structures, and the like, that perform particular tasks or implement particular abstract data types. The video generation technique embodiments may also be practiced in distributed computing environments where tasks are performed by one or more remote processing devices, or within a cloud of one or more devices, that are linked through one or more communications networks. In a distributed computing environment, program modules may be located in both local and remote computer storage media including media storage devices. Additionally, the aforementioned instructions may be implemented, in part or in whole, as hardware logic circuits, which may or may not include a processor.
Claims
1. A computer-implemented process for generating a video of a scene, comprising:
- using a computing device to perform the following process actions:
- receiving one or more streams of sensor data that represent the scene; and
- generating scene proxies from said streams of sensor data, said generation comprising the actions of: generating a stream of mesh models of the scene from said streams of sensor data, and for each of the mesh models, slicing the mesh model using a series of planes that are parallel to each other, each of the planes in the series defining one or more contours each of which defines a specific region on the plane where the mesh model intersects the plane, and generating a texture map for the mesh model which defines texture data corresponding to each of the contours that is defined by the series of planes.
2. The process of claim 1, wherein the texture map for the mesh model comprises a series of scanlines each of which corresponds to a different one of the planes in the series of planes, each of the scanlines comprises a series of texels, and the process action of generating a texture map for the mesh model which defines texture data corresponding to each of the contours that is defined by the series of planes comprises the actions of:
- for each of the planes in the series of planes, analyzing each of the contours that is defined by the plane in a prescribed order across the plane to identify a series of point locations along the contour, said analysis being performed starting from a prescribed zero position on the contour, said zero position being the same for each of the contours that is defined by the plane, for each of the contours that is defined by the plane, using the series of point locations to determine a mathematical equation describing the contour, assigning the texels in the scanline corresponding to the plane to the contours that are defined by the plane, said texel assignment being performed in the prescribed order across the plane, and entering information specifying said texel assignment into the texture map for the mesh model; and
- using the one or more streams of sensor data that represent the scene to compute texture data for each of the texels that is in the texture map for the mesh model; and
- entering said computed texture data into the texture map for the mesh model.
3. The process of claim 2, wherein the process action of using the one or more streams of sensor data that represent the scene to compute texture data for each of the texels that is in the texture map for the mesh model comprises an action of, for each of said texels, using a projective texture mapping method to sample each of said streams of sensor data and combine texture information from each of said samples to generate texture data for the texel.
4. The process of claim 2, wherein the process action of assigning the texels in the scanline corresponding to the plane to the contours that are defined by the plane comprises the actions of:
- for each of the contours that is defined by the plane, calculating the length of the contour, calculating the normalized length of the contour by dividing the length of the contour by the sum of the lengths of all of the contours that are defined by the plane, calculating the number of texels in said scanline that are to be assigned to the contour by multiplying the normalized length of the contour and the total number of texels that is in said scanline, and assigning said calculated number of texels to the contour.
5. The process of claim 1, wherein the texture data comprises one or more of: color data; or specular highlight data; or transparency data; or reflection data; or shadowing data.
6. The process of claim 1, wherein each of the mesh models comprises a collection of vertices, a prescribed spacing is used between successive planes in the series of planes, and said spacing is selected such that the series of planes intersects a maximum number of vertices in each of the mesh models.
7. The process of claim 1, wherein either,
- the one or more streams of sensor data comprise a single stream of sensor data which represents the scene from a single geometric perspective, and the video being generated is a single viewpoint video, or
- the one or more streams of sensor data comprise a plurality of streams of sensor data each of which represents the scene from a different geometric perspective, and the video being generated is a free viewpoint video.
8. The process of claim 1, further comprising an action of storing the scene proxies, said storing comprising the actions of:
- for each of the mesh models, storing a mathematical equation describing each of the contours that is defined by the series of planes, storing data specifying which contours on neighboring planes in the series of planes correspond to each other, and storing the texture map for the mesh model.
9. The process of claim 8, wherein the mathematical equation describing a given contour specifies either a polygon approximation of the contour, or a non-uniform rational basis spline curve approximation of the contour.
10. The process of claim 8 wherein,
- whenever the spatial orientation of the series of planes is not pre-determined, the process action of storing the scene proxies further comprises an action of storing data specifying said spatial orientation, and
- whenever the geometry of the series of planes is not pre-determined, the process action of storing the scene proxies further comprises an action of storing data specifying said geometry, said data comprising one or more of: data specifying the number of planes in the series of planes; or data specifying a prescribed spacing that is used between successive planes in the series of planes; or data specifying the shape and dimensions of each of the planes in the series of planes.
11. The process of claim 8 wherein,
- whenever the prescribed order across the plane is not pre-determined, the process action of storing the scene proxies further comprises an action of storing data specifying said order,
- whenever the number of texels in each of the scanlines is not pre-determined, the process action of storing the scene proxies further comprises an action of storing data specifying said number, and
- whenever the prescribed zero position on the contour is not pre-determined, the process action of storing the scene proxies further comprises an action of storing data specifying said zero position.
12. The process of claim 1, further comprising an action of distributing the scene proxies to an end user who either is, or will be, viewing the video on another computing device which is connected to a data communication network, said distribution comprising the actions of:
- for each of the mesh models, transmitting a mathematical equation describing each of the contours that is defined by the series of planes over the network to said other computing device, transmitting data specifying which contours on neighboring planes in the series of planes correspond to each other over the network to said other computing device, and transmitting the texture map for the mesh model over the network to said other computing device.
13. The process of claim 12, wherein whenever the spatial orientation of the series of planes is not pre-determined, the process action of distributing the scene proxies to an end user who either is, or will be, viewing the video on another computing device which is connected to a data communication network further comprises an action of transmitting data specifying said spatial orientation over the network to said other computing device.
14. The process of claim 12, wherein whenever the geometry of the series of planes is not pre-determined, the process action of distributing the scene proxies to an end user who either is, or will be, viewing the video on another computing device which is connected to a data communication network further comprises an action of transmitting data specifying said geometry over the network to said other computing device, said data comprising one or more of:
- data specifying the number of planes in the series of planes; or
- data specifying a prescribed spacing that is used between successive planes in the series of planes; or
- data specifying the shape and dimensions of each of the planes in the series of planes.
15. The process of claim 12, wherein,
- whenever the prescribed order across the plane is not pre-determined, the process action of distributing the scene proxies to an end user who either is, or will be, viewing the video on another computing device which is connected to a data communication network further comprises an action of transmitting data specifying said order over the network to said other computing device,
- whenever the number of texels in each of the scanlines is not pre-determined, the process action of distributing the scene proxies to an end user who either is, or will be, viewing the video on another computing device which is connected to a data communication network further comprises an action of transmitting data specifying said number over the network to said other computing device, and
- whenever the prescribed zero position on the contour is not pre-determined, the process action of distributing the scene proxies to an end user who either is, or will be, viewing the video on another computing device which is connected to a data communication network further comprises an action of transmitting data specifying said zero position over the network to said other computing device.
16. The process of claim 1, wherein the series of planes comprises either a horizontal spatial orientation or a vertical spatial orientation.
17. A computer-implemented process for presenting a video of a scene to a user, comprising:
- using a computing device to perform the following process actions:
- receiving scene proxies, said scene proxies comprising: a stream of mathematical equations describing contours that are defined by a series of planes that are parallel to each other, and a stream of texture maps defining texture data corresponding to each of the contours that is defined by the series of planes;
- rendering images of the scene from the scene proxies, said rendering comprising the actions of: constructing the series of planes using data specifying the spatial orientation and geometry of the series of planes, constructing the contours that are defined by the series of planes using the stream of mathematical equations, constructing a series of point locations along each of said contours, said construction being performed in a prescribed order across each of the planes in the series of planes, said construction also being performed starting from a prescribed zero position on each of said contours, tessellating the point locations that are defined by the series of planes, said tessellation generating a stream of polygonal models, each polygonal model comprising a collection of polygonal faces that are formed by neighboring point locations on corresponding contours on neighboring planes in the series of planes, sampling the stream of texture maps to identify the texture data that corresponds to each of the polygonal faces in the stream of polygonal models, and using said identified texture data to add texture to each of the polygonal faces in the stream of polygonal models; and
- displaying the images of the scene.
18. The process of claim 17, wherein each of the texture maps in the stream of texture maps comprises a series of scanlines each of which corresponds to a different one of the planes in the series of planes, each of the scanlines comprises a series of texels that are assigned to each of the contours that is defined by the plane corresponding to the scanline, and the process action of sampling the stream of texture maps to identify the texture data that corresponds to each of the polygonal faces in the stream of polygonal models comprises the actions of:
- for each of the scanlines in each of the texture maps, adapting the number of texels in the scanline that are assigned to each one of the contours that is defined by the plane corresponding to the scanline to be the average of the number of texels in the scanline that are assigned to said one of the contours and the number of texels in the next scanline in the series of scanlines that are assigned to a contour that corresponds to said one of the contours, said adaption resulting in a modified version of each of the texture maps; and
- sampling the modified version of each of the texture maps to identify the texture data that corresponds to each of the polygonal faces in the stream of polygonal models.
19. The process of claim 17, wherein the video being presented comprises one of:
- asynchronous single viewpoint video; or
- asynchronous free viewpoint video; or
- unidirectional live single viewpoint video; or
- unidirectional live free viewpoint video; or
- bidirectional live single viewpoint video; or
- bidirectional live free viewpoint video.
20. A computer-implemented process for generating a video of a scene, comprising:
- using a computing device to perform the following process actions:
- receiving one or more streams of sensor data that represent the scene;
- generating scene proxies from said streams of sensor data, said scene proxies generation comprising the actions of: generating a stream of mesh models of the scene from said streams of sensor data, and for each of the mesh models, slicing the mesh model using a series of planes that are parallel to each other, each of the planes in the series defining one or more contours each of which defines a specific region on the plane where the mesh model intersects the plane, and generating a texture map for the mesh model which defines texture data corresponding to each of the contours that is defined by the series of planes, said texture map comprising a series of scanlines each of which corresponds to a different one of the planes in the series of planes, each of the scanlines comprising a series of texels, said texture map generation comprising the actions of, for each of the planes in the series of planes, analyzing each of the contours that is defined by the plane in a prescribed order across the plane to identify a series of point locations along the contour, for each of the contours that is defined by the plane, using the series of point locations to determine a mathematical equation describing the contour, assigning the texels in the scanline corresponding to the plane to the contours that are defined by the plane, said texel assignment being performed in the prescribed order across the plane, and entering information specifying said texel assignment into the texture map for the mesh model, using the one or more streams of sensor data that represent the scene to compute texture data for each of the texels that is in the texture map for the mesh model, and entering said computed texture data into the texture map for the mesh model; and
- distributing the scene proxies to an end user who either is, or will be, viewing the video on another computing device which is connected to a data communication network, said distribution comprising the actions of: for each of the mesh models, transmitting a mathematical equation describing each of the contours that is defined by the series of planes over the network to said other computing device, transmitting data specifying which contours on neighboring planes in the series of planes correspond to each other over the network to said other computing device, and transmitting the texture map for the mesh model over the network to said other computing device.
Type: Application
Filed: Mar 8, 2013
Publication Date: Dec 5, 2013
Applicant: Microsoft Corporation (Redmond, WA)
Inventors: Patrick Sweeney (Woodinville, WA), Don Gillett (Bellevue, WA)
Application Number: 13/790,158
International Classification: G06T 15/04 (20060101); G06T 17/00 (20060101);