Method and System for Providing Virtual Reality (VR) Video Transcoding and Broadcasting

Info

Publication number: 20180189980
Type: Application
Filed: Jan 2, 2018
Publication Date: Jul 5, 2018
Applicant: Black Sails Technology Inc. (Sunnyvale, CA)
Inventors: Zhuo Wang (Sunnyvale, CA), Yongtao Tang (San Leandro, CA), Ruoxi Zhao (San Jose, CA), Haoyan Zu (Newark, CA), Chia-Chi Chang (San Jose, CA)
Application Number: 15/860,494

Abstract

Disclosed a method and a system for providing virtual reality (VR) video transcoding and broadcasting. The method comprises: obtaining a use's viewport; processing a VR video data into a basic video set and an enhancement video set in accordance with the user's viewport, wherein the basic video set comprises a plurality of basic video segments, the enhancement video set comprises a plurality of enhancement video segments, and the playback effect of the sum of the basic video segment and the enhancement video segment is better than that of the basic video segment; downloading the basic video segments and the enhancement video segments; and displaying a sum of two video data obtained by adding the basic video segments and the enhancement video segments in accordance with the user's viewport. According to the embodiments of the present disclosure, the VR video data is processed into a basic video set and an enhancement video set and an video data obtained by adding the basic video segments and the enhancement video segments in accordance with the user's viewport is displayed. Thus, viewing experience is ensured while downloaded data is reduced and transmission effect is improved.

Description

Description

CROSS-REFERENCE TO RELATED APPLICATION

This application claims the benefit of U.S. provisional application 62/441,936, filed on Jan. 3, 2017, which are incorporated herein by reference in its entirety.

BACKGROUND OF THE DISCLOSURE Field of the Disclosure

The present disclosure relates to video processing technology, and more particularly, to a method and a system for proving virtual reality (VR) video transcoding and broadcasting.

Background of the Disclosure

Virtual Reality (VR) is a computer simulation technology for creating and experiencing a virtual world. For example, a three-dimensional real-time image can be presented based on a technology which tracks a user's head, eyes or hand. For a network-based virtual reality technology, full-view video data is pre-stored on a server, and then transmitted to a display device, such as glasses. A video is displayed on the display device in accordance with a viewing angle of the user.

However, high-resolution video data needs to occupy a large transmission bandwidth and requires that the display device has a high data processing capability. Because the data transmission needs a large transmission bandwidth and the display device needs a high processing capability, the existing video processing technology has high requirements for network and terminals. Moreover, it is difficult to present high-resolution and real-time image display.

Therefore, it is desirable to further improve video processing and rendering method for a VR playback system, so as to save transmission bandwidth, reduce performance requirements for the display device, and present real-time image display smoothly.

SUMMARY OF THE DISCLOSURE

In view of this, the present disclosure provides a method and a system for providing virtual reality (VR) video transcoding and broadcasting, to solve the above problems.

According to a first aspect of the present disclosure, there is a method for providing virtual reality (VR) video transcoding and broadcasting, comprising:

A method for providing virtual reality (VR) video transcoding and broadcasting, comprising:

obtaining a user's viewport;

processing a VR video data into a basic video set and an enhancement video set in accordance with the user's viewport, wherein the basic video set comprises a plurality of basic video segments, the enhancement video set comprises a plurality of enhancement video segments, and the playback effect of the sum of the basic video segments and the enhancement video segments is better than that of the basic video segments;

downloading the basic video segments and the enhancement video segments; and

displaying a sum of two video data obtained by adding the basic video segments and the enhancement video segments in accordance with the user's viewport.

Preferably, the user's viewport is relevant to specification and parameters of a head-up display device.

Preferably, wherein the step of processing a VR video data into a basic video set and an enhancement video set in accordance with the user's viewport comprises:

dividing a projection area of the VR video data into a plurality of grid blocks;

determining that among the plurality of grid blocks, which grid blocks constitute a viewport block in accordance with the user's viewport; and

processing the VR video data into the basic video set and the enhancement video set in accordance with the grid blocks constituting the viewport block.

Preferably, the step of processing the VR video data into the basic video set and the enhancement video set in accordance with the grid blocks constituting the viewport block comprises:

obtaining an audio data set and a first frame data set by decoding the source VR video data,

obtaining a second frame data set by scaling down the first frame data set losslessly to a target resolution;

obtaining a third frame data set by decreasing a resolution of the first frame data set to a basic resolution and then increasing to a target resolution by using interpolation algorithm;

obtaining a basic video set by combining the audio data set and the second frame data set and segmenting the combination;

encoding and segmenting the second frame data set minus the third frame data set to obtain a plurality of video segments in accordance with the plurality of grid blocks; and

assigning some of the plurality of video segments into the enhancement video set in accordance with the grid blocks constituting the viewport block.

Preferably, the step of dividing a projection area of the VR video data into a plurality of grid blocks comprises dividing the projection area into the plurality of grid blocks with equal areas.

Preferably, the method further comprises:

obtaining a plurality of combinations of resolution and bitrate, wherein each of the plurality of combinations comprises a basic resolution, an enhancement resolution, a basic bitrate and an enhancement bitrate.

the step of processing a VR video data into a basic video set and an enhancement video set comprises:

processing the VR video data into the basic video set and the enhancement video set in accordance with the plurality of combinations of resolution and bitrate.

Preferably, the step of downloading the basic video segments and the enhancement video segments comprises:

calculating an average download speed;

selecting from the plurality of combinations of resolution and bitrate in accordance with the average download speed; and

downloading corresponding basic video segments and enhancement video segments in accordance with a selected combination of resolution and bitrate.

Preferably, the step of displaying a sum of two video data obtained by adding the basic video segments and the enhancement video segments comprises:

displaying the sum of two video data respectively in panoramic mode and binocular mode.

Preferably, the step of displaying the sum of two video data in panoramic mode comprises:

building a basic video model and an enhancement video model respectively;

initializing UV coordinates of the basic video model and the enhancement video model;

obtaining basic video segments and enhancement video segments;

obtaining pixel information of the basic video segments and the enhancement video segments by decoding;

generating a basic video texture according to the pixel information of the basic video segments and the UV coordinates of the basic video model, and an enhancement video texture according to the pixel information of the enhancement video segments and the UV coordinates of the enhancement video model;

determining UV alignment coordinates of the enhancement video texture according to a user's viewport;

generating reconstructed pixel information by adding the basic video texture and the enhancement video texture according to UV alignment coordinates; and

drawing an image according to the reconstructed pixel information.

Preferably, the step of displaying the sum of two video data in binocular mode comprises:

obtaining relevant parameters including a camera matrix, a projection matrix, a model matrix and a center position of lens distortion;

creating a three-dimensional model and obtaining an original coordinate data of the three-dimensional model;

obtaining a first coordinate data in accordance with the relevant parameters and the original coordinate data of the three-dimensional model;

performing lens distortion on the first coordinate data based on the center position of lens distortion to obtain a second coordinate data;

pixel-quantizing the second coordinate data to obtain pixel units; and

drawing an image in accordance with a VR video data and the pixel units.

According to a second aspect of the disclosure, there is provided a system for providing virtual reality (VR) video transcoding and broadcasting, comprising:

a obtaining module configured to obtaining a user's viewport;

a data transcoding module configured to process a VR video data into a basic video set and an enhancement video set in accordance with the user's viewport, wherein the basic video set comprises a plurality of basic video segments, the enhancement video set comprises a plurality of enhancement video segments, and the playback effect of the sum of the basic video segment and the enhancement video segment is better than that of the basic video segment;

a downloading module configured to download the basic video segments and the enhancement video segments; and

a playing module configured to display a sum of two video data obtained by adding the basic video segments and the enhancement video segments in accordance with the user's viewport.

Preferably, the user's viewport is relevant to specification and parameters of a head-up display device.

Preferably, the data transcoding module comprises:

a division unit configured to divide a projection area of the VR video data into a plurality of grid blocks;

a cutting unit configured to determining that among the plurality of grid blocks, which grid blocks constitute a viewport block in accordance with the user's viewport; and

a processing unit configured to process the VR video data into the basic video set and the enhancement video set in accordance with the grid blocks constituting the viewport block.

Preferably, the system further comprises: a mapping table generating unit configured to obtain a plurality of combinations of resolution and bitrate, wherein each of the plurality of combinations comprises a basic resolution, an enhancement resolution, a basic bitrate and an enhancement bitrate.

the processing unit comprises:

processing a VR video data into the basic video set and the enhancement video set in accordance with the plurality of combinations of resolution and bitrate.

Preferably, the downloading module comprises:

a speed calculation unit configured to calculate an average download speed;

a selection unit configured to select from the plurality of combinations of resolution and bitrate in accordance with the average download speed;

an execution unit configured to download corresponding basic video segments and enhancement video segments in accordance with a selected combination of resolution and bitrate.

Preferably, the playing module display a sum of two video data respectively in two different modes of panoramic mode and binocular mode.

According to the embodiments of the present disclosure, the VR video data is processed into a basic video set and an enhancement video set in accordance with a user's viewport and the sum of two video data obtained by adding the basic video segments and the enhancement video segments in accordance with the user's viewport is displayed. As a result, viewing experience is ensured while downloaded data is reduced and transmission effect is improved.

BRIEF DESCRIPTION OF THE DRAWINGS

The above and other objects, features and advantages of the present disclosure will become more apparent by describing the embodiments of the present disclosure with reference to the following drawings, in which:

FIG. 1 is a diagram illustrating an example network of a VR playback system;

FIG. 2 is a flowchart diagram showing method for providing virtual reality (VR) video transcoding and broadcasting according to an embodiment of the disclosure;

FIG. 3 is a specific flowchart diagram of step S200 in FIG. 2;

FIG. 4 is a specific flowchart diagram of step S203 of FIG. 3;

FIG. 5 is a specific flowchart diagram of step S300 of FIG. 2;

FIG. 6 is a specific flowchart diagram of displaying a sum of two video data in panoramic mode of step S400 shown in FIG. 2;

FIG. 7 is a specific flowchart diagram of displaying a sum of two video data in binocular mode of step S400 shown in FIG. 2;

FIG. 8 is a schematic diagram of a system for proving virtual reality (VR) video transcoding and broadcasting according to an embodiment of the disclosure;

FIG. 9 is a specific schematic diagram of a data transcoding module 602 in FIG. 6; and

FIG. 10 is a specific schematic diagram of a downloading module 603 in FIG. 6;

DETAILED DESCRIPTION OF THE DISCLOSURE

Exemplary embodiments of the present disclosure will be described in more details below with reference to the accompanying drawings. In the drawings, like reference numerals denote like members. The figures are not drawn to scale, for the sake of clarity. Moreover, some well known parts may not be shown.

FIG. 1 is a diagram illustrating an example network of a VR playback system. The VR playback system 10 includes a server 100 and a display device 120 which are coupled with each other through a network 110, and a VR device. For example, the server 100 may be a stand-alone computer server or a server cluster. The server 100 is used to store various video data and to store various applications that process these video data. For example, various daemons run on the server 100 in real time, so as to process various video data in the server 100 and to respond various requests from VR devices and the display device 120. The network 110 may be a selected one or selected ones from the group consisting of an internet, a local area network, an internet of things, and the like. For example, the display device 120 may be any of the computing devices, including a computer device having an independent display screen and a processing capability. The display device 120 may be a personal computer, a laptop computer, a computer workstation, a server, a mainframe computer, a palmtop computer, a personal digital assistant, a smart phone, an intelligent electrical apparatus, a game console, an iPad/iPhone, a video player, a DVD recorder/player, a television, or a home entertainment system. The display device 120 may store VR player software as a VR player. When the VR player is started, it requests and downloads various video data from the server 100, and renders and plays the video data in the display device. In this example, the VR device 130 is a stand-alone head-up display device that can interact with the display device 120 and the server 100, to communicate the user's current information with the display device 120 and/or the server 100 through signaling. The user's current information is, for example, parameters relevant to users' viewport, changes of sight of eyes. According to these information, the display device 120 can flexibly process the currently played video data. In some embodiments, when a user's viewport is changed, the display device 120 determines that a core viewing region for the user has been changed and starts to play video data with high resolution in the changed core viewing region.

In the above embodiment, the VR device 130 is a stand-alone head-up display device. However, those skilled in the art should understand that the VR device 130 is not limited thereto, and the VR device 130 may also be an all-in-one head-up display device. The all-in-one head-up display device itself has a display screen, so that it is not necessary to connect the all-in-one head-up display device with the external display device. For example, in this example, if the all-in-one head-up display device is used as the VR device, the display device 120 may be omitted. At this point, the all-in-one head-up display device is configured to obtain video data from the server 100 and to perform playback operation, and the all-in-one head-up display device is also configured to detect a user's current viewport and to adjust the playback operation.

FIG. 2 is a flowchart diagram showing method for providing virtual reality (VR) video transcoding and broadcasting according to an embodiment of the disclosure. The method includes the following steps.

In step S100, a use's viewport is obtained.

In step S200, a VR video data is processed into a basic video set and an enhancement video set in accordance with the user's viewport. The basic video set includes a plurality of basic video segments, and the enhancement video set includes a plurality of enhancement video segments.

In step S300, the basic video segments and the enhancement video segments are downloaded in accordance with the user's viewport.

In step S400, a sum of two video data obtained by adding the basic video segments and the enhancement video segments is displayed.

In the scenario where a user uses a VR head-up display device, the user's viewport can be obtained through the specification and parameters of the head-up display device and a screen size. VR video data stored on the server is generally obtained by collecting 360-degree panoramic images of real world, but what a specific user can see is the video image within the viewport. Therefore, in this embodiment, the VR video data is processed into the basic video set and the enhancement video set according to the viewport, and after the basic video segments and the enhancement video segment are downloaded in accordance with the user's viewport, and a video data obtained by adding the basic video segments and the enhancement video segments is displayed. As a result, higher-resolution video data can be displayed within the viewport for better viewing experience while basic video data is displayed outside the viewport to reduce the amount of downloaded data.

FIG. 3 is a specific flowchart diagram of step S200 in FIG. 2. It specifically includes the following steps.

In step S201, a projection area of the VR video data is divided into a plurality of grid blocks; The projection area is generally divided into a number of grid blocks with equal areas.

In step S202, it is determined that among the plurality of grid blocks, which grid blocks constitute a viewport block in accordance with the user's viewport.

In step S203, the VR video data is processed into the basic video set and the enhancement video set in accordance with the grid blocks constituting the viewport block.

FIG. 4 is a specific flowchart diagram of step S203 of FIG. 3. It specifically includes the following steps.

In step S2031, the VR video data is decoded into an audio data set 1 and a frame data set 1.

In step S2032, a frame data set 2 is obtained by scaling down the frame data set 1 losslessly to a target resolution.

In step S2033, a frame data set 3 is obtained by decreasing a resolution of the frame data set 1 to a basic resolution and then increasing to a target resolution by using interpolation algorithm.

In step S2034, an enhancement data set is obtained by a subtraction between the frame data set 2 and the frame data set 3.

In step S2035, the basic video set is obtained by combining the frame data set 2 and the audio data set 1 and segmenting the combination.

In step S2036, a plurality of video segements are obtained by encoding, compressing and segmenting the enhancement data set.

In step S2037, some of the plurality of video segments are assigned into the enhancement video set in accordance with the grid blocks constituting the viewport block.

The above embodiments specifically describe the process of processing a specific VR video data into a basic video set and an enhancement video set according to a user's viewport. For ease of understanding, the following examples are provided for further explaining. The frame data set 1 is set to have an original resolution of 12,600×6,000 pixels, the frame data set 2 and the frame data set 3 having a target resolution of 6,300×3,000 pixels are obtained according to steps S2032 and S2033. The frame data set 2 is obtained by decreasing losslessly 12,600×6,000 pixels of the frame data set 1 to 6,300×3,000 pixels, and the frame data set 3 is obtained by decreasing 12,600×6,000 pixels of the frame data set 1 to, for example, 798×1024 pixels and then performing interpolation and amplification. A substraction is performed between the frame data set 2 and the frame data set 3 to obtain an enhancement data set and the enhancement data set is encoded and compressed to obtain a compressed data, the compressed data is segmented in accordance with the grid blocks to obtain a plurality of video segments, and finally some of the plurality of video segments are assigned into the enhancement video set in accordance with correspondence between viewport block and grid blocks, that is, the enhancement video set contains the video segements corresponding to the viewport block.

Of course, the present disclosure is not limited thereto, and other methods can also obtain the basic video set and an enhancement video set.

The above video data processing steps are generally executed on the server. When the server corresponds to multiple head-up display devices or display devices, the VR video data is correspondingly processed according to respective viewport of multiple users. Therefore, a number of correspondence of viewport block and grid blocks can be established. In order to easily process, the grid blocks should be set to have an appropriate size, for corresponding to a plurality of different viewport.

In the above embodiment, step S200 may process the video data in accordance with a preset resolution and a preset bitrate, that is, after processing, the basic video segments in the basic video set each have a preset basic resolution and a preset basic bitrate, and the enhancement video segments in the enhancement video set each have a preset enhancement resolution and a preset enhancement bitrate. Therefore, a plurality of base video sets and a plurality of enhancement video sets with different combinations of resolution and bitrate may be generated corresponding to one specific video data, and when a display device or a head-up display device requests video data from the server, according to current network conditions, a basic video segment and an enhancement video segment with an ppropriate combination of resolution and bitrate is selected.

FIG. 5 is a specific flowchart diagram of step S300 of FIG. 2.

In step S301, an average download speed is calculated out.

In step S302, from the plurality of combinations of resolution and bitrate, a combination of resolution and bitrate is selected in accordance with the average download speed.

In step S303, corresponding basic video segments and enhancement video segments are downloaded in accordance with the selected combination of resolution and bitrate.

In this embodiment, a combination of resolution and bitrate is determined in accordance with the average download speed, and corresponding basic video segements and enhancement video segments are obtained, thereby achieving the purpose of optimizing data transmission. Alternatively, a plurality of combinations of resolution and bitrate can be established based on an initial combination of resolution and bitrate. For example, N-level combinations of resolution and bitrate are established, and the combinations at each level have a preset corresponding relationship. When the average download speed is within a certain interval, the basic video segments and enhancement video segments which have a resolution and a bitrate within corresponding level, are selected.

Further, step S400 includes panoramic mode and binocular mode.

FIG. 6 is a specific flowchart diagram of displaying a sum of two video data in panoramic mode of step S400 shown in FIG. 2;

In step S401, pixel information of the basic video segments and the enhancement video segments are obtained by decoding.

In this step, the basic video segments and the enhancement video segments are decoded by a suitable decoder to obtain respective pixel information. The decoding process may also include a decompression process for decompressing a compressed video data. Corresponding to each of the different color spaces, different pixel components are solved, for example, the R, G, B components are solved for RGB color space.

In step S402, a basic video model and an enhancement video model respectively are built.

In this step, a suitable three-dimensional model can be created in accordance with requirements. For example, two polygonal sphere can be created as the three-dimensional the basic video model and the enhancement video model.

In step S403, UV coordinates of the basic video model and the enhancement video model are initialized.

Here, UV coordinates are the abbreviation of u, v texture mapping coordinates, similar to space model of X, Y, Z axis. It defines position information for each point on the plane that corresponds to the three-dimensional model. Through the UV coordinates, each point on the image can be accurately mapped to the three-dimensional model. In this step, each UV coordinate point on the base video model and the enhancement video model is created and initialized.

In step S404, a basic video texture and an enhancement video texture are generated.

In this step, the basic video texture is generated in accordance with the pixel information of the basic video segments and the UV coordinates of the basic video model, and the enhancement video texture is generated in accordance with the pixel information of the enhancement video segments and the UV coordinates of the enhancement video model.

In step S405, UV alignment coordinates of the enhancement video texture are determined.

In this step, the UV alignment coordinates of the enhancement video texture may be determined in accordance with the user's current viewport.

In step S406, a reconstructed pixel information is generated by adding the basic video texture and the enhancement video texture with each other in accordance with the UV alignment coordinates.

In this step, the reconstructed pixel information is obtained based on the relationship between the basic video segments and the enhancement video segments.

For easily understanding, the following example is used to further illustrate.

Assuming that the pixel Px,y^Original=(r, g, b)^Tis a pixel information of x, y coordinates in the video data, r, g, b∈[L, H],Px,y^ScaledBase=(r′, g′, b′)^Tis a pixel information in the basic video with coordinates x, y, and r′, g′, b′∈[L, H].

For all (x,y), following difference generate equation (1) is provided,

$Px, y^{NormalizedResidual} = Px, y^{Original} - Px, y^{ScaledBase} + \frac{H - L}{2}$

Px,y^{NormalizedResidual}represents pixel difference, for all pixels (x, y), the following difference reconstruction equation is provided,

$Px, y^{Reconstructed} = Px, y^{ScaleBase} + Px, y^{NormalizedResidual} - \frac{H - L}{2} .$

In step S407, an image is drawn in accordance with the reconstructed pixel information.

FIG. 7 is a specific flowchart diagram of displaying a sum of two video data in binocular mode of step S400 shown in FIG. 2.

In step S411, relevant parameters are obtained.

For example, the relevant parameters are calculated based on specification and parameters of a head-up display device and a screen size. The relevant parameters include, for example, the parameters for field of view of left and right lenses, the camera matrix, the projection matrix and the center position of lens distortion. A head-up display device generally includes a stand and left and right lenses on the stand, and human eyes obtains images from left and right viewable regions through the left and right lenses. Because the left and right viewable regions provide images with difference, human mind, ater obtaining the information with difference, produces a three-dimensional sense. Different type of head-up devices have different specification and parameters, generally, the specification and parameters can be obtained by querying website or querying built-in parameter files, and then the relevant parameters required in rendering process can be calculated in accordance with these relevant parameters.

In step S412, a three-dimensional model is created and the original coordinate data of the three-dimensional model is obtained.

In this step, a suitable three-dimensional model can be created in accordance with requirements. For example, a polygonal sphere can be created as the three-dimensional model and the original coordinate data can be obtained based on the polygonal sphere.

In step S413, the first coordinate data is obtained in accordance with the relevant parameters and the original coordinate data of the three-dimensional model.

In step S414, lens distortion is performed on the first coordinate data based on the center position of lens distortion to obtain second coordinate data.

In step S413, vector calculation on the original coordinate data is performed in accordance with the camera matrix, the projection matrix and the model matrix to obtain the calculated coordinate data as the first coordinate data, and in step S414, the first coordinate data is further distorted to obtain the second coordinate data.

In step S415, the second coordinate data is rasterized to obtain pixel units.

In this step, the second coordinate data is processed into pixel units in a two-dimensional plane.

In step S416, an image is drawn based on the VR video data and the pixel units.

In the step, the VR video data downloaded from the server is decoded to obtain the pixel information therein, the pixel units are assigned in accordance with the pixel information and finally the image is drawn.

The embodiments of the disclosure provides two viewing modes including panoramic mode and binocular mode. In panorama mode, a three dimensional model is created, the UV alignment coordinates of the base video segments and the enhancement video segments are determined in accordance with to the user's current viewport, and then the reconstructed pixel information is assigned to the three-dimensional model in accordance with the UV alignment coordinates, so as to achieve three-dimensional panoramic viewing effect. In binocular mode, a three-dimensional model is created, lens distortion is performed on the coordinate data of the three-dimensional model, and then the basic video segments and the enhancement video segments are added and displayed in the distorted three-dimensional model, so as to achieve binocular-mode VR immersive viewing effect. In binocular mode, the rendering of video data is completed by one processing, which improves rendering efficiency.

FIG. 8 is a schematic diagram of a system for proving virtual reality (VR) video transcoding and broadcasting according to an embodiment of the disclosure. The system includes an obtaining module 801, a data transcoding module 802, a downloading module 803 and a playing module 804.

The obtaining module 801 is configured to obtain the user's viewport.

The data transcoding module 802 is configured to process the VR video data into a basic video set and an enhancement video set in accordance with the user's viewport, the basic video set includes a plurality of basic video segments, the enhancement video set includes a plurality of enhancement video segments, and the playback effect of the sum of the basic video segments and the enhancement video segments is better than that of the basic video segments.

The downloading module 803 is configured to download basic video segments and enhancement video segments.

The playing module 804 is configured to display a sum of two video data obtained by adding the basic video segments and the enhancement video segments in accordance with the user's viewport.

FIG. 9 is a specific schematic diagram of a data transcoding module 602 in FIG. 6.

The data transcoding module includes a division unit 8021, a cutting unit 8022, and a processing unit 8023. The division unit 8021 is configured to divide a projection area of the VR video data into a plurality of grid blocks. The cutting unit 8022 is configured to determine that among the plurality of grid blocks, which grid blocks constitute a viewport block in accordance with the user's viewport. The processing unit 8023 is configured to process the VR video data into the basic video set and the enhancement video set in accordance with the grid blocks constituting the viewport block.

In an alternative embodiment, the system further includes a mapping table generating unit which is configured to obtain a plurality of combinations of resolution and bitrate, each combination of resolution and bitrate includes a basic resolution, an enhancement resolution, a basic bitrate and an enhancement bitrate. At this point, the processing unit 8023 processes the VR video data into a basic video set and an enhancement video set in accordance with the plurality of combinations of resolution and bitrate. Preferably, a plurality of combinations of resolution and bitrate corresponding to a plurality of level can be established based on an initial combination of resolution and bitrate, the resolution and the bitrate at neighboring levels have a specific proportional relationship.

In another alternative embodiment, referring to FIG. 10, the downloading module 803 includes a speed calculation unit 8031, a selection unit 8032, and an execution unit 8033. The speed calculation unit 8031 is configured to calculate an average download speed. The selection unit 8032 is configured to select from the plurality of combinations of resolution and bitrate in accordance with the average download speed. The execution unit 8033 is configured to download corresponding basic video segments and enhancement video segments in accordance with a selected combination of resolution and bitrate.

According to the present disclosure, at the server, the VR video data is processed into the basic video set and the enhancement video set in accordance with the viewport, and at the head-up display device or the display device, the basic video segments and the enhancement video segments are downloaded and the sum of two video data obtained by adding the basic video segments and the enhancement video segments in accordance with the user's viewport is displayed. Thus, the viewing experience can be ensured while transmission bandwidth and performance requirements for the display device are reduced.

In an alternative embodiment, after the appropriate resolution and bitrate are selected, the corresponding video segments are downloaded, so that dynamically adjusting the amount of downloaded data can be realized, thereby optimizing data transmission.

In another alternative embodiment, panorama and binocular viewing modes are provided, and the sum of two video data is displayed in different view modes.

Although the embodiments of the present disclosure have been described above with reference to the preferred embodiments, it is not intended to limit the claims. Any modifications and variations may be made by those skilled in the art without departing from the spirit and scope of the present disclosure, Therefore, the protection scope of the present disclosure should be based on the scope of the claims of the present disclosure.

The foregoing descriptions of specific embodiments of the present disclosure have been presented, but are not intended to limit the disclosure to the precise forms disclosed. It will be readily apparent to one skilled in the art that many modifications and changes may be made in the present disclosure. Any modifications, equivalence, variations of the preferred embodiments can be made without departing from the doctrine and spirit of the present disclosure.

Claims

1. A method for providing virtual reality (VR) video transcoding and broadcasting, comprising:

obtaining a user's viewport;

processing a VR video data into a basic video set and an enhancement video set in accordance with the user's viewport, wherein the basic video set comprises a plurality of basic video segments, the enhancement video set comprises a plurality of enhancement video segments, and the playback effect of the sum of the basic video segments and the enhancement video segments is better than that of the basic video segments;

downloading the basic video segments and the enhancement video segments; and

displaying a sum of two video data obtained by adding the basic video segments and the enhancement video segments in accordance with the user's viewport.

2. The method according to claim 1, wherein, the user's viewport is relevant to specification and parameters of a head-up display device.

3. The method according to claim 1, wherein the step of processing a VR video data into a basic video set and an enhancement video set in accordance with the user's viewport comprises:

dividing a projection area of the VR video data into a plurality of grid blocks;

determining that among the plurality of grid blocks, which grid blocks constitute a viewport block in accordance with the user's viewport; and

processing the VR video data into the basic video set and the enhancement video set in accordance with the grid blocks constituting the viewport block.

4. The method according to claim 1, wherein the step of processing the VR video data into the basic video set and the enhancement video set in accordance with the grid blocks constituting the viewport block comprises:

obtaining an audio data set and a first frame data set by decoding the VR video data,

obtaining a second frame data set by scaling down the first frame data set losslessly to a target resolution;

obtaining a third frame data set by decreasing a resolution of the first frame data set to a basic resolution and then increasing to a target resolution by using interpolation algorithm;

obtaining a basic video set by combining the audio data set and the second frame data set and segmenting the combination;

encoding and segmenting an enhancement data set obtained by performing a substraction between the second frame data set and the third frame data set to obtain a plurality of video segments in accordance with the plurality of grid blocks; and

assigning some of the plurality of video segments into the enhancement video set in accordance with the grid blocks constituting the viewport block.

5. The method according to claim 3, wherein the step of dividing a projection area of the VR video data into a plurality of grid blocks comprises dividing the projection area into the plurality of grid blocks with equal areas.

6. The method according to claim 1, further comprises:

obtaining a plurality of combinations of resolution and bitrate, wherein each of the plurality of combinations comprises a basic resolution, an enhancement resolution, a basic bitrate and an enhancement bitrate.

the step of processing a VR video data into a basic video set and an enhancement video set comprises:

processing the VR video data into the basic video set and the enhancement video set in accordance with the plurality of combinations of resolution and bitrate.

7. The method according to claim 6, wherein, the step of downloading the basic video segments and the enhancement video segments comprises:

calculating an average download speed;

selecting from the plurality of combinations of resolution and bitrate in accordance with the average download speed; and

downloading corresponding basic video segments and enhancement video segments in accordance with a selected combination of resolution and bitrate.

8. The method according to claim 1, wherein, the step of displaying a sum of two video data obtained by adding the basic video segments and the enhancement video segments comprises:

displaying the sum of two video data respectively in panoramic mode and binocular mode.

9. The method according to claim 8, wherein the step of displaying the sum of two video data in panoramic mode comprises:

building a basic video model and an enhancement video model respectively;

initializing UV coordinates of the basic video model and the enhancement video model;

obtaining basic video segments and enhancement video segments;

obtaining pixel information of the basic video segments and the enhancement video segments by decoding;

generating a basic video texture according to the pixel information of the basic video segments and the UV coordinates of the basic video model, and an enhancement video texture according to the pixel information of the enhancement video segments and the UV coordinates of the enhancement video model;

determining UV alignment coordinates of the enhancement video texture according to a user's viewport;

generating reconstructed pixel information by adding the basic video texture and the enhancement video texture according to UV alignment coordinates; and

drawing an image according to the reconstructed pixel information.

10. The method according to claim 8, wherein the step of displaying the sum of two video data in binocular mode comprises:

obtaining relevant parameters including a camera matrix, a projection matrix, a model matrix and a center position of lens distortion;

creating a three-dimensional model and obtaining an original coordinate data of the three-dimensional model;

obtaining a first coordinate data in accordance with the relevant parameters and the original coordinate data of the three-dimensional model;

performing lens distortion on the first coordinate data based on the center position of lens distortion to obtain a second coordinate data;

rasterizing the second coordinate data to obtain pixel units; and

drawing an image in accordance with a VR video data and the pixel units.

11. A system for providing virtual reality (VR) video transcoding and broadcasting, comprising:

a obtaining module configured to obtaining a user's viewport;

a data transcoding module configured to process a VR video data into a basic video set and an enhancement video set in accordance with the user's viewport, wherein the basic video set comprises a plurality of basic video segments, the enhancement video set comprises a plurality of enhancement video segments, and the playback effect of the sum of the basic video segment and the enhancement video segment is better than that of the basic video segment;

a downloading module configured to download the basic video segments and the enhancement video segments; and

a playing module configured to display a sum of two video data obtained by adding the basic video segments and the enhancement video segments in accordance with the user's viewport.

12. The system according to claim 11, wherein, the user's viewport is relevant to specification and parameters of a head-up display device.

13. The system according to claim 11, wherein the data transcoding module comprises:

a division unit configured to divide a projection area of the VR video data into a plurality of grid blocks;

a cutting unit configured to determining that among the plurality of grid blocks, which grid blocks constitute a viewport block in accordance with the user's viewport; and

a processing unit configured to process the VR video data into the basic video set and the enhancement video set in accordance with the grid blocks constituting the viewport block.

14. The system according to claim 13, further comprises: a mapping table generating unit configured to obtain a plurality of combinations of resolution and bitrate, wherein each of the plurality of combinations comprises a basic resolution, an enhancement resolution, a basic bitrate and an enhancement bitrate.

the processing unit comprises:

processing a VR video data into the basic video set and the enhancement video set in accordance with the plurality of combinations of resolution and bitrate.

15. The system according to claim 14, wherein, the downloading module comprises:

a speed calculation unit configured to calculate an average download speed;

a selection unit configured to select from the plurality of combinations of resolution and bitrate in accordance with the average download speed;

an execution unit configured to download corresponding basic video segments and enhancement video segments in accordance with a selected combination of resolution and bitrate.

16. The system according to claim 11, wherein the playing module display a sum of two video data respectively in two different modes of panoramic mode and binocular mode.