Predictive bitrate selection for 360 video streaming
Predictive pre-fetching of streams for 360 degree video is described. User view orientation metadata is obtained for a 360 degree video stream that includes data for a plurality of viewports. Data corresponding to one or more high-resolution frames for a particular one of the viewports is pre-fetched based on the user view orientation metadata and those frames are displayed. The high resolution frames are characterized by a higher resolution than for remaining viewports.
Latest Sony Interactive Entertainment Patents:
- User sentiment detection to identify user impairment during game play providing for automatic generation or modification of in-game effects
- Initial setting method for information processing device, information processing device, and terminal device
- Foot structure of legged mobile robot, and legged mobile robot
- Providing a UI input generation interface for drafting text descriptions used by an AI image generation model
- IMAGE PROCESSING DEVICE AND IMAGE PROCESSING METHOD
Aspects of the present disclosure are related to video streaming. In particular, the present disclosure is related to streaming of 360 degree video.
BACKGROUND OF THE INVENTION360 degree video is created by taking video streams from several cameras arranged around a single point and stitching the video streams together to create a single continuous video image. Modern encoders break the continuous video image in to multiple video streams of frames. To watch 360 degree video over a network, a server sends a client these multiple streams of frames. The client decodes and reassembles the streams into continuous images that are presented on a display.
A system may send a single request for frames, download the requested frames and then assemble them for display. This combination of actions is sometimes called a fetch action. Generally to reliably stream video without interruption a client must also pre-fetch video which means the system must download frames and process them before prior downloaded frames have been displayed. In this way the system builds a buffer of processed frames between the processed frames that are being displayed and subsequent frames that need to be downloaded and processed.
Buffering can be very costly on system resources especially when processing and storing high resolution video. To save bandwidth and reduce the amount of buffering required a client may request only a high resolution video stream frames that are within the field of view of the client also known as the viewport. In this case the client receives low resolution video streams for all but the current view of the client. A problem with this system is that a client is often able to move the field of view faster than a high quality stream can be requested, delivered and buffered. Thus, there is a need in the art, for a system that allows the client to predict where the field of view might be pointed in a 360 degree video stream and fetch the corresponding high resolution video stream before the field of view has moved.
The teachings of the present disclosure can be readily understood by considering the following detailed description in conjunction with the accompanying drawings, in which:
The disadvantages associated with the prior art are overcome by aspects of the present disclosure relating to a method for pre-fetching 360 degree video comprising, obtaining a user view orientation metadata for a 360 degree video stream, pre-fetching frames determined by the user view orientation metadata, and displaying a higher resolution frame of the 360 degree video stream according to the user view orientation metadata.
DESCRIPTION OF THE SPECIFIC EMBODIMENTSAlthough the following detailed description contains many specific details for the purposes of illustration, anyone of ordinary skill in the art will appreciate that many variations and alterations to the following details are within the scope of the invention. Accordingly, the exemplary embodiments of the invention described below are set forth without any loss of generality to, and without imposing limitations upon, the claimed invention.
INTRODUCTIONTypically streaming 360 degree video over a network involves receiving a set of video streams of all one quality. Newer streaming techniques allow the reduction in bandwidth usage by only loading high quality streams in the area the viewer is focused. This technique has the additional effect of allowing the viewer to load higher resolution video streams without requiring as much time or buffer resources.
While the disclosed technique allows a viewer to watch higher quality video streams, a jarring drop in resolution may be experienced if the user suddenly moves the viewport away from the high quality stream. Aspects of the present disclosure have been developed to eliminate this jarring experience.
In other instances an author of a 360 degree video may have some artist vision as to what the viewer should see in a certain scene of a 360 degree video. According to prior art methods such details in a scene displayed to a viewer might be lost due to display of low resolution videos or while the view looks in another direction. As such aspects of present disclosure have been developed to improve perceived 360 degree video stream quality and to allow authors to define a viewport and video quality for a viewer.
Author Initiated Pre-Fetching
Within 360 degree video there may be many orientations that camera can take to view the image as can be seen in
Authors of video and images typically have some idea of what they want the viewer of the content to see. Creators of 360 degree video are no different. As discussed above, 360 degree video displayed in the prior art was only one resolution. At lower resolutions important aspects of a video image might be lost to a viewer.
According to aspects of the present disclosure an Author may define the location of a high resolution frame for the client to load 202. The Author may also define a metadata 209 within the video streams which the client can use to predictively to load high resolution video content for streams that correspond to portions of 360 degree video the user is likely to view. By way of example, and not by way of limitation, the metadata 209 may be in the form of a vector having both a magnitude, representing the significance, and a direction. In some implementation, time may be encoded beside the metadata 209 or the vector could be placed in a stream of fixed time-step intervals. The defined also be referred to as user view orientation metadata.
In some implementations, the metadata may be generated on a backend or server-side with no explicit sending of view orientation information by the client. By way of example and not by way of limitation, a server may build a probability field based on which streams the client requests and when, then mapping which stream belongs to which viewport. This assumes the client will select the highest quality streams for the current client viewport.
The metadata 209 may be associated with a potential for a movement of the viewport 201 within the video streams by a user. Alternatively metadata 209 in the form of an Author defined pre-fetch vector may be an idealized movement vector for the viewer according to the Author's artistic vision. The client device may track both the actual movement of the viewport 210 and the metadata 209. The client device may pre-fetch frames in the actual view port 201 and frames along the Author defined prefetching vector 202, 203. Alternatively the client may only fetch high resolution frames 202, 203 along an Author defined vector to encourage the viewer to move the viewport to a different location in the 360 degree stream.
The Author-defined prefetching metadata need not be a vector. The Author may simply define frames they desire to be in high resolution during certain times of display 202. Thus the client may fetch the Author defined high resolution frames at certain times as defined by the author.
An author may also define a certain area of a frame as high-resolution for a zoom function. The Author may provide level of detail information for a subsection of the frame such that a certain subsection of a frame is encoded as high resolution. Metadata informing the client of the location of this high resolution subsection could be sent to the client so that it may pre-fetch that stream.
The metadata 209, may also be used to control viewport 201, during display of video. The Author may choose to have the Viewport 201 move along the metadata 209 without viewer input. In this way virtual cameraman functionality can be achieved and the Author can better display the artistic vision in a 360 degree display.
Production Defined Pre-Fetching
During the addition of effects to the frames of a video stream, also known as Production, it may be desirable for the client to pre-fetch high resolution frames to match production effects. For example and without limitation, it may be desirable for the client to pre-fetch high resolution frames in the apparent direction of a loud sound. Alternatively during production it may be desirable for the client to pre-fetch frames in location where there is a lot of special effects or special camera movement.
According to aspects of the present disclosure the client may receive metadata that causes the client to pre-fetch certain frames as defined during production. These defined frames may correspond more to special effects and sound cues than artistic vision as discussed above. The Production defined pre-fetch metadata may also be referred to as user orientation metadata.
Predictive Prefetching
According to alternate aspects of the disclosure a client may use predictive metadata to pre-fetch streams determined to be in the potential future viewport as seen in
A studio may use screening data collected from viewers of a 360 degree video to generate a probabilistic model of where a viewer may be looking in the video at any time. This probabilistic model may define a user's likelihood to move from the current frame 201 to another 202 or stay in the current frame 201 based on such variables as, current view orientation in the 360 degree video, time code within video, past views of the video and the like. The probability of changing frames to may be represented by probabilities tied to each frame in the 360 degree video currently displayed, this is represented by the percentages in each frame 201-208 seen in
Alternatively the predictive prefetching metadata may be negative or inverse/opposite data. In other words, instead of a probability of where a user is likely to look, the prefetching metadata may instead represent one or more probabilities of where the viewer is not likely to look.
The pre-fetching of video frames is not limited to high and low resolution streams. As seen in
To determine whether to pre-fetch a stream using this probabilistic metadata, the client may have a defined threshold probability level. When the probabilistic metadata determines that the probability a viewer will move the viewport into a certain frame exceeds the threshold probability level; the client will pre-fetch that frame. In an alternate embodiment the client may pre-fetch frames based on the fidelity of probabilistic metadata.
Predictive Fidelity Checking
The client may perform continuous or intermittent checking of the fidelity of the probabilistic and Author defined metadata. The client may initially pre-fetch high resolution frames based on probabilistic metadata and the actual orientation of the viewport 301. The client may then display high resolution frames in accordance with the orientation of the viewport and the metadata. The client may check to determine whether the viewer has moved the viewport to the high resolution frame in accordance with the probabilistic or author defined metadata.
Upon a determination that the viewport is not within the frame pre-fetched according to the probabilistic metadata or author defined pre-fetch vectors, the client may cease to use the metadata for pre-fetching and only fetch high resolution frames in the current field of view of the viewport. In an alternate embodiment the client may continuously check the viewport movement with the metadata for a correlation. The client may have a tolerance level for viewport movements that do not follow the probabilistic metadata. This tolerance level may for example but not by limitation be, a ratio of predicted frames missed over predicted frames viewed, in that instance as the ratio grows to 1 the client may shift to fetching only frames within the actual viewport. More generally, the tolerance level may be determined by statistically quantifying the amount of variation in a set of values. Any suitable measure of variability, e.g., standard deviation may be applied over a fixed or sliding window set of user and meta data.
Post-Production Pre-Fetch Generation
Another aspect of the present disclosure is the generation of predictive metadata based on end-viewer data. The client device may collect data from the viewer as to the orientation of the viewport. The client device may use this viewer data to generate predictive data and pre-fetch a video stream according the predictive data. The client device may also share the predictive data with other clients or a server to generate or refine probabilistic metadata for the video stream.
The client device may use viewport orientation data to generate predictive data. For example and not by way of limitation the client device may use a movement vector 305 of the viewport to predictively fetch high resolution video streams lying on the vector and taking into account movement speed. User generated predictive prefetching data may be referred to as user orientation metadata.
The client may send the viewport orientation data to a data collection center to generate better probabilistic metadata for future viewing. The data collection center may use the viewport orientation data in addition to the Author defined pre-fetching data to generate metadata for the video stream. The generated metadata may weight the Author defined prefetching data such that prefetching of author defined streams are preferred unless the client is following a past user defined viewing vector closely.
Server-Side Prefetching
In an alternate embodiment of the present disclosure the server uses the metadata to select high resolution streams to send to the client device. According to aspects of the present embodiment the metadata used by the server may be author defined, probabilistic data generated from viewer, or other predictive data. The server may receive a request for a 360 degree video stream from a client device. The server may the check for viewer orientation metadata for the requested stream. Upon finding such metadata, the server may send high resolution streams to the client according to the metadata. The server may also receive actual viewer orientation data and send high resolution streams in view of the actual viewport. Furthermore the server may perform predictive fidelity checking as discussed above to determine whether it should continue to send high resolution video streams based on the metadata.
Implementation
The computing device 400 may include one or more processor units 403, which may be configured according to well-known architectures, such as, e.g., single-core, dual-core, quad-core, multi-core, processor-coprocessor, cell processor, and the like. The computing device may also include one or more memory units 404 (e.g., random access memory (RAM), dynamic random access memory (DRAM), read-only memory (ROM), and the like).
The processor unit 403 may execute one or more programs, portions of which may be stored in the memory 404 and the processor 403 may be operatively coupled to the memory, e.g., by accessing the memory via a data bus 405. The programs may be configured to request a frames for a video stream based on metadata received 410 for that video stream. The programs when executed by the processor may cause the system to decode high-resolution frames 408 and store frames potentially in the viewport of the viewer in a buffer 409.
The computing device 400 may also include well-known support circuits, such as input/output (I/O) 407, circuits, power supplies (P/S) 411, a clock (CLK) 412, and cache 413, which may communicate with other components of the system, e.g., via the bus 405. The computing device may include a network interface 414. The processor unit 403 and network interface 414 may be configured to implement a local area network (LAN) or personal area network (PAN), via a suitable network protocol, e.g., Bluetooth, for a PAN. The computing device may optionally include a mass storage device 415 such as a disk drive, CD-ROM drive, tape drive, flash memory, or the like, and the mass storage device may store programs and/or data. The computing device may also include a user interface 416 to facilitate interaction between the system and a user. The user interface may include a keyboard, mouse, light pen, game control pad, touch interface, or other device. In some implementations, the user may use the interface 416 to change the viewport, e.g., by scrolling with a mouse or manipulation of a joystick. In some implementations the display 401 may be a hand-held display, as in a smart phone, tablet computer, or portable game device. In such implementations, user interface 416 may include an accelerometer that is part of the display 401. In such implementations, the processor 403 can be configured, e.g., through suitable programming, to detect changes in the orientation of the display and use this information to determine the viewport. The user can therefore change the viewport simply by moving the display.
In some implementations, the predictive metadata may be configured to implement a Markov chain.
To implement author defined pre-fetch metadata as described above with the Markov chain model the system may weight the probability of transition to states lying along an Author defined pre-fetch vector. Thus if the probability of movement plus the weight surpasses a threshold then the client will initiate prefetching of that state.
According to aspects of the present disclosure the client may receive the viewer orientation metadata for a requested video stream independently of the video stream. If the metadata is received independently of the video stream it must be time-synched to the stream. In alternate aspects of the current disclosure the metadata may be received as part the video stream itself, by way of example and not by way of limitation, in the header of the 360 degree video stream
While the above is a complete description of the preferred embodiments of the present invention, it is possible to use various alternatives, modifications, and equivalents. Therefore, the scope of the present invention should be determined not with reference to the above description but should, instead, be determined with reference to the appended claims, along with their full scope of equivalents. Any feature, whether preferred or not, may be combined with any other feature, whether preferred or not. In the claims that follow, the indefinite article “A” or “An” refers to a quantity of one or more of the item following the article, except where expressly stated otherwise. The appended claims are not to be interpreted as including means-plus-function limitations, unless such a limitation is explicitly recited in a given claim using the phrase “means for”. Any element in a claim that does not explicitly state “means for” performing a specified function, is not to be interpreted as a “means” or “step” clause as specified in 35 USC § 112, ¶6.
Claims
1. A method comprising:
- a) obtaining view orientation metadata for a 360 degree video stream that includes data for a plurality of viewports;
- b) pre-fetching data corresponding to one or more high-resolution frames for a particular viewport of the plurality of viewports determined by the user view orientation metadata;
- c) displaying the one or more high-resolution frames, wherein the one or more high resolution frames are characterized by a higher resolution than for remaining viewports of the plurality of viewports
- d) checking the user view orientation metadata against an actual user view orientation; and disabling pre-fetching when the user view orientation metadata does not coincide with the actual user view orientation during display.
2. The method of claim 1 wherein the user view orientation metadata is an Author-defined pre-fetch vector.
3. The method of claim 1 wherein the user view orientation metadata is a pre-fetch vector created to match production effects in a scene of the 360 degree video stream.
4. The method of claim 1 wherein the view orientation metadata is a predictive probability of a viewport frame change.
5. The method of claim 4 wherein the predictive probability is generated by a focus group.
6. The method of claim 4 wherein the predictive probability is generated using user generated frame location data.
7. The method of claim 4 wherein the predictive probability also includes an Author defined pre-fetch vector.
8. The method of claim 7 wherein the Author defined pre-fetch vector is a weight applied to the probability of a viewport frame change.
9. The method of claim 4 wherein said pre-fetching frames includes applying a first threshold to the user view orientation metadata to determine the particular viewport of the plurality of viewports.
10. The method of claim 4 wherein the predictive probability of a viewport frame change is a probability for a Markov model.
11. The method of claim 1 wherein the user view orientation metadata is a movement vector.
12. The method of claim 9 wherein b) includes prefetching intermediate resolution frames when the predictive probability exceeds a second threshold.
13. The method of claim 1, wherein the user view orientation metadata represents a probability of where a user is likely to look.
14. The method of claim 1, wherein the user view orientation metadata represents a probability of where a user is not likely to look.
15. The method of claim 13 wherein disabling pre-fetching when the user view orientation metadata does not coincide with the actual user view orientation during display includes applying a missed frame threshold to determine whether to disable prefetching.
16. The method of claim 1 wherein prefetching data corresponding to one or more high-resolution frames determined by the user view orientation metadata includes pre-fetching a frame that has a higher resolution section that the rest of the frame.
17. A non-transient computer readable medium containing program instructions for causing a computer to perform the method of:
- a) obtaining a user view orientation metadata for a 360 degree video stream that includes data for a plurality of viewports;
- b) pre-fetching data corresponding to one or more high-resolution frames for a particular viewport of the plurality of viewports determined by the user view orientation metadata;
- c) displaying the one or more high-resolution frames, wherein the one or more high resolution frames are characterized by a higher resolution than for remaining viewports of the plurality of viewports
- d) checking the user view orientation metadata against an actual user view orientation; and disabling pre-fetching when the user view orientation metadata does not coincide with the actual user view orientation during display.
18. A system, comprising:
- a processor;
- a display coupled to the processor;
- a memory coupled to the processor having processor-executable instructions embodied therein, the instructions being configured to implement a method upon execution by the processor, the method comprising:
- a) obtaining a user view orientation metadata for a 360 degree video stream that includes data for a plurality of viewports;
- b) pre-fetching data corresponding to one or more high-resolution frames for a particular viewport of the plurality of viewports determined by the user view orientation metadata;
- c) displaying the one or more high-resolution frames with the display, wherein the one or more high resolution frames are characterized by a higher resolution than for remaining viewports of the plurality of viewports
- d) checking the user view orientation metadata against an actual user view orientation; and disabling pre-fetching when the user view orientation metadata does not coincide with the actual user view orientation during display.
19. The system of claim 18 wherein the display is a 360 degree display configured to display the plurality of viewports simultaneously.
8599266 | December 3, 2013 | Trivedi |
20050128286 | June 16, 2005 | Richards |
20060187305 | August 24, 2006 | Trivedi |
20100032348 | February 11, 2010 | Duyvesteyn |
20100302348 | December 2, 2010 | Richards |
20110103199 | May 5, 2011 | Winkler |
20120059826 | March 8, 2012 | Mate et al. |
20150015789 | January 15, 2015 | Guntur |
20150124048 | May 7, 2015 | King |
20170045941 | February 16, 2017 | Tokubo et al. |
20170054800 | February 23, 2017 | DiVincenzo et al. |
20170084086 | March 23, 2017 | Pio et al. |
20170103577 | April 13, 2017 | Mendhekar |
20180295205 | October 11, 2018 | Beran |
- International Search Report and Written Opinion dated Jun. 11, 2018 for International Patent Application No. PCT/US18/22259.
Type: Grant
Filed: Apr 6, 2017
Date of Patent: Aug 20, 2019
Patent Publication Number: 20180295205
Assignee: Sony Interactive Entertainment (Tokyo)
Inventors: Erik Beran (Belmont, CA), Todd Tokubo (Newark, CA)
Primary Examiner: Khanh Q Dinh
Application Number: 15/481,324
International Classification: G06F 15/16 (20060101); H04L 29/08 (20060101); H04L 29/06 (20060101);