SYSTEMS AND METHODS FOR DEFERRED POST-PROCESSES IN VIDEO ENCODING
Systems and methods for deferring post-process effects in video encoding are disclosed. The systems and methods are capable of calculating the capability of client hardware to defer load, and summing a known load of one or more deferral candidates to evaluate how many post-process deferral candidates are capable of being deferred to client hardware. The systems and methods are also capable of sending an updated deferral list of post-processes to a remote server that can apply the list of deferred post-process candidates during the post-processing phase of a first video frame.
This Application is a continuation of U.S. application Ser. No. 15/958,574 filed Apr. 20, 2018 which claims the benefit of the following U.S. Provisional Applications: No. 62/488,526, filed Apr. 21, 2017, and No. 62/618,498, filed Jan. 17, 2018. The disclosures of each of those applications are incorporated herein.
BACKGROUND OF THE INVENTIONRemote gaming applications, in which a server-side game is controlled by a client-side player, have attempted to encode the video output from a three-dimensional (3D) graphics engine in real-time using existing or customized encoders. However, the interactive nature of video games, particularly the player feedback loop between video output and player input, makes game video streaming much more sensitive to latency than traditional video streaming. Existing video coding methods can trade computational power, and little else, for reductions in encoding time. New methods for integrating the encoding process into the video rendering process can provide significant reductions in encoding time while also reducing computational power, improving the quality of the encoded video, and retaining the original bitstream data format to preserve interoperability of existing hardware devices.
Typical video rendering pipelines are separate and independent from video encoding pipelines, with little crossover between process and expertise in the two domains. As a result, some of the visual effects and post-processes applied in the video rendering pipeline are counterproductive to the video encoding process, leading to video artifacting, increased encoded video size, and longer encoding times. However, these visual effects are still desirable in the resultant decoded video.
By integrating video rendering and video encoding pipelines, post-process effects can be deferred to improve the encoding process. For example, simulated filmic grain introduces randomly-occurring animated grain that is difficult for typical encoders to process without a significant cost to video quality or compression ratio. Some video encoding methods attempt to remove this additional visual noise before encoding, but these methods are offline-only and computationally costly. By disabling this specific post-process in the rendering pipeline, the video automatically becomes easier to encode. The post-process can then be applied after the video is decoded. In the case of filmic grain, compositing the grain over the decoded video is not computationally demanding, can be done in real-time at the decoder, and may improve subjective video quality by disguising other encoding artifacts.
International Patent Application No. WO2016172314 A1 (“the 314 Application”) discloses systems and methods directed to artistic intent based content coding. A coding user interface permits a user to specify an artistic set and to configure treatment of pixels and/or blocks associated with an artistic set, such as a fidelity enhancement, QP adjustment value and/or post-processing. Examples of artistic intent that may be added to video output include when an encoder may remove the film grain from the original signal before encoding and use the film grain SEI to convey to the decoder how to regenerate the film grain and add it back to the video signal before it is displayed. The present invention may be distinguished from the '314 Application at least because the '314 Application does not disclose disabling specific post-processes in the rendering pipeline prior to encoding and then applying those post-processes after the video is decoded. As a consequence, the present invention is an improvement to this computer technology because it offers improved encoding and decoding of video data without a significant cost to video quality or compression ratio. The present invention is also an improvement because it improves the resulting bandwidth, bitrate, encoding time, and is capable of being used in real-time video streaming applications with improved video quality.
U.S. Pat. No. 9,609,330 (“the '330 Patent”) discloses content adaptive entropy coding of modes and reference types data, meaning that a pre-analyzer subsystem of the encoder analyzes content to compute various types of parameters useful for improving video coding efficiency and speed performance. These parameters include horizontal and vertical gradient information (Rs, Cs), variance, spatial complexity per picture, temporal complexity per picture, scene change detection, motion range estimation, gain detection, prediction distance estimation, number of objects estimation, region boundary detection, spatial complexity map computation, focus estimation, and film grain estimation. The parameters generated by the pre-analyzer subsystem can then be consumed by the encoder or be quantized and communicated to the decoder. The present invention may again be distinguished from the technology disclosed in the '330 Patent at least because that technology does not disable specific post-processes in the rendering pipeline prior to encoding and then apply those post-processes after the video is decoded. The present invention is therefore an improvement to the computer technology of the '330 Patent because it offers improved encoding and decoding of video data without significant a cost to video quality or compression ratio and because it is capable of being used in real-time video streaming applications with improved video quality.
U.S. Pat. No. 9,762,911 (“the '911 Patent”), discloses systems and methods for techniques related to content adaptive prediction and entropy coding of motion vectors. The technology disclosed allows for a first video data and second video data to be received for entropy encoding at an entropy encoder module. The first video data and the second video data may be different data types (e.g., header data, morphing parameters, synthesizing parameters, or global maps data or motion vectors or intra-prediction partition data or so on, as is discussed further herein). A first entropy encoding technique may be determined for the first video data based on a parameter associated with the first video data such as, for example, a number of compressed bits of the first video data, a predetermined indicator or flag associated with the first video data, a predetermined threshold, or a heuristically determined threshold or the like. In some examples, the first entropy encoding technique may be chosen from one of an adaptive symbol-run variable length coding technique or an adaptive proxy variable length coding technique. The first video data may be entropy encoded using the first entropy encoding technique and the second video data may be entropy encoded using the first entropy encoding technique. Once more, the present invention is distinguishable at least because the technology disclosed in the '911 Patent does not involve the selective disabling of post-processes in the rendering pipeline prior to encoding and then apply those post-processes after the video is decoded. Once again, the present invention is an improvement to the computer technology of the '911 Patent because it offers improved encoding and decoding of video data without a significant cost to video quality or compression ratio. The present invention is also an improvement because it improves the resulting bitrate, encoding time, and is capable of being used in real-time video streaming applications with improved video quality.
As is apparent from the above discussion of the state of art in this technology, there is a need in the art for an improvement to the present computer technology related to video encoding in game environments.
SUMMARY OF THE INVENTIONIt is therefore an object of the exemplary embodiments disclosed herein to address disadvantages in the art and provide systems and methods for reducing latency and encoding times through techniques in which a server transmits an instruction to a client application to measure client hardware capability and transmits an instruction to a client application to sum a known load of one or more predetermined post-process deferral candidates to evaluate how many post-process deferral candidates are capable of being deferred to client hardware. At the client application, the post-process deferral list is compiled and built in reverse order. The server then receives the post-process deferral list, skips the list of deferred post-processes during the post-processing phase of a first video frame, and transmits an instruction to a client application to render an image.
It is another object of the invention to provide systems and methods for reducing latency and encoding times by having a client application perform a callback or poll to one or more operating system events to determine whether to re-measure the capability of the client hardware.
It is yet another object of the invention to provide systems and methods for reducing latency and encoding times by measuring the capability of the client hardware by detecting available instruction sets, memory, CPU, and/or GPU characteristics.
It is yet another object of the invention to provide systems and methods for reducing latency and encoding times by evaluating how many post-process deferral candidates are capable of being deferred to the client hardware by measuring frame rate and/or resource usage.
A more complete appreciation of the invention and many of the attendant advantages thereof will be readily obtained as the same becomes better understood by reference to the following detailed description when considered in connection with the accompanying drawings, wherein:
In describing the preferred embodiments of the invention illustrated in the drawings, specific terminology will be resorted to for the sake of clarity. However, the invention is not intended to be limited to the specific terms so selected, and it is to be understood that each specific term includes all technical equivalents that operate in a similar manner to accomplish a similar purpose. Several preferred embodiments of the invention are described for illustrative purposes, it being understood that the invention may be embodied in other forms not specifically shown in the drawings.
Post-processing pipelines can perform many complex processes including anti-aliasing, motion blur, depth of field, color grading, bloom, filmic grain, chromatic aberration, vignetting, and tone mapping. Some of these effects are actively detrimental to the encoding processes, increasing encoding times and reducing compression ratios when compared to unprocessed frames. Waiting to apply certain post-processes until after a frame is decoded may increase subjective video quality and provide additional beneficial tradeoffs.
During the development of the client application, the balance between encoding time, bandwidth, and subjective quality should be evaluated for each post-process in the rendering pipeline to determine which post-processes are good candidates for deferral. The list of deferral candidates will be used by the client during runtime to determine which of the post-processes can be deferred onto the client.
Each post-process should be tested to measure its effects on the encoding process. First, a series of reference frames should be fed through the unaltered rendering and encoding pipelines and the encoding time and encoded frame size should be measured. The post-processes should be turned off one at a time in the rendering pipeline and the encoding time and encoded frame size should be compared to the control results. These measurements will help inform which post-processes are good candidates for deferral. Almost any post-process effect that increases image entropy, measured by an increased encoded frame size, would likely be a good candidate to defer. For example, a simulated filmic grain post-process adds random noise over an image leading to lower compression ratios. In certain situations, chromatic aberration and bloom may increase image entropy and result in lower compression ratios. Almost any post-process effect that reduces entropy or image data should not be deferred, as entropy reductions will reduce encoding overhead.
Post-processes that do not change the image entropy may be selected as deferral candidates to achieve secondary goals such as subjective video quality improvement or server load reduction. For example, color grading may not impact encoding time or bandwidth usage but may result in a measurable reduction in server-side computational load when deferred to the client. Similarly, anti-aliasing may improve subjective video quality and dramatically reduce server-side load when deferred. Additional testing should be performed to determine whether it is beneficial to defer entropy-neutral post-processes. For example, a similar testing procedure using reference frames can be used to compare server load before and after deferring an entropy-neutral post-process.
The client application should be able to run the post-processing calculations for each deferral candidate. Some code refactoring may be necessary to move these functions into the client application. Post-processes at the end of the rendering pipeline, such as filmic grain, chromatic aberration, or vignetting, are generally easier to move to the client application than those that occur earlier in the rendering pipeline such as anti-aliasing or depth of field. There may be some cases where deferring a post-process will cause it to be applied in non-linear space when it is typically applied in linear space, such as those post-processes that are applied before tone mapping like chromatic aberration, bloom, or vignetting. The process may be applied directly in gamma space and may not be mathematically correct, but the difference may be imperceptible to the viewer and overall subjective quality may be improved. Otherwise, at the cost of some client-side compute cycles and a loss of image quality, the client application can convert the image back into linear space, apply the post-process, then convert back to gamma space. Converting back to linear space will sacrifice some quality because the image will have been quantized and compressed during encoding. These subjective quality decisions should be made during the development of the client application.
All of the post-processes that the client application can perform form the basis of the deferral candidate list. The list of deferral candidates should be in the same order as they appear in the rendering pipeline in order to preserve any dependencies. Each deferral candidate in the list should also be paired with hardware feature requirements such as memory minimums or GPU requirements.
The remote client computer system 116 is capable of running a client-side codec 118 to decode the transmitted encoded video data 108 and a client application 120 to apply the deferred pixel-quality post-processes. The client computer system 116 also contains a display controller 122 to drive display hardware 124. The input from the client-side input peripherals 126 will be converted by the client application 120 into control data 128 which is transmitted back to the game software 102 running on the server 100. Based on the specific implementation of deferred pixel-quality post-processing, some additional control data 128 may need to flow from the server-side software 102 to the client application 120 to ensure that the correct post-processes are applied for a given video frame.
At step 202, at the video rendering post-processing phase at the graphics engine 104 of the server 100, any post-processes that will be deferred should be skipped. Any number of post-processes may be skipped if the client computer system 116 has the computational power required to apply all of the deferred post-processes.
At step 204, the resulting frame is encoded at the codec 106. Based on the selection of deferred post-processes, the encoding time may be faster and the encoded data may require less bandwidth. For example, if a filmic grain post-process is deferred, the codec 106 will have an easier time encoding the frame without the introduced noise.
At step 206, the encoded video data 108 is stored or transmitted to the remote client computer system 116, as necessary. In a real-time video game streaming application, as in the example from
At step 208, the encoded video is decoded at the codec 118 of the remote client computer system 116. No alterations need to be made to the decoding process.
At step 210, a software application will apply any post-processes that were deferred in the same order that they would appear in the rendering pipeline. This software, shown in
When the client 116 receives the encoded video frame, it is decoded at “DECODING,” step 314. At this point, all deferred post-processes are applied to the decoded frame at “DEFERRED POST-PROCESSING,” step 316. In the case of filmic grain, for instance, an animated effect can be cached ahead of time and composited over the decoded frame with a relatively low computational cost. Real-time solutions already exist for color grading, dithering, and sharpening video and could be applied based on a client's computational power. The resulting video frame is displayed at “DISPLAY,” step 318.
At step 400, the client application 120 will measure the capability of the client hardware to determine which post-processes can be deferred onto the client 116. Client capability can be measured through feature detection for hardware information such as available instruction sets, memory, CPU, or GPU.
At step 402, the client application 120 reads the list of deferral candidates and discards any deferral candidates for which hardware requirements are not met by the client. The deferral candidate post-processes should be benchmarked on the client to measure real-time client performance. The deferral candidates are added one at a time to the benchmarking process until the client is no longer able to maintain desirable performance measured by frame rate, resource usage, or some other live measurement. Benchmarking can be done during client application installation, during the initial load of the client application 120, or on each load of the application.
The client 116 should do the post-processing for as many deferral candidates as possible. The deferral list is built in reverse order to keep the overall order of operations close to the original rendering pipeline order. For example, a mobile device may be capable of running only the last post-processes, a laptop may be capable of running the three post-processes at the end of the rendering pipeline, while a new desktop computer may be capable of running all post-processes in the deferral candidate list.
At step 404, the client 116 sends the list of post-processes to defer to the server 100.
At step 202, the server 100 uses the list of deferred post-processes during the post-processing phase of the first video frame. All post-processes on the deferral list are skipped.
At step 206, the encoded video data stream 108 starts transmitting to the client 116. Since the client 116 sent the deferral list before any frames were generated, no additional metadata needs to be sent from the server 100 to the client 116. The client 116 will automatically know which post-processes have been deferred.
At step 210, the client 116 applies all of the post-processes in the deferral list. The client 116 will continue to apply the deferred post-processes on future frames.
There may be scenarios in which the client capabilities change during runtime, requiring the list of deferred post-processes to change. For example, if the client application is running on a mobile device that just entered battery-saving mode, the client may wish to shrink the list of deferred post-processes. In this example, the client application would need to register a callback or poll the operating system (“OS”) events in order to listen for changes in battery state. At step 412, the client application 120 responds to a recent environment change by re-measuring the client capability. An environmental change may be any change that affects the hardware performance of the remote client computer system 116. For the above example, the battery saving mode will reduce the clock rate by a multiplier which can be fetched directly. The change in clock rate multiplier may provide a rough estimate for the change in client capability. Otherwise, an additional benchmark in the battery-saving mode may be added to the benchmark phase described at step 402.
If the client capability has changed, the client application 120 will reevaluate which post-processes can be deferred at step 414. In the battery-saving mode example, the deferral list may shrink in proportion to the change of the clock rate multiplier. For example, if battery-saving mode reduces the clock rate by 50%, the deferral list will shrink by at least half. If the client 116 was deferring four post-processes, it will shrink the list down to two post-processes. Otherwise, if a battery-saving mode benchmark has previously been performed, the deferral list will already be known.
If the deferral list is changed, the client 116 will send the altered deferral list to the server 100 at step 416. The client 116 will continue to apply the post-processes from original deferral list until it receives a message from the server 100, which is preferably comprised of a different list of deferred post-processes.
At step 418, the server 100 applies the altered deferral list on the next available frame. In order to sync with the client 116, some metadata is applied to this frame.
At step 420, the frame of encoded video data 108 is sent with its corresponding metadata.
The client waits until it receives the metadata flag. At step 422, the client 116 begins processing frames according to the altered deferral list. The client 116 will continue to apply the deferred post-processes according to the altered deferral list. If the runtime environment changes again, the deferral list can grow or shrink again, starting from step 412. If the deferral list shrinks due to a temporary runtime environment change, such as the battery saving mode on a mobile device, the client application 120 should grow the deferral list at the earliest opportunity such that the maximum possible post-processes are deferred at any given time.
Example 1: Benchmark Test ResultsFilmic grain introduces randomly-occurring visual noise that causes a significant impact on the encoder's compression ratio. Applying post-processes such as filmic grain on the client side will result in smaller encoded frame sizes.
Experimental bitrate values were taken while the graphics engine was producing output at a resolution of 1280×720 at 60 frames per second and averaged over 60 frames to find an average bitrate. The measured values compare the bitrate of a video stream where filmic grain is applied at the server-side to the bitrate of a video stream where filmic grain is deferred to the client. These measurements are repeated for three different sizes of filmic grain and for two encoder quality settings values. Filmic grain 1 represents the smallest grain size while filmic grain 3 represents the largest grain size. The experimental results are reproduced in Tables 1 and Table 2 below. Table 1 shows the results using an encoder quality of 16, while Table 2 shows the results using an encoder quality of 20.
Based on experimental results, it is apparent that post processes such as filmic grain result in larger encoded frame sizes, which is undesirable. These negative effects are more apparent at higher encoder quality values and become even more apparent as the amount of introduced noise increases. However, by deferring filmic grain to the client, dramatic reductions in bitrate can be achieved, as shown in Tables 1 and 2, where the bitrate is reduced to 270 Kbyte/s and 140 Kbyte/s, respectively. Regardless of the amount of introduced noise, as measured by the size of the filmic grain in these experiments, the bitrate remains stable for a given encoder quality.
Similarly, as shown in Table 3 below, experimental encoding times were measured while the graphics engine was producing output at a resolution of 1280×720 at 60 frames for several encoder quality settings. The measured values compare the encoding times for a video stream where filmic grain is applied at the server-side to the encoding time of a video stream where filmic grain is deferred to the client. The size of the filmic grain remains fixed across all measurements. As is apparent from Table 3, the reductions in encoding times applying the techniques discussed herein are more apparent at higher encoder quality settings.
The foregoing description and drawings should be considered as illustrative only of the principles of the invention. The invention is not intended to be limited by the preferred embodiment and may be implemented in a variety of ways that will be clear to one of ordinary skill in the art. Numerous applications of the invention will readily occur to those skilled in the art. Therefore, it is not desired to limit the invention to the specific examples disclosed or the exact construction and operation shown and described. Rather, all suitable modifications and equivalents may be resorted to, falling within the scope of the invention.
Claims
1. A computer-implemented method for deferring post-processes comprising the steps of:
- transmitting an instruction to a client application to measure client hardware capability; and
- transmitting an instruction to a client application to sum a known load of one or more predetermined post-process deferral candidates to evaluate how many post-process deferral candidates are capable of being deferred to client hardware,
- wherein a post-process deferral list is compiled in reverse order, and
- wherein a server receives the post-process deferral list, skips the list of deferred post-processes during the post-processing phase of a first video frame, and transmits an instruction to a client application to render an image.
2. The computer-implemented method of claim 1, wherein the server applies the updated deferral list to a next available video frame.
3. The computer-implemented method of claim 1, wherein the server returns the first or next available video frame to the client application with a metadata flag.
4. The computer-implemented method of claim 1, further comprising the step of the server transmitting encoded video data to the client application without metadata associated with post-processes.
5. The computer-implemented method of claim 1, wherein the list of post-process deferral candidates is recalculated based on changes to the client hardware's battery state.
6. A system for deferring post-processes, wherein over a network, a server:
- transmits an instruction to a client application to measure client hardware capability; and
- transmits an instruction to a client application to sum a known load of one or more predetermined post-process deferral candidates to evaluate how many post-process deferral candidates are capable of being deferred to client hardware, wherein a post-process deferral list is compiled in reverse order, and
- wherein a server receives the post-process deferral list, skips the list of deferred post-processes during the post-processing phase of a first video frame, and transmits an instruction to a client application to render an image.
7. The system of claim 6, wherein the server applies the updated deferral list to a next available video frame.
8. The system of claim 6, wherein the server returns the first or next available video frame to the client application with a metadata flag.
9. The system of claim 6, further comprising the step of the server transmitting encoded video data to the client application without metadata associated with post-processes.
10. The system of claim 6, wherein the list of post-process deferral candidates is recalculated based on changes to the client hardware's battery state.
11. A computer-implemented method for deferring post-processes comprising the steps of:
- identifying one or more post-processes for deferral;
- skipping the one or more identified post-processes;
- encoding one or more frames at a codec into one or more encoded videos, wherein the skipped post-processes are not used in encoding; and
- transmitting the one or more encoded videos to a client computer, wherein the client applies the skipped post-processes to the encoded video prior to output.
12. The computer-implemented method of claim 11, wherein skipping of post-processes occurs at a graphics engine.
13. The computer-implemented method of claim 11, wherein at least one of the encoded videos is stored at a server prior to transmission to a client.
14. The computer-implemented method of claim 13, wherein the encoded video is stored on a server for on-demand streaming or on physical media.
15. The computer-implemented method of claim 11, wherein identification of post-processes for deferral is determined based on the computation power of the client.
16. The computer-implemented method of claim 11, wherein the encoded video is for video game streaming.
17. The computer-implemented method of claim 11, wherein the server receives control data from the client.
18. A system for deferring post-processes, wherein over a network, a server
- identifies one or more post-processes for deferral;
- skips the one or more identified post-processes;
- encodes one or more frames at a codec into one or more encoded videos, wherein the skipped post-processes are not used in encoding; and
- transmits the one or more encoded videos to a client computer, wherein at least one of the encoded videos is stored at the server prior to transmission to a client.
19. The system of claim 18, wherein skipping of post-processes occurs at a graphics engine.
20. The system of claim 18, wherein at least one of the encoded videos is stored at the server prior to transmission to a client.
21. The system of claim 20, wherein the encoded video is stored on the server for on-demand streaming or on physical media.
22. The system of claim 18, wherein identification of post-processes for deferral is determined based on the computation power of the client.
23. The system method of claim 18, wherein the encoded video is for video game streaming.
24. The system of claim 18, wherein the server receives control data from the client.
Type: Application
Filed: Mar 15, 2019
Publication Date: Jul 11, 2019
Patent Grant number: 10841591
Inventor: Michael Kopietz (Frankfurt)
Application Number: 16/355,200