Variable bit-rate encoding
A process encodes video or image data. The process includes estimating forms of a plurality of functions and estimating a best quality value for producing encoded frames with sizes that satisfy one or more constraints. Each function relates encoded size to encoded quality for an associated frame. Each frame has data for one or more images. The constraints are associated with transmission line bandwidth or receiver buffer size. The estimating a best quality value is based in part on the functions. The process also includes transmitting frames encoded with approximately the best quality value to a transmission line.
Latest Autodesk, Inc. Patents:
- Techniques for trial-and-error learning in complex application environments
- Constraint-oriented programming approach to mechanical assembly design
- Branch objects for dependent optimization problems
- Seamless three-dimensional design collaboration
- Graph alignment techniques for dimensioning drawings automatically
The invention relates to video data encoders and decoders.
BACKGROUNDData encoding enables digital images and videos to be compressed so that less data is needed to transmit the images or videos. To increase the amount of compression, lossy encoding can be used. Loosy encoding provides more compression at a price of some loss of image or video quality. The quality may be defined by a sum of root-mean-squared differences between relative illuminations of the original image and the image generated from encoded data, i.e., summed over the pixels entire image.
Some “lossy” encoders have control input terminals for setting either the amount of compression or the quality of the image obtainable from the encoded output data. Selecting a higher compression generally results in lower image or video quality. A reduction in image or video quality may also result from the transmission of the encoded data.
SUMMARYIn a first aspect, the invention features a process for encoding video or image data. The process includes estimating forms of a plurality of functions and estimating a best quality value for producing encoded frames with sizes that satisfy one or more constraints. The best quality value might assign the same quality to each frame, or may assign different qualities to each frame, in order to produce a best overall quality for a sequence of frames. Each function relates encoded size to encoded quality for an associated frame. Each frame has data for an image. The constraints are associated with such considerations as transmission line bandwidth, receiver buffer size and/or total size for the entire frame sequence. The estimating a best quality value is based in part on the functions. The process also includes transmitting a sequence of frames where the quality values for at least some of the frames have been determined based on the estimated best quality value.
In some embodiments, the process also includes encoding at least some of the frames of the sequence with the best quality value in response to estimating the best quality value.
In some embodiments, each act of estimating one of the forms also includes computing a plurality of pairs of encoded quality and encoded size values for each frame of the sequence from encoded frame data and determining a functional relationship between values of the encoded quality and the encoded size for the plurality of pairs of values. The act of computing may further include encoding each frame of the sequence with a plurality of qualities to compute encoded data sizes associated with each of the plurality of qualities. The acts of encoding a frame with the plurality of qualities may be performed in parallel.
In some embodiments, the estimating a best quality value further includes selecting an encoded quality of one of the plurality of frames and deciding whether the encoded size associated with the encoded quality satisfies a constraint based on transmission bandwidth, receiver buffering, or receiver prebuffering.
In some embodiments, the process may determine the encoded size associated with each encoded image quality from the functional relation between the encoded quality and the encoded size for the associated frame.
In a second aspect, the invention features a system for encoding image frames. The system includes a variable bit-rate encoder and a controller connected to receive data on sizes of image frames encoded by the encoder. The controller controls quality of the encoded frames produced by the encoder. The controller is capable of causing the encoder to generate encoded data at a rate responsive to one of a bandwidth of a transmission line and space in a receiver buffer.
In some embodiments, the controller is configured to determine a relation between quality of an encoded image frame and amount of encoded data from the received size data.
In some embodiments, the controller is configured to determine a best quality value for encoding an image frame from size data on data frames encoded with different qualities.
In a third aspect, the invention features a program storage media storing a program of computer executable instructions that cause a computer to perform one or more of the above-described processes.
Other advantages and features of the invention will be apparent from the following description of an embodiment thereof and from the claims.
The encoded data frames are compressed with respect to data from the source 12. The encoded data frames have either a standard format, e.g., Motion Picture Experts Group (MPEG) or the Joint Photographic Experts Group (JPEG) formats, or a proprietary format, e.g., Sorenson compression format.
For each image frame, the encoder 16, permits control of the quality, Q, of the encoded data frame produced through control signals sent to terminal 22. Nevertheless, for a selected quality, Q, the encoder 16 may produce different amounts, S, of encoded data for each image frame. The frame-to-frame differences in the amounts, S, of produced encoded data are related to differences in image content. For image frames having more detail and/or associated with more object motion, the encoder 16 generally generates larger encoded data frames.
The feedback data from the encoder 16 indicates the amount, S, of encodes data produced for an associated image frame. The analyzer 20 uses the feedback data, i.e., values of S, to select the image quality, Q, produced by the encoder 16 on a frame-by-frame basis.
The analyzer 20 controls the encoder 16 by sending control signals to control terminal 22. The control signals determine whether encoded data frames are sent to the line 18 or simply used to generate feedback data. The control signals also set the quality, Q, of individual encoded frames from the encoder 16. The control signals set the quality, Q, of the encoder 16, to produce a “best quality sequence” for a sequence of frames in a scene, sometimes referred to as a clip. This might include, for example, the majority of frames in the sequence having a “best quality value,” QB, (discussed in detail below) and higher quality values for the beginning and ending frames of a sequence. Higher qualities for the beginning and ending frame of a sequence can be useful, because these frames may be seen in still form during editing, and the beginning frames may fall after a cut in the movie and have greater visual impact. The best quality value, QB, is a value that is typically determined for an entire video program, and it is considered the best quality value for the series of frames in light of defined constraints, e.g., bandwidth constraints on transmission line 18, buffering and prebuffering constraints for receiver 24, and/or constraints on total size for the compressed program.
Referring to
To avoid gaps and irregularities, an encoder may be controlled to produce encoded data frames at a constant rate. But, producing encoded data at a constant rate leads to fluctuations in image quality, because encoding algorithms produce different amounts of encoded data for each image frame. Fluctuations in image quality can disturb viewers. To avoid such fluctuations, an encoder can be operated so that image quality for the majority of frames remains constant in light of constraints based on transmission bandwidth and receiver buffering and prebuffering. As noted above, it may be desirable to have the beginning and ending frames encoded at a higher quality than other frames, and there may be other reasons to make variations from a uniform quality value
For a selected sequence of image frames, the process 50 determines a separate function, S(Q), relating quality, Q, to amount, S, of encoded data for each image frame (step 52). The process 50 determines S(Q) for each image frames of the sequence using a set of data points {(Q,S)} associated to the frame and an interpolation process. Based on the function S(Q), the process 50 iteratively performs a binary search (or other subdividing search mechanism) to find a best quality value, QB1, for which the sequence of frames satisfies the user-defined constraints, e.g., transmission line constraints, receiver constraints and/or total size constraints (step 54). The user-defined constraints impose limits on the amount, S, of encoded data per frame and are associated with the bandwidth of the transmission line, buffering and prebuffering characteristics of the target receiver, and/or total size of compressed data. As described below, other search techniques can be used in place of a binary search.
The process 50 encodes each frame of the entire sequence with the value of the best quality value, QB1 found through the binary search (step 56). The process 50 checks whether each encoded data frame actually satisfied the constraints of the transmission line and target receiver (step 58). The check includes calculating the actual amount, S, of data produced for each frame when encoded with the quality QB1. To calculate the amount of data, the process 50 actually encodes each frame at the quality QB1 and determines how much data is produced. If each encoded data frame actually satisfies the constraints, the process 50 can then actually transmit the sequence of encoded image frames to the transmission line and receiver (step 60).
If one or more of the encoded frames does not satisfy the constraints, the process 50 adds the calculated (QB1,S) to the set {(Q,S)} of data points previously used to define the function S(Q) for each frame (step 62). Then, the process 50 loops back 64 to determine new S(Q)'s, i.e., based on the new set {(Q,S)} and to find a new best quality value, QB2. Since the new point (QB,S) is generally closer to the “actual” best quality value, QB, the new functions S(Q)'s ordinarily produce a better approximation to the actual QB. Thus, performing the loop 64 once or twice usually generates a quality satisfying the constraints from the transmission line and receiver buffer.
For each selected quality Q, the amount, S, of encoded data varies from frame to frame, because the content of each image changes in a frame-to-frame fashion. The content changes may include differences in motion and image detail. The differences in the content of each image change the amount of data, S, need to encode the image to a selected image quality.
The set of Q values covers the range between the upper and lower bound to indicate the behavior of S(Q). The process 70 uses the measured (Q,S) pairs to estimate the form of the function S(Q) (step 76). For example, the process 70 may use a Catmull-Rom curve fitting algorithm to estimate the form of S(Q). Generally, S(Q) is a monotonic function of Q, i.e., higher selected qualities require more encoded data for the same image frame.
The process 70 also performs a binary search (or other search) for the best quality value, QB, i.e., step 54 of
The model simulates constraints on amounts of encoded data, S, which are imposed by the transmission line and the receiver's input buffer and/or the total size constraint. The transmission line has a bandwidth that limits the rate at which encoded data can be transmitted to the receiver without loss, i.e., limiting the amount of data produced by encoding each image frame. The size of the receiver buffer imposes both buffering and prebuffering constraints. The buffering constraint limits the amount of encoded data that can be accepted by the receiver without loss resulting from insufficient buffer space to store incoming data prior to decompression and play,i.e., also limiting the amount of encoded data per image frame. The prebuffering constraint limits the amount of data that may be sent before the first group of pictures (GOP) is removed from the receiving buffer and displayed. The viewer at the receiving end perceives this as a delay before display begins, so prebuffering must be limited to a small amount of time.
For simple constraints, the algorithm operates by directly determining, via a closed-form equation, whether the SM values satisfy the constraints. For more complex constraints, the algorithm simulates the process of transmitting, receiving, and displaying the frames at a level of detail sufficient to determine whether the constraints are met. Such a level of detail can readily be simulated at low cost, since the actual data need not be compressed, transmitted, nor uncompressed. The simulation simply tracks the size and timing of data transfers that would occur in actual transmission, without modelling the detailed data content.
The state of the receiver buffer depends on both the buffering and prebuffering constraints. The model simulates the time-evolving state of the buffer to determine whether a sequence of data amounts, S, for encoded data frames either overfills the receiver buffer or results in an empty receiver buffer when data is needed at the receiving end for the next GOP. The model assumes that the buffer is initially empty, but may allow for a certain amount of buffer filling, prebuffering, prior to simulating the removal of data from the receiver buffer.
The total size constraint can be a particular concern for offline or archival compression.
If the constraints are satisfied, the process 70 determines whether the present and last estimates for the best quality value, QB, are within a predetermined distance of each other (step 84). If the two estimates are within the distance of each other, the process 70 outputs the estimate for QB (step 86). If the estimates are not within the preselected distance, the process 70 resets the lower bound for the best quality value, loQ, to QM (step 88). Then, the process 70 loops back 70 to select a quality that is partway (halfway if using binary search) between the present upper and lower bounds as the next estimate for best quality value, QB. The last estimate for QB, i.e., QM, has become the new lower bound, loQ, for the best quality value in the next iteration of the binary search.
If the constraints are not satisfied, the process 70 selects QM to be the new upper bound for the best quality value (step 92). Then, the process 70 loops back 94 to select a new estimate for the best quality value QB that is partway (halfway if binary) between the new upper and lower bounds, e.g., (loQ+hiQ)/2 if binary. The search algorithm rapidly converges to a estimate of the best quality value, QB, with the preselected range of the actual best value.
In some embodiments, the process 70 of
In some embodiments, steps 104-112 are performed for a plurality of values, QS1 . . . QSN, in parallel instead of serially. These steps are computationally slow, because each image frame of the selected sequence is encoded multiple times—each encoding of an image frame is computationally slow. Performing the needed frame encodings in parallel can produce an important increase in the speed of the process 100 for estimating S(Q).
Referring to
Performing the final compression at the best quality value, i.e., step 58 of
Other embodiments are within the scope of the following claims. For example, instead of doing a search for the best quality value, QB, by a binary search, one can use another search mechanism, e.g., another search mechanism that reduces the search range by subdivision. Some variants of the binary search include picking a dividing point for the current range that is not centered exactly, but falls at an interpolated point in the range. For instance, if the bottom and top ends of a range have sizes 50 and 100, and if one is looking for a size of 60, one might choose a dividing point closer to the bottom than to the top end of the range.
Claims
1. A process for encoding data, comprising:
- determining a separate function for each frame in a sequence of frames, each function relating encoding size to encoded quality for each frame in the sequence of frames, each frame having data for an image;
- prior to encoding any of the frames, performing a search of all of the separate functions to determine a best quality value for encoding the sequence of frames, whose encoded sizes satisfy one or more constraints, the constraints being associated with one of a transmission line bandwidth, a receiver buffer size and total compressed size;
- encoding each frame of the entire sequence of frames with the determined best quality value;
- determining whether each encoded frame satisfies the constraints; and
- if the encoded frames satisfy the constraints, transmitting the sequence of encoded frames.
2. The process of claim 1, wherein the search reduces the search range for said best quality value by subdivision.
3. The process of claim 1 wherein said search is a subdivision search algorithm.
4. The process of claim 1 wherein said search is a binary search algorithm.
5. The process of claim 1, wherein each encoded frame produces a group of temporally encoded pictures.
6. The process of claim 1, wherein each act of determining a separate function, further comprises:
- computing a plurality of pairs of encoded quality and encoded size values for each frame of the sequence from encoded frame data; and
- determining a functional relationship between values of the encoded quality and the encoded size for the quality of frames from the pairs of values.
7. The process of claim 6, wherein the computing further comprises:
- encoding each frame of the sequence with a plurality of qualities to computer encoded data sizes associated with each of the plurality of qualities.
8. The process of claim 7, wherein the transmitting comprises:
- selecting the one of the plurality of qualities having a closest value to the best quality value; and
- wherein the transmitting sends frames encoded with the selected quality.
9. The process of claim 1, wherein the determining of separate functions is performed across the sequence of frames on multiple processors in parallel.
10. The process of claim 1, wherein the determining of a best quality value further comprises:
- selecting an encoded image quality of one of the plurality of frames; and
- deciding whether the encoded size associated with the encoded image quality satisfies a constraint based on one of transmission bandwidth, receiver buffering, total compressed size, and receiver prebuffering.
11. The process of claim 10, wherein the deciding is based on two of the transmission bandwidth, receiver buffering, and receiver prebuffering.
12. The process of claim 10, further comprising:
- determining the encoded size associated with each encoded image quality from the form of the functional relation between the encoded quality and the encoded size for the associated frame.
13. The process of claim 1 wherein if one of more of the encoded frames do not satisfy the constraints:
- determining a new separate function based on the prior separate function determining and search; and
- repeating the performing a search, encoding, and determining whether each encoded frame satisfies the constraints steps based on the new function.
14. A system for encoding image frames, the system comprising:
- (a) a controller connected to receive data on sizes of image frames that are part of a sequence of image frames, to be encoded by the encoder and to control quality of the encoded frames produced by the encoder based on: (i) a determination of a separate function for each image frame in the sequence of image frames, each function relating encoding size to encoded quality for each frame in the sequence of frames; (ii) a search of all of the separate functions to determine a best quality value for encoding the sequence of frames whose encoded sizes satisfy one or more constraints, the constraints being associated with one of a bandwidth of a transmission line, space in a receiver buffer and a total compressed size; and
- (b) a variable bit rate encoder controlled by the controller configured to encode each frame of the entire sequence of frames with the determined best quality value, wherein the controller is further configured to determine whether each encoded frame satisfies the constraints, and if the encoded frames satisfy the constraints, transmitting the sequence of encoded frames.
15. The system of claim 14, wherein the controller is configured to determine a relation between quality of an encoded image frame and amount of encoded data from the received size data.
16. The system of claim 14, wherein the controller is configured to determine a best quality value for encoding an image frame from size data on data frames encoded with different qualities.
17. The system of claim 14 wherein if one of more of the encoded frames do not satisfy the constraints, the controller:
- determines a new separate function based on the prior separate function determining and search;
- repeats the search of all of the frames;
- causes the encoder to encode each frame of the entire sequence based on the new form; and
- repeats the determining of whether each encoded frame satisfies the constraints.
18. The system of claim 14, wherein the controller is configured to determine the separate functions across the sequence of frames on multiple processors in parallel.
19. A computer readable storage media storing a computer program including executable instructions, the instructions to cause a computer to:
- determine a separate function for each frame in a sequence of frames, each function relating encoded size to encoded quality for each frame in the sequence of frames, each frame having data for an image;
- prior to encoding any of the frames, perform a search of all of the separate functions to determine a best quality value for encoding the sequence of frames whose encoded sizes satisfy one or more constraints, the constraints being associated with one or more of a transmission line bandwidth, a receiver buffer size and a total size constraint;
- encode each frame of the entire sequence of frames with the determined best quality value;
- determines whether each encoded frame satisfies the constraints; and
- if the encoded frames satisfy the constraints, order transmission of frames of the sequence.
20. The media of claim 19 wherein said search is a binary search algorithm.
21. The media of claim 19, wherein each encoded frame produces a group of temporarily encoded pictures.
22. The media of claim 19, wherein each instruction to determine a separate function, further causes the computer to:
- compute a plurality of pairs of encoded quality and encoded size values for each frame of the sequence from encoded frame data; and
- determine a functional relationships between values of the encoded quality and the encoded size for the plurality of frames from the pairs of values.
23. The media of claim 22, wherein the instruction to compute further causes the computer to:
- encode each frame of the sequence with a plurality of qualities to computer encoded data sizes associated with each of the plurality of qualities.
24. The media of claim 19, wherein the instruction to determine a best quality value, further causes the computer to:
- select an encoded image quality of one of the plurality of frames; and
- decide whether the encoded size associated with the encoded image quality satisfies a constraint based on one of transmission bandwidth, receiver buffering, and receiver prebuffering.
25. The media of claim 19 wherein if one of more of the encoded frames do not satisfy the constraints, the instructions cause the computer to:
- determine a new separate function based on the prior separate function determining and search; and
- repeat the performing a search, encoding, and determining whether each encoded frame satisfies the constraints steps based on the new function.
26. The media of claim 19, wherein the instructions cause the computer to determine the functions across the sequence of frames on multiple processors in parallel.
5091782 | February 25, 1992 | Krause et al. |
5231484 | July 27, 1993 | Gonzales et al. |
5333012 | July 26, 1994 | Singhal et al. |
5452104 | September 19, 1995 | Lee |
5497153 | March 5, 1996 | Jeong |
5563960 | October 8, 1996 | Shapiro |
5576767 | November 19, 1996 | Lee et al. |
5638126 | June 10, 1997 | Lim |
5686963 | November 11, 1997 | Uz et al. |
5699457 | December 16, 1997 | Adar et al. |
5708664 | January 13, 1998 | Budge et al. |
5734677 | March 31, 1998 | Liew et al. |
5867230 | February 2, 1999 | Wang et al. |
5969764 | October 19, 1999 | Sun et al. |
5990955 | November 23, 1999 | Koz |
5999655 | December 7, 1999 | Kalker et al. |
6008847 | December 28, 1999 | Bauchspies |
6014694 | January 11, 2000 | Aharoni |
6038256 | March 14, 2000 | Linzer et al. |
6052384 | April 18, 2000 | Huang et al. |
6151362 | November 21, 2000 | Wang |
6188428 | February 13, 2001 | Koz et al. |
WO 00/03546 | January 2000 | WO |
- International Search Report PCT/US01/3063, Dec. 14, 2001.
Type: Grant
Filed: Sep 28, 2000
Date of Patent: Sep 9, 2008
Assignee: Autodesk, Inc. (San Rafael, CA)
Inventors: Clinton A. Staley (Atascadero, CA), Darren Alexander Giles (Los Gatos, CA)
Primary Examiner: Allen Wong
Attorney: Gates & Cooper LLP
Application Number: 09/672,352
International Classification: H04N 7/12 (20060101);