Method and Apparatus to Perform Optimal Visually-Weighed Quantization of Time-Varying Visual Sequences in Transform Space
Pure transform-based technologies, such as the DCT or wavelets, can leverage a mathematical model based on few or one parameters to generate the expected distribution of the transform components' energy, and generate ideal entropy removal configuration data continuously responsive to changes in video behavior. Construction of successive-refinement streams is supported by this technology, permitting response to changing channel conditions. Lossless compression is also supported by this process. The embodiment described herein uses a video correlation model to develop optimal entropy removal tables and optimal transmission sequence based on a combination of descriptive characteristics of the video source, enabling independent derivation of said optimal entropy removal tables and optimal transmission sequence in both encoder and decoder sides of the compression and playback process.
This application claims benefit of a prior filed U.S. provisional application, Ser. No. 61/818,419, filed May 1, 2013.
REFERENCESISO/IEC 15444-1:2000
Information technology—JPEG 2000 image coding system—Part 1: Core coding system
US Patent Documents
U.S. Pat. No. 6,239,811 Westwater
Method and apparatus to measure relative visibility of time-varying data in transform space
U.S. Pat. No. 8,422,546
FEDERALLY SPONSORED RESEARCHNot Applicable.
BACKGROUND1. Field of Invention
The present invention relates generally to compression of moving video data, and more particularly to the application of quantization of the three-dimensional Discrete Cosine Transform (DCT) representation of moving video data for the purposes of removing visually redundant information.
2. Description of Prior Art
It is well established in the literature of the field of video compression that video can be well-modeled as a stationary Markov-1 process. This statistical model predicts the video behavior quite well, with measured correlations over 0.9 in the pixel and line directions.
It is well-known the Karhunen-Loeve Transform (KLT) perfectly decorrelates Markov-distributed video. This means the basis of the KLT is an independent set of vectors which encode the pixel values of the video sequence.
It is a further result that many discrete transforms well approximate the KLT for large correlation values. Perhaps the best-known such function is the DCT, although many other functions (DST, WHT, etc.) serve as reasonable approximations to the KLT.
It is for this reason the DCT is used to decorrelate images in the JPEG standard, after which a uniform quantization factor individually chosen for each DCT component is applied to said component, removing visual information imperceptible to the human eye.
What is needed is a means of removing subjectively redundant video information from a moving sequence of video.
Many prior-art techniques are taught under the principle of guiding a design of a quantization matrix to provide optimum visual quality for a given bitrate. These techniques, being applicable to motion compensation-based compression algorithms, require a Human Visual Model-driven feedback loop to converge on the quantizers that will show minimal artifact on reconstruction. The use of this Human Visual Model is again limited to its application in the spatial domain. An example of this teaching is U.S. Pat. No. 8,326,067 by Furbeck, as illustrated in
The wavelet transform is another technique commonly used to perform compression. However, the wavelet does not decorrelate video, and thus optimal quantizers based upon a Human Visual Model cannot be calculated. A teaching by Gu et al, U.S. Pat. No. 7,006,568 attempts to address this issue by segmenting video sequences into similar-characteristic segments and calculating 2-D quantizers for each selected segment, chosen to reduce perceptual error in each subband, as illustrated in
The current invention improves the compression process by directly calculating the visually optimal quantizers for 3-D transform vectors by evaluating the basis behavior of the decorrelated transform space under a time-varying Human Visual Model, as represented by a Contrast Sensitivity Function.
SUMMARY OF INVENTIONIn accordance with one aspect of the invention, a method is provided for removal of all subjectively redundant visual information by means of calculating optimal visually-weighed quantizers corresponding to the decorrelating-transformed block decomposition of a sequence of video images. The contrast sensitivity of the human eye to the actual time-varying transform-domain frequency of each transform component is calculated, and the resolution of the transformed data is reduced by the calculated sensitivity.
A second aspect of the invention applies specifically to use of the DCT as the decorrelating transform.
As illustrated in
In the current embodiment, said configuration of video stream Error! Reference source not found 320 is elaborated in
In the current embodiment, said configuration of viewing conditions Error! Reference source not found 310 is elaborated in
In the current embodiment, said configuration of block-based decorrelating transform Error! Reference source not found 340 is elaborated in
In the current embodiment, said configuration of quantizer algorithm Error! Reference source not found 330 is elaborated in
Luminance quantizers are calculated as in
The equation Error! Reference source not found 110 of
The equation Error! Reference source not found 010 of
Equation Error! Reference source not found 010 of
The two-dimensional map of values assumes by said typical contrast sensitivity function CSF(u,w,I,X0,Xmax) (Error! Reference source not found 010) for equally-weighted is depicted in
As illustrated in
Said quantizer Q (Error! Reference source not found 020) gives optimal response for pure AC transform components, but produces sub-optimal results for pure DC or mixed AC/DC components, due to the extreme sensitivity of the human eye to DC levels. Pure DC transform components may be quantized by the value that the variance of the DC component is concentrated over the number of possible levels that can be represented in the reconstructed image, as the human eye is constrained to the capabilities of the display. Equation Error! Reference source not found 010 of
Mixed AC/DC components can be quantized by the minimum quantization step size apportioned over the variance of the DCT basis component. This process requires calculation of the per-component variance for the AC and DC components (i.e., the variance calculation in the number of dimensions in which each AC or DC component resides). Similarly, the value of the independent AC and DC quantizers must be calculated using the Contrast Sensitivity Function limited to the number of dimensions in which the AC or DC component resides. As illustrated in
The two-dimensional AC quantizer QACm,n,0 Error! Reference source not found 220 is calculated directly from said typical generalized Contract Sensitivity Function CSF(u,w,I,X0,Xmax) Error! Reference source not found 010.
The maximum visual delta of 1/QACm,n,0 Error! Reference source not found 110 calculated to apply to the variance-concentrated range Cx[m,m]*Cy[n,n] Error! Reference source not found 120 and 1/QDCm,n,0 Error! Reference source not found 130 calculated to apply to the variance-concentrated range Cz[0,0] Error! Reference source not found 130 is calculated as 1/min(QACm,n,0 QDCm,n,0) Error!
Reference source not found 210, and can be applied over the entire range Cx[m,m]*Cy[n,n]* Cz[0,0] Error! Reference source not found 220.
Said statistically optimal quantizer Qm,n,0 Error! Reference source not found 310 may now be calculated following with the C language pseudocode excerpt Error! Reference source not found 320. It is to be understood that the process of calculating typical statistically ideal mixed AC/DC coefficients is illustrated in the general sense in
The worst-case degradation in visual quality caused by the Gibbs phenomenon as a result of quantization is illustrated in
Thus the present invention presents a comprehensive means of determining, for any given video-decorrelating spatiotemporal transform, optimal visual quantizers under specified viewing conditions and digital video configuration. The rationale behind the development of these optimal visual quantizers includes the mapping of a standard contrast spatiotemporal sensitivity model to the specific and potentially dynamically changing characteristics of the compression system, and the extension of the model to include human sensitivity to angular and off-axis conditions, and the removal of potential Gibbs artifacts generated as a result of quantization. The invention has the important side-effect of supporting independent coherent quantizer generation in compressor and decompressor, enabling the low data rates associated with fixed quantizer tables while providing adaptation to potentially changing video frame rates.
While the present invention has been described in its preferred version or embodiment with some degree of particularity, it is understood that this description is intended as an example only, and that numerous changes in the composition or arrangements of apparatus elements and process steps may be made within the scope and spirit of the invention.
Adaptive video encoding using a perceptual model
U.S. Pat. No. 8,416,104
Method and apparatus for entropy decoding
U.S. Pat. No. 8,406,546
Adaptive entropy coding for images and videos using set partitioning in generalized hierarchical trees
U.S. Pat. No. 7,899,263
Method and apparatus for processing analytical-form compression noise in images with known statistics
U.S. Pat. No. 7,788,106
Entropy coding with compact codebooks
U.S. Pat. No. 7,085,425
Embedded DCT-based still image coding algorithm
Claims
1. An apparatus comprised of a compressor and decompressor and a method for generating an optimally compressed representation of multidimensional visual data after transformation by a multidimensional orthogonal transform of a specified transformation block size, after quantization by coefficients of said transformation block size, and after rearrangement of said quantized coefficients into a transmission sequence, and after collection of said quantized transformation coefficients into symbols, by the application of said quantized decorrelating transform to a plurality of measured variances of uncompressed multidimensional visual data and measured correlation coefficients of uncompressed multidimensional visual data to calculate the probability distribution of each quantized transform coefficient required to perform entropy removal,
2. The method of claim 1 where said orthogonal transform is the discrete cosine transform,
3. The method of claim 1 where said multidimensional visual data comprises a two-dimensional still image,
4. The method of claim 3 where said transformation block size comprises the entire image,
5. The method of claim 3 where said plurality of measured variances of uncompressed multidimensional visual data is one averaged value per block and said plurality of correlation coefficients is one averaged value per frame,
6. The method of claim 3 where said plurality of measured variances of uncompressed multidimensional visual data is one averaged value per block and said plurality of correlation coefficients is one averaged value per block,
7. The method of claim 3 where said plurality of measured variances of uncompressed multidimensional visual data is one averaged value per dimension per frame and said plurality of correlation coefficients is one averaged value per dimension per frame,
8. The method of claim 3 where said plurality of measured variances of uncompressed multidimensional visual data is one averaged value per block and said plurality of correlation coefficients is one averaged value per dimension per block,
9. The method of claim 1 where said multidimensional visual data comprises a three-dimensional moving video sequence,
10. The method of claim 9 where said transformation block size comprises a number of frames by the entire size of a single frame,
11. The method of claim 9 where said plurality of measured variances of uncompressed multidimensional visual data is one averaged value per group of frames and said plurality of correlation coefficients is one averaged value per group of frames,
12. The method of claim 9 where said plurality of measured variances of uncompressed multidimensional visual data is one averaged value per block and said plurality of correlation coefficients is one averaged value per block,
13. The method of claim 9 where said plurality of measured variances of uncompressed multidimensional visual data is one averaged value per dimension per group of frames and said plurality of correlation coefficients is one averaged value per dimension per group of frames,
14. The method of claim 9 where said plurality of measured variances of uncompressed multidimensional visual data is one averaged value per dimension per block and said plurality of correlation coefficients is one averaged value per dimension per block,
15. The method of claim 1 where said quantizers are all ones,
16. The method of claim 1 where said quantizers are all equal,
17. The method of claim 1 where said quantizers are visually weighed,
18. The method of claim 1 where coefficients are organized within each block into order of decreasing calculated component variance,
19. The method of claim 18 where the probability of symbols is calculated from a definition of a plurality of symbols as collected from sequences of component values whose conditional expectation is zero followed by the actual non-zero value, a plurality of symbols as collected from sequences of component values whose conditional expectation is zero followed by the number of bits required to represent the non-zero value, an end-of-block symbol whose conditional expectation is calculated from the cumulative probability of a sequence of symbols comprised solely of zeroes, and an escape symbol whose conditional expectation is calculated from the accumulation of the probability of all symbols not otherwise defined.
20. The method of claim 1 where coefficients are organized across blocks into order of decreasing calculated component variance,
21. The method of claim 20 where the probability of symbols is calculated from a definition of a plurality of symbols as collected from sequences of component values whose conditional expectation is zero followed by the actual non-zero value, a plurality of symbols as collected from sequences of component values whose conditional expectation is zero followed by the number of bits required to represent the non-zero value, an end-of-block symbol whose conditional expectation is calculated from the cumulative probability of a sequence of symbols comprised solely of zeroes, and an escape symbol whose conditional expectation is calculated from the accumulation of the probability of all symbols not otherwise defined.
22. The method of claim 1 where coefficients are organized across blocks into bands of decreasing calculated component variance within of order successive refinement,
23. The method of claim 22 where the probability of symbols is calculated from a definition of a plurality of symbols as collected from sequences of component values whose conditional expectation is zero followed by the actual non-zero value, a plurality of symbols as collected from sequences of component values whose conditional expectation is zero followed by the number of bits required to represent the non-zero value, an end-of-block symbol whose conditional expectation is calculated from the cumulative probability of a sequence of symbols comprised solely of zeroes, and an escape symbol whose conditional expectation is calculated from the accumulation of the probability of all symbols not otherwise defined.
24. The method of claim 1 where coefficients are organized across blocks into bands of equal weight in order of decreasing calculated component variance,
25. The method of claim 24 where the probability of symbols is calculated from a definition of a plurality of symbols as collected from sequences of component values whose conditional expectation is zero followed by the actual non-zero value, a plurality of symbols as collected from sequences of component values whose conditional expectation is zero followed by the number of bits required to represent the non-zero value, an end-of-block symbol whose conditional expectation is calculated from the cumulative probability of a sequence of symbols comprised solely of zeroes, and an escape symbol whose conditional expectation is calculated from the accumulation of the probability of all symbols not otherwise defined.
26. The method of claim 1 where Huffman coding based used to perform entropy removal on the constructed stream of symbols,
27. The method of claim 26 where said measured variances of uncompressed multidimensional visual data and said measured correlations of uncompressed multidimensional visual data are communicated between compressor and decompressor,
28. The method of claim 1 where arithmetic coding based is used to perform entropy removal on the constructed stream of symbols,
29. The method of claim 28 where said measured variances of uncompressed multidimensional visual data and said measured correlations of uncompressed multidimensional visual data are communicated between compressor and decompressor,
30. The method of claim 1 where said decorrelating transform is any orthonormal wavelet.
Type: Application
Filed: Apr 30, 2014
Publication Date: Nov 6, 2014
Inventor: Raymond John Westwater (Princeton, NJ)
Application Number: 14/266,757
International Classification: H04N 19/625 (20060101); H04N 19/91 (20060101); H04N 19/63 (20060101);