BLIND NOISE ANALYSIS FOR VIDEO COMPRESSION
Example embodiments of the present invention provide a method or device for coding source video. The method or device may provide for a segment of video frames from the source video, computing a noise map for the segment of the source video where the noise map is computed from differences among pixels selected from spatially-distributed sampling patterns in the segment, computing control parameter adjustments based on the noise map, and coding the selected segment of source video according to control parameters generated from a default coding policy and the control parameter adjustments, where the default coding policy includes default control parameters of the encoder.
Latest Apple Patents:
- METHOD OF LIFE CYCLE MANAGEMENT USING MODEL ID AND MODEL FUNCTION
- APERIODIC SRS TRIGGERING MECHANISM ENHANCEMENT
- TIMING ADVANCE TECHNIQUES TO MANAGE CROSS LINK INTERFERENCE IN 5G COMMUNICATION SYSTEMS
- Mesh Compression Texture Coordinate Signaling and Decoding
- Devices, methods, and graphical user interfaces for assisted photo- taking
This application claims priority to U.S. Provisional Patent Application No. 61/163,684, filed Mar. 26, 2009, entitled “Blind Noise Analysis For Video Compression,” which is herein incorporated by reference in its entirety.
FIELD OF THE INVENTIONThe present invention is generally directed to video coding techniques and devices. In particular, the present invention is directed to estimate control parameter adjustments to a video encoder (also called coder in this application) and video decoder based on an estimation of noise in source video.
BACKGROUND INFORMATIONStudio video content producers often pre-edit uncompressed video to fit different requirements for channel distribution and format conversion. Another important and common edit step may be the color and film matching (also called analog support), which may require the injection of random noise into video frames to match characteristics of an analog film. Further, when the source video or film includes computer-generated graphics, a large amount of noise may be purposefully added in areas of uniform colors to create natural-looking effects.
The color dithering may arise during bit precision reduction. An analog source material may be digitized with high bit precision, e.g., 14, 12, or 10 bits per channel (each channel representing luma or chroma in YUV or RGB) in a standard color space (4:4:4 in Society of Motion Picture and Engineers or SMPTE spec), while the final cut (uncompressed video) may be in a lower precision format, e.g., 8 bits per channel of e.g., YUV 4:2:2 or less. The conversion from high to low precision quantization may produce artifacts and require dithering to create virtual intermediate levels of colors not available in the final color space or bit depth. Finally, a global control of noise level, or noise modulation, may be used to create an overall visually persistent effect on a user, especially for situations such as luminance fading and dimming light that are common in, e.g., titles or low light scenes. The final results of all this “in-house” processing may be given to distributors, e.g., an Apple iTunes store to be encoded in multiple formats, bitrates, or standards, e.g., H.264 for video compression.
Video materials having pre-edited content (even if in a 10 or 8 bit, non-compressed format) may create challenges to a block-based video encoder, e.g., a H.264 type encoder. For example, the effects of all these noise adding processes (for quality purpose) may affect multiple stages of an encoder including the motion estimation that matches blocks of images based on a sum of absolute difference (SAD) metrics.
A video encoder may cause compression artifacts of quantization and/or block artifacts. When compression artifacts are present in a video region that includes additive noise, they may become even more evident and visually annoying to a viewer because the geometrically defined structures of the compression artifacts may present in a random isotropic region. The persistent artifacts on a playback screen may create unnatural effects which may degrade the perceptual quality.
In low bit rate video encoding, the additive noise from film production may also make it more difficult to achieve high perceptual quality since high frequency noise may affect the quantization process and the rate distortion optimization (RDOPT). In fact, some techniques used to maximize the video quality in the final cut production may actually prevent low bit rate video encoding to achieve the maximum overall quality.
Since additive noise introduced from film post production may adversely affect the subsequent coding and decoding steps, it is advantageous to estimate the additive noise so that the additive noise may be properly treated in the subsequent coding and decoding steps.
Example embodiments of the present invention provide a method or device for coding source video. The method or device may provide for a segment of video frames from the source video, computing a noise map for the segment of the source video, the noise map computed from differences among pixels selected from spatially-distributed sampling patterns in the segment, computing control parameter adjustments based on the noise map, and coding the selected segment of source video according to control parameters generated from a default coding policy and the control parameter adjustments, where the default coding policy includes default control parameters of the encoder.
According to one example embodiment of the present invention, a noise map, e.g., an array of one numerical value per sample, may be computed to provide a noise measure in a source video. The noise map may be a binary map, each sample of which indicates a noise state (e.g., “1”) or a noiseless state (e.g., “0”). Alternatively, the noise map may include samples of integers, each of which indicates an estimated noise strength. For example, with byte integers, a “0” may indicate noiseless and a “255” may indicate maximum noise.
According to one example embodiment of the present invention, the additive noise may be modeled as uncorrelated noise in the sense that the noise may not contain any spatial structure in the picture domain, or a predominant frequency range in a transformed frequency domain obtained, e.g., by a Fourier transform. The noise may not necessarily be white noise, i.e., it may have a spectrum of limit width, but it should be quite uniform within its spectrum range.
In one example embodiment of the present invention, the analysis of noise is blind, i.e., there is no knowledge as to which types of pre-production process have been utilized, and no knowledge as to which types of noise or dither have been added to the original source pictures.
In one example embodiment of the present invention, a pre-processing functional unit may parse input pictures from a source video content and provide pre-processed image frames to a video encoder. Simultaneously, the pre-processing unit may through a separate channel provide extra information about, e.g., the additive noise in pictures, to the video encoder. This extra information may be used to control coding steps, e.g., in the inner loop of an encoder or a decoder.
In one example embodiment of the present invention, the analysis pipeline may include a noise detector for detecting or evaluating noise in the source video or in frames of the source video. The type of noise may include additive noise or compression artifacts. The noise detection may be measured based on conventional directional band-pass filters (e.g., a directional difference of gaussians type of filters), or weighted sum of absolute differences (WSAD) as discussed later for a segment of the source video. Alternatively, instead of using spatial directional filters, the noise detection may be accomplished by filtering in the frequency domain, e.g., by selecting multiple patches of frequencies that represent different spatial directions, and by computing and comparing the energy in different patches. Exemplary patches may include the low-high, high-high, high-low, and high-low frequency bands of a 2D DCT. The segment of video may include one or more video frames of the source video. The output the of noise detector may be called picture information in the form of, e.g., a noise map, that indicates locations of noise in the video and strengths of the noise.
In another example embodiment, the noise detection may be carried out in 3D, e.g, in a spatio-temporal domain. A segment of multiple picture frames over a period of time may be used for noise detection. In one example, each picture frame in the segment may be motion compensated and summed up to generate an accumulated or alternatively an average frame, from which noise in the segment may be detected. Alternatively, a 3D filter, e.g., a 3D difference of gaussians filter or 3D WSADs may be directly applied the segment of picture frames. The 3D WSADs may be computed from 3D (spatial+temporal) sampling patterns. For example, a stack of video frames may be partitioned into cubes of pixels in the 3D spatio-temporal space like blocks of pixels. Within each cube of pixels, random sampling patterns may be generated for computing WSADs.
The encoding of the source video may be controlled by a set of control parameters including, but not limited to, e.g., qp (quantization parameter, per frame or per Macroblock), entropy coding mode, mode decision (inter/intra, block size, etc), transform size, qmatrix (quantization matrix). These control parameters may be initially set to default values. A collection of default control parameters may be called a default coding policy. In one embodiment of the present invention, the noise map may be used to compute adjustments to these control parameters so that the encoder may more efficiently code the source video according to the default coding policy and the control parameter adjustments.
In one example embodiment, the output may be an array of binary values for indicating the existence of noise. For example, a binary value one (1) may indicate noise and zero (0) may indicate no noise or substantially less noise. Each binary value may represent the presence of noise, e.g., at a pixel or within a pixel block. In one example embodiment, a threshold may be used to determine whether noise is present.
In another example embodiment, the output of the noise detector may be an array of integer values, each of which represents noise strength, e.g., at a pixel or within a block of pixels. In one example embodiment, the noise strength may be represented with integers of one byte length. Thus, a value of zero (0) may indicate noise-free and a value of 255 may indicate a maximum level of noise at the corresponding location in the video frame.
The filter pipeline 202 may include a denoise filter for noise removal. Both the filtering pipeline 202 and the analysis pipeline 204 may be connected to each other so that the analysis pipeline may transfer noise measures, e.g., noise maps, about the input video or picture frames to the filter pipeline. The filtering pipeline may then use the noise map to improve the filtering of input video or picture frames.
In one example embodiment, the output of the analysis pipeline 204 may be transmitted to the video decoder 210. The video decoder may use this information to drive post processing operations. In one example, the output may be noise maps to provide hints to the decoder as to the amount of noise the uncompressed video had at different locations, e.g., pixels or blocks of pixels. The decoder may compare the noise in the decoded video, e.g., at pixel or pixel block locations, with the noise present in the uncompressed video. Based on the comparison, the decoder may decide whether to add noise to the decoded video.
The video encoder 208 may also send information request to the analysis pipeline 204 to request picture information on the video to be encoded or on previously inputted picture frames. Upon receiving the request, the analysis pipeline may provide requested picture information to the video encoder 208 and/or a decoder 210. As discussed above, the picture information may include noise measures such as noise maps. The noise measure provided to the video encoder may determine video encoder control parameters 212 including, but not limited to, e.g., the quantization parameter (qp, per frame or per macroblock), entropy coding mode, mode decision (inter/intra, block size, etc.), transform size, and quantization matrix (qmatrix) as defined in, e.g., H.264 or MPEG standards.
In one example embodiment of the present invention, each picture frame may be further divided into blocks of pixels (“pixel blocks”), e.g., 8×8 or 16×16 pixels per block. Thus, each sample in a noise map may correspond to a noise measure for a block of pixels. This may be particularly advantageous for a block-based video encoder, e.g., an H.264 type of video encoder. However, the size (or the location) of the input block of data used by the noise analysis pipeline may not be necessarily related to the pixel block size of the video encoder.
Each pixel may include one or more components, e.g., one luma and two chroma for YUV format, and red, green, blue for RGB format. The value of each component may be represented with a number of bits, e.g., 8 bits or 12 bits, called bit depth. The pixel block size used for the noise analysis may depend on the bit depth of the input picture frame (luma or chroma) and/or on the desired accuracy of the final noise measure. A too small block, thus small statistical sample size, may produce an unstable measure that may relate more to local variations of the input picture. The lower limit of the block size may be highly correlated to the bit depth, or number of bits per pixel. For an 8 bit non-compressed video, for example, an encoder may use a sample size of at least 50 to 80 pixels, e.g. using a pixel block of 8×8 pixels. A too large pixel block may produce a meaningless noise measure since it may be uncorrelated to local variations of the input picture. Additionally, the resolution of the input picture in terms of number of pixels may also affect the pixel block size, e.g., bigger pixel block sizes for bigger formats.
In one example embodiment of the present invention, the noise in a source video may be measured by a band-pass filter, e.g., a difference of gaussians (DOG) filter as conventionally known. Alternatively, the noise in the source video may be measured using weighted sum of absolute differences (WSAD) as discussed in the following.
At 302, a shift difference computation may be performed on the input picture. The input picture may be shifted both horizontally and vertically by a delta (Δ) amount of pixels. The shifted version of the picture may be subtracted from the original picture to compute the difference between the original and the shifted version. This shifted difference operation may isolate all the irregularities in the input picture including, e.g., noise and sharp edges, in the original picture. The irregularities may be structured or non-structured features in the original source video frames. A structured feature may include, e.g., edges between objects or, caused by changes of lighting condition.
The amount of shift Δ may depend on the grain size of the noise, e.g., speckle noise, to be detected and with the resolution (in pixels) of the input picture. In one example embodiment of the present invention, for a picture frame of a resolution of 1080 by 720 pixels, the Δ may be in the range of one to five pixels. The main factor for determining Δ may be the maximum spatial frequency of the noise which is to be determined. Thus, the shift Δ may be computed as a function of the maximum noise frequency, grain size, and picture resolution.
The results of the shifted difference computation may include multiple blocks of e.g., 8×8 samples representing the shifted difference. For each pixel block, weighted sum of absolute differences (WSAD) may be computed (see detail in following) at 304. For each block, it is advantageous to compute at least two WSAD values. A greater number of WSAD values may increase the accuracy of the noise level measurements.
In one example embodiment of the present invention, WSAD values may be computed based on the difference between the original image and the shifted image. The objective is to measure noise in a pixel block using multiple local gradient measures. The gradients may simply be differences between pixels within the same block, which may be computed based on a pattern used to measure these differences.
The WSAD of this invention is defined differently from conventional SAD. First, in WSAD, a weighting function are applied to the SAD values to compute WSAD. In addition, the weights may be calculated as a function of the absolute differences, i.e., the weighting may be adaptive to the local pixel data. Thus the weighting may play an important role in detecting structured data in the pixel domain, or correspondingly predominant frequency in the transform domain.
Referring to
where W( ) is the weighting function.
A WSAD value may be determined by pixel sampling patterns and a weighting function W( ). In one example embodiment of the present invention, a V-shaped weighting function may be used for noise detection. The V-shaped weighting function may be varied to match different aspects of underlying noise.
As discussed above, TH may be determined as a function of SNR that is acceptable to a user and be related to the luminance level. The TH value is a subjective characteristic of the human eyes, i.e., they are more sensitive to small luminance change within a certain range, e.g., [−TH, TH], but are more receptive to changes in darker or whiter areas. As such, human eyes may perceive similar quality even under less SNR in darker or whiter areas. In one embodiment, TH may vary based on SNR or local average luminance.
The WSAD values may also be determined by the pattern used to calculate gradients. The pixel sampling pattern may be different structured or random samplings of pixels in a block as long as they are used consistently through the whole process. Thus, within a block of pixel values, multiple WSADs may be calculated. The underlying noise pattern may be determined based on variations among multiple WSADs of a particular block. The pixel sampling pattern as shown in
In one example embodiment of the present invention, for each block of pixel values, multiple WSADs may be calculated.
In another embodiment of the present invention, two WSADs may be calculated for a block of pixels. Referring to
The shuffling function may take on different forms. In one example embodiment of the present invention, the shuffling function may be a simple transposition operation between pairs of pixels, i.e., exchange the position in a pair without affecting other pixels in a block. In another example embodiment, the shuffling may take place in a register that stores pixels. For the convenience of implementation, the shuffling may take place in an index shuffling in the register. For a block with pixels shuffled, a WSAD value may be calculated as
where fs is a shuffling function, pixel(i) is the pixel value at index i, and W( ) is the weighting function. As such, shuffling operation may be performed in the inner loop of gradient computation by simply modifying the pixel sampling pattern for gradient computation.
In yet another embodiment of the present invention, the shifted difference calculation may be combined with the shuffling operation for a more compact calculation.
In one example embodiment of the present invention, confident intervals for noise measurements may be calculated based on the resulting multiple WSADs.
Noise(blocki)=fci(Δ(WSAD 1, . . . , WSAD N))*g(WSAD 1, . . . , WSAD N)
where Δ is a function that measures the variations among WSAD 1, WSAD 2, . . . , WSAD N. In a situation where there is only two WSAD values, the Δ may be a simple difference between two WSAD values. In another situation where there are more than two WSADs, the Δ function may be any one of an average function, a maximum, or a deviation function etc. for WSAD 1, WSAD 2, . . . , WSAD N.
Different fci functions may be used as shown in
In one advantageous embodiment, the sizes of samples used for calculating WSADs are the same for gathering same energy from the input signal sample. In another advantageous embodiment, blocks of samples may be overlapping for a portion of pixels. The overlapping blocks may provide a more uniform measure and resolve structured data that spans over between block edges and that may not be detected by a simple shift operator.
In one variant embodiment of the present invention, the above discussed WSAD values may be used to analyze flatness of a scene for video compression. One of the situations where artifacts are visible for compressed video is the presence of semi-static area in a normal motion environment, e.g., situations of slow panning camera in background and fast motions in the foreground. These situations may cause a video encoder to reduce picture quality due to less visibility of details. Further, human vision may be sensitive to perceive unnatural changes in a flat or semi-flat area, e.g., a scene of a sky or flat wall, after a long (>2 seconds) exposure to the scene. Encoding techniques, such as frame skipping or intra/inter frame prediction, may achieve high compression rate, but at the same time, may cause sudden small changes that may not be evident in a single frame, but are evident and annoying in a multiple frame playback.
Each picture may be provided with a WSAD-based one dimensional noise measure that biases toward flat areas in very slow motion during playback. These areas may include some details, but are generally perceived as semi-flat. In addition, these areas may retain structured data that may cause visible artifacts if a video encoder produces quantization errors or unnatural motions such as a sudden jump in a motion flow.
A pre-processing/analysis step specifically designed to detect semi-flat areas in slow motions may provide useful biasing information to a video encoder for optimizing, e.g., a final rate distortion. The slow motion background may be conveniently detected by comparing the current picture frame with the previous picture frame for a high correlation may mean small motion vectors. The flatness of an area may be detected using above-discussed WSAD-based noise detection methods.
In one example embodiment of the present invention, each picture frame may include a plurality of overlapping blocks of pixels in the sense each block share at least one pixel with another block. The computation of WSAD for these blocks may be carried out in parallel via a plurality of processors or processing units, e.g., central processing units (CPUs) or graphical processing units (GPUs).
As shown in
Those skilled in the art may appreciate from the foregoing description that the present invention may be implemented in a variety of forms, and that the various embodiments may be implemented alone or in combination. Therefore, while the embodiments of the present invention have been described in connection with particular examples thereof, the true scope of the embodiments and/or methods of the present invention should not be so limited since other modifications will become apparent to the skilled practitioner upon a study of the drawings, specification, and following claims.
Claims
1. A method for coding source video at an encoder, comprising:
- (a) selecting a segment of video frames from the source video;
- (b) computing a noise map for the segment of the source video, the noise map computed from differences among pixels selected from spatially-distributed sampling patterns in the segment;
- (c) computing control parameter adjustments based on the noise map; and
- (d) coding the selected segment of source video according to control parameters generated from a default coding policy and the control parameter adjustments,
- wherein the default coding policy includes default control parameters of the encoder.
2. The method of claim 1, further comprising:
- (e) selecting another segment of video frames from the source video; and
- (f) repeating (a) to (d).
3. The method of claim 1, wherein the segment of video frames includes only one video frame.
4. The method of claim 1, wherein the segment of video frames includes more than one video frames.
5. The method of claim 1, wherein the noise map includes an array of elements each representing a pixel location.
6. The method of claim 1, wherein the noise map includes an array of elements each representing a location of a block of pixels.
7. The method of claim 1, wherein the noise map includes an array of binary-valued elements, a first state of a binary value representing noise and a second state of the binary value representing noiseless.
8. The method of claim 1, wherein the noise map includes an array of integer-valued elements, each integer value representing a strength of noise.
9. The method of claim 1, wherein the control parameters of the encoder include a frame quantization parameter.
10. The method of claim 1, wherein the control parameters of the encoder include a macroblock quantization parameter.
11. The method of claim 1, wherein the control parameters of the encoder include an entropy coding mode.
12. The method of claim 1, wherein the control parameters of the encoder include a mode decision.
13. The method of claim 1, wherein the control parameters of the encoder include a transform size.
14. The method of claim 1, wherein the control parameters of the encoder include a quantization matrix.
15. The method of claim 1, wherein the computing of the noise map is based on directional band-pass filtering the source video.
16. The method of claim 1, wherein the computing of the noise map is based on a plurality of WSAD values each representing a weighted sum of absolute differences among pixels of a block of pixels, each WSAD value determined from a different sampling pattern taken across the block of pixels.
17. The method of claim 16, wherein the noise map is further determined based on a confidence interval as a function of the plurality of WSAD values.
18. The method of claim 16, wherein the different sampling pattern includes pairs of pixels.
19. The method of claim 16, wherein the different sampling pattern is generated randomly.
20. The method of claim 16, wherein the computing WSAD values further includes:
- shifting the block of pixels by at least one of horizontally a first offset and vertically a second offset;
- subtracting the shifted block of pixels from the block of pixels to compute shift differences;
- computing gradients based on an absolute difference between pairs of pixels selected from a sampling pattern; and
- computing the WSAD values based on a sum of a plurality of weights, wherein each of the plurality of weights is computed as a weight function of the gradients.
21. The method of claim 20, wherein the weight function is:
- linear when an absolute value of the each of gradients is less than or equal to a threshold; and
- equal to a gradient maximum when the absolute value of the each of the gradients is greater than the threshold.
22. The method of claim 20, wherein the calculation of the WSAD values includes:
- shuffling pixel positions in a storage before the computing the gradients.
23. The method of claim 16, wherein the block of pixels has a size of 8×8 pixels or 16×16 pixels.
24. The method of claim 16, wherein each of the blocks of pixels is overlapping with another block of pixels.
25. The method of claim 16, wherein each element of the noise map is a difference function between two WSAD values.
26. The method of claim 16, wherein the noise map is one of an average function, a maximum function, and a deviation function of the WSAD values.
27. The method of claim 16, wherein the noise map is further determined based on a number of pixels used for computing the WSAD values.
28. The method of claim 1, further comprising:
- before encoding the source video, filtering the source video based on the noise map.
29. The method of claim 28, wherein the filtering of source video substantially removes additive noise based on the noise map.
30. The method of claim 1, wherein the differences represent luma differences or chroma differences.
31. The method of claim 1, wherein the differences represent red intensity differences, green intensity differences, and blue intensity differences.
32. The method of claim 1, wherein the computing of the noise map is based on a 3D band-pass filtering the source video.
33. The method of claim 32, wherein the 3D band-pass filtering a motion-compensated version of the source video.
34. The method of claim 1, wherein the computing of the noise map is based on a plurality of WSAD values each representing a weighted sum of absolute differences among pixels of a 3D cube of pixels in the segment of video frames, each WSAD value determined from a different 3D sampling pattern taken across the 3D cube of pixels.
35. An encoder for video, comprising:
- a selector to select a segment of video frames from a source video;
- a processor to compute a noise map for the segment of the source video, the noise map computed from differences among pixels selected from spatially-distributed sampling patterns in the segment;
- an estimator to estimate control parameter adjustments based on the noise map; and
- an coder to code the selected segment of source video according to control parameters generated from a default coding policy and the control parameter adjustments,
- wherein the default coding policy includes default control parameters of the coder.
36. An estimator of control parameters of a video encoder, comprising:
- a processor configured to: select a segment of video frames from the source video; compute a noise map for the segment of the source video, the noise map computed from differences among pixels selected from spatially-distributed sampling patterns in the segment; and compute control parameter adjustments based on the noise map;
- wherein: the selected segment of source video is coded according to control parameters generated from a default coding policy and the control parameter adjustments; and the default coding policy includes default control parameters of the coder.
Type: Application
Filed: May 11, 2009
Publication Date: Sep 30, 2010
Applicant: Apple Inc. (Cupertino, CA)
Inventors: Gianluca FILIPPINI (Los Gatos, CA), Xiaosong ZHOU (Campbell, CA), Hsi-Jung WU (San Jose, CA), James Oliver NORMILE (Los Altos, CA), Xiaojin SHI (Fremont, CA), Ionut HRISTODORESCU (San Jose, CA)
Application Number: 12/463,871
International Classification: H04N 7/12 (20060101);