IMAGE CODING APPARATUS, IMAGE DECODING APPARATUS, IMAGE CODING METHOD, AND IMAGE DECODING METHOD
An input image signal is divided into MC block units and, when coding processing is performed in these divided units, a motion compensation section generates a motion-compensated prediction image by detecting movement amounts in predetermined MC block units, a smoothing filter section performs, with respect to the prediction image, smoothing of pixels located at the boundaries of adjoining MC blocks on the basis of predetermined evaluation criteria, and a prediction residual signal, which is obtained from the difference between the prediction image obtained by the smoothing, and the input image signal, is encoded.
Latest NTT DoCoMo, Inc Patents:
This application is a continuation of U.S. application Ser. No. 10/480,642, filed Dec. 24, 2003, the entire contents of which is incorporated by reference. U.S. application Ser. No. 10/480,642 is a national phase application of PCT/JP02/00614 under 35 U.S.C §371, filed Jun. 28, 2002, which claims priority to the Japanese Patent Application JP 2001-199685, filed on Jun. 29, 2001.
TECHNICAL FIELDThe present invention relates to an image coding apparatus, an image decoding apparatus, an image coding method, and an image decoding method that perform the transmission and storage of images with a small encoding data volume and that are applied to a mobile image transmission system or similar.
BACKGROUND ARTConventional image coding systems are constituted to divide image frames into blocks of a fixed size and then perform coding processing in these divided units. Typical examples of conventional image coding systems include the MPEG (Moving Picture Experts Group) 1 coding system as described in Le Gall. D: “MPEG: A Video Compression Standard for Multimedia Applications”, Trans. ACM, 1991, April.
MPEG 1 performs motion-compensated interframe prediction (MC: Moving Compensation) by dividing image frames into fixed block units known as macroblocks, detecting movement amounts (or motion vectors) by referencing a local decoding frame image encoded in units, specifying similar blocks from within a reference image and employing these similar blocks as predictive data. By means of this technique, even when there is motion in an image, the prediction efficiency can be improved by tracking the motion, and redundancy in a temporal direction can be reduced. Furthermore, redundancy that remains in a spatial direction can be reduced by employing the DCT (Discrete Cosine Transform), using units that are blocks consisting of 8×8 pixels, with respect to a prediction residual signal. A variety of standard image coding systems that start with MPEG 1 perform data compression of image signals by combining MC and the DCT.
The macroblock data of the current frame (current macroblocks) produced by the input image signal 1 are first outputted to a motion detection section 2 where detection of motion vectors 5 is carried out. A motion vector 5 is detected by referencing a predetermined search region of a previous encoded frame image 4 (called a local decoding image 4 hereinafter) stored in a frame memory 3, locating a pattern similar to the current macroblock (called a prediction image 6 hereinafter), and determining the amount of spatial displacement between this pattern and the current macroblock.
Here, the local decoding image 4 is not limited to only the previous frame. Rather, a future frame can also be used as a result of being encoded in advance and stored in the frame memory 3. Although the use of a future frame generates switching of the coding order and in turn an increase in the processing delay, there is the merit that variations in the image content produced between previous and future frames is easily predicted, thus making it possible to effectively reduce temporal redundancy still further.
Generally, in MPEG1, it is possible to selectively use three coding types which are called bidirectional prediction (B frame prediction), forward prediction (P frame prediction) that uses previous frames alone, and I frame prediction which does not perform interframe prediction, instead performing coding only within the frame.
The motion vector 5 shown in
The motion predictive data for all the macroblocks in the current frame is determined, and this data, which is rendered as a frame image, is equivalent to the motion prediction frame 605 in
The prediction residual signal 8 is converted into DCT coefficient data 10 (also called DCT coefficients 10 hereinafter) by a DCT section 9. As shown in
The DCT uses the correlation between pixels present in a spatial region to localize the power concentration in the DCT block. The higher the power concentration, the better the conversion efficiency is, and therefore the performance of the DCT with respect to a natural image signal is not inferior when compared with a KL transform which is the optimum transform. Particularly in the case of a natural image, the power is concentrated in the lower regions including the DC component as a main part, and there is barely any power in the higher regions, and therefore, as shown in
The quantization of the DCT coefficients 10 is performed by a quantization section 11 and the quantized coefficients 12 obtained thereby are scanned, run-length encoded, and multiplexed in a compressed stream 14 by a variable length coding section 13 before being transmitted. Further, the motion vectors 5 detected by the motion detection section 2 are also multiplexed in the compressed stream 14 and transmitted, one macroblock at a time, because these vectors are required in order to allow the image decoding apparatus described subsequently to generate a prediction image that is the same as that of the image coding apparatus.
In addition, the quantized coefficients 12 are decoded locally via a reverse quantization section 15 and a reverse DCT section 16, and the decoded results are added to the prediction image 6 by an addition section 22, whereby a decoding image 17 which is the same as that of the image decoding apparatus is generated. The decoding image 17 is used in the prediction for the next frame and is therefore stored in the frame memory 3.
A description is provided next for the constitution of a conventional image decoding apparatus that is based on an MPEG1 image decoding system as shown in
However, in a conventional apparatus, MC performs movement amount detection based on the premise that all of the pixels in the blocks (referred to as MC blocks hereinafter and as macroblocks in the MPEG1 example above) which are the units of MC possess the same motion. Consequently, the possibility exists that, with a prediction image that is constituted by the spatial disposition of MC blocks, a signal waveform will arise in which discontinuity is perceived at the boundaries of the MC blocks. This discontinuous waveform can be compensated by supplementing the residual component in cases where an adequate encoding data amount is allocated to the residual signal. However, when coding is carried out with a high compression ratio, a satisfactory rendition of the residual signal is not possible and the discontinuous boundaries are sometimes apparent and perceived as distortion.
Further, it has been identified that, because the DCT is also a closed orthogonal transform in fixed blocks, in cases where the transform basis coefficients are reduced as a result of coarse quantization, the signal waveform which naturally connects between blocks cannot be reconstituted and unnatural distortion is generated between blocks (block distortion).
As means for solving the former MC block boundary discontinuity, overlapped motion compensation (called OBMC hereinafter) has been proposed. As illustrated by
In
Pc=W1×P{C,MV(C)}+W2×P{C,MV(A)}+W3×P{C,MV(B)}+W4×P{C, MV(D)}+W5×P{C,MV(E)}
Here, the weight is normally set so that the influence of the original predictive data of block C becomes gradually smaller when moving from the center of the block C toward the block boundaries. Such processing affords the benefit that, because the prediction image is determined such that the movement amounts of neighboring regions overlap block C's own motion, the continuity of the waveform is preserved between the inner and outer pixels of the MC blocks and thus the boundaries thereof do not readily stand out.
However, with OBMC, there is the problem that, in addition to the extraction of predictive data for a block's own motion vectors, processing is also executed for all the MC blocks which includes processing to extract predictive data by means of motion vectors of neighboring MC blocks, and processing that involves the weighted addition of such data, meaning that the computational load is high.
Furthermore, in the movement amount detection involved in image coding, because detection is performed based on the criterion that the power of the prediction residual should be minimized beyond the movement amount that matches the natural movement of the subject, the problems exist that motion which is not based on true movement is sometimes detected in regions containing a lot of noise or in other locations, and that, in such locations, MC blocks are smoothed beyond what is necessary through the combined influence of neighboring movement amounts in OBMC, and two-line blurring is generated, and so forth.
On the other hand, as means for solving the latter DCT block distortion, a loop filter has been proposed. The loop filter acts as a smoothing filter for the boundaries of the DCT blocks of a decoding image that is obtained by adding a prediction residual signal, which has undergone encoding and local decoding, to a prediction image. This is a technique that does not introduce the effects of distortion, as caused by DCT quantization, to MC by removing block distortion from the reference image which is used for subsequent frames. However, so long as MC is limited to being performed in block units, the discontinuity between MC blocks will not necessarily be avoided. Further, there is the problem that in cases where residual coding which is not dependent on block structure such as subband coding or a block-spanning basis transform, or the like, is performed, coding efficiency disadvantages caused by the existence of a discontinuous waveform at block boundaries cannot be avoided.
The present invention was conceived in view of these problems, an object thereof being to provide an image coding apparatus, image decoding apparatus, image coding method, and image decoding method that make it possible to use relatively simple computation to perform processing, with respect to a prediction frame image generated by block-based motion-compensated interframe prediction (MC), to adaptively smooth a discontinuous waveform generated between MC blocks of the prediction frame image, whereby the efficiency of low bit rate coding that employs interframe MC can be improved.
In order to resolve the above problems, the image coding apparatus according to the present invention is characterized by comprising: motion compensation predicting means for generating a motion-compensated prediction image by detecting movement amounts in predetermined partial image region units of an input image; smoothing means for performing smoothing of pixels located at the boundaries of adjoining partial image regions on the basis of predetermined evaluation criteria, with respect to the prediction image obtained by the motion compensation predicting means; and prediction residual coding means for coding a prediction residual signal obtained from the difference between the input image and the smoothed prediction image.
Further, as the image decoding apparatus which corresponds to this image coding apparatus, the image decoding apparatus according to the present invention is characterized by comprising: motion compensation predicting means for generating a motion-compensated prediction image by detecting movement amounts in predetermined partial image region units; smoothing means for performing smoothing of pixels located at the boundaries of adjoining partial image regions on the basis of predetermined evaluation criteria, with respect to the prediction image obtained by the motion compensation predicting means; prediction residual decoding means for decoding a prediction residual signal from the encoding side; and adding means for obtaining a decoded image by adding together a decoded prediction residual signal obtained by the prediction residual decoding means, and the smoothed prediction image.
In addition, in order to resolve the above problems, the image coding method according to the present invention is characterized by comprising: a motion compensation predicting step of generating a motion-compensated prediction image by detecting movement amounts in predetermined partial image region units of an input image; a smoothing step of performing smoothing of pixels located at the boundaries of adjoining partial image regions on the basis of predetermined evaluation criteria, with respect to the prediction image obtained by the motion compensation predicting step; and a prediction residual coding step of coding the prediction residual signal obtained from the difference between the input image and the smoothed prediction image.
Further, as the image decoding method which corresponds to this image coding method, the image decoding method according to the present invention is characterized by comprising: a motion compensation predicting step of generating a motion-compensated prediction image by detecting movement amounts in predetermined partial image region units; a smoothing step of performing smoothing of pixels located at the boundaries of adjoining partial image regions on the basis of predetermined evaluation criteria, with respect to the prediction image obtained by the motion compensation predicting step; a prediction residual decoding step of decoding a prediction residual signal from the encoding side; and an adding step of obtaining a decoded image by adding together a decoded prediction residual signal obtained by the prediction residual decoding step, and the smoothed prediction image.
According to this constitution, smoothing is performed for pixels located at the boundaries of adjoining partial image regions on the basis of predetermined evaluation criteria, with respect to the prediction image, and it is therefore possible to perform correction in the direction in which only the smoothing processing, that corrects discontinuity in partial image regions, is allowed. It is therefore possible to improve the coding efficiency by suppressing discontinuous waveforms generated in the prediction residual. Accordingly, it is possible to use relatively straightforward computation to perform processing, with respect to a prediction frame image generated by block-unit motion-compensated interframe prediction (MC), to adaptively smooth a discontinuous waveform generated between MC blocks of the prediction frame image, whereby the efficiency of low bit rate coding that employs interframe MC can be improved.
Embodiments of the present invention will be described in detail hereinbelow by referring to the drawings.
First EmbodimentThe MC procedure of this image coding apparatus is substantially the same as the method described in the conventional example. An outline of this procedure is provided in
For example, in MC mode 1 shown in
A procedure that involves performing an orthogonal transform with respect to a residual signal obtained from the difference between an input image and a prediction image which has undergone smoothing processing, and then quantizing and entropy coding the corresponding coefficients, is also as described with reference to
The operation of the image coding apparatus and image decoding apparatus shown in
The operation of the image coding apparatus will be described first. The input image signal 101 is a temporal array of frame images and will subsequently embody the signal of a frame image unit. A frame image that is to be encoded is the current frame 601 shown in
The current frame is encoded by means of the following procedure. The input image signal 101 is inputted to a motion detection section 102 one macroblock at a time, and motion vectors 105 are detected in the motion detection section 102. Of the macroblock forms shown in
Although the motion detection section 102 and the motion compensation section 107 perform processing for every one of the macroblocks, the signal for the difference with respect to the input image signal 101 (the prediction residual signal 108) is obtained with the frame as the unit. That is, the motion vectors 105 of individual macroblocks are maintained over the entire frame, whereby the prediction image 106a is constituted as a frame-unit image.
Next, smoothing processing between MC blocks of the prediction image 106a is performed in the smoothing filter section 124. The details of this processing will be described in detail subsequently. The smoothed prediction image 106b is subtracted from the input image signal 101 by the subtraction section 131, and, as a result, the prediction residual signal 108 is obtained. The prediction residual signal 108 is converted into orthogonal transform coefficient data 110 by the orthogonal transform section 109. A DCT is used for example in the orthogonal transform. The orthogonal transform coefficient data 110 passes through a quantization section 111, and is scanned and run-length encoded by the variable length coding section 113, before being multiplexed and transmitted in a compressed stream 114 by same.
Thereupon, coding mode data 123 that indicates whether intraframe coding or interframe coding has been performed, which is determined one macroblock at a time, is also multiplexed. In an inter mode case, motion vectors 105 are multiplexed and transmitted in a compressed stream 114 one macroblock at a time. Further, quantized coefficients 112 are locally decoded via a reverse quantization section 115 and a reverse orthogonal transform section 116, and the decoded result is added to the prediction image 106b by an addition section 132 to thereby generate a decoding image 117 which is the same as that on the image decoding apparatus side. The decoding image 117 is stored in the frame memory 103 to be used as a reference image 104 for the prediction of the next frame.
Next, the operation of the image decoding apparatus will be described with reference to
The prediction image 106a passes through the smoothing filter section 124 and is then outputted as the smoothed prediction image 106b. The quantized orthogonal transform coefficients 112 are decoded via a reverse quantization section 120 and a reverse orthogonal transform section 121, and then added by an addition section 133 to the prediction image 106b to form the final decoded image 117. The decoded image 117 is stored in the frame memory 122 and outputted to a display device (not shown) with predetermined display timing, whereby the image is played back.
Next, the operation of the smoothing filter section 124 will be described. First, the grounds for the need for the smoothing filter will be described with reference to
That is, all the pixels contained the MC blocks possess the same movement amount. Generally, in block unit MC, motion vectors which afford the greatest reduction in the prediction residual for the MC blocks are detected, meaning that no consideration is given to spatial continuity with adjoining MC blocks. For this reason, as shown in
However, in cases where, in the prediction residual coding of this frame, such a specific waveform cannot be adequately encoded, the waveform component remains in the local decoding image and appears within the MC blocks in the prediction image of subsequent frames. This sometimes influences the coding efficiency of the prediction residual signal. In a natural image, the boundaries of the original MC blocks should be smoothly linked, and, based on this assumption, the processing performed by the smoothing filter section 124 has as an object to obtain a prediction image which is close to being a natural image by smoothing any discontinuous waveforms present between MC blocks.
The constitution of the smoothing filter section 124 is shown in
(1) current S(A) is updated to max {S(A), S0+1}
(2) current S(B) is updated to max {S(B), S0+1}
(3) current S(C) is updated to max {S(C), S0+2}
(4) current S(D) is updated to max {S(D), S0+1}
(5) current S(E) is updated to max {S(E), S0+1}
Therefore, the block activity level in the vicinity of the block to be intra coded is set high. The resolution of the prediction image in intra coding is generally lower than that of a prediction image produced by inter coding and hence the boundaries of blocks among macroblocks of intra mode stand out easily. The provision of step ST1 is equivalent to raising the priority of smoothing processing in such regions.
A description will be provided next for the rule for setting S(X) when the coding mode data 123 of macroblocks including block C indicates inter coding. First, it is judged whether or not the current prediction image was generated using bidirectional prediction (the B frame prediction mentioned in the conventional example) (step ST2).
In cases where bidirectional prediction can be used, the prediction direction can be changed for each of the macroblocks. When the prediction direction differs between blocks, spatial continuity at the boundaries of both blocks cannot be assumed. That is, a judgment is made of whether or not the prediction direction of blocks A, B, D, and E, which adjoin block C, is the same, and processing is then switched (step ST3).
When only unidirectional prediction is used or when the frame permits bidirectional prediction and the prediction direction for block C is the same, the block activity level is updated in accordance with Rule 2 below (step ST4).
(Rule 2)(1) If macroblocks including block A are of an inter mode, current S(A) is updated to max {S(A), K} and current S(C) is updated to max {S(C), K}.
Here, K=2 (when mvd (A,C)≧3)
K=1 (when 0<mvd (A,C)<3)
K=0 (when mvd (A,C)=0)
(2) If macroblocks including block B are of an inter mode, current S(B) is updated to max {S(B), K} and current S(C) is updated to max {S(C), K}.
Here, K=2 (when mvd (B,C)≧3)
K=1 (when 0<mvd (B,C)<3)
K=0 (when mvd (B,C)=0)
(3) If macroblocks including block D are of an inter mode, current S(D) is updated to max {S(D), K} and current S(C) is updated to max {S(C), K}.
Here, K=2 (when mvd (D,C)≧3)
K=1 (when 0<mvd (D,C)<3)
K=0 (when mvd (D,C)=0)
(4) If macroblocks including block E are of an inter mode, current S(E) is updated to max {S(E), K} and current S(C) is updated to max {S(C), K}.
Here, K=2 (when mvd (E,C)≧3)
K=1 (when 0<mvd (E,C)<3)
K=0 (when mvd (E,C)=0)
(5) If blocks A, B, D, and E are intra coded, the block activity level thereof is not changed.
In the above rule, mvd (X,Y) indicates the large value of the differential values for the components of motion vectors of adjoining blocks X and Y. Further, max (a,b) indicates the larger value of the values a and b. By updating the above block activity level, a high block activity level can be provided between blocks exhibiting a marked motion vector difference.
When mvd (X, Y)=0 (when there is no motion vector difference between blocks X and Y), this represents a case where the block boundary retains complete spatial continuity and there is thus no need for smoothing here. The block activity level is therefore set to a minimum value.
On the other hand, in a frame permitting the use of bidirectional prediction and when the direction of the prediction for the blocks A, B, D, E is different in relation to block C, or in a mode that combines prediction images by adding and averaging the prediction values in forward and backward directions, the spatial continuity of the prediction image is broken irrespective of the motion vector difference, and hence the current S(X) is updated (step ST5) to max {S(X),1} (X represents blocks A to E). The above processing is performed until completed for all the fixed blocks X in the frame (step ST6), and the setting of the block activity level S(X) 126 is thus completed.
By using the block activity level S(X) 126 set by the block activity level calculation section 125, smoothing processing between MC blocks is performed for the prediction image 106a in the filter processing section 127. In the filter processing process, futile filter processing is corrected in the post-processing section 129 so that the results 128 of performing smoothing filtering once do not produce excessive smoothing. The process for this processing is shown in the flowchart of
That is, in
First, it is judged whether or not the magnitude of the block boundary difference value d=|r1−l1| (here, r1, l1 represent the pixel values of the respective pixels r1, l1) exceeds a threshold value α (S) established in accordance with the block activity level S (step ST7). In the filter processing below, processing is performed for two block boundaries and therefore, the larger value of the values of S(X) for the two blocks processed is used as the block activity level S. For example, when filtering is performed on the boundaries of block B and block C in
As a result, if S=0, there is no discontinuity at the block boundaries and so the filter processing is skipped. If S=1, filter processing is performed for the two pixels which are pixels r1 and l1 (step ST9). As shown in
When S=2, in addition to the pixels r1 and l1, pixels r2 and l2 are pixels that are targeted for smoothing (steps ST10 and ST11). In cases where S=2, there are often steep and discontinuous boundaries due to the high block activity level, and hence the object is to increase the continuity of the signal by increasing the extent of the smoothing.
The above processing is carried out in a filter processing section 127. The prediction pixel value 128 produced by filter processing is corrected so as to be effective in the coding efficiency of a post-processing section 129. The processing of the post-processing section 129 is equivalent to steps ST12 and ST13 in
Specifically, the functions shown in
Therefore, according to the image coding apparatus and image decoding apparatus according to the first embodiment, it is possible to perform correction in the direction in which only filtering to correct MC block discontinuity is allowed, by means of the above corrective measures performed by the smoothing filter section 124. The prediction image 106b is outputted via the above processing and it is therefore possible to improve the coding efficiency by suppressing discontinuous waveforms generated in the prediction residual.
When setting the block activity level in
Furthermore, this smoothing filter processing can also be constituted so that same can be turned ON/OFF in frame units. The processing itself of the smoothing filter section 124 is processing to change prediction image data selected optimally in MC block units, and therefore this processing can also have an adverse effect as well as a good effect on the coding efficiency. Thus, image analysis in frame units is performed by the image coding apparatus. It is judged beforehand whether or not motion causing discontinuity between MC blocks is present, and the smoothing filter section 124 is turned ON when discontinuity is generated and turned OFF in the absence of discontinuity.
Examples of image analysis include the evaluation of a provisional residual between the input image signal 101 and the prediction image 106a. The signal distribution of the residual is viewed and, because residual coding processing does not require smoothing filter processing in frames that are not particularly disadvantageous, the filter is turned OFF, but the filter is turned ON for frames that are significantly disadvantageous. For example, consideration may be given to operation such that in cases where the proportion of the residual signal amount at the MC boundaries in relation to the overall residual signal amount is equal to or more than a certain fixed threshold value, the filter is turned ON, and when this proportion is equal to or less than a threshold value, the filter is turned OFF. Alternatively, there are also methods in which a determination of whether to turn the filter ON or OFF is made after the frame unit coding efficiency has been compared in cases where smoothing processing is and is not performed. The result of the ON/OFF determination is transmitted as a portion (bit data that indicates the presence or absence of smoothing) of the header information of the start of a frame in the compressed stream 114. By means of such a constitution, smoothing processing can be applied more adaptively for an irregular image signal.
Second Embodiment
Here, m is the total number of basis search steps, i is the basis search step number, and ri is the prediction residual image signal following completion of the basis search of the (i−1)th step, this signal being without further processing the prediction residual image signal for the basis search of the ith step, where r0=f. Further, si and gki are the partial region and basis respectively, these being obtained by selecting, in the basis search of the ith step, a combination of s and gk such that the inner product value thereof is maximized, from optional partial regions s (partial regions in a frame) of ri, as well as optional bases gk contained in the basis set G. If the basis search is performed thus, the larger the number m of basis search steps, the less energy rm diminishes. This means that the greater the number of bases used in the rendition of the prediction residual image signal f, the better the signal can be rendered.
In each of the basis search steps, the data that is encoded is:
1) The index expressing gki (gk is shared and maintained on the encoding side and the decoding side, which makes it possible to specify a basis by converting only the index data).
2) The inner product values <si, gki> (correspond to the basis coefficients), and
3) si on-screen center position data pi=(xi, yi).
A set of these parameters is collectively known as an atom. By means of this image signal rendition and encoding method, the number of encoded atoms is increased, that is, as the total number m of basis search steps increases, so too does the encoded volume, whereby distortion is reduced.
On the other hand, according to the image coding performed by Matching Pursuits in the above paper, MC is carried out independently from Matching Pursuits, and atom extraction is performed with respect to the prediction residual signal. In this case, there is the possibility that atoms will be extracted in positions extending over the MC block. So long as a system is adopted in which MC is dependent on the block structure, there is the disadvantage that a discontinuous waveform between MC blocks as described in the first embodiment above remains in the residual signal and thus a waveform which should not be encoded will be encoded.
Conventionally, overlapped MC that considers the motion vectors of neighboring MC blocks has been utilized as a measure to resolve the foregoing problem. However, overlapped MC references more numerous prediction values and performs calculations for the final prediction value by means of a weight sum and there is therefore the problem that the computational cost is high and it is not possible to perform adaptive smoothing with respect to the pixel values in the MC blocks, which obscures the prediction image excessively. By performing adaptive smoothing filter processing at the MC block boundaries as described in the first embodiment, smoothing of the residual signal can be performed without obscuring the prediction image excessively.
In the image coding apparatus shown in
First of all, the current frame is outputted to a motion detection section 202, and detection of the motion vectors 205 is performed by means of a procedure that is exactly the same as that of the motion detection section 102 of the first embodiment above. However, the motion detection section 202 divides the intra coding into that for the DC component and that for the AC component. The result of encoding the DC component is used as part of the prediction image and the AC component is encoded as part of the prediction residual. This constitutes processing to obtain the prediction image batchwise in frame units in order to use the Matching Pursuits.
Accordingly, when intra mode is selected in the motion detection section 202, the corresponding macroblock prediction image is filled by an intra DC component which is encoded and locally decoded. The intra DC component undergoes prediction from neighboring image data as well as quantization in a DC coding section 225, and is outputted to a variable length decoding section 213 as encoded data 226 and multiplexed in a compressed stream 214.
A motion compensation section 207 uses the DC component as above for intra mode macroblocks, and, for inter mode macroblocks, uses motion vectors 205 to reference a local decoding image 204 in the frame memory 203, whereby a prediction image 206a for the current frame is obtained. Although the motion detection section 202 and the motion compensation section 207 perform processing for each of the macroblocks, the differential signal with respect to the input image signal 201 (the prediction residual signal 208) is obtained by taking the frame as the unit. That is, the motion vectors 205 of individual macroblocks are maintained over the entire frame, whereby the prediction image 206a is constituted as a frame-unit image.
Next, the smoothing filter section 224 performs smoothing processing between the MC blocks of the prediction image 206a. The operation of a smoothing filter section 224 uses coding mode data 223 and motion vectors 205 and is implemented by means of processing like that in the first embodiment. The smoothed prediction image 206b is subtracted from the input image signal 201 by a subtraction section 241 to obtain a prediction residual signal 208.
Next, the atom extraction section 209 generates atom parameters 210 on the basis of the above-described Matching Pursuits algorithm, with respect to the prediction residual signal 208. A basis set gk 211 is stored in a basis codebook 210. If, based on the properties of the Matching Pursuits algorithm, a basis which can render the partial signal waveform as accurately as possible can be found in an initial search step, the partial signal waveform can be rendered by fewer atoms, that is, with a small encoded volume. Atoms are extracted over the whole area of the frame. For the coding of the position data in the atom parameters, making use of the fact that the atom coding order does not influence the decoding image, sorting is performed such that the atoms are aligned in order using two-dimensional co-ordinates with the top left-hand corner of the frame as the starting point, and the coding order is constructed so that the atoms are counted in macroblock units. The macroblock units are therefore constituted such that atom parameters 212 (the respective basis index, position data, and basis coefficient) are coded in proportion to the number of atoms contained in the macroblock units.
An atom decoding section 215 decodes a local decoding residual signal 216 from the atom parameters 212 and then obtains a local decoding image 217 by adding the local decoding residual signal 216 to the smoothed prediction image 206b by means of an addition section 242. The local decoding image 217 is stored in the frame memory 203 in order to be used in the MC for the next frame.
Next, the image decoding apparatus will be described by referring to
Therefore, according to the image coding apparatus and the image decoding apparatus of the second embodiment, results similar to those for the first embodiment above can be obtained also for an image coding and decoding apparatus according to a compression coding system that applies the technique known as Matching Pursuits.
Third EmbodimentA third embodiment of the present invention will now be described. The third embodiment describes another smoothing filter section. This smoothing filter section is a modification of the smoothing filter sections 124 and 224 described in the above first and second embodiments respectively, and because this filter simply substitutes for the smoothing filter sections 124 and 224, this filter can be applied to the image coding apparatus and image decoding apparatus shown in
With the smoothing filter section according to the third embodiment, the block activity level calculation section 125 does not define the block activity level information with respect to the blocks but instead defines this information with respect to the block boundaries. Consequently, the filter can be controlled by uniquely allocating an activity level without the selection of an activity level used in circumstances where the activity level differs between blocks as was described in the first and second embodiments.
The activity level is defined with respect to block boundaries and therefore, as shown in
The boundaries between blocks D and E are determined as SL(D) and SU(E) respectively. As indicated in the first embodiment, the method of determining the activity level is determined by the motion vector difference between two blocks and by a difference in coding mode therebetween, and can therefore be determined using setting rules like those for the first embodiment.
Therefore, according to the smoothing filter section of the third embodiment, the filter can be controlled by uniquely allocating an activity level without the selection of an activity level used in circumstances where the activity level differs between blocks as was described in the first and second embodiments.
Furthermore, in the third embodiment, because the determination of the activity level is dependent on the blocks to the left and above alone, the apparatus, which generates the prediction image in macroblock units and encodes and decodes this image, is also able to carry out encoding and decoding processing while performing smoothing of the MC blocks. Further, by introducing pipeline processing in macroblock units, implementation that enables rapid and efficient processing is possible for the image coding apparatus and image decoding apparatus.
Fourth EmbodimentA fourth embodiment of the present invention will now be described. The fourth embodiment describes another smoothing filter section. This smoothing filter section is a modification of the smoothing filter sections 124 and 224 described in the above first and second embodiments respectively, and because this filter simply substitutes for the smoothing filter sections 124 and 224, this filter can be applied to the image coding apparatus and image decoding apparatus shown in
The smoothing filter section of the fourth embodiment switches the filter characteristics in accordance with the activity level.
Therefore, according to the smoothing filter section of the fourth embodiment, the extent of the smoothing according to the activity level can be controlled.
Further, the constitution may be such that, in the switching of the filter characteristics, a plurality of characteristics can be selected in accordance with the activity level, and such that information identifying the characteristics is multiplexed in a compressed stream 114 and transmitted to the image decoding apparatus. By mean of such a constitution, a more detailed adaptive judgment based on image analysis on the image coding apparatus side can be reflected in the filter characteristics, and the image decoding apparatus can thus implement adaptive smoothing filter processing without performing special image analysis processing as implemented by the image encoding apparatus. The fourth embodiment is equally applicable in cases of using an activity level which is defined for block boundaries as described in the third embodiment.
When the filter characteristics are switched, the type of filter characteristics used is transmitted as part of the header information at the start of the frame in the compressed stream, for example.
Fifth EmbodimentThe image coding apparatus shown in
That is, this smoothing filter section 524 is a modification of the smoothing filter sections 124 and 224 described in the above first and second embodiments respectively, and, with the exception of the inputting of the reference image 104, can simply substitute for these filters. In the fifth embodiment, the difference is obtained between the prediction image 106a prior to smoothing filter processing, and the reference image 104 which is in the frame memory 103 and from which the prediction image 106a originated, and filter control is performed on the basis of the corresponding error margin electric power.
The prediction image 106a is image data extracted from the reference image 104 using the motion vectors 105, and is image data that approximates the input image signal 101 inputted to the image coding apparatus. In other words, when points that are spatially the same in the reference image 104 and the prediction image 106a are compared, the error margin electric power is large in parts with movement, and in parts with very little movement, the error margin electric power is small. The magnitude of the motion vectors 105 does to some extent express the movement amount, but primary factors that are not dependent on a change to the image, such as noise, also influence detection, and therefore the extent and intensity of the movement cannot be adequately expressed by this magnitude alone. However, the above error margin electric power can be used as an indicator for the intensity of the movement, whereby the adaptability of the filter control can be improved. Further, the reference image 104 can use exactly the same data on the encoding and decoding sides and therefore, when introducing this control, implementation is possible without transmitting special identification information to the decoding apparatus.
Specifically, as shown in
In cases where the activity level at least is more than zero, the activity level is not evaluated by means of motion vectors alone. Instead, the error margin electric power thus found is used, and when same is greater than a predetermined threshold value, the activity level is changed toward a larger value, and when the error margin electric power is smaller than a predetermined threshold value, the activity level is set to zero and smoothing is not performed (step ST15). At such time, the threshold value in the direction of raising the activity level and the threshold value in the direction of lowering the activity level need not necessarily be the same.
Further, in the fifth embodiment, as far as the reference image 104 is concerned, the constitution may be such that average values in blocks are precalculated and buffered in evaluation block units before this image is stored in the frame memory, and average values are similarly found for the prediction image 106a, whereby the error margin electric power evaluation is performed using only average values.
Because the average values of the error margin amounts between the reference image 104 and the prediction image 106a are controlling components and the average values alone can be stored in a small buffer, the frequency of access to the frame memory during an activity level calculation can be reduced without affecting the judgment of the activity level.
Furthermore, when the activity level is allocated at the block boundaries as is the case in the third embodiment above, the constitution can also be such that partial regions that extend across the block boundaries are defined and the error margin amount between the reference image 104 and the prediction image 106a is evaluated in these units.
In addition, in the fifth embodiment, an error margin amount between the reference image 104 and the prediction image 106a is used to update the activity level but may also be used to change the filter characteristics applied for points possessing a certain predetermined activity level value. For example, in cases where the activity level of a certain block or block boundary is an intermediate value in a defined activity level range, as the filter characteristics at this time are changed in accordance with the conditions, the adaptability increases still further. In order to achieve this object, a constitution is also possible in which an evaluation is adopted in which the error margin amount between the reference image 104 and the prediction image 106a is switched.
Therefore, by means of the smoothing filter section of the fifth embodiment, the adaptability of the filter control can be improved as described above, and the reference image 104 is able to use exactly the same data on the encoding and decoding sides, and therefore, when introducing this control, implementation is possible without transmitting special identification information to the decoding apparatus. Further, the frequency of access to the frame memory during an activity level calculation can be reduced without affecting the judgment of the activity level.
INDUSTRIAL APPLICABILITYThe present invention can be used as an image coding apparatus and an image decoding apparatus applied to a mobile image transmission system, for example.
Claims
1. An image coding apparatus, characterized by comprising:
- motion compensation predicting means for generating a motion-compensated prediction image by detecting movement amounts in predetermined partial image region units of an input image;
- smoothing means for performing smoothing of pixels located at the boundaries of adjoining partial image regions on the basis of predetermined evaluation criteria, with respect to the prediction image obtained by the motion compensation predicting means; and
- prediction residual coding means for coding the prediction residual signal obtained from the difference between the input image and the smoothed prediction image.
Type: Application
Filed: Jun 4, 2010
Publication Date: Nov 11, 2010
Applicant: NTT DoCoMo, Inc (Chiyoda-ku)
Inventors: Shunichi SEKIGUCHI (Yamato-shi), Sadaatsu Kato (Yokosuka-shi), Mitsuru Kobayashi (Yokohama-shi), Minoru Etoh (Yokohama-shi)
Application Number: 12/794,304
International Classification: H04N 7/32 (20060101);