VIDEO ENCODING METHOD AND APPARATUS, AND VIDEO DECODING METHOD AND APPARATUS
A method for encoding a video block using reference blocks comprises assigning the video block to one of first and second prediction groups, and encoding the video block according to a motion compensated prediction encoding mode, using the reference blocks depending on the one of the first and second prediction groups to which the video block is assigned, one of the reference blocks being a decoded block, wherein a first prediction group is obtained by a prediction using the reference blocks belonging to a first prediction group, and a second prediction group is obtained by a prediction using the reference blocks belonging to at least one of the second prediction group and the first prediction group.
This application is a divisional of and claims the benefit of priority under 35 USC §120 from U.S. Ser. No. 10/396,437, filed Mar. 26, 2003 and is based upon and claims the benefit of priority under 35 USC §119 from the Japanese Patent Application No. 2001-386596, filed Dec. 19, 2001 and No. 2002-97892, filed Mar. 29, 2002, the entire contents of which are incorporated herein by reference.
BACKGROUND OF THE INVENTION1. Field of the Invention
The present invention relates to a video encoding method and apparatus and a video decoding method and apparatus with the use of a motion compensated prediction intra frame encoding.
2. Description of the Related Art
As a video compression encoding technique, MPEG 1 (ISO/IEC 11172-2), MPEG2 (ISO/IEC 13818-2), MPEG 4 (ISO/IEC 14496-2) are put to practical use broadly. These video encoding modes are performed by a combination of an intra frame encoding, a forward prediction intra frame encoding, an encoding and a bi-directional prediction interframe encoding. The frames encoded by these encoding modes are called I picture, P picture and B picture. P picture is encoded using as a reference frame P or I picture just before the former P picture. B picture is encoded using as reference frame P or I picture just before and after the B picture. The forward prediction interframe encoding and bi-directional prediction interframe encoding are referred to as a motion compensated prediction interframe encoding.
When the video encoding data based on the MPEG mode is played back in fast-forward, a method that only I picture that the reference frame is not required is played back or a method that only I and P pictures is decoded while skipping B picture using a nature that B picture cannot be used as a reference frame is conventional. However, when only I picture is played back, if the period of I picture is long, a high-speed fast-forward playback can be carried out but a smooth fast-forward playback cannot be carried out. In a fast-forward playback with the use of I and P pictures, since P picture is encoded by an interframe prediction encoding, all I and P pictures must be decoded. For this reasons, it becomes difficult to change a fast-forward speed freely.
In the video encoding of the conventional MPEG mode, B picture is not used as a reference frame. Therefore, in case of the prediction configuration that plural B pictures continue, B picture must be encoded using P picture separating from B picture with respect to a time as a reference frame. This results in a problem that the encoding efficiency of B picture deteriorates. On the other hand, when the decoded B picture is used as a reference frame in P picture, it is necessary to decode all frames including B picture in the fast-forward playback while skipping B picture. As a result, it becomes difficult to perform the fast-forward playback effectively.
As described above, when the video encoded data obtained by the encoding including a motion compensated prediction interframe encoding such as MPEG is played back with a fast-forward, it is difficult to perform a smooth fast-forward playback at a free playback speed in playing back only I picture. When the fast-forward playback is performed with skipping B picture without decoding it, it is difficult to use the decoded B picture as a reference frame. For this reason, there is a problem that the encoding efficiency deteriorates in a prediction configuration that the B pictures continue.
BRIEF SUMMARY OF THE INVENTIONIt is an object of the invention is to provide a video encoding and decoding method and apparatus using a motion compensated prediction interframe encoding, that enable a fast-forward playback at a high encoding efficiency and a high degree of freedom in the decoding side.
According to an aspect of the invention, there is provided a method for encoding a video block using reference blocks, comprising assigning the video block to one of a plurality of prediction groups including at least first and second prediction groups; and encoding the video block according to a motion compensated prediction encoding mode, using the reference blocks depending on the one of the prediction groups to which the video block is assigned, one of the reference blocks being a decoded block, wherein the first prediction group is obtained by a prediction using the reference blocks belonging to the first prediction group, and the second prediction group is obtained by a prediction using the reference blocks belonging to at least one of the second prediction group and the first prediction group.
BRIEF DESCRIPTION OF THE SEVERAL VIEWS OF THE DRAWING
An embodiment of the present invention will be described with reference to accompanying drawings. (Encoding)
The present embodiment is based on a video encoding which is a combination of a motion compensated prediction, an orthogonal transformation and a variable-length coding, the video encoding being represented by a conventional MPEG scheme. There will now be described a video encoding method based on prediction groups including two hierarchical layers.
A video signal 100 (video frame) is input to a video encoding apparatus every frame. At first the video frame of the video signal 100 is assigned to either of prediction groups of two hierarchical layers by a motion compensation prediction unit 111 (step S11). The video frame is encoded by a motion compensated prediction interframe encoding, using at least one reference frame belonging to a prediction group of at least one hierarchical layer lower than the hierarchical layer of the prediction group to which the video frame is assigned (step S12). In this embodiment, the reference frame stored in the frame memory set 118 is used.
The assignment of the video frame to the prediction group of each hierarchical layer is changed between frames with time. For example, the even numbered frame is assigned to the prediction group of the first hierarchical layer, and the odd numbered frame to the prediction group of the second hierarchical layer. The reference frame belonging to the prediction group of each hierarchical layer is determined according to the prediction group belonging to the video frame corresponding to the encoded frame used as a reference frame. In other words, if a video frame is assigned to a prediction group of a hierarchical layer, the encoded frame obtained by encoding and local-decoding the video frame belongs to the prediction group of the same hierarchical layer. The process of steps S11 and S12 is explained in detail.
As described above, a plurality of encoded frames belong to the prediction groups of the first and second hierarchical layers as reference frames. Two reference memory sets 118 and 119 are prepared for temporarily storing the encoded frames as the reference frames. The encoded frames belonging to the prediction group of the first hierarchical layer (i.e., the lowest hierarchical layer) are temporarily stored as reference frames in the first reference memory set 118. The encoded frames belonging to the prediction group of the second hierarchical layer (i.e., the higher hierarchical layer) are temporarily stored as the reference frames in the second reference memory set 119.
The video frame assigned to the prediction group of the first hierarchical layer is subjected to the motion compensated prediction interframe encoding, using the reference frame belonging to the prediction group of the first hierarchical layer and stored in the first reference memory set 118. On the other hand, the video frame assigned to the prediction group of the second hierarchical layer is subjected to the motion compensated prediction interframe encoding, using the reference frames belonging to both prediction groups of the first and the second hierarchical layers and stored in the first and second reference memory sets 118 and 119.
The motion compensated prediction frame encoding will be concretely explained. When the video frame corresponding to the video signal 100 belongs to the prediction group of the first hierarchical layer, one or more reference frames temporarily stored in the first reference memory set 118 are read out therefrom and input to the motion compensation prediction unit 111. In this time, the switch 120 is OFF, so that the reference frame from the first reference memory set 119 is not input to the motion compensation prediction unit 111. The motion compensation prediction unit 111 executes the motion compensated prediction using one or more reference frames read out from the reference memory set 118 to generate a prediction picture signal 104. The prediction picture signal 104 is input to the subtracter 110 to generate a predictive error signal 101 that is an error signal of the prediction picture signal 104 with respect to the input video signal 100.
When the video frame corresponding to the input video signal 100 belongs to the prediction group of the second hierarchical layer, the switch 120 is ON. In this time, one or more reference frames temporarily stored in the first and second reference memory sets 118 and 119 are read out therefrom, and input to the motion compensation prediction unit 111. The motion compensation prediction unit 111 generates the prediction picture signal 104 and supplies to the subtracter 110 similarly to the above. The subtracter 110 generates the predictive error signal 101.
The predictive error signal 101 is subjected to a discrete cosine transformation with the DCT transformer 112. The DCT coefficient from the DCT transformer 112 is quantized with the quantizer 113. The quantized DCT coefficient data 102 is divided in two routes, and encoded by the variable-length encoder 114 in one route. The DCT coefficient data 102 is reproduced as a predictive error signal by the dequantizer 115 and inverse DCT transformer 116 in the other route. This reproduced predictive error signal is added to the prediction picture signal 104 to generate a local decoded picture signal 103.
The encoded frame corresponding to the local decoded picture signal 103 is temporarily stored in either of the first and second reference memory sets 118 and 119 according to the prediction group of the hierarchical layer to which the video frame corresponding to the input video signal 100 is assigned (step S13). In other words, when the video frame belongs to the prediction group of the first hierarchical layer, the encoded frame is temporarily stored in the first reference memory set 118. When the video frame belongs to the prediction group of the second hierarchical layer, the encoded frame is temporarily stored in the second reference memory set 119.
From the motion compensation prediction unit 111 is output so-called side information 105 including a motion vector used for a motion compensated prediction, an index (first identification information) for identifying the prediction group to which the video frame belongs and an index (second identification information) which specifies the reference frame used for the motion compensated prediction interframe encoding. The side information is encoded by the variable-length encoding unit 114 (step S14). In this case, the index for identifying the prediction group is encoded as a picture type representing, for example, a prediction configuration. The index specifying the reference frame is encoded every macroblock.
These side information are output as variable-length coded data 106 along with the quantized DCT coefficient data which is a result of the motion compensated prediction interframe encoding (step S15). For example, the side information is encoded as header information to encoded data 106. Further, if a second reference frame number setting method is adopted, information indicating the maximum number of frames is encoded as header information to the encoded data 106. The second reference frame number setting method is a method of setting the maximum number of reference frames assigned to the prediction group of each hierarchical layer by predefining the total number of reference frames belonging to the prediction group of each hierarchical layer. The encoded data 106 is sent to a storage medium or a transmission medium (not shown).
The new decoded frames are sequentially written in the reference memory sets 118 and 119 as reference frames. So-called FIFO (First-In First-Out) type control that the stored frames are sequentially deleted from the oldest reference frame is performed in units of a frame. However, when the reference frame is read out, a random access is done to an arbitrary reference frame in each of the reference memory sets 118 and 119.
The number of reference frames temporarily stored in the reference memory sets 118 and 119 respectively, in other words, the number of reference memories included in each of the reference memory sets 118 and 119 is determined by either of the following two methods.
In the first reference frame number setting method, the maximum number of reference frames belonging to the prediction group of each hierarchical layer is previously established according to an encoding method or an encoding specification such as a profile and a level. In the video encoding apparatus and the video decoding apparatus, the maximum number of the reference frames determined as described above is assured every prediction group, and encoding and decoding are done. In this case, the necessary number of reference frames can be assured automatically, by making the encoding specification coincide between the video encoding apparatus and the video decoding apparatus.
In the second reference frame number setting method, the total number of reference frames belonging to the prediction group of each hierarchical layer is predefined according to an encoding method or an encoding specification such as a profile and a level, and information on how many reference frames are assigned to the prediction group of each hierarchical layer, that is, information indicating the maximum number of frames is encoded as header information to the encoded data 106.
As thus described, in the second reference frame number setting method, the maximum number of reference frames which are most suitable for the prediction group of each hierarchical layer is dynamically assigned to the prediction group in the encoding side.
By encoding information indicating the assigned maximum number of frames, it is possible to make the maximum number of reference frames belonging to the prediction group of each hierarchical layer coincide between the encoding side and the decoding side. Therefore, a ratio of the maximum number of reference frames belonging to the prediction group of each hierarchical layer with respect to the total number of reference frames is suitably changed according to the change of the image nature of the input video signal 100. As a result, the encoding efficiency is improved.
In the above explanation, the encoding is performed in units of frames. The encoding is performed in units of blocks (macroblocks). In other words, the video block is assigned to one of a plurality of prediction groups including at least first and second prediction groups. The video block is encoded according to a motion compensated prediction encoding mode, using the reference blocks depending on the one of the prediction groups to which the video block is assigned, one of the reference blocks being a decoded block. The first prediction group is obtained by a prediction using the reference blocks belonging to the first prediction group. The second prediction group is obtained by a prediction using the reference blocks belonging to at least one of the second prediction group and the first prediction group.
The video block is encoded by each of an intraframe encoding mode, a forward prediction interframe encoding mode and a bi-directional prediction interframe encoding mode. The first video blocks encoded by the intraframe encoding mode and the forward prediction interframe encoding mode and the reference blocks corresponding to the first video blocks are assigned to the first prediction group. The second video blocks encoded by the bi-directional prediction interframe encoding mode and the reference blocks corresponding to the second video blocks are assigned to at least one of the first and second prediction groups. (Decoding)
The encoded data 106 output from the video encoding apparatus shown in
On the other hand, side information 202 including a motion vector encoded every macroblock, an index (first identification information) identifying the prediction group belonging to each video frame and an index (second identification information) specifying a reference frame is decoded (step 21). The selection of reference frame and motion compensation is performed according to the side information similarly to the encoding to generate a prediction picture signal 203. In other words, the reference frame is selected according to the first identification information and the second identification information (step S22). The result of the motion compensated prediction interframe encoding is decoded by the selected reference frame (step S23). The prediction picture signal 203 and the predictive error signal from the inverse DCT transformer 216 are added to generate a decoded picture signal 204.
The decoded frame corresponding to the decoded picture signal 204 is temporarily stored in either of the first and second reference memory sets 218 and 219 according to the prediction group to which the encoded frame corresponding to the decoded frame belongs (step S24). The decoded frame is used as the reference frame. These reference memory sets 218 and 219 are controlled in FIFO type similarly to the video encoding apparatus. The number of reference frames belonging to the prediction group of each hierarchical layer is set according to the first and second reference frame number setting methods described in the video encoding apparatus.
In other words, when the maximum number of reference frames belonging to the prediction group of each hierarchical layer is predefined according to the first reference frame number setting method and the encoding specification, the number of reference frames belonging to the prediction group of each hierarchical layer is set to a fixed value every encoding specification. When the total number of reference frames belonging to the prediction group of each hierarchical layer is predefined according to the second reference frame number setting method and the encoding specification, and the maximum number of reference frames is assigned to the prediction group of each hierarchical layer. Only the total number of reference frames is fixed, and the number of reference frames belonging to the prediction group of each hierarchical layer is dynamically controlled based on information indicating the maximum number of reference frames decoded according to the header information of encoded data.
As mentioned above, available reference frames differ according to the prediction group of the hierarchical layer to which the frame to be encoded or the frame to be decoded belongs. Assuming that frame memories 302 to 304 in
The motion compensation prediction unit selects one from among the available reference frames every macroblock or calculates a linear sum of the available reference frames by the linear predictor 301 to predict a reference frame based on the linear sum, whereby a motion compensation is performed to generate a prediction macroblock.
The video encoding apparatus selects the reference frame and the motion vector every macroblock so that the prediction macroblock with a small prediction error and a highest encoding efficiency is selected. The information of the selected reference frame and the information of the motion vector are encoded every macroblock.
In the video decoding apparatus, the motion compensation unit generates and decodes a prediction macroblock according to the received motion vector and information of the reference frame. When the prediction is performed based on the linear sum, information concerning the linear prediction coefficient is encoded as header information of the encoded data to make the linear predictor coefficient coincide between encoding and decoding.
FIGS. 6 to 11 show diagrams for explaining an interframe prediction configuration and a reference memory control in the present embodiment.
A picture with a suffix a such as Ia0, Pa2 or Pa4 belongs to the prediction group a, and a picture with a suffix b such as Pb1, Pb3 or Pb5 belongs to the prediction group b. The attributes of these prediction groups are encoded as an extension of a picture type or an independent index and are used as header information of the video frame. The video frame belonging to the prediction group a can use only the frame belonging to the prediction frame a and already decoded as a reference frame.
As for the prediction frame b of the higher hierarchical layer, a prediction picture is generated using one frame belonging to either of the prediction group a and the prediction group b and already decoded or a linear sum of both decoded frames.
The prediction group of each hierarchical layer has a reference memory corresponding to one frame. Thus, the number of reference frame for the video frame of the prediction group a is 1 in maximum. Two reference frames in maximum can be used for the video frame of the prediction group b. The frame Pa2 belonging to, for example, the prediction group a uses only the decoded frame Ia0 as the reference frame. The frame Pb3 belonging to the prediction group b uses two frames, i.e., the decoded frame Pa2 belonging to the prediction group a and the decoded frame Pb1 belonging to the prediction group b as the reference frame.
In
In the example of
It is possible to decode the frame at a half frame period without breaking down a prediction configuration by decoding only the frame belonging to the prediction group a. A smooth fast-forward playback can be performed by playing back the decoded frame belonging to the prediction group a at a frame rate of 2 times, for example. Also, when the bandwidth of a transmission channel fluctuates along with time in a video streaming, all encoded data are transmitted in normal cases. When the effective bandwidth of the transmission channel decreases, the encoded data belonging to the prediction group b is discarded and only the encoded data belonging to the prediction group a of the lower hierarchical layer is sent. In this case, the decoded frame can be reproduced without failure on the receiving side.
The reference frame of the prediction groups a, b and c of respective hierarchical layers is one frame. The hierarchy increase in an order of a, b and c. In other words, the frame belonging to the prediction group a can use only one frame of the decoded prediction group a as a reference frame. The frame belonging to the prediction group b can use two frames of the decoded prediction groups a and b as reference frames. The frame belonging to the prediction group c can use three frames of the decoded prediction groups a, b and c as reference frames.
In
In the configuration of
In
In the example of
Similarly to FIGS. 6 to 8, FM1, FM2, FM3 and FM4 show physical frame memories, and DEC, REFa1, REFa2 and REFb show logical frame memories. DEC shows a frame memory for temporarily storing a frame during decoding. REFa1 and REFa2 show reference memories corresponding to two frames of the prediction group a. REFb shows a reference memory corresponding to one frame of the prediction group b.
Idx0 and Idx1 in
BWref in
The forward prediction of B picture can be performed by two frames selectable in maximum in the example of
As for the indexes of the reference frames, numbering is added to the reference frames every video frame in a sequence to be time-closer to the reference frame for the forward prediction. In the example of
In the examples of FIGS. 6 to 10, the number of reference memories of the prediction group of each hierarchical layer is fixed. However, the number of reference frames of the prediction group of each hierarchical layer may be dynamically changed under the constant total number of reference frames. In the configuration of, for example,
In the above explanation, the decoding is performed in units of frames. The decoding is performed in units of blocks (macroblocks). In other words, the coded data includes encoded video block data, first encoded identification information indicating first and second prediction groups to which the video block data is assigned and second encoded identification information indicating reference block data used in the motion compensated prediction interframe encoding. The first encoded identification information and the second encoded identification information are decoded to generate first decoded identification information and second decoded identification information. The video block data is decoded using the reference block data belonging to the first prediction group and the reference block data belonging to at least one of the first and second prediction groups according to the first decoded identification information and the second decoded identification information.
The above way enables dynamically to set an optimum prediction configuration suitable for an input video image in the limited number of reference frames. Also, the way enables a high efficiency encoding with improved prediction efficiency.
As described above, an interframe prediction configuration is made up as a layered prediction group configuration. An interframe prediction from the reference frame of the prediction group of a higher hierarchical layer is prohibited. In addition, the number of reference frames of the prediction group of each hierarchical layer is dynamically changed under the constant total number of reference frames, resulting in that the encoding efficiency is improved and the fast-forward playback can be realized with a high degree of freedom.
When the hierarchy is increased, a gentle playback can be done in the fast-forward playback. Also, since a period of frame, i.e., a frame frequency increases, a picture quality is improved in the fast-forward playback.
When the multi-hierarchical layer video image described above is played back with a home television, all hierarchical layers can be played back. When the multi-hierarchical layer video image is played back with a cellular phone, the multi-hierarchical layer video image can be played back with being appropriately skipped in order to lighten a burden of a hardware. That is to say, the hierarchical layers can be selected according to the hardware of the receiver side.
Additional advantages and modifications will readily occur to those skilled in the art. Therefore, the invention in its broader aspects is not limited to the specific details and representative embodiments shown and described herein. Accordingly, various modifications may be made without departing from the spirit or scope of the general inventive concept as defined by the appended claims and their equivalents.
Claims
1. A method for decoding encoded data obtained by a motion compensated prediction interframe encoding, the method comprising:
- receiving the coded data including encoded video picture data, first encoded identification information indicating a plurality of picture groups of hierarchical layers to which the video picture data are assigned and second encoded identification information indicating reference picture data used in a motion compensated prediction interframe encoding and a prediction mode of motion compensated prediction interframe encoding;
- decoding the first encoded identification information and the second encoded identification information to generate first decoded identification information and second decoded identification information; and
- decoding the video picture data by a motion compensated prediction decoding scheme, using one of first reference picture data belonging to a picture group of same hierarchical layer as that of the video picture data and second reference picture data belonging to a picture group of hierarchical layer lower than that of the video picture and the prediction mode according to the first decoded identification information and the second decoded identification information. (See page 20, lines 1-8, page 22, line 26 to page 23, line 7)
2. The method according to claim 1, wherein the decoding the video picture data includes decoding the video picture using one of the first reference picture, the second reference picture and a linear sum of the first reference picture and the second reference picture. (See page 20, line 9)
3. The method according to claim 1, further including performing a smooth fast-forward playback by playing back the video picture data belonging to the lower picture group at a frame rate of plural times of an original frame rate. (See page 21, lines 21-23)
4. The method according to claim 1, wherein the decoding the video picture data includes decoding only the video picture data of the picture group of lower hierarchical layer and playing back the video picture data at an original frame rate. (See page 22, lines 15-18)
5. The method according to claim 1, wherein the picture groups of hierarchical layers includes a first picture group, a second picture group and a third picture group in order of decreasing hierarchical layer, and the decoding the video picture data includes decoding all encoded pictures not more than the third picture group to perform a normal playback, decoding the encoded pictures not more than the second picture group to play back pictures ½ of the normal playback, and decoding the encoded pictures not more than the first picture group to play back pictures ¼ of the normal playback. (See page 23, lines 24 to page 24, line3)
6. A video decoding apparatus which decodes encoded data obtained by a motion compensated prediction interframe encoding, the apparatus comprising:
- a receiving unit configured to receive coded data including encoded video picture data, first encoded identification information indicating plural picture groups to which the video picture data are assigned and second encoded identification information indicating reference picture data used in a motion compensated prediction interframe encoding and a prediction mode of motion compensated prediction interframe encoding;
- a first decoder which decodes the first encoded identification information and the second encoded identification information to generate first decoded identification information and second decoded identification information; and
- a second decoder which decodes the video picture data by a motion compensated prediction decoding scheme, using one of first reference picture data belonging to a picture group of same hierarchical layer as that of the video picture data and reference picture data belonging to a picture group of hierarchical layer lower than that of the video picture data and the prediction mode according to the first decoded identification information and the second decoded identification information.
7. The apparatus according to claim 6, wherein the second decoder is configured to decode the video picture using one of the first reference picture, the second reference picture and a linear sum of the first reference picture and the second reference picture.
8. The apparatus according to claim 6, further including a playback unit configured to perform a smooth fast-forward playback by playing back the video picture data belonging to the lower picture group at a frame rate of plural times of an original frame rate.
9. The apparatus according to claim 6, wherein the second decoder includes a decoder which decodes only the video picture data of the picture group of lower hierarchical layer and playing back the video picture data at an original frame rate.
10. The apparatus according to claim 6, wherein the picture groups of hierarchical layers includes a first picture group, a second picture group and a third picture group in order of decreasing hierarchical layer, and the second decoder is configured to decode all encoded pictures not more than the third picture group to perform a normal playback, decoding the encoded pictures not more than the second picture group to play back pictures ½ of the normal playback, and decoding the encoded pictures not more than the first picture group to play back pictures ¼ of the normal playback.
Type: Application
Filed: Feb 22, 2007
Publication Date: Jun 21, 2007
Inventors: Shinichiro Koto (Machida-shi), Takeshi Chujoh (Tokyo), Yoshihiro Kikuchi (Yokohama-shi)
Application Number: 11/677,948
International Classification: H04N 11/02 (20060101);