Method and apparatus for generating a motion vector
One embodiment provides a method of generating a motion vector associated with a current block in a first picture layer. In this embodiment, motion vector information for a block in a second picture layer is obtained. The second picture layer has lower quality pictures than pictures in the first picture layer, and the block of the second picture layer is temporally associated with the current block in the first picture layer. Motion vector difference information associated with the current block in the first picture layer is also obtained. The motion vector associated with the current block in the first picture layer is generated based on the obtained motion vector information and the obtained motion vector difference information.
This application claims the benefit of priority on U.S. Provisional Application No. 60/723,474 filed Oct. 5, 2005; the entire content of which is hereby incorporated by reference.
FOREIGN PRIORITY INFORMATIONThis application claims the benefit of priority on Korean Patent Application No. 10-2006-0068314 filed Jul. 21, 2006; the entire content of which is hereby incorporated by reference.
BACKGROUND OF THE INVENTION1. Field of the Invention
The present invention relates, in general, to methods of encoding and decoding video signals.
2. Description of the Related Art
A Scalable Video Codec (SVC) is a scheme for encoding video signals at the highest image quality when encoding the video signals, and enabling image quality to be secured to some degree even if only part of the entire picture (frame) sequence generated as a result of the encoding (a sequence of frames intermittently selected from the entire sequence) is decoded.
Even if only a partial sequence of a picture sequence encoded by a scalable scheme is received and processed, image quality can be secured to some degree. However, if the bit rate is decreased, the deterioration in image quality becomes serious. In order to solve the problem, a separate sub-picture sequence for a low bit rate, for example, a picture sequence of small screens and/or a picture sequence having a small number of frames per second, can be provided.
This sub-picture sequence is called a base layer, and a main picture sequence is called an enhanced layer. The base layer and the enhanced layer are obtained by encoding the same video signal source, and redundant information exists in the video signals of the two layers. Therefore, when the base layer is provided, an interlayer prediction method can be used to improve coding efficiency.
Further, in order to improve the Signal-to-Noise Ratio (SNR) of a base layer, that is, to enhance image quality, an enhanced layer may be used, which is called SNR scalability, Fine Granular Scalability (FGS), or progressive refinement.
According to FGS, transform coefficients corresponding to respective pixels, for example, Discrete Cosine Transform (DCT) coefficients, are separately encoded into a base layer and an enhanced layer, depending on the resolution of bit representation. When a transmission environment is bad, the transmission of the enhanced layer is omitted, so that the bit rate can be decreased while the quality of a decoded image is deteriorated. That is, FGS compensates for loss occurring during a quantization process, and provides high flexibility enabling a bit rate to be controlled in response to a transmission or decoding environment.
For example, if a transform coefficient is quantized using a quantization step size (that is, QP), for example, QP=32, to generate a base layer, a first FGS enhanced layer is generated by quantizing the difference between an original transform coefficient and a transform coefficient obtained by inversely quantizing the quantized coefficient of the base layer, using a quantization step size corresponding to quality higher than QP=32, for example, QP=26. Similarly, a second FGS enhanced layer is generated by quantizing the difference between the original transform coefficient and a transform coefficient obtained by inversely quantizing the sum of the quantized coefficients of the base layer and the first FGS enhanced layer, using a quantization step size, for example, QP=20.
However, in a conventional FGS coding method, only a quality base layer, that is, a picture of an FGS base layer, is used to generate an FGS enhanced layer. This means that temporally redundant information existing between temporally adjacent quality enhanced layers, that is, pictures of an FGS enhanced layer, are not used.
In order to use such temporal redundancy in the FGS enhanced layer, a method of utilizing an adjacent quality enhanced layer as well as a quality base layer to predict a current FGS enhanced layer is proposed. This method is called a Progressive FGS (PFGS), and the structure of such a PFGS scheme is shown in
As shown in
The collocated block Xb includes a reference picture index that indicates a reference base layer frame. The collocated block Xb also includes a motion vector. As shown in
The FGS enhanced layer reference block Re is a difference or error signal representing enhancement quality. As such, the adaptive reference block formation function adds the FGS enhanced layer reference block Re to the collocated block Xb at a transform coefficient level to obtain the adapted reference block Ra. Then, as shown in
However, the resolution of bit representation of an image may vary due to the difference between the quantization step sizes of the FGS base layer and the FGS enhanced layer, so that the motion vector of the FGS base layer collocated block Xb may not be identical to that of the FGS enhanced layer block X. This means that coding efficiency may be decreased.
SUMMARY OF THE INVENTIONThe present invention relates to a method of generating a motion vector.
For example, one embodiment provides a method of generating a motion vector associated with a current block in a first picture layer. In this embodiment, motion vector information for a block in a second picture layer is obtained. The second picture layer has lower quality pictures than pictures in the first picture layer, and the block of the second picture layer is temporally associated with the current block in the first picture layer. Motion vector difference information associated with the current block in the first picture layer is also obtained. The motion vector associated with the current block in the first picture layer is generated based on the obtained motion vector information and the obtained motion vector difference information.
The motion vector information may be obtained from the second picture layer.
The motion vector difference information may be obtained from the first picture layer.
In one embodiment, the motion vector associated with the current block is generated by determining a motion vector prediction based on the obtained motion vector information, and generating the motion vector associated with the current block in the first picture layer based on the motion vector prediction and the obtained motion vector difference information.
The present invention also relates to an apparatus for generating a motion vector.
For example, in one embodiment, the apparatus generates a motion vector associated with a current block in a first picture layer. In this embodiment, the apparatus includes a first decoder obtaining motion vector information for a block in a second picture layer. The second picture layer has lower quality pictures than pictures in the first picture layer, and the block of the second picture layer is temporally associated with the current block in the first picture layer. The apparatus also includes a second decoder obtaining motion vector difference information associated with the current block in the first picture layer, and generating the motion vector associated with the current block in the first picture layer based on the obtained motion vector information and the obtained motion vector difference information.
BRIEF DESCRIPTION OF THE DRAWINGSThe above and other objects, features and other advantages of the present invention will be more clearly understood from the following detailed description taken in conjunction with the accompanying drawings, in which:
Hereinafter, example embodiments of the present invention will be described in detail with reference to the attached drawings.
In an embodiment of the present invention, during the encoding process, the motion vector mv(Xb) of a Fine Granular Scalability (FGS) base layer collocated block Xb is finely adjusted to improve the coding efficiency of Progressive FGS (PFGS).
That is, the embodiment obtains the FGS enhanced layer frame for the FGS enhanced layer block X to be encoded as the FGS enhanced layer frame temporally coincident with the base layer reference frame for the base layer block Xb collocated with respect to the FGS enhanced layer block X. As will be appreciated, this base layer reference frame will be indicated in a reference picture index of the collocated block Xb; however, it is common for those skilled in the art to refer to the reference frame as being pointed to by the motion vector. Given the enhanced layer reference frame, a region (e.g., a partial region) of a picture is reconstructed from the FGS enhanced layer reference frame. This region includes a block indicated by the motion vector mv(Xb) for the base layer collated block Xb. The region is searched to obtain the block having the smallest image difference with respect to the block X, that is, a block Re′, causing the Sum of Absolute Differences (SAD) to be minimized. The SAD is the sum of absolute differences between corresponding pixels in the two blocks. The two blocks are the block X to be coded or decoded and the selected block. Then, a motion vector mv(X) from the block X to the selected block is calculated.
In this case, in order to reduce the burden of the search, the search range can be limited to a region including predetermined pixels in horizontal and vertical directions around the block indicated by the motion vector mv(Xb). For example, the search can be performed with respect only to the region extended by 1 pixel in every direction.
Further, the search resolution, that is, the unit by which the block X is moved to find a block having a minimum SAD, may be a pixel, a ½ pixel (half pel), or a ¼ pixel (quarter pel).
In particular, when a search is performed with respect only to the region extended by 1 pixel in every direction, and is performed on a pixel basis, the location at which SAD is minimized is selected from among 9 candidate locations, as shown in
If the search range is limited in this way, the difference vector mvd_ref_fgs between the calculated motion vector mv(X) and the motion vector mv(Xb), as shown in
In another embodiment of the present invention, in order to obtain an optimal motion vector mv_fgs for the FGS enhanced layer for the block X, that is, in order to generate the optimal predicted image of the FGS enhanced layer for the block X, motion estimation/prediction operations are performed independent of the motion vector mv(Xb) for the FGS base layer collocated block Xb corresponding to the block X, as shown in
In this case, the FGS enhanced layer predicted image (FGS enhanced layer reference block) for the block X can be searched for in the reference frame indicated by the motion vector mv(Xb) (i.e., indicated by the reference picture index for the block Xb), or the reference block for the block X can be searched for in another frame. As with the embodiment of
In the former case, there are advantages in that frames in which the FGS enhanced layer reference block for the block X is to be searched for are limited to the reference frame indicated by the motion vector mv(Xb), so that the burden of encoding is reduced, and there is no need to transmit a reference index for the block X that includes the reference block.
In the latter case, there are disadvantages in that the number of frames, in which the reference block is to be searched for, increases, so that the burden of encoding increases, and a reference index for the frame, including a found reference block, must be additionally transmitted. But, there is an advantage in that the optimal predicted image of the FGS enhanced layer for the block X can be generated.
When a motion vector is encoded without change, a great number of bits are required. Since the motion vectors of neighboring blocks have a tendency to be highly correlated, respective motion vectors can be predicted from the motion vectors of surrounding blocks that have been previously encoded (immediate left, immediate upper and immediate upper-right blocks).
When a current motion vector mv is encoded, generally, the difference mvd between the current motion vector mv and a motion vector mvp, which is predicted from the motion vectors of surrounding blocks, is encoded and transmitted.
Therefore, the motion vector mv_fgs of the FGS enhanced layer for the block X that is obtained through an independent motion prediction operation is encoded by mvd_fgs=mv_fgs−mvp_fgs. In this case, the motion vector mvp_fgs, predicted and obtained from the surrounding blocks, can be implemented using the motion vector mvp, obtained when the motion vector mv(Xb) of the FGS base layer collocated block Xb is encoded, without change (e.g., mvp=mv(Xb)), or using a motion vector derived from the motion vector mvp (e.g., mvp=scaled version of mv(Xb)).
If the number of motion vectors of the FGS base layer collocated block Xb corresponding to the block X is two, that is, if the block Xb is predicted using two reference frames, two pieces of data related to the encoding of the motion vector of the FGS enhanced layer for the block X are obtained. For example, in a first embodiment, the pieces of data are mvd_ref_fgs—10/11, and in a second embodiment, the pieces of data are mvd_fgs—10/11.
In the above embodiments, the motion vectors for macroblocks (or image blocks smaller than macroblocks) are calculated in relation to the FGS enhanced layer, and the calculated motion vectors are included in a macroblock layer within the FGS enhanced layer and transmitted to a decoder. However, in the conventional FGS enhanced layer, related information is defined on the basis of a slice level, and is not defined on the basis of a macroblock level, a sub-macroblock level, or sub-block level.
Therefore, in the present invention, in order to define, in the FGS enhanced layer, data related to the motion vectors calculated on the basis of a macroblock (or an image block smaller than a macroblock), syntax required to define a macroblock layer and/or an image block layer smaller than a macroblock layer, for example, progressive_refinement_macroblock_layer_in_scalable_extension( ) and progressive_refinement_mb (and/or sub_mb)_pred_in_scalable_extension( ), is newly defined, and the calculated motion vectors are recorded in the newly defined syntax and then transmitted.
Meanwhile, the generation of the FGS enhanced layer is similar to a procedure of performing prediction between a base layer and an enhanced layer having different spatial resolutions in an intra base prediction mode, and generating residual data which is an image difference.
For example, if it is assumed that the block of the enhanced layer is X and the block of the base layer corresponding to the block X is Xb, the residual block obtained through intra base prediction is R=X−Xb. In this case, X can correspond to the block of a quality enhanced layer to be encoded, Xb can correspond to the block of a quality base layer, and R=X−Xb can correspond to residual data to be encoded in the FGS enhanced layer for the block X.
In another embodiment of the present invention, an intra mode prediction method is applied to the residual block R to reduce the amount of residual data to be encoded in the FGS enhanced layer. In order to perform intra mode prediction on the residual block R, the same mode information about the intra mode that is used in the base layer collocated block Xb corresponding to the block X is used.
A block Rd having a difference value of the residual data is obtained by applying the mode information, used in the block Xb, to the residual block R. Discrete Cosine Transform (DCT) is performed on the obtained block Rd, and the DCT results are quantized using a quantization step size set smaller than the quantization step size used when the FGS base layer data for the block Xb is generated, thus generating FGS enhanced layer data for the block X.
In a further embodiment, an adapted reference block Ra′ for the block X is generated as equal to the FGS enhanced layer reference block Re′. Further, residual data R to be encoded in the FGS enhanced layer for the block X is set as R=X−Ra, so that an intra mode prediction method is applied to the residual block R. It will be appreciated that in this embodiment, the enhanced layer reference block Re′, and therefore, the adapted reference block Ra′, are reconstructed pictures and not at the transform coefficient level.
In this case, an intra mode applied to the residual block R is a DC mode based on the mean value of respective pixels in the block R. Further, if the block Re is generated by the methods according to embodiments of the present invention, information related to motion required to generate the block Re in the decoder must be included in the FGS enhanced layer data for the block X.
The video signal encoding apparatus of
The FGS_EL encoder 120 reconstructs the quality base layer of the reference frame (also called a FGS base layer picture), which is the reference for motion prediction for a current frame, from the base layer data provided by the BL encoder 110, and reconstructs the FGS enhanced layer picture of the reference frame using the FGS enhanced layer data of the reference frame and the reconstructed quality base layer of the reference frame.
In this case, the reference frame may be a frame indicated by the motion vector mv(Xb) of the FGS base layer collocated block Xb corresponding to the block X in the current frame.
When the reference frame is a frame previous to the current frame, the FGS enhanced layer picture of the reference frame may have been stored in a buffer in advance.
Thereafter, the FGS_EL encoder 120 searches the FGS enhanced layer picture of the reconstructed reference frame for an FGS enhanced layer reference image for the block X, that is, a reference block or predicted block Re′ in which an SAD with respect to the block X is minimized, and then calculates a motion vector mv(X) from the block X to the found reference block Re′.
The FGS_EL encoder 120 performs DCT on the difference between the block X and the found reference block Re′, and quantizes the DCT results using a quantization step size set smaller than a predetermined quantization step (quantization step size used when the BL encoder 10 generates the FGS base layer data for the block Xb), thus generating FGS enhanced layer data for the block X.
When the reference block is predicted, the FGS_EL encoder 120 may limit the search range to a region including predetermined pixels in horizontal and vertical directions around the block indicated by the motion vector mv(Xb) so as to reduce the burden of the search, as in the first embodiment of the present invention. In this case, the FGS_EL encoder 120 records the difference mvd_ref_fgs between the calculated motion vector mv(X) and the motion vector mv(Xb) in the FGS enhanced layer in association with the block X.
Further, as in the case of the above-described second embodiment of the present invention, the FGS_EL encoder 120 may perform a motion estimation operation independent of the motion vector mv(Xb) so as to obtain the optimal motion vector mv_fgs of the FGS enhanced layer for the block X; thus searching for a reference block Re′ having a minimum SAD with respect to the block X, and calculating the motion vector mv_fgs from the block X to the found reference block Re.
In this case, the FGS enhanced layer reference block for the block X may be searched for in the reference frame indicated by the motion vector mv(Xb), or a reference block for the block X may be searched for in a frame other than the reference frame.
The FGS_EL encoder 120 performs DCT on the difference between the block X and the found reference block Re′, and quantizes the DCT results using a quantization step size set smaller than the predetermined quantization step size; thus generating the FGS enhanced layer data for the block X.
Further, the FGS_EL encoder 120 records the difference mvd_fgs between the calculated motion vector mv_fgs and the motion vector mvp_fgs, predicted and obtained from surrounding blocks, in the FGS enhanced layer in association with the block X. That is, the FGS_EL encoder 120 records syntax for defining information related to the motion vector calculated on a block basis (a macroblock or an image block smaller than a macroblock), in the FGS enhanced layer.
When the reference block Re′ for the block X is searched for in a frame other than the reference frame indicated by the motion vector mv(Xb), information related to the motion vector may further include a reference index for a frame including the found reference block Re′.
The encoded data stream is transmitted to a decoding apparatus in a wired or wireless manner, or is transferred through a recording medium.
The FGS_EL decoder 230 checks information about the block X in the current frame, that is, information related to a motion vector used for motion prediction for the block X, in the FGS enhanced layer stream.
When i) the FGS enhanced layer for the block X in the current frame is encoded on the basis of the FGS enhanced layer picture of another frame and ii) is encoded using a block other than the block indicated by the motion vector mv(Xb) of the block Xb corresponding to the block X (that is the FGS base layer block of the current frame) as a predicted block or a reference block, motion information for indicating the other block is included in the FGS enhanced layer data of the current frame.
That is, in the above description, the FGS enhanced layer includes syntax for defining information related to the motion vector calculated on a block basis (a macroblock or an image block smaller than a macroblock). The information related to the motion vector may further include an index for the reference frame in which the FGS enhanced layer reference block for the block X is found (the reference frame including the reference block).
When motion information related to the block X in the current frame exists in the FGS enhanced layer of the current frame, the FGS_EL decoder 230 generates the FGS enhanced layer picture of the reference frame using the quality base layer of the reference frame (the FGS base layer picture reconstructed by the BL decoder 220 may be provided, or may be reconstructed from the FGS base layer data provided by the BL decoder 220), which is the reference for motion prediction for the current frame, and the FGS enhanced layer data of the reference frame. In this case, the reference frame may be a frame indicated by the motion vector mv(Xb) of the block Xb.
Further, the FGS enhanced layer of the reference frame may be encoded using an FGS enhanced layer picture of a different frame. In this case, a picture reconstructed from the different frame is used to reconstruct the reference frame. Further, when the reference frame is a frame previous to the current frame, the FGS enhanced layer picture may have been generated in advance and stored in a buffer.
Further, the FGS_EL decoder 230 obtains the FGS enhanced layer reference block Re′ for the block X from the FGS enhanced layer picture of the reference frame, using the motion information related to the block X.
In the above-described first embodiment of the present invention, the motion vector mv(X) from the block X to the reference block Re′ is obtained as the sum of the motion information mv_ref_fgs, included in an FGS enhanced layer stream for the block X, and the motion vector mv(Xb) of the block Xb.
Further, in the second embodiment of the present invention, the motion vector mv(X) is obtained as the sum of the motion information mvd_fgs, included in the FGS enhanced layer stream for the block X, and the motion vector mvp_fgs, predicted and obtained from the surrounding blocks. In this case, the motion vector mvp_fgs may be implemented using the motion vector mvp, which is obtained at the time of calculating the motion vector mv(Xb) of the FGS base layer collocated block Xb without change, or using a motion vector derived from the motion vector mvp.
Thereafter, the FGS_EL decoder 230 performs inverse-quantization and inverse DCT on the FGS enhanced layer data for the block X, and adds the results of inverse quantization and inverse DCT to the obtained reference block Re′, thus generating the FGS enhanced layer picture for the block X.
The above-described decoding apparatus may be mounted in a mobile communication terminal, or a device for reproducing recording media.
As described above, the present invention is advantageous in that it can efficiently perform motion estimation/prediction operations on an FGS enhanced layer picture when the FGS enhanced layer is encoded or decoded, and can efficiently transmit motion information required to reconstruct an FGS enhanced layer picture.
Although the example embodiments of the present invention have been disclosed for illustrative purposes, those skilled in the art will appreciate that various modifications, additions and substitutions are possible, without departing from the scope and spirit of the invention.
Claims
1. A method of generating a motion vector associated with a current block in a first picture layer, comprising:
- obtaining motion vector information for a block in a second picture layer, the second picture layer having lower quality pictures than pictures in the first picture layer, and the block of the second picture layer being temporally associated with the current block in the first picture layer;
- obtaining motion vector difference information associated with the current block in the first picture layer; and
- generating the motion vector associated with the current block in the first picture layer based on the obtained motion vector information and the obtained motion vector difference information.
2. The method of claim 1, wherein the obtaining motion vector information step obtains the motion vector information from the second picture layer.
3. The method of claim 2, wherein the obtaining the motion vector difference information step obtains the motion vector difference information from the first picture layer.
4. The method of claim 1, wherein the obtaining the motion vector difference information step obtains the motion vector difference information from the first picture layer.
5. The method of claim 1, wherein the motion vector information includes a motion vector associated with the block of the second picture layer.
6. The method of claim 1, wherein the generating step comprises:
- determining a motion vector prediction based on the obtained motion vector information; and
- generating the motion vector associated with the current block in the first picture layer based on the motion vector prediction and the obtained motion vector difference information.
7. The method of claim 6, wherein
- the motion vector information includes a motion vector associated with the block of the second picture layer; and
- the determining a motion vector prediction step determines the motion vector prediction equal to the motion vector associated with the block of the second picture layer.
8. The method of claim 6, wherein the generating step generates the motion vector associated with the current block equal to the motion vector prediction plus a motion vector difference indicated by the motion vector difference information.
9. The method of claim 8, wherein
- the motion vector information includes a motion vector associated with the block of the second picture layer; and
- the determining a motion vector prediction step determines the motion vector prediction equal to the motion vector associated with the block of the second picture layer.
10. The method of claim 1, wherein the motion vector difference information indicates a motion vector difference of a one-quarter pixel or less.
11. The method of claim 1, wherein the motion vector difference information indicates a motion vector difference of a one-half pixel or less.
12. The method of claim 1, wherein the generated motion vector points to a block in a reference picture for the current block.
13. The method of claim 12, wherein the reference picture is a picture in the first picture layer.
14. The method of claim 13, wherein the reference picture for the current block is temporally associated with a reference picture in the second picture layer, the reference picture in the second picture layer being a reference picture for the block in the second picture layer.
15. An apparatus for generating a motion vector associated with a current block in a first picture layer, comprising:
- a first decoder obtaining motion vector information for a block in a second picture layer, the second picture layer having lower quality pictures than pictures in the first picture layer, and the block of the second picture layer being temporally associated with the current block in the first picture layer; and
- a second decoder obtaining motion vector difference information associated with the current block in the first picture layer, and generating the motion vector associated with the current block in the first picture layer based on the obtained motion vector information and the obtained motion vector difference information.
Type: Application
Filed: Oct 5, 2006
Publication Date: Apr 19, 2007
Inventors: Byeong-Moon Jeon (Seoul), Ji-Ho Park (Seoul), Seung-Wook Park (Seoul)
Application Number: 11/543,080
International Classification: H04B 1/66 (20060101); H04N 11/02 (20060101);