METHOD AND APPARATUS OF CURRENT PICTURE REFERENCING FOR VIDEO CODING USING AFFINE MOTION COMPENSATION

Info

Publication number: 20200322599
Type: Application
Filed: May 26, 2017
Publication Date: Oct 8, 2020
Inventors: Tzu-Der CHUANG (Zhubei City, Hsinchu County), Ching-Yeh CHEN (Taipei City), Yu-Wen HUANG (Taipei City), Xiaozhong XU (State College, PA)
Application Number: 16/304,209

Abstract

A method and apparatus for a video coding system with the current picture referencing (CPR) mode enabled are disclosed. According to one method, if one reference picture index for the current block points to the current image, the affine motion compensation is inferred as Off for the current block without a need for signalling or parsing an affine mode syntax or the adaptive MV resolution is inferred as On for the current block without a need for signalling or parsing an adaptive motion vector resolution syntax. In another method, if the affine mode is used for the current block or if the adaptive MV resolution is not used for the current block, a reference picture index for the current block is signalled or parsed and the reference picture index always points to one reference picture other than the current image.

Description

Description

CROSS REFERENCE TO RELATED APPLICATIONS

The present invention claims priority to U.S. Provisional Patent Application, Ser. No. 62/342,883 filed on May 28, 2016. The U.S. Provisional Patent Application is hereby incorporated by reference in its entirety.

FIELD OF THE INVENTION

The present invention relates to block partition for coding and/or prediction process in video coding. In particular, the present invention discloses various coding arrangements for a coding system using current picture referencing (CPR).

BACKGROUND AND RELATED ART

The High Efficiency Video Coding (HEVC) standard is developed under the joint video project of the ITU-T Video Coding Experts Group (VCEG) and the ISO/IEC Moving Picture Experts Group (MPEG) standardization organizations, and is especially with partnership known as the Joint Collaborative Team on Video Coding (JCT-VC). In HEVC, one slice is partitioned into multiple coding tree units (CTU). In main profile, the minimum and the maximum sizes of CTU are specified by the syntax elements in the sequence parameter set (SPS). The allowed CTU size can be 8×8, 16×16, 32×32, or 64×64. For each slice, the CTUs within the slice are processed according to a raster scan order.

The CTU is further partitioned into multiple coding units (CU) to adapt to various local characteristics. A quadtree, denoted as the coding tree, is used to partition the CTU into multiple CUs. Let CTU size be M×M, where M is one of the values of 64, 32, or 16. The CTU can be a single CU (i.e., no splitting) or can be split into four smaller units of equal sizes (i.e., M/2×M/2 each), which correspond to the nodes of the coding tree. If units are leaf nodes of the coding tree, the units become CUs. Otherwise, the quadtree splitting process can be iterated until the size for a node reaches a minimum allowed CU size as specified in the SPS (Sequence Parameter Set). This representation results in a recursive structure as specified by a coding tree (also referred to as a partition tree structure) 120 in FIG. 1. The CTU partition 110 is shown in FIG. 1, where the solid lines indicate CU boundaries. The decision whether to code a picture area using Inter-picture (temporal) or Intra-picture (spatial) prediction is made at the CU level. Since the minimum CU size can be 8×8, the minimum granularity for switching between different basic prediction types is 8×8.

Furthermore, according to HEVC, each CU can be partitioned into one or more prediction units (PU). Coupled with the CU, the PU works as a basic representative block for sharing the prediction information. Inside each PU, the same prediction process is applied and the relevant information is transmitted to the decoder on a PU basis. A CU can be split into one, two or four PUs according to the PU splitting type. HEVC defines eight shapes for splitting a CU into PU as shown in FIG. 2, including 2N×2N, 2N×N, N×2N, N×N, 2N×nU, 2N×nD, nL×2N and nRx2N partition types. Unlike the CU, the PU may only be split once according to HEVC. The partitions shown in the second row correspond to asymmetric partitions, where the two partitioned parts have different sizes.

After obtaining the residual block by the prediction process based on PU splitting type, the prediction residues of a CU can be partitioned into transform units (TU) according to another quadtree structure which is analogous to the coding tree for the CU as shown in FIG. 1. The solid lines indicate CU boundaries and dotted lines indicate TU boundaries. The TU is a basic representative block having residual or transform coefficients for applying the integer transform and quantization. For each TU, one integer transform having the same size to the TU is applied to obtain residual coefficients. These coefficients are transmitted to the decoder after quantization on a TU basis.

The terms coding tree block (CTB), coding block (CB), prediction block (PB), and transform block (TB) are defined to specify the 2-D sample array of one colour component associated with CTU, CU, PU, and TU, respectively. Thus, a CTU consists of one luma CTB, two chroma CTBs, and associated syntax elements. A similar relationship is valid for CU, PU, and TU. The tree partitioning is generally applied simultaneously to both luma and chroma, although exceptions apply when certain minimum sizes are reached for chroma.

Alternatively, a binary tree block partitioning structure is proposed in JCTVC-P1005 (D. Flynn, et al, “HEVC Range Extensions Draft 6”, Joint Collaborative Team on Video Coding (JCT-VC) of ITU-T SG 16 WP 3 and ISO/IEC JTC 1/SC 29/WG 11, 16th Meeting: San Jose, US, 9-17 Jan. 2014, Document: JCTVC-P1005). In the proposed binary tree partitioning structure, a block can be recursively split into two smaller blocks using various binary splitting types as shown in FIG. 3. The most efficient and simplest ones are the symmetric horizontal and vertical split as shown in the top two splitting types in FIG. 3. For a given block of size M×N, a flag is signalled to indicate whether the given block is split into two smaller blocks. If yes, another syntax element is signalled to indicate which splitting type is used. If the horizontal splitting is used, the given block is split into two blocks of size M×N/2. If the vertical splitting is used, the given block is split into two blocks of size M/2×N. The binary tree splitting process can be iterated until the size (width or height) for a splitting block reaches a minimum allowed block size (width or height). The minimum allowed block size can be defined in high level syntax such as SPS. Since the binary tree has two splitting types (i.e., horizontal and vertical), the minimum allowed block width and height should be both indicated. Non-horizontal splitting is implicitly implied when splitting would result in a block height smaller than the indicated minimum. Non-vertical splitting is implicitly implied when splitting would result in a block width smaller than the indicated minimum. FIG. 4 illustrates an example of block partitioning 410 and its corresponding binary tree 420. In each splitting node (i.e., non-leaf node) of the binary tree, one flag is used to indicate which splitting type (horizontal or vertical) is used, where 0 may indicate horizontal splitting and 1 may indicate vertical splitting.

The binary tree structure can be used for partitioning an image area into multiple smaller blocks such as partitioning a slice into CTUs, a CTU into CUs, a CU into PUs, or a CU into TUs, and so on. The binary tree can be used for partitioning a CTU into CUs, where the root node of the binary tree is a CTU and the leaf node of the binary tree is CU. The leaf nodes can be further processed by prediction and transform coding. For simplification, there is no further partitioning from CU to PU or from CU to TU, which means CU equal to PU and PU equal to TU. Therefore, in other words, the leaf node of the binary tree is the basic unit for prediction and transforms coding.

Binary tree structure is more flexible than quadtree structure since more partition shapes can be supported, which is also the source of coding efficiency improvement. However, the encoding complexity will also increase in order to select the best partition shape. In order to balance the complexity and coding efficiency, a method to combine the quadtree and binary tree structure, which is also called as quadtree plus binary tree (QTBT) structure, has been disclosed. According to the QTBT structure, a block is firstly partitioned by a quadtree structure and the quadtree splitting can be iterated until the size for a splitting block reaches the minimum allowed quadtree leaf node size. If the leaf quadtree block is not larger than the maximum allowed binary tree root node size, it can be further partitioned by a binary tree structure and the binary tree splitting can be iterated until the size (width or height) for a splitting block reaches the minimum allowed binary tree leaf node size (width or height) or the binary tree depth reaches the maximum allowed binary tree depth. In the QTBT structure, the minimum allowed quadtree leaf node size, the maximum allowed binary tree root node size, the minimum allowed binary tree leaf node width and height, and the maximum allowed binary tree depth can be indicated in the high level syntax such as in SPS. FIG. 5 illustrates an example of block partitioning 510 and its corresponding QTBT 520. The solid lines indicate quadtree splitting and dotted lines indicate binary tree splitting. In each splitting node (i.e., non-leaf node) of the binary tree, one flag indicates which splitting type (horizontal or vertical) is used, 0 may indicate horizontal splitting and 1 may indicate vertical splitting.

The above QTBT structure can be used for partitioning an image area (e.g. a slice, CTU or CU) into multiple smaller blocks such as partitioning a slice into CTUs, a CTU into CUs, a CU into PUs, or a CU into TUs, and so on. For example, the QTBT can be used for partitioning a CTU into CUs, where the root node of the QTBT is a CTU which is partitioned into multiple CUs by a QTBT structure and the CUs are further processed by prediction and transform coding. For simplification, there is no further partitioning from CU to PU or from CU to TU. That means CU equal to PU and PU equal to TU. Therefore, in other words, the leaf node of the QTBT structure is the basic unit for prediction and transform.

An example of QTBT structure is shown as follows. For a CTU with size 128×128, the minimum allowed quadtree leaf node size is set to 16×16, the maximum allowed binary tree root node size is set to 64×64, the minimum allowed binary tree leaf node width and height both is set to 4, and the maximum allowed binary tree depth is set to 4. Firstly, the CTU is partitioned by a quadtree structure and the leaf quadtree unit may have size from 16×16 (i.e., minimum allowed quadtree leaf node size) to 128×128 (equal to CTU size, i.e., no split). If the leaf quadtree unit is 128×128, it cannot be further split by binary tree since the size exceeds the maximum allowed binary tree root node size 64×64. Otherwise, the leaf quadtree unit can be further split by binary tree. The leaf quadtree unit, which is also the root binary tree unit, has binary tree depth as 0. When the binary tree depth reaches 4 (i.e., the maximum allowed binary tree as indicated), no splitting is implicitly implied. When the block of a corresponding binary tree node has width equal to 4, non-horizontal splitting is implicitly implied. When the block of a corresponding binary tree node has height equal to 4, non-vertical splitting is implicitly implied. The leaf nodes of the QTBT are further processed by prediction (Intra picture or Inter picture) and transform coding.

For I-slice, the QTBT tree structure usually applied with the luma/chroma separate coding. For example, the QTBT tree structure is applied separately to luma and chroma components for I-slice, and applied simultaneously to both luma and chroma (except when certain minimum sizes being reached for chroma) for P- and B-slices. In other words, in an I-slice, the luma CTB has its QTBT-structured block partitioning and the two chroma CTBs have another QTBT-structured block partitioning. In another example, the two chroma CTBs can also have their own QTBT-structured block partitions.

For block-based coding, there is always a need to partition an image into blocks (e.g. CUs, PUs and TUs) for the coding purpose. As known in the field, the image may be divided into smaller images areas, such as slices, tiles, CTU rows or CTUs before applying the block partition. The process to partition an image into blocks for the coding purpose is referred as partitioning the image using a coding unit (CU) structure. The particular partition method to generate CUs, PUs and TUs as adopted by HEVC is an example of the coding unit (CU) structure. The QTBT tree structure is another example of the coding unit (CU) structure.

Current Picture Referencing

Motion estimation/compensation is a well-known key technology in hybrid video coding, which explores the pixel correlation between adjacent pictures. In a video sequence, the object movement between neighbouring frames is small and the object movement can be modelled by two-dimensional translational motion. Accordingly, the patterns corresponding to objects or background in a frame are displaced to form corresponding objects in the subsequent frame or correlated with other patterns within the current frame. With the estimation of a displacement (e.g. using block matching techniques), the pattern can be mostly reproduced without the need to re-code the pattern. Similarly, block matching and copy has also been tried to allow selecting the reference block from within the same picture. It was observed to be not efficient when applying this concept to videos captured by a camera. Part of the reasons is that the textual pattern in a spatial neighbouring area may be similar to the current coding block, but usually with some gradual changes over space. It is thus difficult for a block to find an exact match within the same picture of video captured by a camera. Therefore, the improvement in coding performance is limited.

However, the spatial correlation among pixels within the same picture is different for screen content. For typical video with text and graphics, there are usually repetitive patterns within the same picture. Hence, Intra (picture) block compensation has been observed to be very effective. A new prediction mode, i.e., the Intra block copy (IBC) mode or called current picture referencing (CPR), has been introduced for screen content coding to utilize this characteristic. In the CPR mode, a prediction unit (PU) is predicted from a previously reconstructed block within the same picture. Further, a displacement vector (called block vector or BV) is used to signal the relative displacement from the position of the current block to the position of the reference block. The prediction errors are then coded using transformation, quantization and entropy coding. An example of CPR compensation is illustrated in FIG. 6, where area 610 corresponds to a picture, a slice or a picture area to be coded. Blocks 620 and 630 correspond to two blocks to be coded. In this example, each block can find a corresponding block in the previous coded area in the current picture (i.e., 622 and 632 respectively). According to this technique, the reference samples correspond to the reconstructed samples of the current decoded picture prior to in-loop filter operations including both deblocking and sample adaptive offset (SAO) filters in HEVC.

An early version of CPR was disclosed in JCTVC-M0350 (Madhukar Budagavi, et al, “AHG8: Video coding using Intra motion compensation”, Joint Collaborative Team on Video Coding (JCT-VC) of ITU-T SG 16 WP 3 and ISO/IEC JTC 1/SC 29/WG 11, 13th Meeting: Incheon, KR, 18-26 Apr. 2013, Document: JCTVC-M0350), which is submitted as a candidate technology for HEVC Range Extensions (RExt) development. In t JCTVC-M0350, the CPR compensation was limited to be within a small local area and the search is limited to 1-D block vector for the block size of 2N×2N only. Later, a more advanced CPR method was developed during the standardization of HEVC SCC (screen content coding).

In order to signal the block vector (BV) efficiently, the BV is signalled predictively using a BV predictor (BVP) in a similar fashion as the MV coding. Accordingly, the BV difference (BVD) is signalled and the BV can be reconstructed according to BV=BVP+BVD as shown in FIG. 7, where reference block 720 is selected as IntraBC prediction for the current block 710 (i.e., a CU). A BVP is determined for the current CU. Methods to derive the motion vector predictor (MVP) is known in the field. Similar derivation can be applied to BVP derivation.

When CPR is used, only part of the current picture can be used as the reference picture. Some bitstream conformance constraints are imposed to regulate the valid MV value referring to the current picture.

First, one of the following two equations must be true:

BV_x+offsetX+nPbSw+xPbs−xCbs<=0, and (1)

BV_y+offsetY+nPbSh+yPbs−yCbs<=0. (2)

Second, the following WPP (Wavefront Parallel Processing) condition must be true:

(xPbs+BV_x+offsetX+nPbSw−1)/CtbSizeY−xCbs/CtbSizeY<=yCbs/CtbSizeY−(yPbs+BV_y+offsetY+nPbSh−1)/CtbSizeY (3)

In equations (1) through (3), (BV_x, BV_y) is the luma block vector (i.e., the motion vector for CPR) for the current PU; nPbSw and nPbSh are the width and height of the current PU; (xPbS, yPbs) is the location of the top-left pixel of the current PU relative to the current picture; (xCbs, yCbs) is the location of the top-left pixel of the current CU relative to the current picture; and CtbSizeY is the size of the CTU. OffsetX and offsetY are two adjusted offsets in two dimensions in consideration of chroma sample interpolation for the CPR mode:

offsetX=BVC_x&0×7?2:0, (4)

offsetY=BVC_y&0×7?2:0. (5)

(BVC_x, BVC_y) is the chroma block vector, in ⅛-pel resolution in HEVC.

Third, the reference block for CPR must be within the same tile/slice boundary.

Affine Motion Compensation

The affine model can be used to describe 2D block rotations, as well as 2D deformations of squares (rectangles) into parallelogram. This model can be described as follows:

x′=a0+a1*x+a2*y,

y′=b0+b1*x+b2*y. (6)

In this model, 6 parameters need to be determined. For each pixels (x, y) in the area of interest, the motion vector is determined as the difference between location of the given pixel (A) and he location of its corresponding pixel in the reference block (A′), i.e., MV=A′−A=(a0+(a1−1)*x+a2*y, b0+b1*x+(b2−1)*y). Therefore, the motion vector for each pixel is location dependent.

According to this model, if the motion vectors of three different locations are known, then the above parameters can be solved. It is equivalent to the condition that the 6 parameters are known. Each location with a known motion vector is referred as a control point. The 6-parameter affine model corresponds to a 3-control-point model.

In the technical literature by Li, et al. (“An affine motion compensation framework for high efficiency video coding”, in 2015 IEEE International Symposium on Circuits and Systems (ISCAS), 24-27 May 2015, Pages: 525-528) and by Huang et al. (“Control-Point Representation and Differential Coding Affine-Motion Compensation”, IEEE Transactions on Circuits, System and Video Technology (CSVT), Vol. 23, No. 10, pages 1651-1660, October 2013), some exemplary implementations of affine motion compensation are presented. In the technical literature by Li, et al., an affine flag is signalled for the 2N×2N block partition, when current block is coded in either Merge mode or AMVP mode. If this flag is true, the derivation of motion vectors for the current block follows the affine model. If this flag is false, the derivation of motion vectors for the current block follows the traditional translational model. Three control points (3 MVs) are signalled when affine AMVP mode is used. At each control point location, the MV is predictively coded. Later, the MVDs of these control points are coded and transmitted. In the technical literature by Huang et al., different control point locations and predictive coding of MVs in control points are studied.

A syntax table for an affine motion compensation implementation is shown in Table 1. As shown in Table 1, syntax element use_affine_flag is signalled if at least one Merge candidate is affine coded and the partition mode is 2N×2N (i.e., PartMode==PART_2N×2N) as indicated by Notes (1-1) to (1-3) for the Merge mode. Syntax element use_affine_flag is signalled if the current block size is larger than 8×8 (i.e., (log 2CbSize>3) and the partition mode is 2N×2N (i.e., PartMode==PART_2N×2N) as indicated by Notes (1-4) to (1-6) for the B slice. If use_affine_flag indicates the affine model being used (i.e., use_affine_flag having a value of 1), information of other two control points is signalled for reference list L0 as indicated by Notes (1-7) to (1-9) and information of other two control points is signalled for reference list L1 as indicated by Notes (1-10) to (1-12).

TABLE 1 Note prediction_unit( x0, y0, nPbW, nPbH ) { if( cu_skip_flag[ x0 ] [ y0 ] ) { if( MaxNumMergeCand > 1 ) merge_idx[ x0 ][ y0 ] } else { /* MODE_INTER */ merge_flag[ x0 ] [ y0 ] if( merge_flag[ x0 ] [ y0 ] ) { if( at least one merge candidate is affine coded && PartMode = = PART_2Nx2N) 1-1 use_affine_flag 1-2 else 1-3 if( MaxNumMergeCand > 1 ) merge_idx[ x0 ][ y0 ] } else { if( slice_type = = B ) inter_pred_idc[ x0 ] [ y0 ] if( log2CbSize > 3 && PartMode = = PART_2Nx2N) 1-4 use_affine_flag 1-5 if( inter_pred_idc[ x0 ][ y0 ] != PRED_L1 ) { 1-6 if( num_ref_idx_l0_active_minus1 > 0 ) ref_idx_l0[ x0 ][ y0 ] mvd_coding( x0, y0, 0 ) if( use_affine_flag){ 1-7 mvd_coding( x0, y0, 0 ) /* second control point when affine mode is used */ 1-8 mvd_coding( x0, y0, 0 ) /* third control point when affine mode is used */ 1-9 } mvp_l0_flag[ x0 ] [ y0 ] } if( inter_pred_idc[ x0 ][ y0 ] != PRED_L0 ) { if( num_ref_idx_l1_active_minus1 > 0 ) ref_idx_l1[ x0 ][ y0 ] if( mvd_l1_zero_flag && inter_pred_idc[ x0 ][ y0 ] = = PRED_BI ) { MvdL1[ x0 ][ y0 ][ 0 ] = 0 MvdL1[ x0 ][ y0 ][ 1 ] = 0 } else mvd_coding( x0, y0, 1 ) if( use_affine_flag){ 1-10 mvd_coding( x0, y0, 1 ) /* second control point when affine mode is used */ 1-11 mvd_coding( x0, y0, 1 ) /* third control point when affine mode is used */ 1-12 } mvp_l1_flag[ x0 ][ y0 ] } } } }

In the present invention, various aspects of CPR coding with the QTBT structure or luma/chroma separate coding are addressed.

BRIEF SUMMARY OF THE INVENTION

A method and apparatus for video encoding and decoding used by a video encoding system and video decoding system respectively are disclosed. According to one method of the present invention, input data associated with a current block in a current image are received, where affine motion compensation or adaptive motion vector (MV) resolution is enabled for coding the current image. One or more reference picture indexes for a current block are signalled or parsed. If one reference picture index for the current block points to the current image, the affine motion compensation is inferred as Off for the current block without a need for signalling or parsing an affine mode syntax or the adaptive MV resolution is inferred as On for the current block without a need for signalling or parsing an adaptive motion vector resolution syntax.

For the above method, when the affine motion compensation is enabled for coding the current image and if the reference picture index for the current block points to one reference picture other than the current image, the affine mode syntax is signalled or parsed to determine whether the affine motion compensation is applied to the current block according to one embodiment. In another embodiment, when the adaptive MV resolution is enabled for coding the current image and if the reference picture index for the current block points to one reference picture other than the current image, the adaptive MV resolution syntax is signalled or parsed to determine whether the adaptive MV resolution is applied to the current block. If a target reference picture index for a target reference picture list points to one reference picture other than the current image and an MVD (motion vector difference) value associated with of the target reference picture list is not equal to zero, the adaptive MV resolution syntax is signalled or parsed, where the target reference picture list corresponds to List 0 or List 1. If a reference picture index points to the current image or a MVD (motion vector difference) value associated with one reference picture list is equal to zero for one or both reference picture lists, the adaptive MV resolution syntax is not signalled or parsed. The reference picture index for the current block is signalled or parsed before an affine mode syntax or an adaptive MV resolution mode syntax is signalled or parsed.

According to another method, input data associated with a current block in a current image are received, where current picture referencing (CPR) mode is enabled, and wherein affine motion compensation or adaptive motion vector (MV) resolution is enabled for coding the current image. Whether affine mode is used for the current block when the affine motion compensation is enabled for the current image is determined, or whether the adaptive MV resolution is used for the current block when the adaptive MV resolution is enabled for the current image is determined. If the affine mode is used for the current block or if the adaptive MV resolution is not used for the current block, a reference picture index for the current block is signalled or parsed, where the reference picture index always points to one reference picture other than the current image.

According to the above method, a codeword for the reference picture index corresponding to the current image may be removed from a codeword table if the affine mode is used for the current block or if the adaptive MV resolution is not used for the current block. Alternatively, a conforming video bitstream may be used to cause the reference picture index to point to one reference picture other than the current image if the affine mode is used for the current block or if the adaptive MV resolution is not used for the current block. Said determining whether affine mode is used for the current block may comprise signalling or parsing an affine mode syntax before signalling or parsing the reference picture index for the current block. The reference picture index for the current block can be signalled or parsed after an affine mode syntax or an adaptive MV resolution mode syntax is signalled or parsed. If the affine mode is not used for the current block, a reference picture index for the current block can be signalled or parsed, and where a codeword of the reference picture index corresponding to the current image may be included in a codeword table. Furthermore, if the adaptive MV resolution is used for the current block, a reference picture index for the current block is signalled or parsed, and wherein a codeword of the reference picture index corresponding to the current image is included in a codeword table.

According to yet another method, input data associated with associated with a current block in a current image are received, where current picture referencing (CPR) mode and affine motion compensation are enabled for coding the current image. An affine mode and a reference picture index for the current block are determined. If the affine mode is used for the current block, a number of motion vectors (MVs) for the current block are signalled or parsed for a reference picture list depending on whether the reference picture index for the current block points to the current image.

According to the above method, if the affine mode is used for the current block and the reference picture index for the current block points to the current image, only one MV can be signalled or parsed for the current block for the reference picture list. Each MV can be represented by one MV predictor and one MV difference, or one MV predictor index and one MV difference. If the affine mode is used for the current block and the reference picture index for the current block points to one reference image other than the current image, more than one MV can be signalled or parsed for the current block for the reference picture list.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 illustrates an example of block partition using quadtree structure to partition a coding tree unit (CTU) into coding units (CUs).

FIG. 2 illustrates asymmetric motion partition (AMP) according to High Efficiency Video Coding (HEVC), where the AMP defines eight shapes for splitting a CU into PU.

FIG. 3 illustrates an example of various binary splitting types used by a binary tree partitioning structure, where a block can be recursively split into two smaller blocks using the splitting types.

FIG. 4 illustrates an example of block partitioning and its corresponding binary tree, where in each splitting node (i.e., non-leaf node) of the binary tree, one syntax is used to indicate which splitting type (horizontal or vertical) is used, where 0 may indicate horizontal splitting and 1 may indicate vertical splitting.

FIG. 5 illustrates an example of block partitioning and its corresponding quad-tree plus binary tree structure (QTBT), where the solid lines indicate quadtree splitting and dotted lines indicate binary tree splitting.

FIG. 6 illustrates an example of CPR compensation, where area 610 corresponds to a picture, a slice or a picture area to be coded. Blocks 620 and 630 correspond to two blocks to be coded.

FIG. 7 illustrates an example of predictive block vector (BV) coding, where the BV difference (BVD) corresponding to the difference between a current BV and a BV predictor is signalled.

FIG. 8 illustrates examples of constrained reference pixel region for IntraBC mode (i.e., the current picture referencing, CPD mode).

FIG. 9 illustrates an example of ladder-shaped reference data area for WPP (wavefront parallel processing) associated with the CPR mode.

FIG. 10 illustrates an example of collocated colour plane candidate derivation from other colour planes in the same frame, where (Y1, U1, V1) and (Y2, U2, V2) are colour planes of two successive frames.

FIG. 11 illustrates a flowchart of an exemplary coding system with the affine motion compensation or adaptive motion vector (MV) resolution mode enabled according to an embodiment of the present invention, where if one reference picture index for the current block points to the current image, the affine motion compensation is inferred as Off for the current block without a need for signalling or parsing an affine mode syntax or the adaptive MV resolution is inferred as On for the current block without a need for signalling or parsing an adaptive motion vector resolution syntax.

FIG. 12 illustrates a flowchart of an exemplary coding system with the current picture referencing (CPR) mode enabled according to an embodiment of the present invention, where if the affine mode is used for the current block or if the adaptive MV resolution is not used for the current block, a reference picture index for the current block is signalled or parsed and the reference picture index always points to one reference picture other than the current image.

FIG. 13 illustrates a flowchart of an exemplary coding system with the current picture referencing (CPR) mode enabled according to an embodiment of the present invention, where if the affine mode is used for the current block, a number of motion vectors (MVs) for the current block for a reference picture list is signalled or parsed depending on whether the reference picture index for the current block points to the current image.

DETAILED DESCRIPTION OF THE INVENTION

The following description is of the best-contemplated mode of carrying out the invention. This description is made for the purpose of illustrating the general principles of the invention and should not be taken in a limiting sense. The scope of the invention is best determined by reference to the appended claims.

In the video coding based on original quad-tree plus binary tree (QTBT) structure and the luma/chroma separate coding, the luma and chroma are coded separately for all Intra frames (for example, I-slice). However, in HEVC-SCC, the CPR is designed for three colour components joint. The MV of CPR is used for all three components. According to one aspect of the present invention, the CPR design is modified when the luma and chroma are coded separately. In this disclosure, various methods of CPR coding with the luma/chroma separate CU structure are proposed as follows. In the following, various aspects of using the CPR mode for luma/chroma separate coding are disclosed.

CPR with Affine Motion Compensation

When affine motion compensation is used for a block, more than one MV is used for the current PU. Therefore, more than one MVD needs to be signalled in affine AMVP mode. According to one embodiment of the present invention, the affine motion compensation mode is disabled when the CPR mode is used. In other words, when the CPR mode is used for the block, the block is coded using a coding mode selected from a coding group excluding the affine motion compensation mode. To achieve this, the reference picture index for encoding/decoding current prediction unit (PU) is signalled/parsed first in the decoding order for either List 0 or List 1 before the affine mode syntax. Using List 0 as an example, if the reference picture index in List 0 for the current PU is pointing to the current image itself, there will be only one MV information (e.g. MVP/MVP index and MVD) to be encoded/decoded in List 0. At the same time, there is no need to signal/parse the affine mode syntax (e.g. the affine mode flag) in the encode/decoder side and the affine mode flag is inferred as false (affine mode is disabled), where affine mode syntax indicates whether affine AMVP mode is used for current PU. If the reference picture index points to a reference picture other than the current image itself, the affine mode syntax needs to be encoded or decoded. If the affine mode is used, information for more than one MV may need to be encoded or decoded. Similar methods may apply to List 1 reference pictures as well.

It is possible that in one reference picture list, CPR is used, while in another reference picture list, affine mode is used.

In another embodiment, the affine mode syntax is encoded or decoded before the reference picture index is encoded or decoded. For a current block, if the affine mode is applied (i.e., affine mode flag equal to true), the encoded or decoded reference picture index cannot point to the current image itself. In one example, the reference picture index codeword corresponding to the current picture is removed from the codeword table. If the affine mode is not used for the current block, a reference picture index for the current block is signalled or parsed, wherein if the reference picture list includes the current picture, a codeword of the reference picture index corresponding to the current image is included in a codeword table. In another example, bitstream conformance requires that the encoded or parsed reference picture index shall not be equal to the reference picture index pointing to the current image self if the affine mode is applied for the current block.

In another embodiment, information associated with a number of encoded or decoded MVs (e.g. MVP/MVP index and MVD) depends on the conditions of the affine mode and the CPR mode. If the affine mode is applied and the CPR mode is not applied for the block (e.g. the reference picture index pointing to a reference picture other than the current image), information associated with more than one MV is encoded or decoded in the current reference picture list. If the affine mode is applied and the CPR mode is applied (e.g. the reference picture index pointing to the current picture), information associated with only one MV is encoded or decoded in the current reference picture list. If the affine mode is not applied, information associated with only one MV is encoded or decoded in the current reference picture list.

CPR with Adaptive Motion Vector Resolution

If the CPR mode is used for coding a current PU and the assumption of adaptive MV resolution (integer MV resolution) in the CPR mode is true, there is no need for signalling the integer MV syntax for a CPR coded PU. The reference picture index in List 0 (or List 1) is decoded in the decoding order. If the reference picture is the current picture itself, it is allowed that the decoded MVD has a non-zero value. In this case, there is no need to signal the integer MV syntax (e.g. iMV flag). Only when there is a non-zero MVD value for a reference picture other than the current picture itself, the integer MV syntax needs to be signalled. Some of the examples for whether to signal the iMV flag are shown in Table 2, where exemplary combinations of reference picture selection and MVD value for iMV flag signalling decision are shown. List 0 and List 1 in Table 2 can be swapped.

TABLE 2 Ref. Picture Reference Reference Reference Reference Reference List picture MVD picture MVD picture MVD picture MVD picture MVD List 0 Current — Current — Current — Other =0 Other =0 picture picture picture picture picture List 1 Current — Other =0 Other ! = 0 Other ! = 0 Other =0 picture picture picture picture picture iMV_flag No No Yes Yes No signalling

When CPR is applied, the MV is coded in integer MV resolution where the adaptive motion vector resolution is inferred as enabled. To achieve this, the reference picture index for encoding/decoding current prediction unit (PU) is signalled or parsed first in the decoding order for either List 0 or List 1 before the adaptive motion vector resolution syntax. Using List 0 as an example, if the reference picture index in List 0 for the current PU points to the current image itself, there is no need to signal or parse the adaptive motion vector resolution syntax (e.g. adaptive motion vector flag) in the encode or decoder side respectively and the adaptive motion vector resolution flag is inferred as true (i.e., adaptive motion vector resolution enabled). If the reference picture index points a reference picture other than the current picture itself, the adaptive motion vector resolution needs to be encoded or decoded. Similar methods apply to List 1 reference pictures as well.

In another embodiment, the adaptive motion vector resolution syntax is encoded or decoded before the reference picture index is encoded or decoded. For a current block, if the adaptive motion vector resolution is not applied (i.e., adaptive motion vector resolution flag equal to false), the encoded or decoded reference picture index cannot point to the current image itself. In one example, the reference picture index codeword that corresponding to the current picture is removed from the codeword table. If the adaptive MV resolution is used for the current block, a reference picture index for the current block is signalled or parsed, wherein if the reference picture list includes the current picture, a codeword of the reference picture index corresponding to the current image is included in a codeword table. In another example, the bitstream conformance requires that the signalled or parsed reference picture index shall not be equal to the reference picture index pointing to the current image if the adaptive MV resolution is not applied for the current block.

The inventions disclosed above can be incorporated into various video encoding or decoding systems in various forms. For example, the inventions can be implemented using hardware-based approaches, such as dedicated integrated circuits (IC), field programmable logic array (FPGA), digital signal processor (DSP), central processing unit (CPU), etc. The inventions can also be implemented using software codes or firmware codes executable on a computer, laptop or mobile device such as smart phones. Furthermore, the software codes or firmware codes can be executable on a mixed-type platform such as a CPU with dedicated processors (e.g. video coding engine or co-processor).

FIG. 11 illustrates a flowchart of an exemplary coding system with the affine motion compensation or adaptive motion vector (MV) resolution mode enabled according to an embodiment of the present invention. The steps shown in the flowchart, as well as other following flowcharts in this disclosure, may be implemented as program codes executable on one or more processors (e.g., one or more CPUs) at the encoder side and/or the decoder side. The steps shown in the flowchart may also be implemented based hardware such as one or more electronic devices or processors arranged to perform the steps in the flowchart. According to this method, input data associated with a current image are received in step 1110, where the affine motion compensation or adaptive motion vector (MV) resolution is enabled for coding the current image. At the encoder side, the input data may correspond to video data to be encoded. At the decoder side, the input data may correspond to compressed video data to be decoded. In step 1120, one or more reference picture indexes for a current block is signalled or parsed. As is understood, one or more reference picture indexes for a current block is signalled at the encoder side or parsed at the decoder side. In step 1130, if one reference picture index for the current block points to the current image, the affine motion compensation is inferred as Off for the current block without a need for signalling or parsing an affine mode syntax or the adaptive MV resolution is inferred as On for the current block without a need for signalling or parsing an adaptive motion vector resolution syntax.

FIG. 12 illustrates a flowchart of an exemplary coding system with the current picture referencing (CPR) mode enabled according to an embodiment of the present invention. According to this method, input data associated with a current image is received in step 1210, wherein current picture referencing (CPR) mode is enabled, and wherein affine motion compensation or adaptive motion vector (MV) resolution is enabled for coding the current image. In step 1220, whether affine mode is used for the current block when the affine motion compensation is enabled for the current image is determined, or whether the adaptive MV resolution is used for the current block when the adaptive MV resolution is enabled for the current image is determined. In step 1230, if the affine mode is used for the current block or if the adaptive MV resolution is not used for the current block, a reference picture index for the current block is signalled or parsed, wherein the reference picture index always points to one reference picture other than the current image. As is understood that the reference picture index for the current block is signalled at the encoder side or parsed at the decoder side.

FIG. 13 illustrates a flowchart of an exemplary coding system with the current picture referencing (CPR) mode enabled according to an embodiment of the present invention. According to this embodiment, input data associated with a current image is received in step 1310, wherein current picture referencing (CPR) mode and affine motion compensation are enabled for coding the current image. Affine mode and reference picture index is determined for the current block in step 1320. As is known in the field, the encoder may determine whether to use the affine mode for the current block based on a certain performance criterion such as rate-distortion optimization (RDO). At the decoder side, whether the affine mode is used for the block can be determined from coded information, such as a syntax element in the video bitstream. In step 1330, if the affine mode is used for the current block, a number of motion vectors (MVs) for the current block for a reference picture list is signalled or parsed depending on whether the reference picture index for the current block points to the current image.

The flowcharts shown are intended to illustrate an example of video coding according to the present invention. A person skilled in the art may modify each step, re-arranges the steps, split a step, or combine steps to practice the present invention without departing from the spirit of the present invention. In the disclosure, specific syntax and semantics have been used to illustrate examples to implement embodiments of the present invention. A skilled person may practice the present invention by substituting the syntax and semantics with equivalent syntax and semantics without departing from the spirit of the present invention.

The above description is presented to enable a person of ordinary skill in the art to practice the present invention as provided in the context of a particular application and its requirement. Various modifications to the described embodiments will be apparent to those with skill in the art, and the general principles defined herein may be applied to other embodiments. Therefore, the present invention is not intended to be limited to the particular embodiments shown and described, but is to be accorded the widest scope consistent with the principles and novel features herein disclosed. In the above detailed description, various specific details are illustrated in order to provide a thorough understanding of the present invention. Nevertheless, it will be understood by those skilled in the art that the present invention may be practiced.

Embodiment of the present invention as described above may be implemented in various hardware, software codes, or a combination of both. For example, an embodiment of the present invention can be one or more circuit circuits integrated into a video compression chip or program code integrated into video compression software to perform the processing described herein. An embodiment of the present invention may also be program code to be executed on a Digital Signal Processor (DSP) to perform the processing described herein. The invention may also involve a number of functions to be performed by a computer processor, a digital signal processor, a microprocessor, or field programmable gate array (FPGA). These processors can be configured to perform particular tasks according to the invention, by executing machine-readable software code or firmware code that defines the particular methods embodied by the invention. The software code or firmware code may be developed in different programming languages and different formats or styles. The software code may also be compiled for different target platforms. However, different code formats, styles and languages of software codes and other means of configuring code to perform the tasks in accordance with the invention will not depart from the spirit and scope of the invention.

The invention may be embodied in other specific forms without departing from its spirit or essential characteristics. The described examples are to be considered in all respects only as illustrative and not restrictive. The scope of the invention is therefore, indicated by the appended claims rather than by the foregoing description. All changes which come within the meaning and range of equivalency of the claims are to be embraced within their scope.

Claims

1. A method of video encoding and decoding used by a video encoding system and video decoding system respectively, the method comprising:

receiving input data associated with a current block in a current image, wherein affine motion compensation or adaptive motion vector (MV) resolution is enabled for coding the current image;

signalling or parsing one or more reference picture indexes for a current block; and

if one reference picture index for the current block points to the current image, inferring the affine motion compensation as Off for the current block without a need for signalling or parsing an affine mode syntax or inferring the adaptive MV resolution as On for the current block without a need for signalling or parsing an adaptive motion vector resolution syntax.

2. The method of claim 1, wherein when the affine motion compensation is enabled for coding the current image and if the reference picture index for the current block points to one reference picture other than the current image, the affine mode syntax is signalled or parsed to determine whether the affine motion compensation is applied to the current block.

3. The method of claim 1, wherein when the adaptive MV resolution is enabled for coding the current image and if the reference picture index for the current block points to one reference picture other than the current image, the adaptive MV resolution syntax is signalled or parsed to determine whether the adaptive MV resolution is applied to the current block.

4. The method of claim 1, wherein if a target reference picture index for a target reference picture list points to one reference picture other than the current image and an MVD (motion vector difference) value associated with the target reference picture list is not equal to zero, the adaptive MV resolution syntax is signalled or parsed, and wherein the target reference picture list corresponds to List 0 or List 1.

5. The method of claim 1, wherein if a reference picture index points to the current image or a MVD (motion vector difference) value associated with one reference picture list is equal to zero for one or both reference picture lists, the adaptive MV resolution syntax is not signalled or parsed.

6. The method of claim 1, wherein the reference picture index for the current block is signalled or parsed before the affine mode syntax or an adaptive MV resolution mode syntax is signalled or parsed.

7. An apparatus of video encoding and decoding used by a video encoding system and video decoding system respectively, the apparatus comprising one or more electronic circuits or processors arrange to:

receive input data associated with a current block in a current image, wherein affine motion compensation or adaptive motion vector (MV) resolution is enabled for coding the current image;

signal or parse one or more reference picture indexes for a current block; and

if one reference picture index for the current block points to the current image, infer the affine motion compensation as Off for the current block without a need for signalling or parsing an affine mode syntax or infer the adaptive MV resolution as On for the current block without a need for signalling or parsing an adaptive motion vector resolution syntax.

8-20. (canceled)