VIDEO ENCODING DEVICE AND VIDEO DECODING DEVICE
A device is provided with: a first transformer which transforms an coding unit (CU); and a second transformer which transforms a part of first transform coefficients output from the first transformer, wherein the second transformer transforms at least any of the first transform coefficients for a region (first region) having different sizes in a horizontal direction and a vertical direction or the first transform coefficients for a non-rectangular region (second region).
The present invention relates to an image decoding device and an image encoding device.
BACKGROUND ARTAn image encoding device which generates coded data by coding a video, and an image decoding device which generates decoded images by decoding the coded data are used to transmit or record a video efficiently.
For example, specific video coding schemes include schemes proposed in H.264/AVC and High-Efficiency Video Coding (HEVC).
In such a video coding scheme, images (pictures) constituting a video are managed by a hierarchy structure including slices obtained by splitting images, units of coding (also referred to as coding unit (CUs)) obtained by splitting slices, prediction units (PUs) which are blocks obtained by splitting coding units, and transform units (TUs), and are coded/decoded for each CU.
In such a video coding scheme, usually, a prediction image is generated based on local decoded images obtained by coding/decoding input images, and prediction residuals (also sometimes referred to as “difference images” or “residual images”) obtained by subtracting the prediction images from input images (original image) are coded. Generation methods of prediction images include an inter-screen prediction (an inter prediction) and an intra-screen prediction (intra prediction).
In a video encoding device, quantization transform coefficients where orthogonal transform and quantization have been performed on the prediction residuals are coded, and in a video decoding device, quantization transform coefficients are decoded from coded data, and inverse quantization and inverse orthogonal transform are performed to recover the prediction residuals (NPL 2). In recent years, a technique has been developed in which transform coefficient values are concentrated in the vicinity of zero and the amount of coding is reduced by performing second transform (secondary transform) on transform coefficients after performing first orthogonal transform (primary transform) on prediction residuals (NPL 1).
CITATION LIST Non Patent Literature
- NPL 1: “Algorithm Description of Joint Exploration Test Model 5”, JVET-E1001, Joint Video Exploration Team (JVET) of ITU-T SG 16 WP 3 and ISO/IEC JTC 1/SC 29/WG 11, 12-20 Jan. 2017
- NPL 2: ITU-T H.265 (04/2015) SERIES H: AUDIOVISUAL AND MULTIMEDIA SYSTEMS Infrastructure of audiovisual services—Coding of moving video High efficiency video coding
In NPL 1, the image encoding device performs primary transform on prediction residuals to concentrate the energy in specific components, and then performs secondary transform on transform coefficients of the prediction residuals, to further increase the energy concentration. The image encoding device performs quantization and entropy coding processing on the results to generate coded data. The image decoding device performs inverse secondary inverse transform and inverse primary inverse transform on the transform coefficients where entropy decoding and the inverse quantization are performed on the coded data.
With the secondary transform, the energy is concentrated in certain components (low frequency components), but greatly increases the amount of processing. In a case of using non-separable transform for secondary transform, the energy concentration of diagonal direction components that could not be handled in separable primary transform can also be increased, so the line segments in the diagonal direction with high quality can be reproduced. However, as the operation amount of transform with a length N is known to be O (N{circumflex over ( )}2) or O (N log N), transform of long components increases the complexity.
The present invention is made in view of the problem described above, and has an object to provide an image decoding device and an image encoding device that are capable of reducing the amount of processing and the complexity of transform while maintaining coding amount reduction effects.
Solution to ProblemAn image encoding device according to one aspect of the present invention includes: a divider configured to divide a picture of the input video into a coding unit (CU) including multiple pixels; a transformer configured to perform predetermined transform with the CU as a unit and output transform coefficients;
a quantizer configured to quantize the transform coefficients and output quantization transform coefficients; and an encoder configured to perform variable-length coding on the quantization transform coefficients, wherein the transformer includes a first transformer, and a second transformer configured to perform transform on a part of first transform coefficients output from the first transformer, and the second transformer performs transform on at least any of the first transform coefficients for a region (first region) having different sizes in a horizontal direction and a vertical direction, or the first transform coefficients for a non-rectangular region (second region).
An image decoding device according to one aspect of the present invention includes: a decoder configured to perform variable-length decoding on coded data with a coding unit (CU) including multiple pixels as a processing unit, and output quantization transform coefficients; an inverse quantizer configured to perform inverse quantization on quantization transform coefficients and output transform coefficients; and an inverse transformer configured to perform inverse transform on the transform coefficients, wherein the inverse transformer includes a second inverse transformer configured to perform inverse transform on at least a part of the transform coefficients and outputting second transform coefficients, and a first inverse transformer configured to perform inverse transform on a remainder of the transform coefficients and the second transform coefficients, and the second inverse transformer performs inverse transform on at least any of the transform coefficients for a region (first region) having different sizes in a horizontal direction and a vertical direction, or the transform coefficients for a non-rectangular region (second region).
Advantageous Effects of InventionAccording to one aspect of the present invention, the amount of processing of video coding and decoding and the memory used can be reduced while suppressing reduction in the coding efficiency.
Hereinafter, embodiments of the present invention are described with reference to the drawings.
The image transmission system 1 is a system configured to transmit codes of a coding target image having been coded, decode the transmitted codes, and display an image. The image transmission system 1 includes an image encoding device 11, a network 21, an image decoding device 31, and an image display device 41.
An image T indicating an image of a single layer or multiple layers is input to the image encoding device 11. A layer is a concept used to distinguish multiple pictures in a case that there are one or more pictures to configure a certain time. For example, coding an identical picture in multiple layers having different image qualities and resolutions is scalable coding, and coding pictures having different viewpoints in multiple layers is view scalable coding. In a case of performing a prediction (an inter-layer prediction, an inter-view prediction) between pictures in multiple layers, coding efficiency greatly improves. In a case of not performing a prediction, in a case of (simulcast), coded data can be compiled.
The network 21 transmits a coding stream Te generated by the image encoding device 11 to the image decoding device 31. The network 21 is the Internet (internet), Wide Area Network (WAN), Local Area Network (LAN), or combinations thereof. The network 21 is not necessarily a bidirectional communication network, but may be a unidirectional communication network configured to transmit broadcast wave such as digital terrestrial television broadcasting and satellite broadcasting. The network 21 may be substituted by a storage medium that records the coding stream Te, such as Digital Versatile Disc (DVD) and Blu-ray Disc (BD (trade name)).
The image decoding device 31 decodes each of the coding streams Te transmitted by the network 21, and generates one or multiple decoded images Td having decoded each.
The image display device 41 displays all or part of one or multiple decoded images Td generated by the image decoding device 31. For example, the image display device 41 includes a display device such as a liquid crystal display and an organic Electro-luminescence (EL) display. In spatial scalable coding and SNR scalable coding, in a case that the image decoding device 31 and the image display device 41 have high processing capability, an enhanced layer image having high image quality is displayed, and in a case of having lower processing capability, a base layer image which does not require as high processing capability and display capability as an enhanced layer is displayed.
OperatorOperators used herein will be described below.
>> is a right bit shift, << is a left bit shift, & is a bitwise AND, | is a bitwise OR, and |= is a sum operation (OR) with another condition.
x ? y: z is a ternary operator to take y in a case that x is true (other than 0), and take z in a case that x is false (0).
Clip3 (a, b, c) is a function to clip c in a value equal to or greater than a and equal to or less than b, and a function to return a in a case that c is less than a (c<a), return b in a case that c is greater than b (c>b), and return c otherwise (however, a is equal to or less than b (a<=b)).
Structure of Coding Stream TePrior to detailed descriptions of the image encoding device 11 and the image decoding device 31 according to the present embodiment, a data structure of the coding stream Te generated by the image encoding device 11 and decoded by the image decoding device 31 will be described.
In the coding video sequence, a set of data referred to by the image decoding device 31 to decode the sequence SEQ of a processing target is prescribed. As illustrated in (a) of
In the video parameter set VPS, in a video including multiple layers, a set of coding parameters common to multiple videos and a set of coding parameters associated with multiple layers and an individual layer included in a video are prescribed.
In the sequence parameter set SPS, a set of coding parameters referred to by the image decoding device 31 to decode a target sequence is prescribed. For example, width and height of a picture are prescribed. Note that multiple SPSs may exist. In that case, any of multiple SPSs is selected from the PPS.
In the picture parameter set PPS, a set of coding parameters referred to by the image decoding device 31 to decode each picture in the target sequence is prescribed. For example, a reference value (pic_init_qp_minus26) of a quantization step size used for decoding of a picture and a flag (weighted_pred_flag) indicating an application of a weighted prediction are included. Note that multiple PPSs may exist. In that case, any of multiple PPSs is selected from each picture in the target sequence.
Coding PictureIn the coding picture, a set of data referred to by the image decoding device 31 to decode the picture PICT of a processing target is prescribed. As illustrated in (b) of
Note that in a case that it is not necessary to distinguish each of the slices S0 to SNS-1, subscripts of reference signs may be omitted and described below. The same applies to other data included in the coding stream Te described below and data described with a subscript added.
Coding SliceIn the coding slice, a set of data referred to by the image decoding device 31 to decode the slice S of a processing target is prescribed. As illustrated in (c) of
The slice header SH includes a coding parameter group referred to by the image decoding device 31 to determine a decoding method of a target slice. Slice type specification information (slice_type) to specify a slice type is one example of a coding parameter included in the slice header SH.
Examples of slice types that can be specified by the slice type specification information include (1) I slice using only an intra prediction in coding, (2) P slice using a unidirectional prediction or an intra prediction in coding, and (3) B slice using a unidirectional prediction, a bidirectional prediction, or an intra prediction in coding, and the like. Note that the inter prediction is not limited to a uni-prediction or a bi-prediction, and a greater number of reference pictures may be used to generate a prediction image. Hereinafter, P and B slices refer to slices that include a block that can employ an inter prediction.
Note that, the slice header SH may include a reference (pic_parameter_set_id) to the picture parameter set PPS included in the above described coding video sequence.
Coding Slice DataIn the coding slice data, a set of data referred to by the image decoding device 31 to decode the slice data SDATA of a processing target is prescribed. As illustrated in (d) of
As illustrated in (e) of
The CT includes a QT split flag (cu_split_flag) indicating whether or not to perform a QT split and a BT split mode (split_bt_mode) indicating a split method of a BT split as CT information. cu_split_flag and/or split_bt_mode is transmitted for each coding node CN. In a case that cu_split_flag is 1, the coding node CN is split into four coding node CNs. In a case that cu_split_flag is 0, and in a case that split_bt_mode is 1, the coding node CN is split horizontally into two coding nodes CNs. In a case that split_bt_mode is 2, the coding node CN is split vertically into two coding nodes CNs. In a case that split_bt_mode is 0, the coding node CN is not split, having one coding unit CU as a node. The coding unit CU is an end node (leaf node) of the coding nodes, and is not split anymore.
In a case that the size of the coding tree unit CTU is 64×64 pixels, the size of the coding unit can take any of 64×64 pixel, 64×32 pixels, 32×64 pixels, 32×32 pixels, 64×16 pixels, 16×64 pixels, 32×16 pixels, 16×32 pixels, 16×16 pixels, 64×8 pixels, 8×64 pixels, 32×8 pixels, 8×32 pixels, 16×8 pixels, 8×16 pixels, 8×8 pixels, 64×4 pixels, 4×64 pixels, 32×4 pixels, 4×32 pixels, 16×4 pixels, 4×16 pixels, 8×4 pixels, 4×8 pixels, and 4×4 pixels.
Coding UnitAs illustrated in (f) of
In the prediction tree, prediction parameters (a reference picture index, a motion vector, and the like) of each prediction unit (PU) where the coding unit is split into one or multiple pieces is prescribed. In another expression, the prediction unit is one or multiple non-overlapping regions constituting the coding unit. The prediction tree includes one or multiple prediction units obtained by the above-mentioned split. Note that, in the following, a unit of prediction where the prediction unit is further split is referred to as a “subblock”. The subblock includes multiple pixels. In a case that the sizes of the prediction unit and the subblock are the same, there is one subblock in the prediction unit. In a case that the prediction unit is larger than the size of the subblock, the prediction unit is split into subblocks. For example, in a case that the prediction unit is 8×8, and the subblock is 4×4, the prediction unit is split into four subblocks formed by horizontal split into two and vertical split into two.
The prediction processing may be performed for each of these prediction units (subblocks).
Generally speaking, there are two types of splits in the prediction tree, including a case of an intra prediction and a case of an inter prediction. The intra prediction is a prediction in an identical picture, and the inter prediction refers to a prediction processing performed between mutually different pictures (for example, between display times, and between layer images).
In a case of an intra prediction, the split method has 2N×2N (the same size as the coding unit) and N×N.
In a case of an inter prediction, the split method includes coding by a PU split mode (part_mode) of the coded data, and includes 2N×2N (the same size as the coding unit), 2N×N, 2N×nU, 2N×nD, N×2N, nL×2N, nR×2N and N×N, and the like. Note that 2N×N and N×2N indicate a symmetric split of 1:1, and 2N×nU, 2N×nD and nL×2N, nR×2N indicate an asymmetry split of 1:3 or 3:1. The PUs included in the CU are expressed as PU0, PU1, PU2, and PU3 sequentially.
(a) to (h) of
In the transform tree, the coding unit is split into one or multiple transform units, and a position and a size of each transform unit are prescribed. In another expression, the transform unit is one or multiple non-overlapping regions constituting the coding unit. The transform tree includes one or multiple transform units obtained by the above-mentioned split.
Splits in the transform tree include those to allocate a region that is the same size as the coding unit as a transform unit, and those by recursive quad tree splits similar to the above-mentioned split of CUs.
A transform processing is performed for each of these transform units.
Prediction ParameterA prediction image of Prediction Units (PUs) is derived by prediction parameters attached to the PUs. The prediction parameter includes a prediction parameter of an intra prediction or a prediction parameter of an inter prediction. The prediction parameter of an inter prediction (inter prediction parameter) will be described below. The inter prediction parameter is constituted by prediction list utilization flags predFlagL0 and predFlagL1, reference picture indexes refIdxL0 and refIdxL1, and motion vectors mvL0 and mvL1. The prediction list utilization flags predFlagL0 and predFlagL1 are flags to indicate whether or not reference picture lists referred to as L0 list and L1 list respectively are used, and a corresponding reference picture list is used in a case that the value is 1. Note that, in a case that the present specification mentions “a flag indicating whether or not XX”, a flag being other than 0 (for example, 1) assumes a case of XX, and a flag being 0 assumes a case of not XX, and 1 is treated as true and 0 is treated as false in a logical negation, a logical product, and the like (hereinafter, the same is applied). However, other values can be used for true values or false values in real devices and methods.
For example, syntax elements to derive inter prediction parameters included in a coded data include a PU split mode part_mode, a merge flag merge_flag, a merge index merge_idx, an inter prediction indicator inter_pred_idc, a reference picture index refIdxLX, a prediction vector index mvp_LX_idx, and a difference vector mvdLX.
Reference Picture ListA reference picture list is a list constituted by reference pictures stored in a reference picture memory 306.
Decoding (coding) methods of prediction parameters include a merge prediction (merge) mode and an Adaptive Motion Vector Prediction (AMVP) mode, and merge flag merge_flag is a flag to identify these. The merge mode is a mode to derive from prediction parameters of neighboring PUs already processed without including a prediction list utilization flag predFlagLX (or an inter prediction indicator inter_pred_idc), a reference picture index refIdxLX, and a motion vector mvLX in coded data. The AMVP mode is a mode to include an inter prediction indicator inter_pred_idc, a reference picture index refIdxLX, a motion vector mvLX in coded data. Note that, the motion vector mvLX is coded as a prediction vector index mvp_LX_idx identifying a prediction vector mvpLX and a difference vector mvdLX.
The inter prediction indicator inter_pred_idc is a value indicating a type and the number of reference pictures, and takes any value of PRED_L0, PRED_L1, and PRED_B1. PRED_L0 and PRED_L1 indicate to uses reference pictures managed in the reference picture list of the L0 list and the L1 list respectively, and indicate to use one reference picture (uni-prediction). PRED_BI indicates to use two reference pictures (bi-prediction BiPred), and use reference pictures managed in the L0 list and the L1 list. The prediction vector index mvp_LX_idx is an index indicating a prediction vector, and the reference picture index refIdxLX is an index indicating reference pictures managed in the reference picture list. Note that LX is a description method used in a case of not distinguishing the L0 prediction and the L1 prediction, and distinguishes parameters for the L0 list and parameters for the L1 list by replacing LX with L0 and L1.
The merge index merge_idx is an index to indicate to use which prediction parameter as a prediction parameter of a decoding target PU among prediction parameter candidates (merge candidates) derived from PUs of which the processing is completed.
Motion VectorThe motion vector mvLX indicates a gap quantity between blocks in two different pictures. A prediction vector and a difference vector related to the motion vector mvLX is referred to as a prediction vector mvpLX and a difference vector mvdLX respectively.
Inter Prediction Indicator Inter_Pred_Idc and Prediction List Utilization Flag predFlagLX
A relationship between an inter prediction indicator inter_pred_idc and prediction list utilization flags predFlagL0 and predFlagL1 are as follows, and those can be transformed mutually.
inter_pred_idc=(predFlagL1<<1)+predFlagL0
predFlagL0=inter_pred_idc & 1
predFlagL1=inter_pred_idc>>1
Note that an inter prediction parameter may use a prediction list utilization flag or may use an inter prediction indicator. A determination using a prediction list utilization flag may be replaced with a determination using an inter prediction indicator. On the contrary, a determination using an inter prediction indicator may be replaced with a determination using a prediction list utilization flag.
Determination of Bi-Prediction biPred
A flag biPred of whether or not a bi-prediction BiPred can be derived from whether or not two prediction list utilization flags are both 1. For example, the flag can be derived by the following equation.
biPred=(predFlagL0==1 && predFlagL1==1)
The flag biPred can be also derived from whether an inter prediction indicator is a value indicating to use two prediction lists (reference pictures). For example, the flag can be derived by the following equation.
biPred=(inter_pred_idc==PRED_BI)?1:0
The above described equation can be also expressed with the following equation.
biPred=(inter_pred_idc==PRED_BI)
Note that, for example, PRED_BI can use the value of 3.
Intra Prediction ModeLuminance intra prediction modes IntraPredModeY include 67 modes, and corresponds to a planar prediction (0), a DC prediction (1), and directional predictions (2 to 66). Chrominance intra prediction modes IntraPredModeC include 68 mode including a Colour Component Linear Mode (CCLM) added to the 67 modes described above. The CCLM is a mode in which a pixel value of a target pixel in a target color component is derived by a linear prediction with reference to a pixel value of another color component coded before the target color component. Note that the color component includes a luminance Y, a chrominance Cb, and a chrominance Cr. Different intra prediction modes may be assigned depending on chrominance and luminance, and prediction modes are coded or decoded in every CU or every PU.
Configuration of Image Decoding DeviceA configuration of the image decoding device 31 according to the present embodiment will now be described.
The prediction parameter decoding unit 302 includes an inter prediction parameter decoding unit 303 and an intra prediction parameter decoding unit 304. The prediction image generation unit 308 includes an inter prediction image generation unit 309 and an intra prediction image generation unit 310.
The entropy decoding unit 301 performs entropy decoding on the coding stream Te input from the outside, and separates and decodes individual codes (syntax elements). Separated codes include prediction parameters to generate a prediction image and residual information to generate a difference image and the like.
The entropy decoding unit 301 outputs a part of the separated codes to the prediction parameter decoding unit 302. For example, a part of the separated codes includes a prediction mode predMode, a PU split mode part_mode, a merge flag merge_flag, a merge index merge_idx, an inter prediction indicator inter_pred_idc, a reference picture index ref_Idx_LX, a prediction vector index mvp_LX_idx, and a difference vector mvdLX. The control of which code to decode is performed based on an indication of the prediction parameter decoding unit 302. The entropy decoding unit 301 outputs quantization coefficients to the inverse quantization and inverse transformer unit 311. These quantization coefficients are coefficients obtained by performing orthogonal transform (discrete cosine transform, discrete sine transform, and the like) on residual signals and quantizing the result in coding processing.
The inter prediction parameter decoding unit 303 decodes an inter prediction parameter with reference to a prediction parameter stored in the prediction parameter memory 307, based on a code input from the entropy decoding unit 301.
The inter prediction parameter decoding unit 303 outputs a decoded inter prediction parameter to the prediction image generation unit 308, and also stores the decoded inter prediction parameter in the prediction parameter memory 307.
The intra prediction parameter decoding unit 304 decodes an intra prediction parameter with reference to a prediction parameter stored in the prediction parameter memory 307, based on a code input from the entropy decoding unit 301. The intra prediction parameter is a parameter used in a processing to predict a CU in one picture, for example, an intra prediction mode IntraPredMode. The intra prediction parameter decoding unit 304 outputs a decoded intra prediction parameter to the prediction image generation unit 308, and also stores the decoded intra prediction parameter in the prediction parameter memory 307.
The loop filter 305 applies a filter such as a deblocking filter, a sample adaptive offset (SAO), and an adaptive loop filter (ALF) on a decoded image of a CU generated by the addition unit 312.
The reference picture memory 306 stores a decoded image of a CU generated by the addition unit 312 in a position prescribed for each picture and CU of a decoding target.
The prediction parameter memory 307 stores a prediction parameter in a position prescribed for each picture and prediction unit (or a subblock, a fixed size block, and a pixel) of a decoding target. Specifically, the prediction parameter memory 307 stores an inter prediction parameter decoded by the inter prediction parameter decoding unit 303, an intra prediction parameter decoded by the intra prediction parameter decoding unit 304 and a prediction mode predMode separated by the entropy decoding unit 301. For example, inter prediction parameters stored include a prediction list utilization flag predFlagLX (an inter prediction indicator inter_pred_idc), a reference picture index refIdxLX, and a motion vector mvLX.
To the prediction image generation unit 308, a prediction mode predMode input from the entropy decoding unit 301 is input, and also a prediction parameter is input from the prediction parameter decoding unit 302. The prediction image generation unit 308 reads a reference picture from the reference picture memory 306. The prediction image generation unit 308 generates a prediction image of a PU by using a prediction parameter input and a reference picture read, with a prediction mode indicated by the prediction mode predMode.
Here, in a case that the prediction mode predMode indicates an inter prediction mode, the inter prediction image generation unit 309 generates a prediction image of a PU by an inter prediction by using an inter prediction parameter input from the inter prediction parameter decoding unit 303 and a read reference picture.
For a reference picture list (an L0 list or an L1 list) where a prediction list utilization flag predFlagLX is 1, the inter prediction image generation unit 309 reads a reference picture block from the reference picture memory 306 in a position indicated by a motion vector mvLX, based on a decoding target PU, from reference pictures indicated by the reference picture index refIdxLX. The inter prediction image generation unit 309 performs a prediction based on a read reference picture block and generates a prediction image of a PU. The inter prediction image generation unit 309 outputs the generated prediction image of the PU to the addition unit 312.
In a case that the prediction mode predMode indicates an intra prediction mode, the intra prediction image generation unit 310 performs an intra prediction by using an intra prediction parameter input from the intra prediction parameter decoding unit 304 and a read reference picture. Specifically, the intra prediction image generation unit 310 reads an adjacent PU, which is a picture of a decoding target, in a prescribed range from a decoding target PU among PUs already decoded, from the reference picture memory 306.
The prescribed range is, for example, any of adjacent PUs in left, top left, top, and top right in a case that a decoding target PU moves in order of so-called raster scan sequentially, and varies according to the intra prediction mode. The order of the raster scan is an order to move sequentially from the left edge to the right edge in each picture for each row from the top edge to the bottom edge.
The intra prediction image generation unit 310 performs a prediction in a prediction mode indicated by the intra prediction mode IntraPredMode for a read adjacent PU, and generates a prediction image of a PU. The intra prediction image generation unit 310 outputs the generated prediction image of the PU to the addition unit 312.
In a case that the intra prediction parameter decoding unit 304 derives different intra prediction modes depending on the luminance and chrominance, the intra prediction image generation unit 310 generates a prediction image of a PU of luminance by any of a planar prediction (0), a DC prediction (1), and directional predictions (2 to 66) depending on the luminance prediction mode IntraPredModeY, and generates a prediction image of a PU of chrominance by any of a planar prediction (0), a DC prediction (1), directional predictions (2 to 66), and LM mode (67) depending on the chrominance prediction mode IntraPredModeC.
A detailed block diagram of the inverse quantization and inverse transformer unit 311 is illustrated in
The addition unit 312 adds a prediction image of a PU input from the inter prediction image generation unit 309 or the intra prediction image generation unit 310 and a residual signal input from the inverse quantization and inverse transformer unit 311 for each pixel, and generates a decoded image of a PU. The addition unit 312 stores the generated decoded image of a PU in the reference picture memory 306, and outputs a decoded image Td where the generated decoded image of the PU is integrated for each picture to the outside.
Configuration of Image Encoding DeviceA configuration of the image encoding device 11 according to the present embodiment will now be described.
For each picture of an image T, the prediction image generation unit 101 generates a prediction image P of a prediction unit PU for each coding unit CU that is a region where the picture is split. Here, the prediction image generation unit 101 reads a block that has been decoded from the reference picture memory 109, based on a prediction parameter input from the prediction parameter encoder unit 111. For example, in a case of an inter prediction, the prediction parameter input from the prediction parameter encoder unit 111 is a motion vector. The prediction image generation unit 101 reads a block in a position in a reference image indicated by a motion vector starting from a target PU. In a case of an intra prediction, the prediction parameter is, for example, an intra prediction mode. The prediction image generation unit 101 reads a pixel value of an adjacent PU used in an intra prediction mode from the reference picture memory 109, and generates the prediction image P of a PU. The prediction image generation unit 101 generates the prediction image P of a PU by using one prediction scheme among multiple prediction schemes for the read reference picture block. The prediction image generation unit 101 outputs the generated prediction image P of a PU to the subtraction unit 102.
Note that the prediction image generation unit 101 performs operations same as the prediction image generation unit 308 already described, so description thereof will be omitted here.
The prediction image generation unit 101 generates the prediction image P of a PU, based on a pixel value of a reference block read from the reference picture memory, by using a parameter input by the prediction parameter encoder unit. The prediction image generated by the prediction image generation unit 101 is output to the subtraction unit 102 and the addition unit 106.
The subtraction unit 102 subtracts a signal value of the prediction image P of a PU input from the prediction image generation unit 101 from a pixel value of a corresponding PU of the image T, and generates residual signals. The subtraction unit 102 outputs the generated residual signals to the transform and quantization unit 103.
A detailed block diagram of the transform and quantization unit 103 is illustrated in
The primary transform generally performs separable transform on prediction residuals in every CU or every TU. To perform transform with mutually independent transform axes suitable for the characteristics of the prediction residuals, a transform base may be selected form multiple transform bases, such as DCT-2, DCT-5, DCT-8, DST-1, DST-7, and the like. The inverse primary transform described in the image decoding device is inverse transform of primary transform, and uses a base for inverse transform corresponding to the transform base used in the primary transform.
Next, secondary transform and inverse transform thereof will be described.
The secondary transform is transform applied to coefficients after the primary transform. In general, the primary transform is realized by separable transform, and for example, is not optimal for diagonal direction components, and the energy is not efficiently concentrated. In such a case, the energy can be concentrated in specific components by performing transform effective for components in the diagonal direction again on the primary transform coefficients.
The secondary transform and the inverse secondary transform procedure are illustrated with reference to
In a case that separable two-dimensional transform such as ROtational Transform (ROT) is used for the secondary transform, processing of the sorting unit 702, the secondary transformer unit 703, and the sorting unit 704 is described below.
Where [ROTf] is an array of the ROT transform bases. In Formula 2, the two-dimensional arrays of ROTf, SX are considered matrixes and the product of the matrixes is calculated (hereinafter, an array is considered as a matrix to calculate). T [X] represents a transpose of a matrix [X]. The separable two-dimensional transform of Formula 2 may also be realized by applying one-dimensional transform, which is a product of the transform base matrix, twice to the input matrix. In this case, the second transform is performed with a matrix obtained by transpose of the output of the first transform as an input.
The ROT is transform of the high energy concentration of the diagonal direction components among separable transforms, but a non-separable transform with higher energy concentration than separable transform may be used as the secondary transform instead of ROT. In this case, in addition to the above-described processing, the sorting unit 702 performs processing to arrange a two-dimensional array into a one-dimensional array. Furthermore, in addition to the above-described processing, the sorting unit 704 performs processing to arrange a one-dimensional array into a two-dimensional array.
In a case that non-separable transform is used for the secondary transform, processing of the sorting unit 702, the secondary transformer unit 703, and the sorting unit 704 is described below. Here, the M*M array SX and the M*M array SY are both one-dimensional arrays.
Where Tf is non-separable transform (one-dimensional transform) array, and may be a one-dimensional DCT-2, DCT-5, DCT-8, DST-1, DST-7, Hypercube-Givens Transform (HyGT), and the like. Some examples of ROT and non-separable transforms are illustrated below.
The primary transformer unit 701 performs the primary transform on the prediction residuals with every CU (S1101). The sorting unit 702 compares the width W and height H of the CU to a predetermined threshold TH (S1102). In a case that either one of W and H is less than the threshold TH, the sorting unit 702 sets to the secondary transform size M as M1 (M=M1) (S1103), or otherwise M as M2 (M=M2) (S1104). Here M1 is less than M2 (M1<M2). It is desirable that the M1 and M2 are a power of two. As illustrated in Formula 1 or Formula 4 above, the sorting unit 702 sets the M*M primary transform coefficients as the input SX to the secondary transformer unit 703 (S1105). The secondary transformer unit 703 applies separable transform illustrated in Formula 2 or non-separable transform illustrated in Formula 5 to the input M*M primary transform coefficients, and performs the secondary transform (S1106). As illustrated in Formula 3 or Formula 6 above, the sorting unit 704 sets the primary transform coefficients and the secondary transform coefficients as an input PY to the quantization unit 705 (S1107). The quantization unit 705 performs quantization on the transform coefficients PY (S1108).
Next, the inverse secondary transform will be described. As illustrated in
In a case that separable two-dimensional transform such as ROtational Transform (ROT) is used for the secondary transform, processing of the sorting unit 707, the inverse secondary transformer unit 708, and the sorting unit 709 is described below.
Where [ROTb] is an array of the inverse ROT transform bases and T [X] represents the transpose of a matrix [X].
In a case that non-separable transform is used instead of ROT as the secondary transform, in addition to the above-described processing, the sorting unit 707 performs processing to arrange a two-dimensional array into a one-dimensional array. Furthermore, in addition to the above-described processing, the sorting unit 709 performs processing to arrange a one-dimensional array into a two-dimensional array.
In a case that non-separable transform is used for the secondary transform, processing of the sorting unit 707, the inverse secondary transformer unit 708, and the sorting unit 709 is described below. Here, the array SX′ and the array SY′ are both one-dimensional arrays having M*M size.
Where [Tb] is an array of non-separable transform, and may be inverse transform such as the above-mentioned one-dimensional DCT-2, DCT-5, DCT-8, DST-1 DST-7, Hypercube-Givens Transform (HyGT), and the like.
The inverse quantization unit 706 performs inverse quantization on the quantization transform coefficients of the prediction residuals decoded by the entropy decoding unit 301 (S1109). The sorting unit 707 compares the width W and height H of the CU to a predetermined threshold TH (S1110). In a case that either one of W and H is less than the threshold TH, the sorting unit 707 sets the inverse secondary transform size M as M1 (M=M1) (S1111), and otherwise M as M2 (M=M2) (S1112). The sorting unit 707 extracts an M×M region as a region for the secondary transform using the M configured. Here, M1 and M2 are the same as those used in the flowchart of
In the above, in a case that the thresholds TH is equal to 8, M1 may be configured to 4 (M1=4) and M2 may be configured to 8 (M2=8).
The secondary transform is applied in an intra prediction and selects an intra prediction mode iPred and transform to be applied for each CU by reference to an index nIdx.
To the entropy encoder unit 104, quantization coefficients are input from the transform and quantization unit 103, and prediction parameters are input from the prediction parameter encoder unit 111. For example, input prediction parameters include codes such as a reference picture index ref_Idx_LX, a prediction vector index mvp_LX_idx, a difference vector mvdLX, a prediction mode pred_mode_flag, and a merge index merge_idx.
The entropy encoder unit 104 performs entropy coding on input split information, prediction parameters, quantization transform coefficients, and the like to generate the coding stream Te, and outputs the generated coding stream Te to the outside.
The inverse quantization and inverse transformer unit 105 is the same as the inverse quantization and inverse transformer unit 311 (
The addition unit 106 adds signal values of the prediction image P of a PU input from the prediction image generation unit 101 and signal values of the residual signals input from the inverse quantization and inverse transformer unit 105 for each pixel, and generates a decoded image. The addition unit 106 stores the generated decoded image in the reference picture memory 109.
The loop filter 107 performs a deblocking filter, a sample adaptive offset (SAO), and an adaptive loop filter (ALF) on the decoded image generated by the addition unit 106.
The prediction parameter memory 108 stores the prediction parameters generated by the coding parameter determination unit 110 in a position prescribed for each picture and CU of a coding target.
The reference picture memory 109 stores the decoded image generated by the loop filter 107 in a position prescribed for each picture and CU of a coding target.
The coding parameter determination unit 110 selects one set among multiple sets of coding parameters. The coding parameters are the above-mentioned QTBT split parameters or prediction parameters or parameters to be a target of coding generated associated with these. The prediction image generation unit 101 generates the prediction image P of a PU by using each of the sets of these coding parameters.
The coding parameter determination unit 110 calculates an RD cost value indicating a volume of an information quantity and coding errors for each of the multiple sets. For example, the RD cost value is a sum of a code amount and a value of multiplying a coefficient X by a square error. The code amount is an information quantity of the coding stream Te obtained by performing entropy coding on quantization residuals and coding parameters. The square error is a sum of pixels for square values of residual values of residual signals calculated in the subtraction unit 102. The coefficient X is a real number that is larger than a pre-configured zero. The coding parameter determination unit 110 selects a set of coding parameters by which the calculated RD cost value is minimized. With this configuration, the entropy encoder unit 104 outputs the selected set of coding parameters as the coding stream Te to the outside, and does not output sets of coding parameters that are not selected. The coding parameter determination unit 110 stores the determined coding parameters in the prediction parameter memory 108.
The prediction parameter encoder unit 111 derives a format for coding from parameters input from the coding parameter determination unit 110, and outputs the format to the entropy encoder unit 104. A derivation of a format for coding is, for example, to derive a difference vector from a motion vector and a prediction vector. The prediction parameter encoder unit 111 derives parameters necessary to generate a prediction image from parameters input from the coding parameter determination unit 110, and outputs the parameters to the prediction image generation unit 101. For example, parameters necessary to generate a prediction image are a motion vector of a subblock unit.
The inter prediction parameter encoder unit 112 derives inter prediction parameters such as a difference vector, based on prediction parameters input from the coding parameter determination unit 110. The inter prediction parameter encoder unit 112 includes a partly identical configuration to a configuration by which the inter prediction parameter decoding unit 303 (see
The intra prediction parameter encoder unit 113 derives a format for coding (for example, MPM_idx, rem_intra_luma_pred_mode, and the like) from the intra prediction mode IntraPredMode input from the coding parameter determination unit 110.
As illustrated in
Conventionally, secondary transform has used square and 2 power sizes. The next large transform of M1*M1 is (M1*2)*(M1*2)=M1*M2*4, and the transform size increases four times. In the present invention, by configuring the size of the secondary transform to that having different sizes horizontally and vertically, as in M*N (M=M1*2, N=M1 or M=M1, N=M1*2), rather than M*M, the transform size is suppressed to two times, rather than four times. In particular, in non-separable transform, in a case that the number of inputs becomes four times, the base of the transform coefficients becomes 16 times, so suppressing the transform size to two times has a significant effect on the reduction of memory used. Additionally, by performing M*N transform only for input including low frequency components, rather than twice, the amount of processing and the amount of memory used can be further reduced. Since most of non-zero transform coefficients after primary transform is concentrated in the low frequency region, there is no significant decrease in the coding efficiency even in a case that secondary transform is not performed on the input on the high-frequency component side.
The separable secondary transform and the secondary inverse transform are described below.
Secondary transformer unit 703: SY=[ROTf_N][SX]T[ROTf_M] (Formula 17)
Inverse secondary transformer unit 708: SX′=T[ROTb_N][SY′][ROTb_M] (Formula 18)
Where “N” and “M” are vertical and horizontal sizes of the transform. That is, [ROTf_N] is an N*N array, and [ROTf_M] is an M*M array.
The non-separable secondary transform and the inverse secondary transform are represented by Formula 5 and Formula 11.
Hereinafter, the shape of an input region of the secondary transform extracted by the sorting unit 702 of the transform and quantization unit 103 (input SX of the secondary transform) is described, and the shape is also used as the shape of an output region of the inverse secondary transform in the sorting unit 709 of the inverse quantization and inverse transformer unit 311 (105). That is, depending on the scanning direction, the intra prediction direction, and the block shape, a region configuring the output of the inverse secondary transform may be selected.
Scanning Direction Dependent Secondary TransformFor M*M secondary transform (
As another configuration, secondary transform region that is vertically long, for example, the input SX in
In addition, as another configuration, in a case of separable secondary transform, secondary transform region that is vertically long, for example, the input SX in
Note that, as illustrated in the flowchart described below, in a case that either one of the width W and the height H of the CU or TU is less than the threshold TH, a square region of 4*4, for example, is used as the input of the secondary transform region, and in other cases (both the width W and the height H of the CU or TU are equal to or greater than the threshold TH), the non-square region described above may be used as the input of the secondary transform region.
Intra Direction Dependent Secondary TransformAs another example, for M*M secondary transform (
As another configuration, in a case that the intra prediction mode is near the horizontal direction, for example, a secondary transform region that is vertically long in
Furthermore, as another configuration, in a case that the intra prediction mode is less than predDiag illustrated in
Note that, in a case that either one of the width W and the height H of the CU or TU is less than the threshold TH, a square region of 4*4, for example, is used as the input of the secondary transform region, and in other cases (both the width W and the height H of the CU or TU are equal to or greater than the threshold TH), the non-square region described above may be used as the input of the secondary transform region.
Block Shape Dependent Secondary TransformAs another example, the shape of the region that serves as the input of the secondary transform may be determined using a block shape. In a case that the block shape is vertically long (W<H), for example, a secondary transform region that is vertically long illustrated in
As another configuration, in a case that the block shape is vertically long (W<H), for example, a secondary transform region that is vertically long illustrated in
Note that, in a case that either one of the width W and the height H of the CU or TU is less than the threshold TH, a square region of 4*4, for example, is used as the input of the secondary transform region, and in other cases (both the width W and the height H of the CU or TU are equal to or greater than the threshold TH), the non-square region described above may be used as the input of the secondary transform region.
S1101, S1102, and S1108 are the same as S1101, S1102, and S1108 of
S1109, S1110, and S1116 are the same as S1109, S1110, and S1116 of
Note that in
Although
As described above, by using non-square M*N (M !=N) as a region of transform coefficients subjected to secondary transform, the amount of processing and the memory used can be reduced while suppressing reduction in the coding efficiency. Furthermore, by combining horizontally long transform M*N and vertically long transform N*M where M is greater than N (M>N), reduction in the coding efficiency is minimized even for a transform size smaller than M*M transform.
Embodiment 2In Embodiment 1, a technique has been described in which the amount of processing and the memory used are reduced while suppressing reduction in the coding efficiency, by setting the number of elements of secondary transform from M2*M2 to M2*M2/2. In Embodiment 2, a technique will be described in which the amount of processing and the memory used are reduced while suppressing reduction in the coding efficiency, by applying small size secondary transform such as M1*M1 without using M2*M2 secondary transform even in a case that the CU size is large. Here, the definitions of M1 and M2 are the same as in Embodiment 1.
The inputs SX1, SX2, SX3, and SX4 to the non-separable secondary transformer unit 703 are illustrated below.
Note that Formula 20 to Formula 22 are common arrays, where X, Xm,n are read in substitution with SX, SXm,n.
How many small blocks of the small blocks into which the M2*M2 block is divided is subjected to secondary transform can be determined in accordance with an acceptable amount of processing and memory size. In a case that the amount of processing and the memory size have room, secondary transform is performed on all small blocks, and in a case that the amount of processing and the memory size have almost no room, secondary transform is only performed on one small block. In this way, the number of small blocks subjected to secondary transform can be determined in accordance with the room of the amount of processing and the memory size. The number of small blocks subjected to secondary transform may be coded and notified to the image decoding device. Alternatively, in a case that the number is determined by reference to level information included in the coded data, it is not necessary to add a new syntax indicating the number.
The scanning direction and intra prediction mode of the transform coefficients are referred to determine whether to divide into rectangular small blocks or divide into non-rectangular small blocks, or which small blocks are subjected to secondary transform. For example, in a case that the scanning direction is the vertical or horizontal direction, the block is divided into rectangular small blocks, and in a case that the scanning direction is the diagonal direction, the block is divided into non-rectangular small blocks. Alternatively, in a case that the intra prediction mode is predHor−diff to predHor+diff or predVer−diff to predVer+diff illustrated in
The operation of the transform and quantization unit 103 and the inverse quantization and inverse transformer unit 311 (105) in a case that the M*M block subjected to secondary transform are divided into small size blocks and secondary transform or inverse secondary transform is performed for each block is substantially the same as the operation of the flowchart of
Operations of the sorting unit and the inverse secondary transformer unit in a case that the number L of small blocks subjected to secondary transform is 1, 3, 4 are described using the flowchart of
Operations of the sorting unit and the inverse secondary transformer unit in a case that the number L of small blocks subjected to secondary transform is 2 are described using the flowchart of
In Embodiment 2, the amount of processing and the memory used can be reduced while suppressing reduction in the coding efficiency, by applying multiple small size secondary transforms such as M1*M1 without using M2*M2 secondary transform even in a case that the CU size is large.
Modification 1In Embodiment 1 and Embodiment 2, secondary transform/inverse secondary transform, or input SX/SY′ shape (type) to secondary transform/inverse secondary transform is derived from the scanning direction and the intra prediction mode of the transform coefficients. In Modification 1, a technique will be described in which the type of transform is explicitly notified by using the index nIdx used for selecting the secondary transform.
As illustrated in
In the image encoding device, the coding parameter determination unit 110 derives an optimal secondary transform for the target CU from among them, and code nIdx to code and decode an image by using the secondary transform with the highest coding efficiency.
As described above, in Modification 1, the type of secondary transform applied to primary transform coefficients is coded with the index nIdx, and notified to the image decoding device, and thus, an image can be coded and decoded by using the secondary transform with the highest coding efficiency.
Embodiment 3The incentive of secondary transform introduction is to concentrate the energy by performing secondary transform on components that have not been able to efficiently concentrate the energy in primary transform. In Embodiment 3, since primary transform realized by separable transform is not optimal for the diagonal direction component, a technique in which secondary transform is performed only in a case that the diagonal direction components are important, that is, in a case that the intra prediction mode is the diagonal direction. In a case of an intra prediction other than the diagonal direction, the energy is not concentrated in the diagonal direction components originally, and therefore, the effect of secondary transform is small, and reduction in coding efficiency is small even in a case that secondary transform is not performed. Since secondary transform is not performed, the amount of processing can be reduced while suppressing reduction in the coding efficiency.
predBL<=iPred<predHor−diff∥predHor+diff<iPred<predVer−diff∥predVer+diff<iPred<=predUR (Formula 23),
secondary transform is performed. Where diff is a positive integer.
As described above, secondary transform is a transform applied to coefficients after the primary transform. In general, the primary transform is realized by separable transform, and is not optimal for diagonal direction components, and the energy is not efficiently concentrated. In such a case, more energy can be concentrated in specific components by performing secondary transform effective for components in the diagonal direction again on the primary transform coefficients.
Modification 2In Embodiment 3, an example of a technique for changing the option of secondary transform in accordance with the intra prediction mode has been described in which secondary transform is performed in a case that the intra prediction mode is the diagonal direction, and secondary transform is not performed otherwise. In Modification 2, a technique for increasing the option of secondary transform and suppressing reduction in the coding efficiency will be described.
In secondary transform, as illustrated in
In Modification 2, in a case that the intra prediction mode iPred is the horizontal direction (predHor−diff<=iPred<=predHor+diff), then secondary transform is selected from three of the transform suitable for the horizontal direction (nIdx=3), the transform suitable for the diagonal direction (nIdx=1), and transform off (nIdx=0). In a case that the intra prediction mode iPred is the vertical direction (predVer−diff<=iPred<=predVer+diff), then secondary transform is selected from three of the transform suitable for the vertical direction (nIdx=2), the transform suitable for the diagonal direction (nIdx=1), and transform off (nIdx=0). In other cases (planar prediction, DC prediction, diagonal direction prediction), secondary transform is selected from two of the transform suitable for the diagonal direction (nIdx=1) and transform off (nIdx=0). Where diff is a positive integer.
In this way, by reducing the number of nIdx that can be taken in accordance with the intra prediction mode, reduction in the coding efficiency can be suppressed while reducing the amount of processing.
Embodiment 4Embodiment 1 to 3 describe techniques for reducing the amount of processing of secondary transform, or the memory used. In Embodiment 4, a technique for reducing the amount of processing by switching whether or not primary transform is applied under certain conditions will be described.
Because primary transform often uses separable transform, the energy concentration is not favorable for components in the diagonal direction, and in secondary transform using non-separable transform, the energy concentration is favorable for components in the diagonal direction. On the other hand, as the transform size increases, primary transform using a separable filter for separating the input into the horizontal and vertical direction components to compute increases in the amount of processing and the memory usage, but is not a very large increase. However, in secondary transform using non-separable transform, as the size becomes two times larger, the amount of memory used becomes four times larger, and thus the amount of operation increases. Therefore, in a case that the transform size is small and the main components are in the diagonal direction, the amount of processing can be reduced by performing only secondary transform and not performing primary transform.
For example, the condition that the transform size is small and the main components are in the diagonal direction is the formula below:
(predBL<=iPred<predHor−diff∥predHor+diff<iPred<predVer−diff∥predVer+diff<iPred<=predUR)&&(W<=M1&& H<=M1) (Formula 24)
In a case that this is true, primary transform is not performed. Where iPred is an intra prediction mode, W and H are the width and height of the CU, M1 is a threshold for determining whether or not the CU size is small, and diff is a positive integer. For example, M1 is equal to 4.
Note that, in
As described above, in Embodiment 4, the amount of processing can be reduced while suppressing reduction in the coding efficiency by switching whether or not primary transform is applied under certain conditions.
An image encoding device according to one aspect of the present invention includes: a divider configured to divide a picture of the input video into a coding unit (CU) including multiple pixels; a transformer configured to perform predetermined transform with the CU as a unit and output transform coefficients;
a quantizer configured to quantize the transform coefficients and output quantization transform coefficients; and an encoder configured to perform variable-length coding on the quantization transform coefficients, wherein the transformer includes a first transformer, and a second transformer configured to perform transform on a part of first transform coefficients output from the first transformer, and the second transformer performs transform on at least any of the first transform coefficients for a region (first region) having different sizes in a horizontal direction and a vertical direction, or the first transform coefficients for a non-rectangular region (second region).
The second transformer of the image encoding device according to one aspect of the present invention further performs transform for a first region in combination with small size square transform.
The second transformer of the image encoding device according to one aspect of the present invention further selects transform for the first region or transform for the second region depending on an intra prediction mode or a scanning direction of transform coefficients.
The image encoding device according to one aspect of the present invention determines whether or not to perform the first transform depending on an intra prediction mode and an CU size.
An image decoding device according to one aspect of the present invention includes: a decoder configured to perform variable-length decoding on coded data with a coding unit (CU) including multiple pixels as a processing unit, and output quantization transform coefficients; an inverse quantizer configured to perform inverse quantization on quantization transform coefficients and output transform coefficients; and an inverse transformer configured to perform inverse transform on the transform coefficients, wherein the inverse transformer includes a second inverse transformer configured to perform inverse transform on at least a part of the transform coefficients and outputting second transform coefficients, and a first inverse transformer configured to perform inverse transform on a remainder of the transform coefficients and the second transform coefficients, and the second inverse transformer performs inverse transform on at least any of the transform coefficients for a region (first region) having different sizes in a horizontal direction and a vertical direction, or the transform coefficients for a non-rectangular region (second region).
The second inverse transformer of the image decoding device according to one aspect of the present invention further performs inverse transform for a first region in combination with small size square inverse transform.
The second inverse transforming unit of the image decoding device according to one aspect of the present invention further selects inverse transform for the first region or inverse transform for the second region depending on an intra prediction mode or a scanning direction of transform coefficients.
The image decoding device according to one aspect of the present invention determines whether or not to perform the first inverse transform depending on an intra prediction mode and an CU size.
Realization Examples by SoftwareNote that, part of the image encoding device 11 and the image decoding device 31 in the above-mentioned embodiments, for example, the entropy decoding unit 301, the prediction parameter decoding unit 302, the loop filter 305, the prediction image generation unit 308, the inverse quantization and inverse transformer unit 311, the addition unit 312, the prediction image generation unit 101, the subtraction unit 102, the transform and quantization unit 103, the entropy encoder unit 104, the inverse quantization and inverse transformer unit 105, the loop filter 107, the coding parameter determination unit 110, and the prediction parameter encoder unit 111, may be realized by a computer. In that case, this configuration may be realized by recording a program for realizing such control functions on a computer-readable recording medium and causing a computer system to read the program recorded on the recording medium for execution. Note that it is assumed that the “computer system” mentioned here refers to a computer system built into either the image encoding device 11 or the image decoding device 31, and the computer system includes an OS and hardware components such as peripheral devices. Furthermore, the “computer-readable recording medium” refers to a portable medium such as a flexible disk, a magneto-optical disk, a ROM, a CD-ROM, and the like, and a storage device such as a hard disk built into the computer system. Moreover, the “computer-readable recording medium” may include a medium that dynamically retains a program for a short period of time, such as a communication line that is used to transmit the program over a network such as the Internet or over a communication line such as a telephone line, and may also include a medium that retains a program for a fixed period of time, such as a volatile memory within the computer system for functioning as a server or a client in such a case. Furthermore, the program may be configured to realize some of the functions described above, and also may be configured to be capable of realizing the functions described above in combination with a program already recorded in the computer system.
Part or all of the image encoding device 11 and the image decoding device 31 in the embodiments described above may be realized as an integrated circuit such as a Large Scale Integration (LSI). Each function block of the image encoding device 11 and the image decoding device 31 may be individually realized as processors, or part or all may be integrated into processors. The circuit integration technique is not limited to LSI, and the integrated circuits for the functional blocks may be realized as dedicated circuits or a multi-purpose processor. In a case that with advances in semiconductor technology, a circuit integration technology with which an LSI is replaced appears, an integrated circuit based on the technology may be used.
Application ExamplesThe above-mentioned image encoding device 11 and the image decoding device 31 can be utilized being installed to various devices performing transmission, reception, recording, and regeneration of videos. Note that, videos may be natural videos imaged by cameras or the like, or may be artificial videos (including CG and GUI) generated by computers or the like.
At first, referring to
(a) of
The transmission device PROD_A may further include a camera PROD_A4 imaging videos, a recording medium PROD_A5 recording videos, an input terminal PROD_A6 to input videos from the outside, and an image processing unit A7 which generates or processes images, as supply sources of videos input into the encoder unit PROD_A1. In (a) of
Note that the recording medium PROD_A5 may record videos which are not coded, or may record videos coded in a coding scheme for recording different than a coding scheme for transmission. In the latter case, a decoding unit (not illustrated) to decode coded data read from the recording medium PROD_A5 according to a coding scheme for recording may be interleaved between the recording medium PROD_A5 and the encoder unit PROD_A1.
(b) of
The reception device PROD_B may further include a display PROD_B4 displaying videos, a recording medium PROD_B5 to record videos, and an output terminal PROD_B6 to output videos outside, as supply destinations of videos output by the decoding unit PROD_B3. In (b) of
Note that the recording medium PROD_B5 may record videos which are not coded, or may record videos which are coded in a coding scheme for recording different from a coding scheme for transmission. In the latter case, an encoder unit (not illustrated) to code videos acquired from the decoding unit PROD_B3 according to a coding scheme for recording may be interleaved between the decoding unit PROD_B3 and the recording medium PROD_B5.
Note that the transmission medium for transmitting modulating signals may be wireless or may be wired. The transmission aspect to transmit modulating signals may be broadcasting (here, referred to as the transmission aspect where the transmission target is not specified beforehand) or may be telecommunication (here, referred to as the transmission aspect that the transmission target is specified beforehand). Thus, the transmission of the modulating signals may be realized by any of radio broadcasting, cable broadcasting, radio communication, and cable communication.
For example, broadcasting stations (broadcasting equipment, and the like)/receiving stations (television receivers, and the like) of digital terrestrial television broadcasting is an example of the transmission device PROD_A/reception device PROD_B for transmitting and/or receiving modulating signals in radio broadcasting. Broadcasting stations (broadcasting equipment, and the like)/receiving stations (television receivers, and the like) of cable television broadcasting are an example of the transmission device PROD_A/reception device PROD_B for transmitting and/or receiving modulating signals in cable broadcasting.
Servers (work stations, and the like)/clients (television receivers, personal computers, smartphones, and the like) for Video On Demand (VOD) services, video hosting services using the Internet and the like are an example of the transmission device PROD_A/reception device PROD_B for transmitting and/or receiving modulating signals in telecommunication (usually, any of radio or cable is used as a transmission medium in the LAN, and cable is used as a transmission medium in the WAN). Here, personal computers include a desktop type PC, a laptop type PC, and a tablet type PC. Smartphones also include a multifunctional portable telephone terminal.
Note that a client of a video hosting service has a function to code videos imaged with a camera and upload them to a server, in addition to a function to decode coded data downloaded from a server and to display it on a display. Thus, a client of a video hosting service functions as both the transmission device PROD_A and the reception device PROD_B.
Next, referring to
(a) of
Note that the recording medium PROD_M may be (1) a type built in the recording device PROD_C such as Hard Disk Drive (HDD) or Solid State Drive (SSD), may be (2) a type connected to the recording device PROD_C such as an SD memory card or a Universal Serial Bus (USB) flash memory, and may be (3) a type loaded in a drive device (not illustrated) built in the recording device PROD_C such as Digital Versatile Disc (DVD) or Blu-ray Disc (BD (trade name)).
The recording device PROD_C may further include a camera PROD_C3 imaging videos, an input terminal PROD_C4 to input videos from the outside, a receiver PROD_C5 to receive videos, and an image processing unit PROD_C6 which generates or processes images, as supply sources of videos input into the encoder unit PROD_C1. In (a) of
Note that the receiver PROD_C5 may receive videos which are not coded, or may receive coded data coded in a coding scheme for transmission different from a coding scheme for recording. In the latter case, a decoding unit (not illustrated) for transmission to decode coded data coded in a coding scheme for transmission may be interleaved between the receiver PROD C5 and the encoder unit PROD C1.
Examples of such recording device PROD_C include a DVD recorder, a BD recorder, a Hard Disk Drive (HDD) recorder, and the like (in this case, the input terminal PROD_C4 or the receiver PROD_C5 is the main supply source of videos). A camcorder (in this case, the camera PROD_C3 is the main supply source of videos), a personal computer (in this case, the receiver PROD_C5 or the image processing unit C6 is the main supply source of videos), a smartphone (in this case, the camera PROD_C3 or the receiver PROD_C5 is the main supply source of videos), or the like is an example of such recording device PROD_C.
(b) of
Note that the recording medium PROD_M may be (1) a type built in the regeneration device PROD_D such as HDD or SSD, may be (2) a type connected to the regeneration device PROD_D such as an SD memory card or a USB flash memory, and may be (3) a type loaded in a drive device (not illustrated) built in the regeneration device PROD_D such as DVD or BD.
The regeneration device PROD_D may further include a display PROD_D3 displaying videos, an output terminal PROD_D4 to output videos to the outside, and a transmitter PROD_D5 which transmits videos, as supply destinations of videos output by the decoding unit PROD_D2. In (b) of
Note that the transmitter PROD_D5 may transmit videos which are not coded, or may transmit coded data coded in a coding scheme for transmission different than a coding scheme for recording. In the latter case, an encoder unit (not illustrated) to code videos in a coding scheme for transmission may be interleaved between the decoding unit PROD_D2 and the transmitter PROD_D5.
Examples of such regeneration device PROD_D include a DVD player, a BD player, an HDD player, and the like (in this case, the output terminal PROD_D4 to which a television receiver, and the like is connected is the main supply destination of videos). A television receiver (in this case, the display PROD_D3 is the main supply destination of videos), a digital signage (also referred to as an electronic signboard or an electronic bulletin board, and the like, the display PROD_D3 or the transmitter PROD_D5 is the main supply destination of videos), a desktop type PC (in this case, the output terminal PROD_D4 or the transmitter PROD_D5 is the main supply destination of videos), a laptop type or tablet type PC (in this case, the display PROD_D3 or the transmitter PROD_D5 is the main supply destination of videos), a smartphone (in this case, the display PROD_D3 or the transmitter PROD_D5 is the main supply destination of videos), or the like is an example of such regeneration device PROD_D.
Realization as Hardware and Realization as SoftwareEach block of the above-mentioned image decoding device 31 and the image encoding device 11 may be realized as hardware by a logical circuit formed on an integrated circuit (IC chip), or may be realized as software using a Central Processing Unit (CPU).
In the latter case, each device includes a CPU performing a command of a program to implement each function, a Read Only Memory (ROM) stored in the program, a Random Access Memory (RAM) developing the program, and a storage device (recording medium) such as a memory storing the program and various data, and the like. The purpose of the embodiments of the present invention can be achieved by supplying, to each of the above described devices, a recording medium recording readably program codes (execution format program, intermediate code program, source program) of a control program of each of the above described devices which is software implementing the above-mentioned functions with a computer, and by the computer reading and performing the program codes (or a CPU or an MPU) recorded in the recording medium.
For example, as the recording medium, a tape such as a magnetic tape or a cassette tape, a disc including a magnetic disc such as a floppy (trade name) disk/a hard disk and an optical disc such as a Compact Disc Read-Only Memory (CD-ROM)/Magneto-Optical disc (MO disc)/Mini Disc (MD)/Digital Versatile Disc (DVD)/CD Recordable (CD-R)/Blu-ray Disc (trade name), a card such as an IC card (including a memory card)/an optical card, a semiconductor memory such as a mask ROM/Erasable Programmable Read-Only Memory (EPROM)/Electrically Erasable and Programmable Read-Only Memory (EEPROM (trade name))/a flash ROM, or a Logical circuits such as a Programmable logic device (PLD) or a Field Programmable Gate Array (FPGA) can be used.
Each of the above described devices is configured connectably with a communication network, and the program code may be supplied through the communication network. This communication network may be able to transmit program codes, and is not specifically limited. For example, the Internet, the intranet, the extranet, Local Area Network (LAN), Integrated Services Digital Network (ISDN), Value-Added Network (VAN), a Community Antenna television/Cable Television (CATV) communication network, Virtual Private Network, telephone network, a mobile communication network, satellite communication network, and the like are available. A transmission medium constituting this communication network may also be a medium which can transmit a program code, and is not limited to a particular configuration or a type. For example, a cable communication such as Institute of Electrical and Electronic Engineers (IEEE) 1394, a USB, a power line carrier, a cable TV line, a phone line, an Asymmetric Digital Subscriber Line (ADSL) line, and a radio communication such as infrared ray such as Infrared Data Association (IrDA) or a remote control, BlueTooth (trade name), IEEE 802.11 radio communication, High Data Rate (HDR), Near Field Communication (NFC), Digital Living Network Alliance (DLNA: trade name), a cellular telephone network, a satellite channel, a terrestrial digital broadcast network are available. Note that the embodiments of the present invention can be also realized in the form of computer data signals embedded in a carrier wave where the program code is embodied by electronic transmission.
The embodiments of the present invention are not limited to the above-mentioned embodiments, and various modifications are possible within the scope of the claims. Thus, embodiments obtained by combining technical elements modified appropriately within the scope defined by claims are included in the technical scope of the present invention.
CROSS-REFERENCE OF RELATED APPLICATIONThis application claims the benefit of priority to JP 2017-089788 filed on Apr. 28, 2017, which is incorporated herein by reference in its entirety.
INDUSTRIAL APPLICABILITYThe embodiments of the present invention can be preferably applied to an image decoding device to decode coded data where image data is coded, and an image encoding device to generate coded data where image data is coded. The embodiments of the present invention can be preferably applied to a data structure of coded data generated by the image encoding device and referred to by the image decoding device.
REFERENCE SIGNS LIST
- 10 CT information decoding unit
- 11 Image encoding device
- 20 CU decoding unit
- 31 Image decoding device
- 41 Image display device
Claims
1. A video encoding device for coding an input video, the video encoding device comprising:
- a divider configured to divide a picture of the input video into a coding unit (CU) including multiple pixels;
- a transformer configured to perform predetermined transform with the CU as a unit and output transform coefficients;
- a quantizer configured to quantize the transform coefficients and output quantization transform coefficients; and
- an encoder configured to perform variable-length coding on the quantization transform coefficients, wherein
- the transformer includes
- a first transformer, and
- a second transformer configured to perform transform on a part of first transform coefficients output from the first transformer, and
- the second transformer performs transform on at least any of the first transform coefficients for a region (first region) having different sizes in a horizontal direction and a vertical direction, or the first transform coefficients for a non-rectangular region (second region).
2. The video encoding device according to claim 1, wherein
- the second transformer performs transform for a first region in combination with small size square transform.
3. The video encoding device according to claim 1, wherein
- the second transformer selects transform for the first region or transform for the second region depending on an intra prediction mode or a scanning direction of transform coefficients.
4. A video decoding device for decoding a video, the video decoding device comprising:
- a decoder configured to perform variable-length decoding on coded data with a coding unit (CU) including multiple pixels as a processing unit, and output quantization transform coefficients;
- an inverse quantizer configured to perform inverse quantization on quantization transform coefficients and output transform coefficients; and
- an inverse transformer configured to perform inverse transform on the transform coefficients, wherein
- the inverse transformer includes
- a second inverse transformer configured to perform inverse transform on at least a part of the transform coefficients and outputting second transform coefficients, and
- a first inverse transformer configured to perform inverse transform on a remainder of the transform coefficients and the second transform coefficients, and
- the second inverse transformer performs inverse transform on at least any of the transform coefficients for a region (first region) having different sizes in a horizontal direction and a vertical direction, or the transform coefficients for a non-rectangular region (second region).
5. The video decoding device according to claim 4, wherein
- the second inverse transformer performs inverse transform for a first region in combination with small size square inverse transform.
6. The video decoding device according to claim 4, wherein
- the second inverse transformer selects inverse transform for the first region or inverse transform for the second region depending on an intra prediction mode or a scanning direction of transform coefficients.
Type: Application
Filed: Apr 23, 2018
Publication Date: Feb 13, 2020
Inventors: Tomoko AONO (Sakai City), Tomohiro IKAI (Sakai City)
Application Number: 16/607,765