Method and Apparatus of Video Encoding with Partitioned Bitstream
A method and apparatus for video encoding to generate a partitioned bitstream without buffering transform coefficient and/or prediction data for subsequent coding units are disclosed. An encoder incorporating an embodiment according to the present invention receives first video parameters associated with a current coding unit, wherein no first video parameters associated with subsequent coding units are buffered. The encoder then encodes the first video parameters to generate a current first compressed data corresponding to the current coding unit. A first memory address in the first logic unit is determined and the encoder provides the current first compressed data at the first memory address in the first logic unit.
Latest MEDIATEK INC. Patents:
- Low power quadrature phase detector
- Efficient preamble design and modulation schemes for wake-up packets in WLAN with wake-up radio receivers
- Dynamic cache resource allocation for quality of service and system power reduction
- Automatic dolly zoom image processing device
- Automatic dolly zoom image processing device
The present invention relates to video encoding system. In particular, the present invention relates to system architecture of a video encoder generating a bitstream with partitioned structure.
BACKGROUNDMotion compensated inter-frame coding has been widely adopted in various coding standards, such as MPEG-1/2/4 and H.261/H.263/H.264(AVC). VP8 is a recent motion compensated video codec (encoder-decoder) being adapted for some software, hardware, platform and publish environments. The VP8 coding algorithm is similar to H.264 Simple Profile. However, VP8 is tailored to simplify encoding and decoding complexity while delivering about the same performance as the H.264 Simple Profile. One of the VP8 codec features is that the bitstream format is suited for parallel decoding to take advantage of the trend of multiple-core processors in the consumer electronic environment or multiple-cores CPU in the personal computer environment. In order to support parallel decoding, the VP8 bitstream partitions the compressed data into two categories, where the category I partition includes coding modes (mb_mode, sub_mb_mode, mb_skip, etc.), reference index, intra prediction mode, QP information, filter parameter, motion vectors of macroblock, etc. in the frame, and the category II partition includes quantized transform coefficients of residues for the macroblock. The partition associated with the category II, i.e., transform coefficients, can be packed into more than one partition on the basis of row of macroblocks, and a partition token may be used to indicate the association between a macroblock row and one of the packed category II partitions. Since the information associated with the transform coefficients are packed after the prediction data for the whole frame are packed, a conventional encoder system may have to store the transform coefficients for the whole frame. Accordingly, it is desirable to develop an encoder system that provides the partitioned bitstream without the need to store the transform coefficients for the whole frame.
BRIEF SUMMARY OF THE INVENTIONA method and apparatus for video encoding to generate partitioned bitstream are disclosed, where partitioned bitstream comprising a first logic unit and a second logic unit, the first logic unit comprises first compressed data corresponding to coding units of a picture, the second logic unit comprises second compressed data corresponding to the coding unit, each of the first compressed data and each of the second compressed data are able to reconstructed one of the coding units. In one embodiment according to the present invention, the method and apparatus of video encoding to generate partitioned bitstream comprises receiving first video parameters associated with a current coding unit, wherein no first video parameters associated with subsequent coding units are buffered; encoding the first video parameters to generate a current first compressed data corresponding to the current coding unit; determining first memory address in the first logic unit; and providing the current first compressed data at the first memory address in the first logic unit. The coding unit may be configured as a macroblock. Said encoding the first video parameters may utilize a first entropy coder to generate the current first compressed data utilizes a first entropy coder. The first memory address can be computed according to data size of a previous first compressed data and a previous first memory address. The first video parameters associated with the current coding unit comprise quantized transform coefficients associated with the current coding unit or prediction residues of the current coding unit.
Similar processing may also be applied to generate the second logic unit. Accordingly, another embodiment according to the present invention comprises receiving second video parameters associated with the current coding unit, wherein no second video parameters associated with the subsequent coding units are buffered; encoding the second video parameters to generate a current second compressed data corresponding to the current coding unit; determining second memory address in the second logic unit; and providing the current second compressed data at the second memory address in the second logic unit. The second video parameters associated with the current coding unit comprise prediction data associated with the current coding unit. Said encoding the first video parameters to generate the current first compressed data may utilize a first entropy coder, and wherein said encoding the second video parameters to generate the current second compressed data utilizes the first entropy coder or a second entropy coder. The second memory address in the second logic unit can be computed according to data size of current previous second compressed data and a previous second memory address.
In yet another embodiment according to the present invention, the first logic unit may be further partitioned into multiple sub-logic units, where each sub-logic unit corresponds to first compressed data associated with the coding units associated with a region of the picture. The region of the picture may be may be a row of coding units or multiple rows of coding units. The multiple sub-logic units may share a same entropy coder or use multiple entropy coders in parallel.
The use of coefficient memory 358 and prediction data memory 360 allows the coefficients and prediction data of a whole frame to be buffered temporarily. Since the coefficients and prediction data of a frame are buffered, the entropy coder 354 can process the coefficients and prediction data in the order as specified in bitstream format. For example, the bitstream for Partition I can be generated by applying entropy coder 354 to the modes/motion vectors stored in the prediction data memory 360 on a coding unit by coding unit basis in a raster scan order and the compressed data is written sequentially into a space in the bitstream buffer 356. The compressed data for the prediction data are written into Partition I data area in the bitstream buffer 356. The Partition I data area is termed as a Partition I logic unit in this disclosure. To store the compressed data of the prediction data associated with a current coding unit properly, an address has to be determined. If the compressed data are written sequentially, a current address may be determined according to a previous address and data size of compressed data for the previous coding unit. The use of transform coefficient memory 358 and prediction data memory 360 provides some convenience for the system design since the compressed data corresponding to prediction data associated with coding units of the frame can be stored to Partition I area of bit-stream buffer 356 one coding unit after another coding unit. Similarly, the compressed data corresponding to quantized transform coefficients associated with coding units of the frame can be stored in Partition II areas of bit-stream buffer 356 one coding unit after another coding unit. However, transform coefficient memory 358 and prediction data memory 360 are quite large since these data are in an un-compressed format and these data are buffered for a whole frame. In addition, an encoder system always needs frame buffer to store a previous reconstructed frame and a current reconstructed frame in order to perform Intra/Inter prediction. The buffer size associated with the transform coefficients may be larger than the original frame since transform coefficients may require more bits than the original image data. For example, the original data is often in 8 bit resolution and the transform coefficients may require 12 bits or more for sufficient accuracy. Therefore, it is desirable to design an encoding system that can support the partitioned bitstream as mentioned above without the need for the transform coefficient memory and/or prediction data memory.
The complete Partition I bitstream can be generated except for the partition sizes, IIA-PS, IIB-PS, IIC-PS, etc. at the end of Partition I. The entropy coder 354 can then be switched to process transform coefficients according to the Partition II order. In the example of four token partitions, the entropy coder 354 will process macroblock rows 0, 4, 8, etc. in a sequential order and write the results in a second space of the bitstream buffer 356. The portions of the frame associated with Partitions IIA, IIB, IIC, . . . , etc. are called picture partitions in this disclosure. Accordingly, Partition IIA refers to macroblock rows 0, 4, 8, etc. When entropy encoding of rows designated for Partition IIA is complete, the partition size is known and the partition size can be written into IIA-PS at the end of Partition I. Upon the completion of entropy coding of Partition IIA, the entropy coding and bitstream generation proceeds to process the next partition until all partitions are processed.
In order to overcome the need for the transform coefficient memory and/or prediction memory, the transform coefficients and/or mode/motion vector for each macroblock have to be processed on the fly, i.e., the transform coefficients and/or mode/motion vector for each macroblock are processed in substantially real time to generate corresponding bitstream. Upon completing the current macroblock processing, the storage of transform coefficients and/or mode/motion vector for each macroblock can be disregarded, such as deleted or over-written. In the example of on-the-fly processing of transform coefficients, a macroblock coefficient buffer can be used to hold transform coefficients of a current block. The transform coefficients of the current block are processed by the entropy coder 354 in real time and are written into a data area allocated for the partition associated with the current macroblock in the bitstream buffer. The data area allocated for the partition associated with the current macroblock is associated with a data address where the bitstream for the current macroblock in the partition is written to the bitstream buffer indicated by the data address. The data address is initialized for the first macroblock in the partition. After the current macroblock is processed, the size of corresponding bitstream is known and the data address for the next macroblock in the partition can be calculated. The size of bitstream for each macroblock may be tracked using a counter. In order to support real time processing, the entropy coding for current macroblock should be completed before the data is over-written by the next coding unit start. While a single MB coefficient buffer may be sufficient, two MB coefficient buffers may also be used to hold transform coefficients for both the current macroblock and the next macroblock to simplify design.
As mentioned before, the coefficient partition may be further partitioned into multiple partitions to allow concurrent processing of multiple macroblock rows.
An example to practice the present invention is illustrated in
Embodiment of video encoding systems to generate partitioned bitstream according to the present invention as described above may be implemented in various hardware, software codes, or a combination of both. For example, an embodiment of the present invention can be multiple processor circuits integrated into a video compression chip or program codes integrated into video compression software to perform the processing described herein. An embodiment of the present invention may also be program codes to be executed on a computer CPU having multiple CPU cores or Digital Signal Processor (DSP) to perform the processing described herein. The invention may also involve a number of functions to be performed by a computer processor, a digital signal processor, a microprocessor, or field programmable gate array (FPGA). These processors can be configured to perform particular tasks according to the invention, by executing machine-readable software code or firmware code that defines the particular methods embodied by the invention. The software code or firmware codes may be developed in different programming languages and different format or style. The software code may also be compiled for different target platform. However, different code formats, styles and languages of software codes and other means of configuring code to perform the tasks in accordance with the invention will not depart from the spirit and scope of the invention.
The invention may be embodied in other specific forms without departing from its spirit or essential characteristics. The described examples are to be considered in all respects only as illustrative and not restrictive. The scope of the invention is, therefore, indicated by the appended claims rather than by the foregoing description. All changes which come within the meaning and range of equivalency of the claims are to be embraced within their scope.
Claims
1. A method of video encoding to generate partitioned bitstream comprising a first logic unit and a second logic unit, the first logic unit comprises first compressed data corresponding to coding units of a picture, the second logic unit comprises second compressed data corresponding to the coding unit, each of the first compressed data and each of the second compressed data are able to reconstruct one of the coding units, the method comprising:
- receiving first video parameters associated with a current coding unit, wherein not all first video parameters associated with all subsequent coding units for the picture are buffered;
- encoding the first video parameters to generate a current first compressed data corresponding to the current coding unit;
- determining first memory address in the first logic unit; and
- providing the current first compressed data at the first memory address in the first logic unit.
2. The method of claim 1, wherein the coding unit is configured as a macroblock.
3. The method of claim 1, wherein said encoding the first video parameters to generate the current first compressed data utilizes a first entropy coder.
4. The method of claim 1, wherein said determining the first memory address in the first logic unit is based on data size of the current first compressed data and a previous first memory address.
5. The method of claim 1, wherein the first video parameters associated with the current coding unit comprise quantized transform coefficients associated with the current coding unit or prediction residues of the current coding unit.
6. The method of claim 5, wherein the first logic unit is further partitioned into a plurality of sub-logic units, where each of the plurality of sub-logic units corresponds to the first compressed data of the coding units associated with a region of the picture.
7. The method of claim 6, wherein the region of the picture consists of one or more rows of the coding units.
8. The method of claim 6, wherein said encoding the first video parameters to generate the current first compressed data utilizes a first entropy coder or a plurality of first entropy coders.
9. The method of claim 6, further comprising determining a sub-memory address in the first logic unit for said each of the plurality of sub-logic units.
10. The method of claim 9, wherein said determining the sub-memory address in the first logic unit is based on data size of the first compressed data for a current sub-logic units and a previous sub-memory address.
11. The method of claim 1, further comprising:
- receiving second video parameters associated with the current coding unit, wherein not all second video parameters associated with all subsequent coding units for the picture are buffered;
- encoding the second video parameters to generate a current second compressed data corresponding to the current coding unit;
- determining second memory address in the second logic unit; and
- providing the current second compressed data at the second memory address in the second logic unit.
12. The method of claim 11, wherein the second video parameters associated with the current coding unit comprise prediction data associated with the current coding unit.
13. The method of claim 11, wherein said encoding the first video parameters to generate the current first compressed data utilizes a first entropy coder, and wherein said encoding the second video parameters to generate the current second compressed data utilizes the first entropy coder or a second entropy coder.
14. The method of claim 11, wherein said determining the second memory address in the second logic unit is based on data size of the current second compressed data and a previous second memory address.
15. An apparatus for video encoding to generate partitioned bitstream comprising a first logic unit and a second logic unit, the first logic unit comprises first compressed data corresponding to coding units of a picture, the second logic unit comprises second compressed data corresponding to the coding unit, each of the first compressed data and each of the second compressed data are able to reconstructed one of the coding units, the apparatus comprising:
- means for receiving first video parameters associated with a current coding unit, wherein not all first video parameters associated with all subsequent coding units for the picture are buffered;
- means for encoding the first video parameters to generate a current first compressed data corresponding to the current coding unit;
- means for determining first memory address in the first logic unit; and
- means for providing the current first compressed data at the first memory address in the first logic unit.
16. The apparatus of claim 15, wherein the coding unit is configured as a macroblock.
17. The apparatus of claim 15, wherein said means for encoding the first video parameters to generate the current first compressed data utilizes a first entropy coder.
18. The apparatus of claim 15, wherein said means for determining the first memory address in the first logic unit is based on data size of current previous first compressed data and a previous first memory address.
19. The apparatus of claim 15, wherein the first video parameters associated with the current coding unit comprise quantized transform coefficients associated with the current coding unit or prediction residues of the current coding unit.
20. The apparatus of claim 19, wherein the first logic unit is further partitioned into a plurality of sub-logic units, where each of the plurality of sub-logic units corresponds to the first compressed data of the coding units associated with a region of the picture.
21. The apparatus of claim 20, wherein the region of the picture consists of one or more rows of the coding units.
22. The apparatus of claim 20, wherein said means for encoding the first video parameters to generate the current first compressed data utilizes a first entropy coder or a plurality of first entropy coders.
23. The apparatus of claim 20, further comprising determining a sub-memory address in the first logic unit for said each of the plurality of sub-logic units.
24. The apparatus of claim 23, wherein said determining the sub-memory address in the first logic unit is based on data size of the first compressed data for a current sub-logic units and a previous sub-memory address.
25. The apparatus of claim 15, further comprising:
- receiving second video parameters associated with the current coding unit, wherein not all second video parameters associated with all subsequent coding units for the picture are buffered;
- encoding the second video parameters to generate a current second compressed data corresponding to the current coding unit;
- determining second memory address in the second logic unit; and
- providing the current second compressed data at the second memory address in the second logic unit.
26. The apparatus of claim 25, wherein the second video parameters associated with the current coding unit comprise prediction data associated with the current coding unit.
27. The apparatus of claim 25, wherein said encoding the first video parameters to generate the current first compressed data utilizes a first entropy coder, and wherein said means for encoding the second video parameters to generate the current second compressed data utilizes the first entropy coder or a second entropy coder.
28. The apparatus of claim 25, wherein said determining the second memory address in the second logic unit is based on data size of the current second compressed data and a previous second memory address.
Type: Application
Filed: Nov 14, 2011
Publication Date: May 16, 2013
Applicant: MEDIATEK INC. (Hsinchu)
Inventors: Yung-Chang Chang (New Taipei), Chi-Cheng Ju (Hsinchu), Yi-Hau Chen (Taipei), De-Yuan Shen (Taipei)
Application Number: 13/295,956
International Classification: H04N 7/32 (20060101);