METHOD AND SYSTEM FOR PROVIDING ARITHMETIC CODE NORMALIZATION AND BYTE CONSTRUCTION
A method and system are provided for code normalization and byte construction. A plurality of subsets of bits is extracted from a first input. Each of the subsets of bits has a bit width equaling a number of leading zeros from a second input variable. Further, a consecutive sequence of the plurality of subsets is stored in a memory. In addition, the consecutive sequence of the plurality of subsets is read from the memory if a third input release flag is established.
Latest GENERAL INSTRUMENT CORPORATION Patents:
1. Field
This disclosure generally relates to the field of video data processing. More particularly, the disclosure relates to Context Adaptive Binary Arithmetic Coding (“CABAC”) for digital video encoders.
2. General Background
Video signals generally include data corresponding to one or more video frames. Each video frame is composed of an array of picture elements, which are called pixels. A typical color video frame having a standard resolution may be composed of over several hundreds of thousands of pixels, which are arranged in arrays of blocks. Each pixel is characterized by pixel data indicative of a hue (predominant color), saturation (color intensity), and luminance (color brightness). The hue and saturation characteristics may be referred to as the chrominance. Accordingly, the pixel data includes chrominance and luminance. Therefore, the pixel data may be represented by groups of four luminance pixel blocks and two chrominance pixel blocks. These groups are called macroblocks (“MBs”). As a video frame generally includes many pixels, the video frame also includes a large number of MBs. Thus, digital signals representing a sequence of video frame data, which usually include many video frames, have a large number of bits. However, the available storage space and bandwidth for transmitting these digital signals is limited. Therefore, compression processes are used to more efficiently transmit or store video data.
Compression of digital video signals for transmission or for storage has become widely practiced in a variety of contexts. For example, multimedia environments for video conferencing, video games, Internet image transmissions, digital TV, and the like utilize compression. Coding and decoding are accomplished with coding processors. Examples of such coding processors include general computers, special hardware, multimedia boards, or other suitable processing devices. Further, the coding processors may utilize one of a variety of coding techniques, such as variable length coding (“VLC”), fixed coding, Huffman coding, blocks of symbols coding, and arithmetic coding. An example of arithmetic coding is Context Adaptive Binary Arithmetic Coding (“CABAC”).
CABAC techniques are capable of losslessly compressing syntax elements in a video stream using the probabilities of syntax elements in a given context. The CABAC process will take in syntax elements representing all elements within a macroblock. Further, the CABAC process constructs a compress bit sequence by building out the following structure: the sequential set of fields for the macroblock based on the chosen macroblock configuration, the specific syntax element type and value for each of the fields within this field sequence, and the context address for each of the syntax elements. The CABAC process will then perform binarization of the syntax elements, update the context weights, arithmetically encode the binarizations of syntax elements (“bins”), and subsequently pack the bits into bytes through the syntax element processing component.
The components of the CABAC process include: the CABAC weight initialization mode selection module, the macroblock syntax sequence generator, the binarization engine, the context address generator, the context weight update engine, the arithmetic coder, the bit packetizer, and the Network Abstraction Layer (“NAL”) header generator. The CABAC engine within a video encoder may accomplish two goals within the encoding process: (1) to carry out compressed data resource prediction for mode decision purposes; and (2) to losslessly compress the data for signal output delivery. The compressed data resource prediction task predicts the amount of bits required given a set of specific encoding modes for a given macroblock. Potential mode decision implementations may have up to eight modes to select from. The computational demand on the CABAC engine to support the mode decision task is significant.
The weight update, arithmetic encoder and the bit packing components of the CABAC engine may require a significant amount of non-trivial computational and processing resources in a sequential processor implementation. Given that high performance encoding systems require multiple macro block rate distortion iterations of encoding per macro block, the CABAC process may impose an unreasonable resource demand on a processor-based solution. Prior implementations typically compromise on mode decision CABAC resource estimation accuracy by limiting the CABAC to bin level accuracy.
A system capable of processing one binary symbol per clock cycle requires a matching back end-receiving engine capable of also processing the results on every cycle. The back end tasks consist of a value normalization task, which may generate up to eight bits of data, and a bit packing task, which groups the bits into bytes. The implementation solutions for the normalization and bit packing tasks are complex and computationally demanding.
Current implementations of the normalization function for the CABAC arithmetic coder fall into two categories. The first category includes routines that can generate at most one bit per cycle. This approach may utilize up to eight cycles to process one binary symbol as a single binary symbol may generate up to eight bits. The second category includes routines that achieve single cycle per binary symbol using a method that does not optimally handle all cases of the carry from the input data and the adder.
SUMMARYIn one aspect of the disclosure, a process extracts a plurality of subsets of bits from a first input. Each of the subsets of bits has a bit width equaling a number of leading zeros from a second input variable. Further, the process stores, in a memory, a consecutive sequence of the plurality of subsets. In addition, the process reads the consecutive sequence of the plurality of subsets from the memory if a third input release flag is established.
In another aspect, a process stores a consecutive set of variable bit width data into a first in first out buffer. The variable bit width data has a width that is determined by a number of leading zeroes from an input variable. Further, the process reads the data from the first in first out buffer if the receiving data contains only ones.
In yet another aspect, a process stores a consecutive set of data from a first input variable into a memory. Further, the process receives a subsequent data set from the first input variable. In addition, the process reads the consecutive set of data from the memory if the subsequent data set includes one or more binary bits having a value of zero.
The above-mentioned features of the present disclosure will become more apparent with reference to the following description taken in conjunction with the accompanying drawings wherein like reference numerals denote like elements and in which:
A method and system are disclosed, which provide an improved video digital data compression capable of providing a single cycle normalization for real-time digital video encoders, such as an MPEG-4 or an H-264 series encoder. The method and system may be utilized by the back end processor within the arithmetic encoder. As a result, normalization and payload to byte packing may be accomplished.
At a decision block 214, the arithmetic coder normalization process 200 determines if the contents of the variable payload[t], i.e., the bits, include only ones or both ones and zeroes. If the variable payload(t) includes both ones and zeroes, the arithmetic coder normalization process 200 proceeds to a process block 216. At the process block 216, the arithmetic coder normalization process begins with the first entry of the payload array. A carry is added to the first entry in the payload array. The payload is then outputted without the resulting carry. The arithmetic coder normalization process 200 then adds the carry from the addition of the first entry in the payload array to the second entry in the payload array. The payload is then outputted without the resulting carry. The arithmetic coder normalization process 200 works through the entries payload array in a similar manner until the entry in payload(t-1) is processed. The iterations through these entries in the payload array may be denoted by the following code: for (i=0; i<t; i++) {payload[i]+=carry; Output(payload[i]}. Once the entry in payload[t-1] is processed, the arithmetic coder normalization process 200 proceeds to a process block 218 where the most recent payload is moved to the base of the array, which may be denoted by payload[0]=payload[t]. The arithmetic coder normalization process 200 then proceeds to a process block 220 to reset the payload array by setting the variable t to zero. The arithmetic coder normalization process 200 then ends at a process block 230.
If the arithmetic coder normalization process 200 determines, at the decision block 212, that the contents of the variable payload[t] include only ones, the arithmetic coder normalization process proceeds from the decision block 212 to the process block 222. At the process block 222, the carry bit is examined. The arithmetic coder normalization process 200 then proceeds to a decision block 224 to determine if the input carry bit equals one. If the arithmetic coder normalization process 200 determines that the input carry bit equals one, the arithmetic coder normalization process 200 proceeds to a process block 226. At the process block 226, the arithmetic coder normalization process 200 outputs all payload entries from index zero to index t sequentially beginning with the index zero. This approach can be denoted by the following code: for (i=0; i<=t; i++) {Output(payload[i])}. The arithmetic coder normalization process 200 then proceeds to a process block 228. At the process block 228, the arithmetic coder normalization process 200 resets the index to negative one. The arithmetic coder normalization process 200 then ends at a process block 230.
If the arithmetic coder normalization process 200 determines, at the decision block 224, that the input carry bit does not equal one, the arithmetic coder normalization process 200 ends at the process block 230.
The normalization and bit packing engine 300 receives two distinct variables: a codeLow variable 302 and a codeRange variable 304, on every clock cycle. A leading zero detector 306 generates an output that is equal to the number of leading zero binary bits in the codeRange variable 304. This output is registered in a latch shiftCnt 308. A bus splitter 310 outputs a carry bit and a dchunk variable. The carry bit is extracted from the most significant bit of the codeLow variable 302. Further, the dchunk variable, which includes the second through ninth lower bits of the codeLow variable 302, is then shifted right by shiftCnt variable 308 through a shift latch 312. The output dchunkRa of this shift latch 312 is then further shifted by bitPos16_1 variable through a bitPos16_1 shift latch 314 to align the data to fit into an output preparation register 316. The output preparation register 316 is utilized to hold data until there are enough output bits to form a full byte. In another embodiment, a plurality of output preparation registers 316 may be utilized.
A bit position calculator 318 generates a bitPos16_1 variable and a byte ready flag based on the input to the shiftCnt variable. The bitPos16_1 variable identifies where the dchunkR should reside within the output preparation register 316. The byte ready flag identifies when the least significant byte 320 is ready for output. The bitPos16_1 shift latch 314 outputs dchunk16, which is then sent to a logical or gate 322 along with the output from the output preparation register 316. The output from the logical or gate 322 is then sent to an adder 324 along with a shifted carry bit from a shift latch 326 to form both the output byte 328 and the new data for the output preparation register 316. The shifted carry bit is generated by the shift latch 326, which shifts the logically conditioned carry bit utilizing oneFlag_d, a delayed carry flag carry_d6, and a delayed carry flag carry_d5.
The oneFlag_d is generated by first providing dchunk to an all ones detector 330. If dchunk is all ones, the all ones detector 330 outputs oneFlag and provides oneFlag to a latch 332. The latch 332 shifts oneFlag and outputs oneFlag_d.
The oneFlag_d is provided along with a delayed carry flag carry_d6 to a first gate 330. Further, the output of the first gate 334 is provided along with a delayed carry flag carry_d5 to a second gate 336.
The output of the adder 324 is split into a plurality of bytes through a bit splitter 338. In one embodiment, the bit splitter 338 splits the output of the adder 324 into three bytes. Further, in one embodiment, the bit splitter 338 is a twenty four bit splitter. The most significant byte is provided to an output byte register 340, which may be denoted by the term outByte. The two least significant bytes are routed through a multiplexor 342 to feed the inputs of the output preparation register 316. Based on the byteRdy flag, the multiplexor 342 selects one of the two lower output bytes from the adder 324 for the middle byte 344 of the output preparation register 316.
It should be understood that the code normalization and byte construction module 740 may be implemented as one or more physical devices that are coupled to the CPU 710 through a communication channel. Alternatively, the normalization and byte construction module 740 may be represented by one or more software applications (or even a combination of software and hardware, e.g., using application specific integrated circuits (ASIC)), where the software is loaded from a storage medium, (e.g., a magnetic or optical drive or diskette) and operated by the CPU in the memory 720 of the computer. As such, the normalization and byte construction module 740 (including associated data structures) of the present invention may be stored on a computer readable medium, e.g., RAM memory, magnetic or optical drive or diskette and the like.
It is understood that the normalization and byte construction engine described herein may also be applied in other type of encoders. Those skilled in the art will appreciate that the various adaptations and modifications of the embodiments of this method and apparatus may be configured without departing from the scope and spirit of the present method and system. Therefore, it is to be understood that, within the scope of the appended claims, the present method and apparatus may be practiced other than as specifically described herein.
Claims
1. A method comprising:
- extracting a plurality of subsets of bits from a first input, each of the subsets of bits having a bit width equaling a number of leading zeros from a second input variable;
- storing, in a memory, a consecutive sequence of the plurality of subsets; and
- reading the consecutive sequence of the plurality of subsets from the memory if a third input release flag is established.
2. The method of claim 1, wherein the consecutive sequence of the plurality of subsets is read from the memory in the same order that the consecutive sequence of the plurality of subsets is stored in the memory.
3. The method of claim 1, wherein each of the plurality of subsets of bits has a variable length.
4. The method of claim 3, further comprising concatenating the plurality of subsets of bits to form a stream of bits.
5. The method of claim 4, further comprising sending the stream of bits in a plurality of constant width blocks.
6. The method of claim 5, further comprising storing remaining bits that do not completely fill the constant width blocks in the plurality of constant width blocks as a subset of bits for a next set of input data.
7. The method of claim 1, wherein the third input release flag is the most significant bit of the first input.
8. The method of claim 5, wherein each of the subset of bits from the first input is formed by removing the most significant bit of the first input and continuing to remove the least significant bits of the input until the remaining bits are equal to the number of leading zeros from the second input variable.
9. The method of claim 1, further wherein the memory utilizes an array data structure for storage.
10. A method comprising,
- storing a consecutive set of variable bit width data into a first in first out buffer, the variable bit width data having a width that is determined by a number of leading zeroes from an input variable; and
- reading the data from the first in first out buffer if the receiving data contains only ones.
11. The method of claim 10, further comprising concatenating the variable length blocks of data to form a stream of bits.
12. The method of claim 11, further comprising sending the stream of bits in constant bit width blocks.
13. The method of claim 12, further comprising storing remaining bits that do not completely fill the constant width blocks as the first variable block width data for a next set of input data.
14. A method comprising:
- storing a consecutive set of data from a first input variable into a memory;
- receiving a subsequent data set from the first input variable; and
- reading the consecutive set of data from the memory if the subsequent data set includes one or more binary bits having a value of zero.
15. The method of claim 14, further comprising reading the consecutive set of data from the memory if the most significant bit of the subsequent data set equals one.
16. The method of claim 14, wherein the first input variable stores data having a variable bit width.
17. The method of claim 16, wherein variable bit width data has a width that is determined by a number of leading zeroes from a second input variable.
18. The method of claim 14, further comprising adding an input carry flag to the consecutive set of data if the subsequent data set includes one or more bits having a value of zero.
19. The method of claim 18, further comprising adding a carry bit of the consecutive set of data to the subsequent data set.
20. The method of claim 15, further comprising determining a data set to store in the first input variable after the reading has completed based on data that causes a condition to be met so that the reading is initiated.
Type: Application
Filed: Jan 22, 2007
Publication Date: Jul 24, 2008
Applicant: GENERAL INSTRUMENT CORPORATION (Horsham, PA)
Inventor: Yendo Hu (La Jolla, CA)
Application Number: 11/625,417