System and method for decoding using parallel processing

An apparatus for decoding frames of a compressed video data stream having at least one frame divided into partitions, includes a memory and a processor configured to execute instructions stored in the memory to read partition data information indicative of a partition location for at least one of the partitions, decode a first partition of the partitions that includes a first sequence of blocks, decode a second partition of the partitions that includes a second sequence of blocks identified from the partition data information using decoded information of the first partition.

Skip to: Description  ·  Claims  ·  References Cited  · Patent History  ·  Patent History
Description
CROSS-REFERENCE TO RELATED APPLICATIONS

This application is a continuation of U.S. nonprovisional patent application Ser. No. 13/565,364, filed Aug. 2, 2012, now U.S. Pat. No. 9,357,223, which is a divisional of U.S. nonprovisional patent application Ser. No. 12/329,248, filed Dec. 5, 2008, which claims priority to U.S. provisional patent application No. 61/096,223, filed Sep. 11, 2008, which are incorporated herein in entirety by reference.

TECHNICAL FIELD

The present invention relates in general to video decoding using multiple processors.

BACKGROUND

An increasing number of applications today make use of digital video for various purposes including, for example, remote business meetings via video conferencing, high definition video entertainment, video advertisements, and sharing of user-generated videos. As technology is evolving, people have higher expectations for video quality and expect high resolution video with smooth playback at a high frame rate.

There can be many factors to consider when selecting a video coder for encoding, storing and transmitting digital video. Some applications may require excellent video quality where others may need to comply with various constraints including, for example, bandwidth or storage requirements. To permit higher quality transmission of video while limiting bandwidth consumption, a number of video compression schemes are noted including proprietary formats such as VPx (promulgated by On2 Technologies, Inc. of Clifton Park, N.Y., H.264 standard promulgated by ITU-T Video Coding Experts Group (VCEG) and the ISO/IEC Moving Picture Experts Group (MPEG), including present and future versions thereof. H.264 is also known as MPEG-4 Part 10 or MPEG-4 AVC (formally, ISO/IEC 14496-10).

There are many types of video encoding schemes that allow video data to be compressed and recovered. The H.264 standard, for example, offers more efficient methods of video coding by incorporating entropy coding methods such as Context-based Adaptive Variable Length Coding (CAVLC) and Context-based Adaptive Binary Arithmetic Coding (CABAC). For video data that is encoded using CAVLC, some modem decompression systems have adopted the use of a multi-core processor or multiprocessors to increase overall video decoding speed.

SUMMARY

An embodiment of the invention is disclosed as a method for decoding a stream of encoded video data including a plurality of partitions that have been compressed using at least a first encoding scheme. The method includes selecting at least a first one of the partitions that includes at least one row of blocks that has been encoded using at least a second encoding scheme. A second partition is selected that includes at least one row of blocks encoded using the second encoding scheme. The first partition is decoded by a first processor, and the second partition is decoded by a second processor. The decoding of the second partition is offset by a specified number of blocks so that at least a portion of the output from the decoding of the first partition is used as input in decoding the second partition. Further, the decoding of the first partition is offset by a specified number of blocks so that at least a portion of the output from the decoding of the second partition is used as input in decoding the first partition.

BRIEF DESCRIPTION OF THE DRAWINGS

The description herein makes reference to the accompanying drawings wherein like reference numerals refer to like parts throughout the several views, and wherein:

FIG. 1 is a diagram of the hierarchy of layers in a compressed video bitstream in accordance with one embodiment of the present invention.

FIG. 2 is a block diagram of a video compression system in accordance with one embodiment of the present invention.

FIG. 3 is a block diagram of a video decompression system in accordance with one embodiment of the present invention.

FIG. 4 is a schematic diagram of a frame and its corresponding partitions outputted from the video compression system of FIG. 2.

FIG. 5 is a schematic diagram of an encoded video frame in a bitstream outputted from the video compression system of FIG. 2 and sent to the video decompression system of FIG. 3.

FIGS. 6A-6B arc timing diagrams illustrating the staging and synchronization of cores on a multi-core processor used in the video decompression system of FIG. 3.

FIG. 7A is a schematic diagram showing data-dependent macroblocks and an offset calculation based used in the video compression and decompression systems of FIGS. 2 and 3.

FIG. 7B is a schematic diagram showing data-dependent macroblocks and an alternative offset calculation used in the video compression and decompression systems of FIGS. 2 and 3.

DETAILED DESCRIPTION

Referring to FIG. 1, video coding standards, such as H.264, provide a defined hierarchy of layers 10 for a video stream 11. The highest level in the layer can be a video sequence 13. At the next level, video sequence 13 consists of a number of adjacent frames 15. Number of adjacent frames 15 can be further subdivided into a single frame 17. At the next level, frame 17 can be composed of a series of fix-sized macroblocks 20, which contain compressed data corresponding to, for example, a 16×16 block of displayed pixels in frame 17. Each macroblock contains luminance and chrominance data for the corresponding pixels. Macroblocks 20 can also be of any other suitable size such as 16×8 pixel groups or 8×16 pixel groups. Macroblocks 20 are further subdivided into blocks. A block, for example, can be a 4×4 pixel group that can further describe the luminance and chrominance data for the corresponding pixels. Blocks can also be of any other suitable size such as 16×16, 16×8, 8×16, 8×8, 8×4, 4×8 and 4×4 pixels groups.

Although the description of embodiments are described in the context of the VP8 video coding format, alternative embodiments of the present invention can be implemented in the context of other video coding formats. Further, the embodiments are not limited to any specific video coding standard or format.

Referring to FIG. 2, in accordance with one embodiment, to encode an input video stream 16, an encoder 14 performs the following functions in a forward path (shown by the solid connection lines) to produce an encoded bitstream 26: intra/inter prediction 18, transform 19, quantization 22 and entropy encoding 24. Encoder 14 also includes a reconstruction path (shown by the dotted connection lines) to reconstruct a frame for encoding of further macroblocks. Encoder 14 performs the following functions in the reconstruction path: dequantization 28, inverse transformation 30, reconstruction 32 and loop filtering 34. Other structural variations of encoder 14 can be used to encode bitstream 26.

When input video stream 16 is presented for encoding, each frame 17 within input video stream 16 can be processed in units of macroblocks. At intra/inter prediction stage 18, each macroblock can be encoded using either intra prediction or inter prediction mode. In the case of intra-prediction, a prediction macroblock can be formed from samples in the current frame that have been previously encoded and reconstructed. In the case of inter-prediction, a prediction macroblock can be formed from one or more reference frames that have already been encoded and reconstructed.

Next, still referring to FIG. 2, the prediction macroblock can be subtracted from the current macroblock to produce a residual macroblock (residual). Transform stage 19 transform codes the residual signal to coefficients and quantization stage 22 quantizes the coefficients to provide a set of quantized transformed coefficients. The quantized transformed coefficients are then entropy coded by entropy encoding stage 24. The entropy-coded coefficients, together with the information required to decode the macroblock, such as the type of prediction mode used, motion vectors and quantizer value, are output to compressed bitstream 26.

The reconstruction path in FIG. 2, can be present to permit that both the encoder and the decoder use the same reference frames required to decode the macroblocks. The reconstruction path, similar to functions that take place during the decoding process, which arc discussed in more detail below, includes dequantizing the transformed coefficients by dequantization stage 28 and inverse transforming the coefficients by inverse transform stage 30 to produce a derivative residual macroblock (derivative residual). At the reconstruction stage 32, the prediction macroblock can be added to the derivative residual to create a reconstructed macroblock. A loop filter 34 can he applied to the reconstructed macroblock to reduce distortion.

Referring to FIG. 3, in accordance with one embodiment, to decode compressed bitstream 26, a decoder 21, similar to the reconstruction path of encoder 14 discussed previously, performs the following functions to produce an output video stream 35: entropy decoding 25, dequantization 27, inverse transformation 29, intra/inter prediction 23, reconstruction 31, loop filter 34 and deblocking filtering 33. Other structural variations of decoder 21 can be used to decode compressed bitstream 26.

When compressed bitstream 26 is presented for decoding, the data elements can be decoded by entropy decoding stage 25 to produce a set of quantized coefficients. Dequantization stage 27 dequantizes and inverse transform stage 29 inverse transforms the coefficients to produce a derivative residual that is identical to that created by the reconstruction stage in encoder 14. Using the type of prediction mode and/or motion vector information decoded from the compressed bitstream 26, at intra/inter prediction stage 23, decoder 21 creates the same prediction macroblock as was created in encoder 14. At the reconstruction stage 33, the prediction macroblock can be added to the derivative residual to create a reconstructed macroblock. The loop filter 34 can be applied to the reconstructed macroblock to reduce blocking artifacts. A deblocking filter 33 can be applied to video image frames to further reduce blocking distortion and the result can he outputted to output video stream 35.

Current context-based entropy coding methods, such as Context-based Adaptive Arithmetic Coding (CABAC), are limited by dependencies that exploit spatial locality by requiring macroblocks to reference neighboring macroblocks and that exploit temporal localities by requiring macroblocks to reference macroblocks from another frame. Because of these dependencies and the adaptivity, encoder 14 codes the bitstream in a sequential order using context data from neighboring macroblocks. Such sequential dependency created by encoder 14 causes the compressed bitstream 26 to be decoded in a sequential fashion by decoder 21. Such sequential decoding can be adequate when decoding using a single-core processor. On the other hand, if a multi-core processor or a multi-processor system is used during decoding, the computing power of the multi-core processor or the multi-processor system would not be effectively utilized.

Although the disclosure has and will continue to describe embodiments of the present invention with reference to a multi-core processor and the creation of threads on the multi-core processor, embodiments of the present invention can also be implemented with other suitable computer systems, such as a device containing multiple processors.

According to one embodiment, encoder 14 divides the compressed bitstream into partitions 36 rather than a single stream of serialized data. With reference to FIG. 4 and by way of example only, the compressed bitstream can be divided into four partitions, which are designated as Data Partitions 1-4. Other numbers of partitions are also suitable. Since each partition can be the subject of a separate decoding process when they are decoded by decoder 21, the serialized dependency can be broken up in the compressed data without losing coding efficiency.

Referring to FIG. 4, frame 17 is shown with divided macroblock rows 38. Macroblock rows 38 consist of individual macroblocks 20. Continuing with the example, every Nth macroblock row 38 can be grouped into one of partitions 36 (where N is the total number of partitions). In this example, there are four partitions and macroblock rows 0, 4, 8, 12, etc. are grouped into partition 1. Macroblock rows 1, 5, 9 and 13, etc. are grouped into partition 2. Macroblock rows 2, 6, 10, 14, etc. are grouped into partition 3. Macroblock rows 3, 7, 11, 15, etc. are grouped into partition 4. As a result, each partition 36 includes contiguous macroblocks, but in this instance, each partition 36 does not contain contiguous macroblock rows 38. In other words, macroblock rows of blocks in the first partition and macroblock rows in the second partition can be derived from two adjacent macroblock rows in a frame. Other grouping mechanisms arc also available and arc not limited to separating regions by macroblock row or grouping every Nth macroblock row into a partition. Depending on the grouping mechanism, in another example, macroblock rows that are contiguous may also be grouped into the same partition 36.

An alternative grouping mechanism may include, for example, grouping a row of blocks from a first frame and a corresponding row of blocks in a second frame. The row of blocks from the first frame can be packed in the first partition and the corresponding row of blocks in the second frame can be packed in the second partition. A first processor can decode the row of blocks from the first frame and a second processor can decode the row of blocks from the second frame. In this manner, the decoder can decode at least one block in the second partition using information from a block that is already decoded by the first processor.

Each of the partitions 36 can be compressed using two separate encoding schemes. The first encoding scheme can be lossless encoding using, for example, context-based arithmetic coding like CABAC. Other lossless encoding techniques may also be used. Referring back to FIG. 1, the first encoding scheme may be realized by, for example, entropy encoding stage 24.

Still referring to FIG. 1, the second encoding scheme, which can take place before the first encoding scheme, may be realized by at least one of intra/inter prediction stage 18, transform stage 19, and quantization 22. The second encoding scheme can encode blocks in each of the partitions 36 by using information contained in other partitions. For example, if a frame is divided into two partitions, the second encoding scheme can encode the second partition using information contained in the macroblock rows of the first partition.

Referring to FIG. 5, an encoded video frame 39 from compressed bitstream 26 is shown. For simplicity, only parts of the bitstream that are pertinent to embodiments of the invention are shown. Encoded video frame 39 contains a video frame header 44 which contains bits for a number of partitions 40 and bits for offsets of each partition 42. Encoded video frame 39 also includes the encoded data from data partitions 36 illustrated as P1-PN where, as discussed previously, N is the total number of partitions in video frame 17.

Once encoder 14 has divided frame 17 into partitions 36, encoder 14 writes data into video frame header 44 to indicate number of partitions 40 and offsets of each partition 42. Number of partitions 40 and offsets of each partition 42 can be represented in frame 17 by a bit, a byte or any other record that can relay the specific information to decoder 21. Decoder 21 reads the number of data partitions 40 from video frame header 44 in order to decode the compressed data. In one example, two bits may be used to represent the number of partitions. One or more bits can be used to indicate the number of data partitions (or partition count). Other coding schemes can also be used to code the number of partitions into the bitstream. The following list indicates how two bits can represent the number of partitions:

BIT 1 BIT 2 NUMBER OF PARTITIONS 0 0 One partition 0 1 Two partitions 1 0 Four partitions 1 1 Eight partitions

If the number of data partitions is greater than one, decoder 21 also needs information about the positions of the data partitions 36 within the compressed bitstream 26. The offsets of each partition 42 (also referred to as partition location offsets) enable direct access to each partition during decoding.

In one example, offset of each partition 42 can be relative to the beginning of the bitstream and can be encoded and written into the bitstream 26. In another example, the offset for each data partition can be encoded and written into the bitstream except for the first partition since the first partition implicitly begins in the bitstream 26 after the offsets of each partition 42. The foregoing is merely exemplary. Other suitable data structures, flags or records such words and bytes, can be used to transmit partition count and partition location offset information.

Although the number of data partitions can be the same for each frame 17 throughout the input video sequence 16, the number of data partitions may also differ from frame to frame. Accordingly, each frame 17 would have a different number of partitions 40. The number of bits that are used to represent the number of partitions may also differ from frame to frame. Accordingly, each frame 17 could be divided into varying numbers of partitions.

Once the data has been compressed into bit stream 26 with the proper partition data information (i.e. number of partitions 40 and offsets of partitions 42), decoder 21 can decode the data partitions 36 on a multi-core processor in parallel. In this manner, each processor core may be responsible for decoding one of the data partitions 36. Since multi-core processors typically have more than one processing core and shared memory space, the workload can be allocated between each core as evenly as possible. Each core can use the shared memory space as an efficient way of sharing data between each core decoding each data partition 36.

For example, if there are two processors decoding two partitions, respectively, the first processor will begin decoding the first partition. The second processor can then decode macroblocks of the second partition and can use information received from the first processor, which has begun decoding macroblocks of the first partition. Concurrently with the second processor, the first processor can continue decoding macroblocks of the first partition and can use information received from the second processor. Accordingly, both the first and second processors can have the information necessary to properly decode macroblocks in their respective partitions.

Furthermore, as discussed in more detail below, when decoding a macroblock row of the second partition that is dependent on the first partition, a macroblock that is currently being processed in the second partition is offset by a specified number of macroblocks. In this manner, at least a portion of the output of the decoding of the first partition can be used as input in the decoding of the macroblock that is currently being processed in the second partition. Likewise, when decoding a macroblock row of the first partition that is dependent on the second partition, a macroblock that is currently being processed in the first partition is offset by a specified number of macroblocks so that at least a portion of the output of the decoding of the second partition can be used as input in the decoding of the macroblock that is currently being processed in the first partition.

When decoding the compressed bitstream, decoder 21 determines the number of threads needed to decode the data, which can be based on the number of partitions 40 in each encoded frame 39. For example, if number of partitions 40 indicates that there are four partitions in encoded frame 39, decoder 21 creates four threads with each thread decoding one of the data partitions. Referring to FIG. 4, as an example, decoder 21 can determine that four data partitions have been created. Hence, if decoder 21 is using a multi-core processor, it can create four separate threads to decode the data from that specific frame.

As discussed previously, macroblocks 20 within each frame use context data from neighboring macroblocks when being encoded. When decoding macroblocks 20, the decoder will need the same context data in order to decode the macroblocks properly. On the decoder side, the context data can be available only after the neighboring macroblocks have already been decoded by the current thread or other threads. In order to decode properly, the decoder includes a staging and synchronization mechanism for managing the decoding of the multiple threads.

With reference to FIGS. 6A and 6B, a time diagram shows the staging and synchronization mechanism to decode partitions 36 on threads of a multi-core processor in accordance with an embodiment of the present invention. FIGS. 6A and 6B illustrate an exemplary partial image frame 45 at various stages of the decoding process. The example is simplified for purposes of this disclosure and the number of partitions 36 is limited to three. Each partition 36 can be assigned to one of the three threads 46, 48 and 50. As discussed previously, each partition 36 includes contiguous macroblocks.

As depicted in FIGS. 6A and 6B, as an example, three threads 46, 48 and 50 arc shown, and each of threads 46, 48 and 50 are capable of performing decoding in parallel with each other. Each of the three threads 46, 48 and 50 processes one partition in a serial manner while all three partitions 40 are processed in parallel with each other.

Each of FIGS. 6A and 6B contain an arrow that illustrates which macroblock is currently being decoded in each macroblock row, which macroblocks have been decoded in each macroblock row, and which macroblocks have yet to be decoded in each macroblock row. If the arrow is pointing to a specific macroblock, that macroblock is currently being decoded. Any macroblock to the left of the arrow (if any) has already been decoded in that row. Any macroblock to the right of the arrow has yet to be decoded. Although the macroblocks illustrated in FIGS. 6A and 6B all have similar sizes, the techniques of this disclosure are not limited in this respect. Other block sizes, as discussed previously, can also be used with embodiments of the present invention.

Referring to FIG. 6A, at time t1, thread 46 has initiated decoding of a first macroblock row 52. Thread 46 is currently processing macroblock j in first macroblock row 52 as shown by arrow 58. Macroblocks 0 to j−1 have already been decoded in first macroblock row 52. Macroblocks j+1 to the end of first macroblock row 52 have yet to be decoded in first macroblock row 52. Thread 48 has also initiated decoding of a second macroblock row 54. Thread 48 is currently processing macroblock 0 in second macroblock row 54 as shown by arrow 60. Macroblocks 1 to the end of second macroblock row 54 have been decoded in second macroblock row 54. Thread 50 has not begun decoding of a third macroblock row 56. No macroblocks have been decoded or are currently being decoded in third macroblock row 56.

Referring to FIG. 6B, at time t2, thread 46 has continued decoding of first macroblock row 52. Thread 46 is currently processing macroblock j*2 in first macroblock row 52 as shown by arrow 62. Macroblocks 0 to j*2-1 have already been decoded in first macroblock row 52. Macroblocks j*2+1 to the end of first macroblock row 52 have yet to be decoded in first macroblock row 52. Thread 48 has also continued decoding of second macroblock row 54. Thread 48 is currently processing macroblock j in second macroblock row 54 as shown by arrow 64. Macroblocks 0 to j−1 have already been decoded in second macroblock row 54. Macroblocks j+1 to the end of second macroblock row 54 have yet to be decoded in second macroblock row 54. Thread 50 has also initiated decoding of a third macroblock row 56. Thread 50 is currently processing macroblock 0 in third macroblock row 56 as shown by arrow 66. Macroblocks 1 to the end of third macroblock row 56 have yet to be decoded in third macroblock row 56.

Previous decoding mechanisms were unable to efficiently use a multi-core processor to decode a compressed bitstream because processing of a macroblock row could not be initiated until the upper adjacent macroblock row had been completely decoded. The difficulty of previous decoding mechanisms stems from the encoding phase. When data is encoded using traditional encoding techniques, spatial dependencies within macroblocks imply a specific order of processing of the macroblocks. Furthermore, once the frame has been encoded, a specific macroblock row cannot be discerned until the row has been completely decoded. Accordingly, video coding methods incorporating entropy coding methods such as CABAC created serialized dependencies which were passed to the decoder. As a result of these serialized dependencies, decoding schemes had limited efficiency because information for each computer processing system (e.g. threads 46, 48 and 50) was not available until the decoding process has been completed on that macroblock row.

Utilizing the parallel processing staging and synchronization mechanism illustrated in FIGS. 6A and 6B allows decoder 21 to efficiently accelerate the decoding process of image frames. Because each partition 36 can be subject to a separate decoding process, interdependencies between partitions can be managed by embodiments of the staging and synchronization scheme discussed previously in connection with FIGS. 6A and 6B. Using this staging and synchronization decoding scheme, each thread 46, 48 and 50 that decodes an assigned partition can exploit context data from neighboring macroblocks. Thus, decoder 21 can decode macroblocks that contain context data necessary to decode a current macroblock before the preceding macroblock row has been completely decoded.

Referring again to FIGS. 6A and 6B, offset j can be determined by examining the size of the context data used in the preceding macroblock row (e.g. measured in a number of macroblocks) during the encoding process. Offset j can be represented in frame 17 by a bit, a byte or any other record that can relay the size of the context data to decoder 21. FIGS. 7A and 7B illustrate two alternatives for the size of offset j.

Referring to FIG. 7A, in one embodiment, current macroblock 68 is currently being processed. Current macroblock 68 uses context data from the left, top-left, top and top-right macroblocks during encoding. In other words, current macroblock 68 uses information from macroblocks: (r+1, c−1), (r, c−1), (r, c) and (r, c+1). In order to properly decode current macroblock 68, macroblocks (r+1, c−1), (r, c−1), (r, c) and (r, c+1) should be decoded before current macroblock 68. Since, as discussed previously, decoding of macroblocks can be performed in a serial fashion, macroblock (r+1, c−1) can be decoded before current macroblock 68. Further, in the preceding macroblock row (i.e. macroblock row r), since the encoding process uses (r, c+1) as the rightmost macroblock, the decoder can use (r, c+1) as the rightmost macroblock during decoding as well. Thus, offset j can be determined by subtracting the column row position of rightmost macroblock of the preceding row used during encoding of the current macroblock from the column row position of the current macroblock being processed. In FIG. 7A, offset j would be determined by subtracting the column row position of macroblock (r, c+1) from the column position of current macroblock 68 (i.e. (r+1, c)), or c+1−c, giving rise to an offset of 1.

Referring to FIG. 7B, in one embodiment, current macroblock 68′ is currently being processed. Current macroblock 68′ uses information from macroblocks: (r+1, c−1), (r, c−1), (r, c), (r, c+1), (r, c+2), and (r, c+3). In order to properly decode current macroblock 68′, macroblocks (r+1, c−1), (r, c−1), (r, c) (r, c+1), (r, c+2) and (r, c+3) should be decoded before current macroblock 68′. Since, as discussed previously, decoding of macroblocks can be performed in a serial fashion, macroblock (r+1, c−1) can be decoded before current macroblock 68′. Further, in the preceding macroblock row (i.e. macroblock row r), since the encoding process uses (r, c+3) as the rightmost macroblock, the decoder can use (r, c+3) as the rightmost macroblock during decoding as well. As discussed previously, offset j can be determined by subtracting the column row position of rightmost macroblock of the preceding row used during encoding of the current macroblock from the column row position of the current macroblock being processed. In FIG. 7A, offset j would be calculated by subtracting the column row position of macroblock (r, c+3) from the column position of current macroblock 68′ (i.e. (r+1, c)), or c+3−c, giving rise to an offset of 3.

In the preferred embodiment, the offset can be determined by the specific requirements of the codec. In alternative embodiments, the offset can be specified in the bitstream.

While the invention has been described in connection with certain embodiments, it is to be understood that the invention is not to he limited to the disclosed embodiments but, on the contrary, is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims, which scope is to be accorded the broadest interpretation so as to encompass all such modifications and equivalent structures as is permitted under the law.

Claims

1. An apparatus for decoding frames of a compressed video data stream, including at least one frame divided into partitions, the apparatus comprising:

a memory; and
a processor configured to execute instructions stored in the memory to: read, from the compressed video data stream, partition data information indicative of a partition location with respect to the compressed video data stream for at least one of the partitions; decode, from the compressed video data stream, a first partition of the partitions that includes a first sequence of blocks; and decode, from the compressed video data stream, a second partition of the partitions that includes a second sequence of blocks identified based on the partition location indicated by the partition data information, using decoded information of the first partition; wherein the first partition and the second partition have each been individually compressed; and output or store a decoded frame including the first sequence of blocks and the second sequence of blocks.

2. The apparatus of claim 1, wherein the decoding includes decoding lossless coded information.

3. The apparatus of claim 1, wherein the processor is further configured to identify one row of blocks in the frame having a boundary between the first partition and the second partition.

4. The apparatus of claim 1, wherein the first partition includes contiguous blocks in at least a portion of a first row and at least a portion of a first subsequent row.

5. The apparatus of claim 4, wherein the second partition includes contiguous blocks in at least a portion of a second row and at least a portion of a second subsequent row.

6. The apparatus of claim 1, wherein the first partition comprises blocks of two or more contiguous rows of blocks.

7. The apparatus of claim 1, wherein the processor is further configured to decode the second sequence of blocks using context information contained in the first sequence of blocks.

8. The apparatus of claim 7, wherein the processor is further configured to perform at least one of intra-frame prediction or inter-frame prediction on the second partition.

9. The apparatus of claim 7, wherein the processor decodes the second partition with an offset of a specified number of blocks such that at least a portion of decoded context information of the first sequence of blocks is available as context data for decoding at least one block in the second partition.

10. The apparatus of claim 9, wherein the offset is determined based upon size of the context.

11. The apparatus of claim 7, wherein the first sequence of blocks in the first partition and the second sequence of blocks in the second partition are derived from corresponding rows of blocks in two successive frames of the video data, and wherein the processor is further configured to:

decode at least one block in the second partition using information from a block previously decoded.

12. The apparatus of claim 1, wherein the decoding includes context-based arithmetic coding.

13. A non-transitory computer-readable storage medium having stored thereon an encoded bitstream, including at least one frame divided into individually compressed partitions, wherein the encoded bitstream is configured for decoding by operations comprising:

reading, from the encoded bitstream, partition data information indicative of a partition location with respect to the encoded bitstream for at least one of the partitions;
decoding, from the encoded bitstream, a first partition of the partitions that includes a first sequence of blocks;
decoding, from the encoded bitstream, a second partition of the partitions that includes a second sequence of blocks identified based on the partition location indicated by the partition data information, using decoded information of the first partition; and
outputting or storing a decoded frame including the first sequence of blocks and the second sequence of blocks.

14. The non-transitory computer-readable storage medium of claim 13, wherein the decoding includes decoding losslessly coded information.

15. The non-transitory computer-readable storage medium of claim 13, wherein the decoding includes decoding the second sequence of blocks using context information contained in the first sequence of blocks.

16. The non-transitory computer-readable storage medium of claim 15, wherein the decoding includes decoding the second partition with an offset of a specified number of blocks such that at least a portion of decoded context information of the first sequence of blocks is available as context data for decoding at least one block in the second partition.

17. The non-transitory computer-readable storage medium of claim 16, wherein the offset is determined based upon size of the context.

18. The non-transitory computer-readable storage medium of claim 15, wherein the first sequence of blocks in the first partition and the second sequence of blocks in the second partition are derived from corresponding rows of blocks in two successive frames of the video data, and wherein the decoding includes:

decoding at least one block in the second partition using information from a block previously decoded.

19. The non-transitory computer-readable storage medium of claim 13, wherein the decoding includes context-based arithmetic coding.

20. An apparatus for encoding frames of a video, including at least one frame divided into partitions, the apparatus comprising:

a memory; and
a processor configured to execute instructions stored in the memory to: encode, into a compressed video data stream, a first partition of the partitions that includes a first sequence of blocks; encode, into the compressed video data stream, at a partition location with respect to the compressed video data stream, a second partition of the partitions that includes a second sequence of blocks encoded using information of the first partition; include, in the compressed video data stream, partition data information indicative of the partition location with respect to the compressed video data stream for at least one of the partitions; and output or store the compressed video data stream.

21. The apparatus of claim 20, wherein the processor is configured to execute the instructions to use context information from the first sequence of blocks to encode the second sequence of blocks.

22. The apparatus of claim 21, wherein the processor is configured to execute the instructions to encode the second partition with an offset of a specified number of blocks such that at least a portion of context information from the first sequence of blocks is available as context data for encoding at least one block in the second partition.

23. The apparatus of claim 22, wherein the offset is determined based upon size of the context.

24. The apparatus of claim 22, wherein the first sequence of blocks in the first partition and the second sequence of blocks in the second partition are derived from corresponding rows of blocks in two successive frames of the video data, and the processor is configured to execute the instructions to:

use information from a block previously decoded to encode at least one block in the second partition.

25. The apparatus of claim 24, wherein to encode the processor is configured to execute the instructions to use context-based arithmetic coding.

Referenced Cited
U.S. Patent Documents
3825832 July 1974 Frei et al.
4719642 January 12, 1988 Lucas
4729127 March 1, 1988 Chan et al.
4736446 April 5, 1988 Reynolds et al.
4797729 January 10, 1989 Tsai
4868764 September 19, 1989 Richards
4891748 January 2, 1990 Mann
5068724 November 26, 1991 Krause et al.
5083214 January 21, 1992 Knowles
5091782 February 25, 1992 Krause et al.
5136371 August 4, 1992 Savatier et al.
5136376 August 4, 1992 Yagasaki et al.
5164819 November 17, 1992 Music
5225832 July 6, 1993 Wang et al.
5270812 December 14, 1993 Richards
5274442 December 28, 1993 Murakami et al.
5313306 May 17, 1994 Kuban et al.
5341440 August 23, 1994 Earl et al.
5381145 January 10, 1995 Allen et al.
5432870 July 11, 1995 Schwartz
5452006 September 19, 1995 Auld
5561477 October 1, 1996 Polit
5576765 November 19, 1996 Cheney et al.
5576767 November 19, 1996 Lee et al.
5589945 December 31, 1996 Abecassis
5604539 February 18, 1997 Ogasawara et al.
5646690 July 8, 1997 Yoon
5659539 August 19, 1997 Porter et al.
5696869 December 9, 1997 Abecassis
5734744 March 31, 1998 Wittenstein et al.
5737020 April 7, 1998 Hall et al.
5748247 May 5, 1998 Hu
5774593 June 30, 1998 Zick et al.
5793647 August 11, 1998 Hageniers et al.
5794179 August 11, 1998 Yamabe
5818530 October 6, 1998 Canfield
5818969 October 6, 1998 Astle
5828370 October 27, 1998 Moeller et al.
5835144 November 10, 1998 Matsumura et al.
5883671 March 16, 1999 Keng et al.
5903264 May 11, 1999 Moeller et al.
5929940 July 27, 1999 Jeannin
5930493 July 27, 1999 Ottesen et al.
5963203 October 5, 1999 Goldberg et al.
5999641 December 7, 1999 Miller et al.
6014706 January 11, 2000 Cannon et al.
6041145 March 21, 2000 Hayashi et al.
6061397 May 9, 2000 Ogura
6084908 July 4, 2000 Chiang et al.
6108383 August 22, 2000 Miller et al.
6112234 August 29, 2000 Leiper
6115501 September 5, 2000 Chun et al.
6119154 September 12, 2000 Weaver et al.
6141381 October 31, 2000 Sugiyama
6160846 December 12, 2000 Chiang et al.
6167164 December 26, 2000 Lee
6181742 January 30, 2001 Rajagopalan et al.
6181822 January 30, 2001 Miller et al.
6185363 February 6, 2001 Dimitrova et al.
6188799 February 13, 2001 Tan et al.
6240135 May 29, 2001 Kim
6292837 September 18, 2001 Miller et al.
6327304 December 4, 2001 Miller et al.
6366704 April 2, 2002 Ribas-Corbera et al.
6370267 April 9, 2002 Miller et al.
6400763 June 4, 2002 Wee
6496537 December 17, 2002 Kranawetter
6522784 February 18, 2003 Zlotnick
6529638 March 4, 2003 Westerman
6560366 May 6, 2003 Wilkins
6594315 July 15, 2003 Schultz et al.
6687303 February 3, 2004 Ishihara
6697061 February 24, 2004 Wee et al.
6707952 March 16, 2004 Tan et al.
6765964 July 20, 2004 Conklin
6876703 April 5, 2005 Ismaeil et al.
6934419 August 23, 2005 Zlotnick
6985526 January 10, 2006 Bottreau et al.
6987866 January 17, 2006 Hu
7003035 February 21, 2006 Tourapis et al.
7023916 April 4, 2006 Pandel et al.
7027654 April 11, 2006 Ameres et al.
7170937 January 30, 2007 Zhou
7227589 June 5, 2007 Yeo et al.
7236524 June 26, 2007 Sun et al.
7330509 February 12, 2008 Lu et al.
7499492 March 3, 2009 Ameres et al.
7606310 October 20, 2009 Ameres et al.
7764739 July 27, 2010 Yamada et al.
7813570 October 12, 2010 Shen et al.
8175161 May 8, 2012 Anisimov
8213518 July 3, 2012 Wang
8265144 September 11, 2012 Christoffersen
8401084 March 19, 2013 MacInnis
8520734 August 27, 2013 Xu
8743979 June 3, 2014 Lee et al.
8767817 July 1, 2014 Xu et al.
8948267 February 3, 2015 Khan
9100509 August 4, 2015 Jia et al.
9100657 August 4, 2015 Jia et al.
20020012396 January 31, 2002 Pau et al.
20020031184 March 14, 2002 Iwata
20020039386 April 4, 2002 Han et al.
20020168114 November 14, 2002 Valente
20030023982 January 30, 2003 Lee et al.
20030189982 October 9, 2003 MacInnis
20030215018 November 20, 2003 MacInnis et al.
20030219072 November 27, 2003 MacInnis
20040028142 February 12, 2004 Kim
20040066852 April 8, 2004 MacInnis
20040120400 June 24, 2004 Linzer
20040228410 November 18, 2004 Ameres
20040240556 December 2, 2004 Winger et al.
20040258151 December 23, 2004 Spampinato
20050050002 March 3, 2005 Slotznick
20050053157 March 10, 2005 Lillevold
20050117655 June 2, 2005 Ju
20050147165 July 7, 2005 Yoo et al.
20050169374 August 4, 2005 Marpe et al.
20050210145 September 22, 2005 Kim et al.
20050259747 November 24, 2005 Schumann
20050265447 December 1, 2005 Park
20050265461 December 1, 2005 Raveendran
20050276323 December 15, 2005 Martemyanov et al.
20060072674 April 6, 2006 Saha et al.
20060098737 May 11, 2006 Sethuraman et al.
20060109912 May 25, 2006 Winger et al.
20060114985 June 1, 2006 Linzer
20060126726 June 15, 2006 Lin et al.
20060126740 June 15, 2006 Lin et al.
20060150151 July 6, 2006 Dubinsky
20060215758 September 28, 2006 Kawashima
20060239345 October 26, 2006 Taubman et al.
20060256858 November 16, 2006 Chin
20060291567 December 28, 2006 Filippini et al.
20070025441 February 1, 2007 Ugur et al.
20070053443 March 8, 2007 Song
20070086528 April 19, 2007 Mauchly et al.
20070092006 April 26, 2007 Malayath
20070140342 June 21, 2007 Karczewicz et al.
20070229704 October 4, 2007 Mohapatra et al.
20070286288 December 13, 2007 Smith et al.
20080056348 March 6, 2008 Lyashevsky et al.
20080152014 June 26, 2008 Schreier et al.
20080159407 July 3, 2008 Yang et al.
20080198270 August 21, 2008 Hobbs et al.
20080198920 August 21, 2008 Yang et al.
20080212678 September 4, 2008 Booth et al.
20080215317 September 4, 2008 Fejzo
20080240254 October 2, 2008 Au
20080267295 October 30, 2008 Sung
20080298469 December 4, 2008 Liu
20080317364 December 25, 2008 Gou
20090002379 January 1, 2009 Baeza
20090003447 January 1, 2009 Christoffersen et al.
20090052529 February 26, 2009 Kim
20090080534 March 26, 2009 Sekiguchi et al.
20090168893 July 2, 2009 Schlanger
20090225845 September 10, 2009 Veremeev et al.
20090238277 September 24, 2009 Meehan
20090245349 October 1, 2009 Zhao
20090249178 October 1, 2009 Ambrosino et al.
20100061455 March 11, 2010 Xu et al.
20100158109 June 24, 2010 Dahlby
20100177826 July 15, 2010 Bhaumik et al.
20100183076 July 22, 2010 Yoon
20100189179 July 29, 2010 Gu et al.
20100215263 August 26, 2010 Imanaka
20100239181 September 23, 2010 Lee et al.
20100246665 September 30, 2010 Brederson et al.
20100316132 December 16, 2010 MacInnis
20110261884 October 27, 2011 Rubinstein et al.
20120014451 January 19, 2012 Lee et al.
20120128069 May 24, 2012 Sato
20120147958 June 14, 2012 Ronca et al.
20120213448 August 23, 2012 Malmborg et al.
20120294376 November 22, 2012 Tanaka et al.
20120307892 December 6, 2012 Xu et al.
20130034150 February 7, 2013 Sadafale
20130083161 April 4, 2013 Hsu et al.
20130259137 October 3, 2013 Kuusela
20150043645 February 12, 2015 Ventela
20150326888 November 12, 2015 Jia et al.
20170188024 June 29, 2017 Wang
Foreign Patent Documents
3510433 March 2004 JP
2007-166625 June 2007 JP
2008/0204 70 February 2008 WO
2008/036237 March 2008 WO
2010/063184 June 2010 WO
Other references
  • Armando J. Pinho, “Encoding of Image Partitions Using a Standard Technique for Lossless Image Compression,” Dep. Electrónica e Telecomunicacoes/ INESC Universidade de Aveiro, Portugal (IEEE 1999), 4 pp.
  • B. Vasudev & N. Merhav, “DCT Mode Conversions for Field/Frame Coded MPEG Video”, IEEE 2d Workshop on Multimedia Signal Processing 605-610 (Dec. 1998).
  • Bankoski et al. “Technical Overview of VP8, an Open Source Video Codec for the Web”. Dated Jul. 11, 2011.
  • Bankoski et al., “VP8 Data Format and Decoding Guide”, Independent Submission RFC 6389, Nov. 2011, 305 pp.
  • Bankoski et al., “VP8 Data Format and Decoding Guide draft-bankoski-vp8-bitstream-02”, Network Working Group, Internet-Draft, May 18, 2011, 288 pp.
  • Fore June “An Introduction to Digital Video Data Compression in Java”, Chapter 12: DPCM video Codec, CreateSpace, Jan. 22, 2011.
  • International Search Report and Written Opinion Issued in co-pending PCT International Application No. PCT/US2013/034581 dated Jun. 11, 2013.
  • Li E Q et al., “Implementation of H.264 encoder on general-purpose processors with hyper-threading technology”, Proceedings of SPIE, pp. 384-395, vol. 5308, No. 1, Jan. 20, 2004.
  • “Introduction to Video Coding Part 1: Transform Coding”, Mozilla, Mar. 2012, 171 pp.
  • On2 Technologies Inc., White Paper TrueMotion VP7 Video Codec, Jan. 10, 2005, 13 pages, Document Version: 1.0, Clifton Park, New York.
  • On2 Technologies, Inc., White Paper On2's TrueMotion VP7 Video Codec, Jul. 11, 2008, pp. 7 pages, Document Version:1.0, Clifton Park, New York.
  • “Overview VP7 Data Format and Decoder”, Version 1.5, On2 Technologies, Inc., Mar. 28, 2005, 65 pp.
  • Series H: Audiovisual and Multimedia Systems, Infrastructure of audiovisual services—Coding of moving video, Amendment 2: New profiles for professional applications, International Telecommunication Union, Apr. 2007, 75 pp.
  • Series H: Audiovisual and Multimedia Systems, Infrastructure of audiovisual services—Coding of moving video, Advanced video coding for generic audiovisual services, Amendment 1: Support of additional colour spaces and removal of the High 4:4:4 Profile, International Telecommunication Union, Jun. 2006, 16 pp.
  • Series H: Audiovisual and Multimedia Systems, Infrastructure of audiovisual services—Coding of moving video, Advanced video coding for generic audiovisual services, Version 3, International Telecommunication Union, Mar. 2005, 343 pp.
  • Sharp, “Entropy slices for parallel entropy decoding”, ITU-T SG16 Meeting, Apr. 22, 2008-Feb. 5, 2008, Geneva.
  • Sze, “Massively Parallel CABAC”, VCEG meeting, Jan. 7, 2009, London and MPEG meeting, Aug. 7, 2009, Geneva.
  • Chen, T, Y.H. Ng; Lossless Color Image Compression Technique for Multimedia Applications; IBM Technical Disclosure Bulletin; vol. 37 No. 10, Oct. 1994.
  • Tasdizen, et al.; “A High Performance Reconfigurable Motion Estimation Hardware Architecture”, Design, Automation & Test in Europe Conference & Exhibition, Apr. 20, 2009, IEEE, Piscataway, NJ, US pp. 882-885.
  • Vos, Luc De and Stegherr, Michael; “Parameterizable VLSI Architectures for the Full-Search Block-Matching Algorithm”, IEEE Transactions on Circuits and Systems, vol. 36, No. 10, Oct. 1989 NY US pp. 1309-1316.
  • VP6 Bitstream and Decoder Specification, Version 1.03, (On2 Technologies, Inc.), Dated Oct. 29, 2007.
  • Wiegand et al, “Overview of the H 264/AVC Video Coding Standard,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 13, No. 7, pp. 568, 569, Jul. 1, 2003.
  • Yao Wang, “Motion Estimation for Video coding”, EE4414: Motion Estimation basics, 2005.
  • Youn et al., “Motion Vector Refinement for high-performance Transcoding” IEEE Transactions on Multimedia, vol. 1, No. 1, Mar. 1999.
Patent History
Patent number: RE49727
Type: Grant
Filed: Mar 12, 2021
Date of Patent: Nov 14, 2023
Inventors: Yaowu Xu (Saratoga, CA), Paul Wilkins (Cambridge), James Bankoski (Los Gatos, CA)
Primary Examiner: Ovidio Escalante
Application Number: 17/200,761
Classifications
Current U.S. Class: Input Video Signal Characteristics (epo) (375/E7.161)
International Classification: H04N 19/61 (20140101); H04N 19/91 (20140101); H04N 19/82 (20140101); H04N 19/17 (20140101); H04N 19/593 (20140101); H04N 19/44 (20140101); H04N 19/174 (20140101); H04N 19/176 (20140101); H04N 19/436 (20140101); H04N 19/51 (20140101);