APPARATUS AND METHOD FOR COMPRESSING VIDEO INFORMATION
A method and apparatus is disclosed for efficiently encoding data representing a video image, thereby reducing the amount of data that must be transferred to a decoder. The method includes transforming data sets utilizing a tensor product wavelet transform which is capable of transmitting remainders from one subband to another. Collections of subbands, in macro-block form, are weighted, detected, and ranked enabling prioritization of the transformed data. A motion compensation technique is performed on the subband data producing motion vectors and prediction errors which are positionally encoded into bit stream packets for transmittal to the decoder. Subband macro-blocks and subband blocks which are equal to zero are identified as such in the bit stream packets to further reduce the amount of data that must be transferred to the decoder.
Latest General Dynamics Information Technology, Inc. Patents:
This application is a continuation of U.S. patent application Ser. No. 09/529,849, filed Apr. 19, 2000 (allowed), which is a 371 application of PCT/US98/24189, filed Nov. 13, 1998, which claims priority to U.S. Provisional Application Ser. No. 60/066,638, filed Nov. 14, 1997 (expired), which disclosures are herein incorporated by reference in their entirety.
BACKGROUND OF THE INVENTION1. Field of the Invention
The present invention relates generally to apparatus and methods for encoding and decoding video information. More particularly, the present invention relates to an apparatus and method for motion estimation and motion prediction in the transform domain.
2. Background of the Related Art
Due to the limited bandwidth available on transmission channels, only a limited number of bits are available to encode audio and video information. Video encoding techniques attempt to encode video information with as few bits as possible, while still maintaining the image quality required for a given application. Thus, video compression techniques attempt to reduce the bandwidth required to transmit a video signal by removing redundant information and representing the remaining information with a minimum number of bits, from which an approximation to the original image can be reconstructed, with a minimal loss of important features. In this manner, the compressed data can be stored or transmitted in a more efficient manner than the original image date.
There are a number of video encoding techniques which improve coding efficiency by removing statistical redundancy from video signals. Many standard image compression schemes are based on block transforms of the input image such as the Discrete Cosine Transform (DCT). The well-known MPEG video encoding technique, for example, developed by the Motion Pictures Experts Group, achieves significant bit rate reductions by taking advantage of the correlation between pixels (pels) and the spatial domain (through the use of the DCT), and the correlation between image frames in the time domain (through the use of prediction and motion compensation).
In well-known orthogonal and bi-orthogonal (subband) transform based encoding systems (inclusive of lapped orthogonal transforms), and image is transformed without the necessity of first blocking the image. Transform encoders based on DCT without necessity the of first blocking the image. Transform encoders based on DCT block the image primarily for two reasons: 1) experience has shown that the DCT is a good approximation to the known optimal transform (Kahunen-Luove') on 8×8 regions of the image or a sequence of difference images; and 2) the processing of DCT grows O(N log N) and through the blocking of the image, computational effort is limited.
The end result is that DCT based approaches, unless otherwise enhanced, have basis functions which are compactly supported by (or zero outside of) an 8×8 region of an image. The orthogonal and bi-orthogonal transforms under consideration have basis members which are predominately supported in a finite interval of the image, but share extent with neighboring spatial regions. Subband image encoding techniques, for example, divide an input image into a plurality of spatial frequency bands, using a set of filters and then quantize each band or channel. For a detailed discussion of subband image encoding techniques see Subband Video Coding With Dynamic Bit Allocation and Geometric Vector Quantization, C. Podilchuck & A. Jaquin, SPIE Vol. 1666 Human Vision, Visual Processing, and Digital Display III, pp. 241-52 (February 1992). At each stage of the subband encoding process, the signal is split into a low pass approximation of the image, and a high pass term representing the detail lost by making the approximation.
In addition, DCT based transform encoders are translation invariant in the sense that the base members have a support which extends over the entire 8×8 black. This prevents motion compensation from being done efficiently in the transform domain. Therefore, most of the motion compensation techniques in use utilize temporally adjacent image frames to form an error term which is then transform coded on an 8×8 block. As a consequence, these techniques require an inverse transform to be carried out to supply a reference frame from the frequency domain to the time domain. Examples of such systems are found in U.S. Pat. No. 5,481,553 to Suzuki et al and U.S. Pat. No. 5,025,482 to Murakami et al.
Most video compression techniques based on DCT or subband encoders have focused on high precision techniques that attempt to encode video information without a loss of accuracy in the transform stage. Such high precision encoding techniques, however, rely on relatively expensive microprocessors, such as Intel Corporation's PENTIUM® processor, which have dedicated hardware to aid in the manipulation of floating point arithmetic and thereby reduce the penalty for maintaining a high degree of precision.
For many applications, however, such relatively expensive hardware is not practical or justified. Thus, a lower cost implementation, which also maintains acceptable image quality levels, is required. Known limited precision transforms that may be implemented on lower-cost hardware, however, tend to exhibit reduced accuracy as a result of the “lossy” nature of the encoding process. As used herein, a “lossy” system refers to a system that loses precision through the various stages of the encoder and thereby lacks the ability to substantially reconstruct the input from the transform coefficients when decoding. The inability to compensate for the reduced accuracy exhibited by these low precision transforms have been an impediment to the use of such transforms.
In view of the foregoing, there is a need for a video encoder that performs the motion compensation in the transform domain, thereby eliminating the requirement of an inverse transform in the encoder and enabling a simple control structure for software and hardware devices. There is also a need in the art for a video encoder having a class of transforms which are suitable for low precision implementation, including a control structure which enables low cost hardware and high speed software devices.
SUMMARY OF THE INVENTIONThe subject invention is directed to a novel and unique apparatus and method for compressing data. More particularly, the present apparatus and method are adapted and configured to more efficiently encode data representing, for example, a video image, thereby reducing the amount of data that must be transferred to a decoder.
The invention concerns a method of compressing data that includes a first data set and a second data set. The method includes transforming the first and second data sets into corresponding first and second transform coefficient sets. Thereafter, data is generated which represents differences between the first and second transform coefficient sets. The generated data is then encoded for transmission to the decoder.
Transforming the first and second data sets may be performed utilizing a tensor product wavelet transform. Further, the remainders resulting from the transforming process may be transmitted from one subband to another subband.
The data representing differences between the first and second transform coefficient sets is generated by estimating the differences between the fast and second transform coefficient sets to provide motion vectors. The motion vectors are applied to the first transform coefficient set to produce a prediction of the second transform coefficient set. The prediction is subtracted from the second transform coefficient set resulting in a set of prediction errors. The first and second transform coefficient sets can be error corrected to ensure synchronization between the encoder and the decoder.
In estimating the differences between the first and second transform coefficient sets a search region is generated about a subset of the transform coefficients from one of the first and the second transform coefficient sets. Thereafter, a related subset of transform coefficients is applied from the other of the first and the second transform coefficient sets to the search region. Then, the related subset of transform coefficients is traversed incrementally within the search region to a position representing a best incremental match. The related subset can then be traversed fractionally within the search region to a position representing a best fractional match.
Another embodiment of the method of compressing data that includes a first data set and a second data set includes transforming the first and second data sets into corresponding first and second collections of subbands. Then, generating data representing the differences between the first and second collections of subbands. The data may be generated, for example, by carrying out a motion compensation technique. The motion compensation technique may provide output such as motion vectors and prediction errors. Thereafter, the generated data is encoded for transmission to the decoder.
An embodiment may also the second collection of subbands macro-block packed to form a subband macro-block grouping. Thereafter, the generated data may be obtained through a motion compensation technique as follows. The differences between the first collection of subbands and the subband macro-block grouping is estimated to provide motion vectors. The motion vectors are applied to the first collection of subbands producing a prediction of the second collection of subbands. The prediction is then subtracted from the second collection of subbands resulting in a set of prediction errors.
The differences can be estimated between the first collection of subbands and the subband macro-block grouping as follows. A search region is generated about a subset of transform coefficients from the first collection of subbands. A related subset of transform coefficients from the subband macro-block grouping is applied to the search region. The related subset of transform coefficients is then traversed incrementally within the search region to a position representing a best incremental match. Then, the related subset of transform coefficients is traversed fractionally within the search region to a position representing a best fractional match.
A subband macro-block packing method is also disclosed for organizing subband blocks of a collection of subbands derived from a transform of an image. The method includes disassociating a set of related subband blocks from a collection of subbands that correspond to an image macro-block in the image. The set of related subband blocks are packed together as a subband macro-block. The steps of the disassociating and packing related subband blocks are repeated for each set of related subband blocks in the collection of subbands to form a subband macro-block grouping.
The method for macro-block packing may be further refined by arranging the set of related subband blocks within the subband macro-block in the same relative position the subband blocks occupy in the collection of subbands. The method may also include locating the subband macro-block within the subband macro-block grouping in the same spatial location as the corresponding image macro-block is located within the image macro-block grouping.
After macro-block packing, changes can be detected between the first subband macro-block grouping (reference) and a subsequent second subband macro-block grouping. Detecting is based on a distortion evaluation according to a general equation of the form:
where: ec=measurement of distortion relative to reference R;
Wi=applied weight;
G=transform coefficients of the second subband macro-block grouping; and
R=reference (e.g., first subband macro-block grouping).
A more specific form of the equation for evaluating distortion is of the form:
ec=W0∥G−R∥22
Another embodiment of the present invention is described as a finite precision method for transforming a data set into transform coefficients wherein the data set is transformed utilizing a tensor product wavelet pair and the remainders emanating therefrom are propagated to the opposite filter path. More particularly, the embodiment may include determining a low pass component and a high pass component of an image. The low pass component is normalized to generate a low pass normalized output and a first remainder (rh). Likewise, the high pass component is normalized to generate a high pass normalized output and a second remainder (rh). A first operation (g(rl,rh)) is performed on the first and second remainders (rl, rh) and added to the results emanating therefrom to the approximation. And, a second operation (f(rl,rh)) is also performed on the first and second remainders (rl, rh) and added to the results emanating therefrom to the detail. It is important to note that the propagation of the remainders (propagation of errors) can be used in any transform, not just the tensor product.
The above finite precision method results in an overcomplete representation of an image. The method may include downsampling, for example, by two (2), of the high and low pass components to obtain the necessary and sufficient transform coefficients representing the image in the transform domain.
An embodiment of the finite precision method includes a low pass filter having the value −1, 2, 6, 2, −1 and a high pass filter having the value −1, 2, −1. The first operation (g(rl,rh)) and the second operation (f(rl,rh)) have the functions:
-
- g(rl,rh)=rh; and
- f(rl,rh)=floor(rh+½), where nh=½.
A particular example of a tensor product wavelet transform including the above has the form:
-
- where: X2i=input data;
- X2i−1=data that precedes input data X2i;
- X2i+1=data that follows input data X2i;
- Di=detail term (decimated high pass filter output);
- Di+1=detail term that follows detail term Di; and
- Ai=approximation term (decimated low pass filter output).
- where: X2i=input data;
Also disclosed is an encoder apparatus for predicting changes between a sequence of frames in the transform domain. The apparatus includes a transformation device, having an input configured to receive a first and second frame of the sequence of frames, and further configured to generate therefrom a corresponding first and second collection of subbands that each support a set of transform coefficients. A motion compensation device, having an input coupled to the transformation device, is configured to receive the first and second collection of subbands, and further configured to efficiently represent differences between the first and second collection of subbands. Also included is a difference block having an input coupled to the transformation device and an input coupled to the output of the motion compensation device. The input received from the motion compensation device is subtracted from the second collection of subbands in the difference block, thereby generating a prediction errors.
The motion compensation device includes a motion estimation device configured to compare the first and second collection of subbands. A collection of motion vectors is generated therefrom which approximately represent the differences between the first and second collections of subbands. The motion compensation device also includes a motion prediction device, having an input coupled to the motion estimation device, configured to receive the motion vectors and the first collection of subbands, and further configured to generate therefrom a prediction grouping representing a prediction of the second collection of subbands. The prediction of the second collection of subbands is subtracted from the second collection of subbands in a difference block resulting in prediction errors.
A finite precision transforming apparatus is also disclosed for transforming an image frame into the transform domain. The apparatus includes a low pass component and a high pass component arranged in parallel and sharing an input that is configured to receive the image frame. A low pass normalizing device is included which has an input configured to receive the output of the low pass component and is further configured to produce a low pass normalized output and a first remainder (rl). A high pass normalizing device has an input configured to receive the output of the high pass component and is further configured to produce a high pass normalized output and a second remainder (rh). A first operation device has an input configured to receive the first remainder (rl) and the second remainder (rh) and further configured to calculate a first calculation (g(rl,rh)) thereby generating a first calculation result. A second operation device has an input configured to receive the first remainder (rl) and the second remainder (rh) and configured to calculate a second calculation (f(rl,rh)) thereby generating a second calculation result. In addition, a first adder has an input configured to receive the low pass normalized output and the first calculation result, the first adder generating a subband approximation. Similarly, a second adder has an input configured to receive the high pass normalized output and the second calculation result, the second adder generating a subband detail.
The finite precision transforming apparatus further includes a first downsampler at the low pass output and a second downsampler at the high pass output. A downsampling of two (2) provides sufficient and necessary transform coefficients to reconstruct the input image in the decoder.
These and other unique features of the apparatus and method disclosed herein will become more readily apparent from the following detailed description taken in conjunction with the drawings.
Representative embodiments of the present invention will be described with reference to the following figures:
An embodiment of the present invention provides an apparatus and method for compressing digital video signals using a limited precision transformation technique. The embodiment improves on conventional loss-less or lossy transform based techniques by motion compensating, e.g., estimating and predicting motion, in the transform domain, rather than in the time domain as in the prior art. In this manner, improved image quality can be achieved on less expensive hardware.
The term “motion compensation” is intended to be defined in its broadest sense. In other words, although motion compensation is often described and is illustrated herein as including motion estimation and motion prediction of a group of picture elements, it should also be understood to encompass, for example, rotation and scale. In addition, the term “motion compensation” may include, for example, simply generating data representing differences between two sets of data.
Compression efficiencies are gained by both converting the image to features and mapping the features first. The disclosure herein is illustrated as it relates to a sequence of images or video frames. Such an image sequence can be readily understood to be a collection of spatially oriented data elements (either scalar, vector, or functional) which are placed in arrangement with each other and are indexed by time or some other parameter. An image sequence can be in Cartesian coordinates, but other coordinate systems in the art can be used.
In addition, the present apparatus and method can be utilized in non-video applications such as speech, audio, and electrocardiogram compression. That is, even though the invention disclosed herein is illustrated on a two-dimensional system (2 D), i.e., video compression, it is intended that the teachings can be applied to any other dimensional systems so to advance the art of data compression in general.
For example, the teachings can be applied to one and one-half dimensional systems (1½ D) such as ultra-sound imaging. Also, the teachings can be applied to three dimensional systems (3 D) such as magnetic resonance imaging (MRI).
Throughout the description below, the term “frame” refers to a single image of a sequence of images fed to an encoder, regardless of the form of the single image, i.e., regardless if it is in the time domain, the frequency domain, or of any other processing that has been done on it. In addition, the term “pel” is used in reference to a picture element in the time domain and the terms “coefficient” and “transform coefficient” are used in reference to representations of the pels which are generated after the pels have passed through, for example, a forward wavelet transform. These terms are used to facilitate the description of the embodiments and are in no way intended to restrict the scope of the invention.
Referring now to the drawings wherein like reference numerals identify similar elements of the subject invention, there is illustrated in
In
For example, the original image is transformed in block 20 and represented by a transform coefficient set. The transform coefficients of the coefficient set are then evaluated in block 22 to determine their significance via various weighting and evaluation techniques and ranked according the their significance. Thereafter, in block 24, motion compensation between the present frame and past or reference frame takes place. Motion compensation may include motion estimating the change between frames to generate a set of motion vectors. Thereafter, the motion vectors are applied to a reference frame during a motion prediction step. The results from motion prediction are subtracted from the transform coefficient set to determine the errors of that prediction. The prediction errors are then optionally scaled and finally positionally encoded along with the motion vectors for transmittal to the decoder.
Referring to
Referring to
The collection of subbands 34 are fed to block 36 for subband macro-block packing. During subband macro-block packing, the subband blocks that correspond to a particular image macro-block are organized to form subband macro-blocks (SMBX,X). Thereafter, each subband macro-block resides at the spatial location of the image macro-block which it is related to and therefore represents. The collection of all subband macro-blocks for a particular frame is called a subband macro-block grouping 40.
For example, the shaded subband blocks in
It is important to note again that although the embodiment described herein refers only to frame images represented in QCIF, those skilled in the art will readily understand that other formats may be used without deviating from the teachings of this invention. It is also important to note that the particular grouping of subband blocks in each subband macro-block is used to accommodate the particular wavelet illustrated. Other groupings of subband data exist which would be more appropriate for other wavelets.
From the above descriptions of the collection of image macro-blocks 30 (
Referring again to
Perceptual importance through weighting can be determined, for example, through a Mean Opinion Score study, or determined from weights used in other coding systems such as those found in H.261 and H.263 of the Consultative Committee for International Telegraph and Telephone (CCITT), the standards of which are incorporated by reference herein. For a discussion of Mean Opinion Scoring see Discrete Cosine Transform, K. R. Rao & P. Yip, Academic Press, Inc., pp. 165-74 (1990), incorporated by reference herein.
After weights have been applied in block 42 to scale each subband macro-block, weighted grouping 44 is fed to and processed in change detect block 46 to determine the relative amount of change that has occurred. This change is also termed the ‘significance’ or, for the purpose of video, the distortion of weighted grouping 44. Significance can be determined in relation to a given reference such as, for example, zero (0) or a past weighted grouping. The loop extending from change-detect block 46 includes a frame delay 48 which returns a past weighted grouping to change-detect block 46 for use as a reference. The output of change-detect block 46 is change detected grouping 50.
A zero (0) reference is used in change detect block 46, for example, when initially transmitting frames through the encoder. In this case, the entire frame is referenced to zero (0). This is also known as infraframe referencing. As described above, a past weighted grouping can also be used wherein the macro-block grouping is weighted in block 42 as described above and thereafter delayed in delay block 48 of change-detect block 46 for use as a reference. This later method, also known as interframe referencing, eliminates repeatedly sending redundant and/or unimportant information to the decoder.
An alternative use of zero (0) frame referencing is for reproducing and maintaining a relatively accurate reference image at the decoder during system operation. One method employs periodically applying a zero (0) reference to the entirety of every eighth (8th) frame of the standard 30 frames/second. Alternatively, the image can be stoichastically refreshed such as by randomly, or methodically, referencing subband blocks to zero (0). To facilitate any process that references all or a part of a frame to zero (0), the zero-referenced subband blocks are identified as such so to prevent motion compensation operations (described below) to be performed on the effected blocks. Thus, the identified subband blocks are reproduced in whole at the decoder for refreshing either the entire reference or a part of the reference therein, as the case may be.
Referring again to
With continued reference to
The motion vectors 58 sent to motion prediction block 60 are used to alter delayed ranked subband grouping 57 so to generate a predicted grouping 66. A difference block 68 receives ranked subband grouping 53 and subtracts predicted grouping 66 therefrom, resulting in grouping differences 70, i.e., the prediction error. The grouping differences 70 are further scaled in block 72 resulting in scaled grouping differences 74. Those skilled in the art will recognize that the fewer the number of nonzero grouping differences 70, the more accurate the collection of motion vectors 58 have predicted the changes between the present frame and the reference frame. And, the fewer the differences the less bits that must be transmitted to the decoder to correct for deficiencies in motion estimation.
The scaled grouping differences 74 from scaling block 72 and the collection of motion vectors 58 from motion estimation block 56 are positionally encoded as macro-blocks in block 76. Therein, the data is efficiently organized into a bit stream. Encoded bit stream grouping 78 is output from block 76 and transmitted via transmission line 80 to a decoder 82 for inverse processing. Transmission can be through a variety of mediums, for example, electronic, electromagnetic, or optical. Regarding bit stream formatting, there are several standard methods well known in the art for formatting bit streams. The format used in an H.263 based encoder system is one example. A bit stream is basically a serial string of bit packets. Each packet representing a particular category of data.
For example, bit packets may include system level data, video, control, and audio data. As data is received for positional encoding in block 76, it is organized into bit packets in accordance with the format in use. Generally, a collection of bit packets representing a video frame starts with a bit identifying it as a new frame. The amount of quantization and other control codes typically follow. Thereafter there is encoded a list of macro-blocks representing the scaled grouping differences 74. For QCIF, the number of macro-blocks equals ninety-nine (99). (See
To facilitate more efficient transfer of data, each macro-block is proceeded by a macro-block zero bit (MEZero-bit) which indicates the presence or absence of non-zero data in a macro-block. If the macro-block is present, control information for the macro-block, including the related collection of motion vectors 58, is sent followed by the subband data, i.e., the related scaled grouping differences 74. Including such information substantially reduces the number of bits that are sent over transmission line 80 in that the absence of a macro-block is represented by a single symbol instead of the all of the bits that would be necessary to identify the entire string of macro-block coefficients that are equal to zero.
Another situation wherein further efficiencies can be had is when only some of the subband blocks within a subband macro-block are zero. An embodiment includes the step of flagging the subbands whose coefficients are equal to zero with a subband zero flag (SBZero flag). A subband from scaled grouping differences 74 whose coefficients are zero indicates that no changes were found to exist between corresponding subband blocks of ranked subband grouping 53 and predicted grouping 66. It takes substantially fewer bits to represent SBZero flags than to separately represent each coefficient equaling zero. Of course, the decoder is programmed to recognize both the MBZero-bit and the SBZero flag so to interpret the symbol introduced during positional encoding in block 76. An example of zero-runs length codes for symbolizing strings of zeros follows.
Zero-Runs Length Codes
With continued reference to
A collection of subbands 92, the encoder's reference frame, is fed to a delay block 94. A delayed collection of subbands 96 is fed from the delay block 94 to a prediction block 98. Similar to the process carried out in motion prediction block 60 of the encoder, the collection of motion vectors 58 are applied to the delayed collection of subbands 96 in prediction block 98. Therein, the delayed collection of subbands 96 is altered to generate a predicted grouping 100, i.e., a subband representation of the updated image not including the grouping differences 70. Grouping differences 70 and predicted grouping 100 are added in an adder block 102 generating the collection of subbands 92, i.e., a new reference frame. Finally, an inverse wavelet transform is performed in block 104 on the collection of subbands 92. This step is essentially the reverse of the forward wavelet transform 32 that was briefly described above and which will be described in greater detail herein below. The resulting output from block 104 is a reconstructed image 105.
As previously described and illustrated in
Referring to
Referring to
After filtering, the low pass components and high pass components are scaled and decimated, or downsampled, at each stage by decimators 112 and 114, respectively, whereby components of the sample values comprising a discrete signal are eliminated. In the illustrated embodiment, the input image is downsampled by a factor of two (2) so to discard every other sample. Decimating by two (2) ultimately results in the necessary and sufficient transform coefficients to enable an exact reconstruction of the input. Thereafter, the downsampled values of the low pass components and high pass components are normalized at each stage in a manner that will be described in more detail herein below with respect to
The row outputs D0R, D1R, D2R, and A2R of the row stages shown in
Referring now to
Referring now to
For the filters L={−1, 2, 6, 2, −1} and H=(−1, 2, −1}, an embodiment of the functions of the remainders are: f(rl,rh)=floor(rh+½) with nh=¼; and g(rl,rh)=rh. The above described manipulation of the remainders is repeated for each filter pair, resulting in reduced bit allocation at the transform output.
An embodiment of a tensor product wavelet pair is of the form:
-
- where: X2i=input data;
- X2i−1=data that precedes input data X2i;
- X2i+1=data that follows input data X2i;
- Di=detail term (decimated high pass filter output);
- Dm=detail term that follows detail term Di; and
- Ai=approximation term (decimated low pass filter output).
- where: X2i=input data;
The above description of the tensor product wavelet transform illustrates a two-way split into high pass (details) and low pass (approximations) components. In addition, the description illustrates the possibility of propagating remainders from a first band to a second band, from the second band to the first band, or both from the first band to the second band and from the second band to the first. The embodiment described above is intended to illustrate the basic concepts of the invention and should in no way be interpreted to limit the scope of the invention.
For example, a tensor product wavelet transform can have a first stage where a three-way split includes a high pass filter, a medium pass filter, and a low pass filter. The output of the low pass filter can then be iterated, i.e., a second stage having a three-way split can be applied to the output of the low pass filter, resulting in a total of five (5) subbands. In such an embodiment the remainders could be propagated from the low pass filter and the high pass filter to the medium pass filter. This embodiment is just one example of how the tensor product wavelet transform can be varied and still remain in keeping with the scope and spirit of the disclosed invention. Those skilled in the art will readily understand that there are numerous other ways in which the input can be split at each stage and interated, and also that there are numerous other ways in which the remainders can be propagated between subbands.
In addition, the above description of the propagation of remainders is not intended to limit its use to a tensor product wavelet transform. It can be used with any transform. For example, the propagation of remainders can be used with a Discrete Cosine Transform (DCT). Also, the propagation of remainders can be used in a loss-less or lossy manner.
As discussed herein above, the output of forward wavelet transform 32 can be a complete representation or an over-complete representation of QCIF image 30. A complete representation of QCIF image 30 includes a collection of subbands that are just enough to represent the contents of the image. An over-complete representation of QCIF image 30 includes the complete representation and redundant, alternative, or additional subband representations to facilitate motion compensation as will be described herein below. Each representation has value in the disclosed embodiment. For example, the over-complete representation can include a variety of image changes, such as translational movement, rotational movement, and scaling. These changes can be recalled as necessary during motion compensation, reducing the problem of representing image changes to one of indexing.
It should be noted with regard to the forward wavelet transform described above that although the transformed image frame structures illustrated herein are for the luma components, the structures also hold for the chroma components and, therefore, have not been separately described.
Regarding change-detect block 46 described herein above with respect to
A change detection metric may take the more specific form:
-
- where: ec=measurement of distortion relative to reference R;
- Wi=applied weight;
- G=transform coefficients of the second subband macro-block grouping; and
- R=reference (e.g., first subband macro-block grouping).
A more specific form of the equation for evaluating distortion is of the form:
ec=W0∥G−R∥22
In addition, change-detect 46 can take advantage of information provided by a feed-back 132 (
As described herein above and illustrated in
Referring to
With continued reference to
For the examples of motion estimation provided herein below, the P×P search region 107 of
The basic size (P×P) of the search region can be determined by empirical or statistical analysis taking into consideration, for example, the amount of movement anticipated between frames. In addition, consideration should be given to the computational effort needed to carry out a search in a given search region. It is readily understood by those skilled in the art that larger search regions require more computational resources and, hence, more interframe delay for a fixed processor. Conversely, smaller search regions require less computational resources but sacrifice image quality. This is especially true during high image-movement periods. That is, the quality of the image is reduced since part of the motion may be located out of the search region, thus preventing accurate motion vector selection.
As described above, ranked subband grouping 53 and ranked subband macro-block grouping 54 are fed from block 52 to delay block 62 and motion estimation block 56, respectively, over line 55. For the example herein below, a search region is placed about subband block 2,4 of SB00 in delayed ranked subband grouping 57 (
Referring now to
Referring now in particular to
Referring to
Referring to
With continued reference to
For example, because SB00 is one-quarter (¼) the size of the related macro-block in the original image (see IMB2,4 of
With continued reference to
Referring to
In the example, the resulting motion vector identifying the movement of the subband blocks in SMB2,4 is x=−7 and y=−4 (MV2,4). MV2,4 is stored in memory with the collection of motion vectors 58. MV2,4 therefore represents the movement of certain collections of coefficients from each subband in delayed ranked subband grouping 57 (the ‘reference’ frame) to their new positions so to predict ranked subband grouping 53 (the ‘present’ frame). The above process is repeated for each significant subband block in, for example, SB00. Processing typically proceeds in the order of ranking, that is, from the macro-blocks having the greatest amount of movement to those having the least amount of movement. Entirely insignificant subband blocks will not be considered at all and therefore will have no motion vector assigned. This will occur, for example, when there is insignificant or no change at those locations between frames. It can also occur when subbands blocks are zero (0) referenced as described herein above.
If a different subband is to be used to calculate the motion vectors, incremental and fractional movements would be determined in a manner analogous to that described above using the proportional relationship of the particular subband with respect to the QCIF image 30. For example, if subband blocks in SB01 are used to develop the motion vectors, the following criterion would apply: search region size=16×8 coefficients; x fractional masks=±¼, ±½, and ±¾ increments; y fractional masks=±½ increments; x scaling=4; and y scaling=2.
An advantage of using the above method is that separable filters can be employed. In other words, filters used for incremental and fractional movement of one subband block can be used for incremental and fractional movement of another subband block. For example, subband blocks in SB00 have four (4) possible fractional movements of x=±½ and y=±½. And, subband blocks in SB01 have eight (8) possible fractional movements of x=±¼, ±½, and ±¾, and y=±½. Because of the common fractional movements of x=±½ and y=±½ in SB00 and SB01, single separable filters can be used for fractional movements of x=+½, x=−½, y=+½, and y=−½ in both subbands. This method can be used for all common fractional movements in delayed ranked subband grouping 57. The same advantageous use of separable filters can be carried out in motion prediction block 60.
Referring to
To determine which masks to use to produce such shifts, the x and y components are multiplied by the reciprocal of the corresponding modulo of each subband block. For example, to determine the x and y components for shifting the 8×8 collection of coefficients 148 that have been determined to have moved to the 2,4 position in SB00, the x and y components of MV2,4 are each multiplied by the reciprocal of the corresponding module two (2). This calculation results in x=−3½ and y=−2. Therefore, a mask for incremental movement of x=−3, a mask for fractional movement of x=−½, and a mask for incremental movement of y=−2 are applied to the 8×8 coefficients 148.
As a second example, to determine the x and y components for shifting the 8×4 collection of coefficients 149 that have been determined to have moved to the 2,4 position in SB01, the x component of MV2,4 is multiplied by the reciprocal of module four (4) and the y component of MV2,4 is multiplied by the reciprocal of modulo two (2). This calculation results in x=−1¾ and y=−2. Therefore, a mask for incremental movement of x=−1, a mask for fractional movement of x=−¾, and a mask for incremental movement of y=−2 are applied.
An alternate embodiment of the above described masking process for determining fractional movement between frames includes the use of 3×3 coefficient masks. These masks take a weighted average of the coefficients surrounding a selected coefficient. In the alternate approach, a collection of motion vectors 58 that include only incremental movements is determined as described above and illustrated in
In motion prediction block 60, the collection of motion vectors 58 is applied in a manner analogous to that illustrated in
After all of the motion vectors from the collection of motion vectors 58 have been applied to delayed ranked subband grouping 57 and all of the coefficients that were shifted by the motion vectors have had the 3×3 mask applied to them, the result is output from motion prediction block 60 as predicted grouping 66. Of course, the process is repeated in prediction Mock 98 of decoder 82 to replicate the masking process carried out in motion prediction block 60.
After the prediction is determined by either of the methods described above, predicted grouping 66 is passed to difference block 68 wherein the difference between ranked subband grouping 53 and predicted grouping 66 is determined. As described above, difference block 68 produces grouping differences 70.
Although the motion compensation methods described herein are illustrated as functioning in conjunction with a tensor product wavelet, it is important to note that the methods can be utilized with other types of transforms. This includes utilizing the motion compensation methods with other transforms in either the time domain or the transform domain. For example, data transformed in a DCT can be motion compensated in a manner similar to that described above. That is, the 64 transform coefficients in each of the 8×8 blocks of the DCT can be motion compensated in a manner similar to that used to motion compensate the 64 transform coefficients in each of the 8×8 subband blocks in SB00 of the tensor product wavelet transform.
Referring now to
Also similar to the embodiment illustrated in
To develop error corrected subband grouping 171, a copy of ranked subband grouping 53 is passed unchanged through difference block 68 and stored in memory when the system is referenced to zero (0), for example, when the system is initiated or when the reference in the decoder is to be refreshed. Thereafter, prediction errors 70 are accumulated, i.e., added to the reference, as the prediction errors 70 of each subsequent frame passes quantize block 158. The updated reference image is fed to delay block 156, thereby producing delayed subband grouping 172. By utilizing this method the reference in the encoder remains synchronized with the reference in the decoder. Those skilled in the art will recognize that such an arrangement is useful in maintaining synchronization between the encoder and decoder when significant amounts of scaling and/or quantization is carried out between motion prediction and positional encoding.
After motion estimation block 150 and motion prediction block 152 receive the delayed subband grouping 172 from delay block 156, motion estimation and motion prediction are determined by a procedure similar to that described herein above and illustrated in
Referring now to
In
The QCIF image 30, also referred to as the ‘present’ frame, is also fed to motion estimation block 160 and delay block 166 for determining a collection of motion vectors 162. More specifically, an image frame 30 is delayed in delay 166 producing a delayed image frame 167, also referred to as the ‘reference’ frame. With reference to
In motion estimation block 160, each significant image macro-block (IMBX,X) of the present QCIF image 30 frame is located within the corresponding search region in the delayed image frame 167 for determining the motion vectors. For example, IMB2,4 is retrieved from QCIF image 30 and located within search region 107 of delayed image frame 167. This process is analogous to that carried out in the transform domain as described above and illustrated in
In a manner analogous to that described above and illustrated in
Referring now to
In
Alternatively, instead of reconstructing the delayed subband grouping 172 in its entirety, a portion of the grouping can be reconstructed to effectuate efficiencies. For example, a 3, 5 filter can be used to obtain a reconstructed region having 48×48 pels. Regions are selected based on the significance of, i.e., the detected changes within, the image macro-blocks (16×16) about which the regions are centered.
In motion prediction block 164, the collection of motion vectors 162 are applied to the reconstructed image 176 (or the reconstructed 48×48 pels regions if only regions were inverse wavelet transformed). The collection of motion vectors 162 are applied to the reconstructed reference image 176 in a manner analogous to that described above and illustrated in
Although illustrated herein as a software implementation, the principles of the embodiments of the invention could also be implemented in hardware, for example, by means of an application-specific integrated circuit (ASIC). Preferably, the ASIC implementation, including the necessary memory requirements, should operate at the pel rate in order to (i) minimize power consumption consistent with the embodiment, and (ii) permit compression of full color video, such as for example a full CCIR601, at a data rate of less than 13.5 MHz. It is foreseen that power consumption will be reduced by a factor of ten (10) times by utilizing an ASIC in comparison to the conventional software and processor implementation.
Alternatively, optical methods can be employed to produce even further power savings. As described above, an approximation to the image is created at each stage of the wavelet transform and the details lost by making this approximation are recorded.
In a photo-electronic or an optical implementation, how the light is gathered and related charge is sensed can be adjusted to gather samples of each of the approximation images. If these approximation images are co-registered in parallel, the detail terms can be calculated from these intermediate values by either analog or digital means. Preferably, analog means are used to calculate the detail terms as the output of an analog stage.
The detail terms can be quantized through the use of a bit serial analog-to-digital converter which implements the quantization strategy. The resulting bit stream is compressed. In this manner, the photonic/optical device operates, i.e., the number of digital transitions which occur, at the compressed data rate rather than at the image data rate (as in the case of an ASIC) or the processor data rate (as in the case of a conventional processor). This will result in an implementation which consumes very little current, thus requiring less power. It is foreseen that the implementation of an optical method will further reduce power consumption by a factor of approximately ten (10) times that of an ASIC implementation.
It is to be understood that the embodiments and variations shown and described herein are merely illustrative of the principles of this invention and that various modifications may be implemented by those skilled in the art without departing from the scope and spirit of the invention.
Claims
1-3. (canceled)
4. A method of encoding a data set into transform coefficients comprising transforming the data set utilizing an encoding technique and propagating remainders derived during encoding from a first filter path to a second filter path of the encoder.
5. The method of encoding as recited in claim 4, further including propagating remainders from the second filter path to the first filter path.
6. The method of encoding as recited in claim 4, wherein the encoding technique is a tensor product wavelet transform.
7. The method of encoding as recited in claim 4, wherein the encoding technique is a discrete cosine transform.
8. A method of encoding a data set comprising:
- determining a first filter component of the data set in a first filter path; determining a second filter component of the data set in a second filter path; normalizing the first filter component to generate a normalized output and a remainder, and propagating the remainder to the second filter path.
9. A method of estimating changes occurring between a first data set and a second data set comprising: generating a search region about a data subset from one of the first and second data sets; applying a related data subset from the other of the first and second data sets to the search region; and traversing incrementally the related data subset within the search region to a position representing a best incremental match.
10. The method of estimating changes occurring between a first data set and a second data set as recited in claim 9, further including traversing fractionally the related data subset within the search region to a position representing a best fractional match.
Type: Application
Filed: Mar 11, 2010
Publication Date: Jul 1, 2010
Applicant: General Dynamics Information Technology, Inc. (Fairfax, VA)
Inventors: Truong Q. Nguyen (Burlington, MA), Joel Rosiene (Colchester, CT)
Application Number: 12/722,188
International Classification: H04N 7/30 (20060101); G06K 9/68 (20060101);