Fast method and apparatus for filtering compressed images in the DCT domain

Info

Patent number: 5832135
Type: Grant
Filed: Mar 6, 1996
Date of Patent: Nov 3, 1998
Assignee: Hewlett-Packard Company (Palo Alto, CA)
Inventors: Neri Merhav (Haifa), Vasudev Bhaskaran (Mountain View, CA)
Primary Examiner: Scott Rogers
Application Number: 8/612,513

Abstract

A method is described for filtering compressed images represented in the discrete-cosine-transform (DCT) domain. The filter includes three sparse, vertical submatrices which are sparse versions of the vertical filter components (VFCs) of a desired filter function that have been combined in such a way as to eliminate many of the non-zero elements. The filter also includes three sparse, horizontal transpose submatrices, which, like the vertical submatrices, are sparse versions of the horizontal filter components of the filter function. The sparseness of these sparse submatrices yields a significant reduction in the number of computations required to filter the image in the DCT domain. To take advantage of this discovery, the input DCT data blocks are "butterflied" to retain the relationship between the input data blocks and the filtered output data blocks as a function of these sparse submatrices. The sparseness of the vertical and horizontal submatrices reduces the number of computations required to filter the image. The sparseness of the DCT data blocks can also be used to further reduce the number of computations required.

Claims

1. A filter for a compressed image represented in a discrete cosine transform (DCT) domain and including a plurality of input DCT data blocks organized as: ##EQU20## the filter comprising: a first computational module having a first input for receiving X.sub.SE, a second input for receiving X.sub.NE, and a third input for receiving X.sub.E, the first computational module further including first, second and third vertical matrix memories for storing first (V.sub.-), second (V.sub.++) and third (V.sub.-+) sparse vertical submatrices, respectively, the first computational module also having an output for providing an output z3 that is a predetermined arithmetic combination of the input DCT blocks X.sub.SE, X.sub.NE, and X.sub.E and the sparse vertical submatrices V.sub.-, V.sub.++, and V.sub.-+;

a first delay memory having an input coupled to the output of the first computational module for receiving the output Z.sub.3 and an output for providing a delayed output Z.sub.2, which is the output Z.sub.3 delayed by a predetermined period;

a second delay memory having an input coupled to the output of the first memory for receiving the delayed output Z.sub.2 and an output for providing a delayed output Z.sub.1, which is delayed output Z.sub.2 delayed by the predetermined period;

a second computational module having a first input coupled to the output of the first computational module for receiving output Z.sub.3, a second input coupled to the output of the first memory for receiving delayed output Z.sub.1, and a third input for receiving delayed output Z.sub.2, the second computational module further including first, second and third horizontal matrix memories for storing first (H.sub.-.sup.+), second (H.sub.++.sup.t) and third (H.sub.-+.sup.t) sparse horizontal transpose submatrices, respectively, the second computational module also having an output for providing an output Y that is a predetermined arithmetic combination of the outputs Z.sub.3, Z.sub.1 and Z.sub.2, and the sparse horizontal submatrices H.sub.-.sup.t, H.sub.++.sup.t, and H.sub.-+.sup.t,

whereby the output Y is a filtered version of the image represented by the input DCT data blocks according to a desired filtering output.

2. A filter according to claim 1 wherein the first computational module includes:

a first arithmetic circuit having a first input for receiving X.sub.SE, a second input for receiving X.sub.NE, and a third input for receiving X.sub.E, and having a first output for providing X.sub.E.sup.-, a second output for providing X.sub.E.sup.++, and a third output for providing X.sub.E.sup.+-;

a first matrix multiplier having a first input coupled to the first vertical matrix memory for receiving the first sparse vertical submatrix V.sub.-, a second input coupled to the first output of the first arithmetic circuit and an output for producing the product of X.sub.E.sup.- and V.sub.-;

a second matrix multiplier having a first input coupled to the second vertical matrix memory for receiving the second sparse vertical submatrix V.sub.++, a second input coupled to the second output of the first arithmetic circuit and an output for producing the product of X.sub.E.sup.++ and V.sub.++; and

a third matrix multiplier having a first input coupled to the third vertical matrix memory for receiving the third sparse vertical submatrix V.sub.-+, a second input coupled to the third output of the first arithmetic circuit and an output for producing the product of X.sub.E.sup.+- and V.sub.-+; and

a summer having a first input coupled to the output of the first matrix multiplier, a second input coupled to the output of the second matrix multiplier, a third input coupled to the output of the third matrix multiplier, and an output for providing a sum Z.sub.3 equal to the sum of the outputs of the three matrix multipliers.

3. A filter according to claim 2 wherein the first arithmetic circuit includes:

a first summing circuit having a first input for receiving X.sub.SE, a second input for receiving X.sub.NE, and an output coupled to the second input of the first matrix multiplier for producing a sum X.sub.E.sup.- that is equal to X.sub.NE minus X.sub.SE;

a second summing circuit having a first input for receiving X.sub.SE, a second input for receiving X.sub.NE, and an output for producing a sum X.sub.NSE that is equal to X.sub.NE plus X.sub.SE;

a divider circuit having an input coupled to the output of the second summing circuit for receiving X.sub.NSE and an output for providing a result X.sub.E.sup.+ that is equal to X.sub.NSE divided by two;

a third summing circuit having a first input coupled to the output of the divider circuit for receiving X.sub.E.sup.+, a second input for receiving X.sub.E, and an output coupled to the second input of the second matrix multiplier for producing a sum X.sub.E.sup.++ that is equal to X.sub.E.sup.+ plus X.sub.E; and

a fourth summing circuit having a first input for receiving X.sub.E.sup.+, a second input for receiving X.sub.E, and an output coupled to the second input of the third matrix multiplier for producing a sum X.sub.E.sup.+- that is equal to X.sub.E.sup.+ minus X.sub.E.

4. A filter according to claim 3 wherein the divider circuit and the second summing circuit comprise a shift-and-add circuit.

5. A filter according to claim 1 wherein the second arithmetic circuit includes:

a first arithmetic circuit having a first input coupled to the output of the first computational module for receiving Z.sub.3, a second input coupled to the output of the second delay memory for receiving Z.sub.1, and a third input coupled to the output of the first delay memory for receiving Z.sub.2, and having a first output for providing Z.sub.-, a second output for providing Z.sub.++, and a third output for providing Z.sub.+-;

a first matrix multiplier having a first input coupled to the first horizontal matrix memory for receiving the first sparse horizontal transpose submatrix H.sub.-.sup.t, a second input coupled to the first output of the second arithmetic circuit and an output for producing the product of Z.sub.- and H.sub.-.sup.t;

a second matrix multiplier having a first input coupled to the second horizontal matrix memory for receiving the second sparse horizontal transpose submatrix H.sub.++.sup.t, a second input coupled to the second output of the second arithmetic circuit and an output for producing the product of Z.sub.++ and H.sub.++.sup.t; and

a third matrix multiplier having a first input coupled to the third horizontal matrix memory for receiving the third sparse horizontal transpose submatrix H.sub.-+.sup.t, a second input coupled to the third output of the second arithmetic circuit and an output for producing the product of Z.sub.+- and H.sub.-+.sup.t; and

a summer having a first input coupled to the output of the first matrix multiplier, a second input coupled to the output of the second matrix multiplier, a third input coupled to the output of the third matrix multiplier, and an output for providing a sum Y equal to the sum of the outputs of the three matrix multipliers.

6. A filter according to claim 5 wherein the second arithmetic circuit includes:

a first summing circuit having a first input for receiving Z.sub.3, a second input coupled to the output of the second delay memory for receiving Z.sub.1, and an output coupled to the second input of the first matrix multiplier for producing a sum Z.sub.- that is equal to Z.sub.1 minus Z.sub.3;

a second summing circuit having a first input coupled to the output of the summer for receiving Z.sub.3, a second input coupled to the output of the second delay memory for receiving Z.sub.1, and an output for producing a sum Z.sub.13 that is equal to Z.sub.1 plus Z.sub.3;

a divider circuit having an input coupled to the output of the second summing circuit for receiving Z.sub.13 and an output for providing a result Z.sub.+ that is equal to Z.sub.13 divided by two;

a third summing circuit having a first input coupled to the output of the divider circuit for receiving Z.sub.+, a second input coupled to the output of the first delay memory for receiving Z.sub.2, and an output coupled to the second input of the second matrix multiplier for producing a sum Z.sub.++ that is equal to Z.sub.+ plus Z.sub.2; and

a fourth summing circuit having a first input coupled to the output of the divider circuit for receiving Z.sub.+, a second input coupled to the output of the first delay memory for receiving Z.sub.2, and an output coupled to the second input of the third matrix multiplier for producing a sum Z.sub.+- that is equal to Z.sub.+ minus Z.sub.2.

7. A filter according to claim 5 wherein the filter can be described by a vertical matrix V in the DCT domain wherein V can further be represented as:

8. A filter according to claim 7 wherein the first vertical matrix memory includes the first sparse vertical submatrix V.sub.-, wherein the first sparse vertical submatrix can be represented as: ##EQU21## wherein the second vertical matrix memory includes the second sparse vertical submatrix V.sub.++, wherein the second sparse vertical submatrix can be represented as: ##EQU22## and wherein the third vertical matrix memory includes the third sparse vertical submatrix V.sub.-+, wherein the third sparse vertical submatrix can be represented as: ##EQU23##

9. A filter according to claim 8 wherein the first sparse vertical submatrix V.sub.- includes 32 nonzero elements and second and third sparse vertical submatrices V.sub.++ and V.sub.-+ each include 20 or fewer nonzero elements.

10. A filter according to claim 7 wherein the first horizontal matrix memory includes the first sparse horizontal transpose submatrix H.sub.-.sup.t, wherein the first sparse horizontal transpose submatrix can be represented as: ##EQU24## wherein the second horizontal matrix memory includes the second sparse horizontal transpose submatrix H.sub.++.sup.t, wherein the second sparse horizontal transpose submatrix can be represented as: ##EQU25## and wherein the third horizontal matrix memory includes the third sparse horizontal transpose submatrix H.sub.-+.sup.t, wherein the third sparse horizontal transpose submatrix can be represented as: ##EQU26##

11. A method of filtering a compressed image represented in a discrete cosine transform (DCT) domain and including a plurality of DCT data blocks, the method comprising the steps of:

providing a desired filter represented by a vertical component matrix V of a filter matrix and by a transpose horizontal component matrix H.sup.t of the filter matrix, where H.sup.t is a transpose matrix of a horizontal component matrix H, the filter receiving the DCT data blocks as input data and producing filtered data as output represented by an output block Y in the DCT domain;

partitioning the vertical matrix V into vertical submatrices;

combining the vertical submatrices to form sparse vertical submatrices;

partitioning the transpose matrix H.sub.t into horizontal transpose submatrices;

combining the horizontal transpose submatrices to form sparse horizontal transpose submatrices;

combining the DCT data blocks in a predetermined manner to form butterflied DCT data blocks so that the output block Y is expressed only in terms of the butterflied DCT data blocks, the sparse vertical submatrices, and the sparse horizontal transpose submatrices according to a reduced expression; and

combining the butterflied DCT data blocks, the sparse vertical submatrices, and the sparse horizontal transpose submatrices according to the reduced expression to produced the output block Y, wherein the number of operations required to perform this step is less than required to filter the image in the spatial domain.

13. A method of filtering a compressed image according to claim 12 wherein the step of combining the vertical submatrices to form sparse vertical submatrices includes combining the vertical submatrices V.sub.1, V.sub.2, and V.sub.3 to form three sparse vertical submatrices V.sub.++, V.sub.-+, and V.sub.-, wherein ##EQU27##

14. A method of filtering a compressed image according to claim 13 wherein the DCT data blocks are organized as follows: ##EQU28## wherein the step of combining the DCT data blocks in a predetermined manner to form butterflied DCT data blocks so that the output block Y is expressed only in terms of the butterflied DCT data blocks, the sparse vertical submatrices, and the sparse horizontal transpose submatrices according to a reduced expression includes the steps of:

computing X.sub.E.sup.- according to the following formula:

15. A method of filtering a compressed image according to claim 14 wherein the step of computing X.sub.E.sup.++ includes the step of computing X.sub.E.sup.+ according to the following formula: ##EQU31## and computing X.sub.E.sup.++ according to the following formula:

16. A method of filtering a compressed image according to claim 15 wherein the step of computing X.sub.E.sup.+- includes the step of computing X.sub.E.sup.+ according to the following formula: ##EQU32## and computing X.sub.E.sup.+- according to the following formula:

17. A method of filtering a compressed image according to claim 16 wherein the step of partitioning the transpose matrix H.sup.t into horizontal transpose submatrices includes partitioning the transpose matrix H.sup.t into three horizontal transpose submatrices H.sub.1.sup.t, H.sub.2.sup.t, and H.sub.3.sup.t, where

18. A method of filtering a compressed image according to claim 17 wherein the step of combining the horizontal transpose submatrices to form sparse horizontal transpose submatrices includes the step of combining the three horizontal transpose submatrices H.sub.1.sup.t, H.sub.2.sup.t, and H.sub.3.sup.t to form three sparse horizontal transpose submatrices H.sub.++.sup.t, H.sub.-+.sup.t, and H.sub.-.sup.t,

19. A method of filtering a compressed image according to claim 18 wherein the step of combining the butterflied DCT data blocks, the sparse vertical submatrices, and the sparse horizontal transpose submatrices according to the reduced expression to produced the filtered to data Y includes the step of computing Z.sub.3 according to the following formula:

20. A method of filtering a compressed image according to claim 19 wherein the step of combining the butterflied DCT data blocks, the sparse vertical submatrices, and the sparse horizontal transpose submatrices according to the reduced expression to produced the filtered to data Y includes the steps of:

storing Z.sub.3 for one period to produce Z.sub.2; and

storing Z.sub.2 for one period to produce Z.sub.1.

21. A method of filtering a compressed image according to claim 19 wherein the step of combining the DCT data blocks to form butterflied DCT data blocks so that the output block Y is expressed only in terms of the butterflied DCT data blocks, the sparse vertical submatrices, and the sparse horizontal transpose submatrices according to a reduced expression includes the steps of:

computing Z.sub.- according to the following formula:

22. A method of filtering a compressed image according to claim 21 wherein the step of computing Z.sub.++ includes the step of computing Z.sub.+ according to the following formula: ##EQU36## and computing Z.sub.++ according to the following formula:

23. A method of filtering a compressed image according to claim 21 wherein the step of computing Z.sub.+- includes the step of computing Z.sub.+ according to the following formula: ##EQU37## and computing Z.sub.+- according to the following formula:

24. A method of filtering a compressed image according to claim 21 wherein the step of combining the butterflied DCT data blocks, the sparse vertical submatrices, and the sparse horizontal transpose submatrices according to the reduced expression to produced the filtered to data Y includes the step of combining the butterflied DCT data blocks, the sparse vertical submatrices, and the sparse horizontal transpose submatrices according to the following formula:

25. A method of filtering a compressed image according to claim 11 wherein the sparse vertical submatrices include nonzero elements and zero elements and wherein the step of combining the butterflied DCT data blocks, the sparse vertical submatrices, and the sparse horizontal transpose submatrices according to the reduced expression to produced the filtered to data Y includes the step of combining the butterflied DCT data blocks, only the nonzero elements of the sparse vertical submatrices, and the sparse horizontal transpose submatrices.

26. A method of filtering a compressed image according to claim 11 wherein the sparse horizontal transpose submatrices include nonzero elements and zero elements and wherein the step of combining the butterflied DCT data blocks, the sparse vertical submatrices, and the sparse horizontal transpose submatrices according to the reduced expression to produced the filtered to data Y includes the step of combining the butterflied DCT data blocks, sparse vertical submatrices, and only the nonzero elements of the sparse horizontal transpose submatrices.

27. A method of filtering a compressed image according to claim 11 wherein the butterflied DCT data blocks include nonzero elements and zero elements and wherein the step of combining the butterflied DCT data blocks, the sparse vertical submatrices, and the sparse horizontal transpose submatrices according to the reduced expression to produced the filtered to data Y includes the step of combining only the nonzero elements of the butterflied DCT data blocks, sparse vertical submatrices, and the sparse horizontal transpose submatrices.

28. A method of filtering a compressed image according to claim 11 further including the step of storing the sparse vertical submatrices and the sparse horizontal transpose submatrices on a general purpose computer.

29. A method of filtering a compressed image according to claim 28 further wherein the steps of combining the DCT data blocks in a predetermined manner to form butterflied DCT data blocks so that the output block Y is expressed only in terms of the butterflied DCT data blocks and combining the butterflied DCT data blocks, the sparse vertical submatrices, and the sparse horizontal transpose submatrices according to the reduced expression to produced the filtered to data Y includes the steps of executing these steps by the general purpose computer programmed according to the reduced expression.

30. A general purpose computer programmed to filter a spatial domain image that has been compressed and is represented in a discrete cosine transform (DCT) domain, the general purpose computer comprising:

a microprocessor;

a storage device coupled to the storage device;

first (V.sub.++), second (V.sub.-) and third (V.sub.-) sparse vertical submatrices stored on the storage device, wherein each of the sparse vertical submatrices include a predetermined number of zero and nonzero elements;

first (H.sub.++.sup.t), second (H.sub.-.sup.t) and third (H.sub.-.sup.t) sparse horizontal transpose submatrices stored on the storage device, wherein each of the sparse horizontal transpose submatrices include a predetermined number of zero and nonzero elements;

a compressed image stored on the storage device, the compressed image including a plurality of DCT data blocks organized in columns;

computer software stored on the storage device and executable by the microprocessor, the microprocessor performing the following steps under control of the software:

(a) fetching a first column of DCT data blocks from the storage device, the first column including DCT data blocks X.sub.1, X.sub.2 and X.sub.3; (b) computing X.sub.++ according to the following formula: ##EQU38## (c) computing X.sub.+- according to the following formula: ##EQU39## (d) computing X.sub.- according to the following formula:

(e) computing Z.sub.1 according to the following formula:

(f) retrieving a variable Z.sub.2;

(g) retrieving a variable Z.sub.1;

(h) computing Z.sub.++ according to the following formula: ##EQU40## (i) computing Z.sub.+- according to the following formula: ##EQU41## (j) computing Z.sub.- according to the following formula:

(k) computing Y according to the following formula:

31. A general purpose computer programmed to filter a spatial domain image that has been compressed and is represented in a discrete cosine transform (DCT) domain according to claim 30, the microprocessor further performing the following steps under control of the software:

(1) computing X.sub.30 according to the following formula: ##EQU42## and (m) computing X.sub.++ according to the following formula:

32. A general purpose computer programmed to filter a spatial domain image that has been compressed and is represented in a discrete cosine transform (DCT) domain according to claim 31, the microprocessor further performing the following steps under control of the software:

(n) computing X.sub.+- according to the following formula:

33. A general purpose computer programmed to filter a spatial domain image that has been compressed and is represented in a discrete cosine transform (DCT) domain according to claim 32, the microprocessor further performing the following steps under control of the software:

(o) computing X.sub.+ by performing a shift-and-add function.

34. A general purpose computer programmed to filter a spatial domain image that has been compressed and is represented in a discrete cosine transform (DCT) domain according to claim 30, the microprocessor further performing the following steps under control of the software:

(p) computing Z.sub.+ according to the following formula: ##EQU43## and (q) computing Z.sub.+- according to the following formula:

35. A general purpose computer programmed to filter a spatial domain image that has been compressed and is represented in a discrete cosine transform (DCT) domain according to claim 30, the microprocessor further performing the following steps under control of the software:

(r) in steps (b) through (e), ignoring zero elements in the DCT data blocks X.sub.1, X.sub.2 and X.sub.3 in order to further reduce the number of computations required to filter the image.

36. A general purpose computer programmed to filter a spatial domain image that has been compressed and is represented in a discrete cosine transform (DCT) domain according to claim 30, the microprocessor further performing the following steps under control of the software:

(s) in step (e), ignoring zero elements in the first (V.sub.++), second (V.sub.-) and third (V.sub.-) sparse vertical submatrices in order to further reduce the number of computations required to filter the image.

37. A general purpose computer programmed to filter a spatial domain image that has been compressed and is represented in a discrete cosine transform (DCT) domain according to claim 30, the microprocessor further performing the following steps under control of the software:

(t) in step (k), ignoring zero elements in the first (H.sub.++.sup.t), second (H.sub.-.sup.t) and third (H.sub.-.sup.t) sparse horizontal transpose submatrices in order to further reduce the number of computations required to filter the image.

38. A general purpose computer programmed to filter a spatial domain image that has been compressed and is represented in a discrete cosine transform (DCT) domain according to claim 30, the microprocessor further performing the following steps under control of the software:

(u) storing Z.sub.2 as Z.sub.1; and, then

(v) storing Z.sub.3 as Z.sub.2.