Apparatus, methods, and computer program products for reducing the number of computations and number of required stored values for information processing methods
Apparatus, methods, and computer program products are provided for generating a second set of equations requiring reduced numbers of computations from a first set of general equations, wherein each general equation defines coefficients in terms of a set of samples and a plurality of functions having respective values. A first set of tokens is initially assigned to the plurality of functions such that every value of the functions that has a different magnitude is assigned a different token, thereby permitting each general equation to be defined by the set of samples and their associated tokens. Each general equation is then evaluated and the samples having the same associated token are grouped together. A second set of tokens is then assigned to represent a plurality of unique combinations of the samples. The second set of equations is then generated based at least on the first and second sets of tokens.
The present application claims priority from U.S. provisional patent application Ser. No. 60/210,661, entitled: METHODS AND APPARATUS FOR PROCESSING INFORMATION USING SPECIAL CASE PROCESSING, filed on Jun. 9, 2000, the contents of which are incorporated herein by reference.
FIELD OF THE INVENTIONThe present invention relates generally to the determination of coefficients of a function. More particularly, the methods and computer program products of the present invention relate to reducing the number of computations and values that must be stored in the determination of the coefficients.
BACKGROUND OF THE INVENTIONSignal processing is an important function of many electronic systems. In particular, in many electronic systems, data is transmitted in signal form. Further, some electronic systems analyze and monitor the operation of mechanical or chemical systems by observing the characteristics of signals, such as vibration signals and other types of signals, that are output from these systems. In light of this, methods have been developed to characterize signals such that information or data in the signal is available for data processing.
As one example, in many electronic systems, time domain signals are typically transformed to the frequency domain prior to signal processing. A typical method for converting signals to the frequency domain is performed using Fourier Transforms. The Fourier Transform of a signal is based on a plurality of samples of the time domain signal taken over a selected time period, known as the base frequency. Based on these samples of the signal the Fourier Transform provides a plurality of coefficients, where the coefficients respectively represent the amplitude of a frequency that is a multiple of the base frequency. These coefficients of the Fourier Transform, which represent the signal in the frequency domain, are then used by electronic systems in processing the signal.
Although Fourier Transforms are among some of the most widely used functions for processing signals, there are other functions that are either currently used or will be used in the future, as a better understanding of their applicability is recognized. These functions include Bessel functions, Legendre Polynomials, Tschebysheff Polynomials of First and Second Kind, Jacoby Polynomials, Generalized Laguerre Polynomials, Hermite Polynomials, Bernoulli Polynomials, Euler Polynomials, and a variety of Matrices used in Quantum Mechanics, Linear Analysis functions, wavelets and fractals just to name a few.
Although Fourier transforms and the other functions mentioned above are useful in determining characteristics of signals for use in data processing, there are some drawbacks to their use. Specifically, application of these functions to signals is typically computationally intensive. This is disadvantageous as it may require the use of specialized processors in order to perform data processing, Further, and even more importantly, the time required to perform the number of computations using these functions may cause an unacceptable delay for many data processing applications. In fact, a goal of many data processing systems is the ability to process data signals in real time, with no delay.
For example, the Fourier Series is defined as an infinite series of coefficients representing a signal. To transform a signal using a Fourier Series would require an infinite number of computations. To remedy this problem, many conventional data processing systems use Discrete Fourier Transforms (DFT), as opposed to the infinite Fourier Series. The DFT is the digital approximation to the Fourier Series and is used to process digitized analog information. Importantly, the DFT replaces the infinite series of the Fourier Series with a finite set of N evenly spaced samples taken over a finite period. The computation of the DFT therefore provides the same number of coefficients as the samples received, instead of an infinite number of samples required by the Fourier Series. As such, use of the DFT provides the most satisfactory current means to process the signal.
Because of the importance of reducing the time required to process signals, however, methods have been developed to further reduce the number of computations required to perform a DFT of a signal. Specifically, the DFT procedure computes each coefficient by a similar process. The process for a general coefficient is; multiply each sample by the sine or cosine of the normalized value of the independent variable times the angular rate and sum over all of the samples. This procedure defines N multiply-add steps for each of N coefficients, which in turn, equates to N2 multiply-add computations per DFT. As many samples of a signal are typically required to perform an adequate approximation of the signal, the DFT of a signal is typically computational and time intensive.
One of the methods developed to reduce the number of computations is the Butterfly method, which reduces the number of computations from N2 to N times log (N). The Butterfly method is based on the fact that many of the trigonometric values of the DFT are the same due to periodicity of the functions. As such, the Butterfly method reduces the matrix associated with the DFT into N/2 two-point transforms (i.e., the transforms representing each coefficient an and bn). The Butterfly method further reduces the redundant trigonometric values of the DFT. Although the Butterfly method reduces the number of computations over the more traditional DFT method, it also adds complexity to the Fourier transformation of a signal. Specifically, the Butterfly method uses a complex method for addressing the samples of the signal and the matrix containing the functions. This complexity can require the use of specialized processors and increase time for computation of the Fourier Transform. By its nature, the Butterfly is a batch process, which does not begin determination of the coefficients until after all of the samples have been received. Consequently, this method causes latency in the determination of the coefficients of the function, where the time between the arrival of the last sample and the availability of the coefficients is defined as the latency of the system.
An improved approach to reducing the time required to process signals is described in U.S. Application Ser. No. 09/560,221 entitled: APPARATUS, METHODS, AND COMPUTER PROGRAM PRODUCTS FOR DETERMINING THE COEFFICIENTS OF A FUNCTION WITH DECREASED LATENCY filed Apr. 28, 2000 and corresponding PCT Application Number W 00/67146, entitled: Compution of Discrete Fourier Transform, publication date Nov. 9, 2000. These applications are assigned to the inventor of the present application, and are incorporated herein by reference. The approach in WO 00/67146 reduces or eliminates the problem of latency for processing coefficients by using the property of independence of samples of functions like the DFT. The approach updates at least one of the coefficients of the function prior to receipt of the last sample of a sample set thereby reduce latency.
Despite the improvements in data processing accomplished by the apparatus and methods of U.S. application Ser. No. 09/560,221), there are continuing needs to reduce the number of calculations required stored terms for determining the coefficients of a function.
SUMMARY OF THE INVENTIONAs set forth below, the apparatus, methods, and computer program products of the present invention overcome many of the deficiencies identified with processing signals using functions, such as Fourier Transforms. In particular, the present invention provides methods and computer program products that determine the coefficients of a function representative of an input signal with reduced calculation complexity, such that the coefficients of the function are made available within a decreased time from receipt of the last sample of the signal. The present invention also provides methods and computer program products that reduce the amount calculations that must be performed in order to determine the coefficients of a function, such that less complex hardware designs can be implemented. Specifically, the time for performing computations can be conserved by reusing previously calculated terms so that terms having the same value are calculated fewer than the number of times that the term appears in the original equations; preferably only one calculation for each repeating term having the same value for the entire calculation process. In addition, some embodiments of the present invention also use a reduced number of values to represent the possible mathematical terms of a function.
BRIEF DESCRIPTION OF THE DRAWING AND APPENDICESFigure is a block diagram illustrating the operations for generating a special case set of equations for converting input values to coefficients with reduced computation from a more general set of equations according to one embodiment of the present invention.
Appendix 1 illustrates use of the operations of
Appendix 2 illustrates use of the operations of
Appendix 3 illustrates use of the operations of
Appendix 4 illustrates use of the operations of
Appendix 6 illustrates use of the operations of
Appendix 7 illustrates use of the operations of
Appendix 8 illustrates use of the operations of
Appendix 9 illustrates use of the operations of
Appendix 10 illustrates use of the operations of
Appendix 11 illustrates use of the operations of
The present invention now will be described more fully hereinafter with reference to the accompanying drawings, in which preferred embodiments of the invention are shown. This invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art. Like numbers refer to like elements throughout.
For illustrative purposes, the various methods and computer program products of the present invention are illustrated and described below in conjunction with the characteristics of Fourier Series. It should be apparent, however, that the methods and computer program products of the present invention can be used with many different types of functions. For instance, the methods and computer program products may be used with functions such as Bessel functions, Legendre Polynomials, Tschebysheff Polynomials of First and Second Kind, Jacoby Polynomials, Generalized Laguerre Polynomials, Hermite Polynomials, Bernoulli Polynomials, Euler Polynomials, and a variety of Matrices used in Quantum Mechanics, Linear Analysis functions, wavelets and fractals. This list is by no means exhaustive and is provided as mere examples. The approach may be applied to any function that can be expressed as a matrix of values. The usefulness of the application of these and other functions not listed above is quite general. The method of the present invention provides a way to develop apparatus, methods, and computer program products for parallel computing and remove calculation redundancy in a rote manner, which is compatible with machine execution. One implementation of the present invention would be in a general purpose computer program to examine each class of problem and write a minimal execution program or design an apparatus for each specific case of the function. In this application, it would be a programming aid.
In particular, the present invention provides methods and computer program products that remove redundancy from a mathematical procedure for determining coefficients of a function. The present invention is particularly suited to use in systems that employ coefficient-based mathematics.
Some of methods and computer program products of the present invention provide a sequence of operations in which a working formulation or program is examined to identify the functional usage of each part. By classifying the various details of the program and then following rules of combination or substitution that are universal, the number of computations and/or required stored values for a given computational determination may be reduced. As with any algebra, it becomes possible to do significant reconfiguration without knowing the behavior of the actual computational system. The functionality of the entire program is preserved while the process is changed to another form. The form is then algebraically manipulated to optimize speed or whatever parameters are desired.
It is often possible to reduce one or more variables out of a system that has many variables. If only part of the possibilities are addressed at one time this process can be repeated later to handle others. Some of the variables that yield gains when methods according the present invention are applied include the number of bits at various locations, the number of coefficients being formed, and any other aspect that defines the structure.
The number of bits can be used in several ways. One way is simply to detail each of the values of that many bits. Another is to have a continuous function that will be represented in a specific number of bits. If the actual values are to be represented in a small number of bits, it may be that several values of the function will be represented by a single number. In this case, it becomes important to know the correctly rounded points so that no systematic bias will be introduced. Once that has been done correctly, there will be a reduction in the total number of discrete terms and individual computations that must be made available to the process.
The resulting structure is then sorted and processed to algebraically format the output in some basic manner. Typically, the individual formulas of the output terms are related to the primary input terms. Where numbers have replaced the previous functions, the additions are carried out. Once the processing is done, there may be some number of simultaneous equations that share some variables.
The next step is a three-fold set of details to be identified and collected. The first is to identify those variables that are used in identical combinations in different terms of each output function. The second one is to separately determine those variables that are used in identical combinations in various output functions. Finally, the variables should be linked temporally so as to be able to find the time at which storage must be allocated or when all of the members of a group become available. Ordering them temporally in each combinatorial representation can do this. They may be placed in the order of occurrence. It is then easy to access the first member to know when an accumulation site is required. Similarly, it is easy to access the last member to determine the time when all members have become available. The term temporal may be the wrong term when the samples were not from a time-oriented process, but these processed are done on an ordered set of numbers and the order may be artificially be correlated with time.
There may be need to minimize the number of accesses or the number of storage locations or a limit according to specific guidelines. These rules control the next steps to form groups. In the examples supplied, it was necessary to minimize the number of multiplies first. It was then required to minimize the number of add-accumulates. Next, minimize the number of individual storage registers. It is usually beneficial that once a register contains a variable, the variable would not be transferred to other register locations because it allows the registers to be called by the name of the variable it contains and also reduces the number of processing steps defined. After such a variable is used for the last time there may be a newly formed variable that cam be assigned to the same register in which case there is value in keeping the register/variable name but changing a prefix or suffix portion. Finally, each output term is to be formed as soon as it becomes determined by the input information. Following this rule is generally beneficial in making the final output available at the earliest time without having to use multi-pass compiling methods at this level. The procedures do not alter the eventual set of coefficients.
The result is a set of intermediate variables that are each derived from lower variables by a multiply or add process. The actual definition of the intermediates actually contains all of the information required to structure the implementation. The output equations are a sum or product of variables that are defined sequentially descending to the primitives.
With regard to
Stated in a somewhat different way, the apparatus, methods, and computer program products of the present invention generate a second set of equations requiring reduced numbers of computations from a first set of general equations, where each general equation defines coefficients in terms of a set of samples and a plurality of functions having respective values. A first set of tokens is initially assigned to the plurality of functions such that every value of the functions that has a different magnitude is assigned a different token, thereby permitting each general equation to be defined by the set of samples and their associated tokens. Each general equation is then evaluated and the samples having the same associated token are grouped together. A second set of tokens is then assigned to represent a plurality of unique combinations of the samples. These second set of tokens are then evaluated to determine a third set of tokens that define all the unique tokens of the second set of tokens. This repeated until there are no more redundant tokens. The second set of equations is then generated based on the sets of tokens.
Details of the DFT Example
It is easiest to view the process as an Output-Oriented Methodology. For the DFT there are N coefficients to be derived from N samples. Each sample is going to be multiplied by a function of the sample position and the specific coefficient. Since a sample has one sample position and will be used in N coefficients, we can designate it by a generic name and two parameters. The generic name is the name we pick for the initial level of abstraction. Let us pick Alpha The parameters may be shown as matrix indices. For the current discussion Alpha (i, j) is used to represent the function which evaluates to become the multiplier of the ith sample position for the jth output coefficient. The indices i and j range from 1 to N. Let i represent the sample positions and j represent the coefficient being referenced.
For the DFT, the Alpha (i, j) is cos (2*Pi*i*j/N) or sin (2*Pi**i*j IN) (in here, as well as in the Appendixes, Pi is notation for π=3.14 . . . ). Once N is decided, there is a numerical value for each Alpha (i, j). The numerical value will be referred to as Beta (i, j). In the example case there are N times N numerical values designated Beta (i, j).
The first operation is to form a list of values. As a designator for the values that we will call tokens, we will use T (m). We begin by setting T (1) equal to Alpha (1,1). We may now proceed in an orderly manner to compare each Alpha (i, j) to T (1). If it is equal we add Alpha (i, j) to the list for T (1), if it is not equal we compare it to T (2) if one has been assigned a value. If it is numerically equal we add it to the list for T (2). If it does not equal any assigned T-value, we assign it to T (v) where v is the next unassigned index for T. Upon completion, we will have N squared entries. There will be tokens T (1) . . . T (v−1). Each Alpha (i, j) may now be replaced by a T (m) whose numerical value is stored in the list and a sign where the Alpha was numerically equal but opposite in value. If at least one of the values of Alpha (i, j) occurred twice, this process results in a simpler array. As a convention, we will only allow positive values or zero for T (m). All sign information will be left in the equations. Similar simplifications may be done where we check for multiples of Alpha (i, j) or functions of i and j, but in this example we will use only equal amplitudes.
The second operation is done with the tokens applied. We express each output function (coefficient C (j),) as the sum of each sample times the token linked with its sample position and the appropriate sign. Each token is multiplied by the sum of all of the samples of the sample positions whose Alpha (i, j) were listed as having the value of that token in the first operation. This operation consists of collecting all of the identical tokens by summing the associated samples into a single term and then multiplying by the token value. The form of each output function becomes a summation of token products. In this, case j is associated with any one coefficient, so that we are collecting across the i sample positions of each coefficient.
Let us choose to call the individual samples by a symbol and an index. The samples may be denoted S(i). The samples are the actual variables that represent input information in the final structure. We may then represent the jth coefficient by:
C(j)=T(1)*ΣS(dk1))+T(2)*(ΣS(dk2))+ . . . T(m)*(ΣS(dn)).
It follows that the number of multiplies is reduced by each time a token is reused in a coefficient.
Listing the groupings of S(i)*T(m) that occur across the various coefficients, C(j), achieves the next two operations. It is obvious that there are no common groupings of S(i)s within a coefficient because each sample is only used one time. The S(i)s are added to form groups. Certain S(i)s are summed and multiplied by a token. When forming various coefficients the same summation will be used again with another sign or multiplied by another token. It is important to reuse the case where the multiply is by the same token and also the reuse of the summation which will be multiplied by another token and used multiple times.
The next operation is achieved by listing the combinations found in each summation in each coefficient in much the same way as was done in the first operation. It is useful to keep the sample symbols in the order of occurrence within each summation. We will designate the summations by the term chunk. These will be named Sum(1) through Sum(s). A list is formed with the first chunk listed under Sum(1). The second chunk will be compared with it and added to its list if it is identical, else it will be named Sum(2) and so listed. Continue by repeating steps from first operation.
After the coefficients are represented by chunks with signs preserved and times tokens, a list will be made that groups the products of tokens and chunks. These are called terms Gamma(1) through Gamma(t). Terms again are only considered as symbols and signs are preserved as before. Once there is a list of terms which breaks down into a list of chunks, etc. we have the minimum set of elements that are required to produce the set of coefficients.
DESCRIPTION OF AN EXAMPLE IMPLEMENTATIONThe first constructor is the range of each chunk. Here we want to identify the first and last sample times contained in each. This is where the value of sample order becomes apparent. Storage will be required from the first required sample time. Similarly, the chunk cannot be multiplied by any of the tokens until it is finished. Because the S(j) are ordered in the chunk definitions listed, no further processing steps are needed at this time.
The second constructor is the sequence of combinations from which to generate the chunks. The general procedure here is to allocate N+1 storage locations in which to fabricate the final coefficients. It will follow that the first samples can be placed in two locations each and then later samples can be accumulated in the locations. Maintaining the constraint allows the correct solution to be achieved by trial and error. There are geometric and other procedures that also produce the same answer in the case of DFT but they are equivalent and require more words to state. Examples of these procedures are to be seen in the following illustrations. There is no single formula for all types of structures, therefore the trial and error method is a suitable example.
Once the chunks are formed it is possible to produce the terms using the same storage locations but by multiplying and accumulating. The terms are then finally combined into coefficients by add-accumulate procedures. In the more general case, there are other strategies, but these are well-known and would be obvious to anyone skilled in the art.
A Computation Minimization of the Discrete Fourier Transform
The DFT has been the subject of much development recently. The FFT has been widely accepted as an efficient method of calculation. The FFT requires N*Log (N) multiplies instead of N2 for previous methods. The FFT appears to require the least computation of any easily stated paradigm. Embodiments of the present invention provide methods that can be much simpler to execute the FFT. The paradigm uses the least processing power.
An example of a method according to one embodiment of the present invention will now be presented. Although the method of this example can derive a precise minimum computational structure for a certain value of N, the method does not necessarily specify the structure for another value of N. This example will show the case where N=8. The procedures described can be programmed so that it is possible to mechanically develop the equations for any given N.
Reference is now made to Appendix 1. Section 1 of Appendix 1 shows the terms that define an 8 by 8 DFT. According to the methods of the standard technologies, each sample along the top must be multiplied by each of the entries below it (i.e. sixty-four multiplications to be performed), and then the products along each row must be summed to produce the coefficient associated with the row. However, embodiments of the present invention significantly can reduce the number of multiplication steps.
Since the trigonometric functions are actually static in this structure, they have numerical values. In Section 2, the mathematical evaluations have been performed and the numerical values have been substituted. In each case, the value consists of a sign and a number. These must each be multiplied by the applicable sample value and then summed across the row to form the coefficient. (See steps 100-120).
As can be seen, there are cases where the numerical value of the entry repeats several times although the sign may reverse, and each sample may have any value in its permitted range. In Section 3, each numerical value other than zero has been assigned a token and the sign has been preserved. (See step 130). Of course, there is no significance to sign in the case of zero.
An algebraic development will now be started. In Section 4, a new column is marked ‘Formula’. The sample numbers are grouped and multiplied by the appropriate tokens. In A0 as an example, the formula is the sum of all of the samples multiplied by the value of token T1. In A3 there is a signed group multiplied by T2 and another by T1. Since all of the variables are either input sample numbers or tokens, this representation is called level zero.
Refer to A1 and A3 in Section 4 of Appendix 1. The groups of samples multiplied by T1 differ only in that the signs are all reversed. The same is true of B1 and B3 except that not even the signs are reversed. Although the individual samples are shown here, the significance of each summation is a single number. Later they are named C6 and C4 respectively. If the sum of [S1−S3−S5+S7] is 3.256, then it follows that A1=3.256*T2−G5 and that A3=−3.256*T2−G5. In other words, we may compute the value of a sum and store it. We can then attach the sign in each case. Once we form the equations for a set of N-coefficients and group sums and tokens, we may identify all groups that are identical (other than sign) independent of where we will use them. The minimum list of groups is one part of developing the required minimum structure. (See step 140).
The next step will be illustrated graphically, but would be the result of a program in a larger development. In Section 5, there are three levels of summation shown. The variables G1 through G4 each consist of two samples added. These tokens represent unique groups of samples that are multiplied by the first set of tokens T. G1 is the (N/2)th plus the Nth sample. G2 is sum of the sample before and the sample after the (N/2)th sample. G3 is the sum of the two samples immediately outside those of G2. This continues until G(N/2) which is composed of samples S1 and S(N−1). G(N/2+1) has the same samples as G1 but the second is subtracted from the first. The other N/2-2 GXs are likewise subtractions and have the same members as G2 through G(N/2−1). The N variables, GX are level 1, each consisting of the sum or difference of two samples.
The level two CX variables are a third set of tokens that consist of the sum or difference of two level one GX variables. Three pairs are used to make the six CX variables, each are the sum or difference of two GX variables. Therefore, each are sums of four samples and again are just one numerical value and a sign. (See step 150).
Level 3 DX variables are a fourth set of tokens that consist of the sum and difference of two CX variables. C1 and C2 are used to form D1 and D2. Therefore, these are each the summation of eight samples reduced to a single signed number. (See steps 160-180).
The method of forming the equations may not be optimized. It is possible to use a random matching procedure to find the simplest implementation of the required steps. Section 6 contains the same information as Section 4 except the formulas are expressed in variables that represent the actual sample combinations. These combinations are achieved purely by the addition of binary sample numbers in a pattern somewhat like the Butterfly method. It is worth mentioning that they are not identical to the Butterfly method.
Next, we observe that the value of T1 is one. Multiplying by one produces no change. In Section 7, the formulas are simplified by removing the (times T1) portion. A review of the final formulas shows that there are only four real multiplies to be made. On the other hand, two of these are C4*T2 with G7 added to or subtracted from it. The other two are C6*T2 with different signs and G5 subtracted. Therefore, only two multiplies are required to do an eight-sample (N=8) Fourier Transform (which is N/4), the FFT would use 24 which is N*Log(N).
It is necessary to point out that small Fourier Transforms have a higher percentage of 1's and therefore a greater reduction of multiplies. It is estimated that the actual number of multiplies approaches N as N increases. This is for N being a power of two. Results are not as efficient for other N's.
It is also worth noting that for larger N's the tokens include ½ and ¼. These can be achieved by summing in a bank of memory that includes a wired offset of one or two binary places as it connects to the output bus without multiplying. There are several ways to take advantage of these cases.
The method of defining level 1 terms seems to be optimal. Formation of the intermediate terms for each N may be accurately done by a ‘modify and evaluate’ program that has freedom to substitute equivalents and test the merit of each case. What is best is a matter of the design requirements and the technology being used. Given a well-chosen set of requirements it is reasonable to devote a sizeable effort to the solution because it is a onetime task. Extension of these principles to N=16, 32, and 64 are good steps in developing various ‘rules’ and insights for extending to larger N's with less reliance on the ‘modify and evaluate’ strategy. Once an approach is shown to find the simplest procedure it is possible to phrase various parallel approaches are simply variants of what is disclosed herein.
A further strategy that is useful when reducing the number of multiplications in a DFT that is composed of a large number of finite-precision samples is achieved by grouping the tokens. This is accomplished as shown in the following example.
An 8192 sample DFT is to be evaluated with samples that are eight-bit binary numbers. If it is decided to use trigonometric functions, rounded correctly to eight bits. The sixteen-bit products are summed in 32-bit registers. There are 2048 first-level tokens that represent the trigonometric function values. These values are rounded correctly to eight bits and assigned to 256 second-level tokens. All of the components of each coefficient that are multiplied by any of the 2048 first-level tokens that are now represented by a single second-level token are thus summed and handled in a single multiplication by the appropriate second-level token value.
This has the effect of reducing the number of multiplies and introducing a noise source. A noise source was already present due to quantization of the original samples to eight-bit accuracy. Matching it by eight-bit rounding of the multipliers will increase the noise by the square root of two. If the original noise source was acceptable in the application, general engineering practice considers that there will be no serious degradation due to the use of eight bit multipliers.
It therefore follows that it is possible to further group and sum according to the principles described above using the actual individual properties and bit-lengths that apply to each and every case. This is valuable where a specific case will be used many times such as in real-time processing, modulation or image compression to name a few applications.
A Computational Minimization of the Discrete Cosine Transform (DCT)
The DCT is also of interest for a variety of applications. Appendix 2 Sections 1 to 6 show that the calculation requirements for the Discrete Cosine Transform can be reduced according to embodiments of the present invention.
An example of a method according to one embodiment of the present invention will now be presented for the DCT. This example will show the case where N=8. As mentioned earlier, the procedures described here can be programmed so that it is possible to mechanically develop the equations for any given N.
Appendix 2 Section 1 shows the terms that define an 8 by 8 DCT. Applying essentially the same procedure described in the sequence presented above and illustrated in
Additional descriptions of embodiments of the present invention can be found in Appendices 3-11. An embodiment of the present invention for a sixteen sample one-dimensional discrete cosine transform is presented in Appendix 3. The equations according to an embodiment of the present invention are presented in Appendix 4. The equations in Appendix 4 are derived for an eight sample Fourier transform. Appendix 4 also shows example calculations using those equations. It is to be understood that the invention, particularly when used in the form of executable steps for processing on an information processors such as a computer. Appendix 4 clearly illustrates an important advantage of the present invention. Specifically, only two multiplication steps are required in order to calculate the Fourier transform coefficients for the eight input values used in the calculations. Appendix 5 shows calculation similar to those shown in Appendix 4 except that a different starting function is assumed from which coefficients are derived.
Appendix 6 shows an embodiment of the present invention for calculating two-dimensional discrete cosine transforms for eight set of eight samples. Appendix 7 also shows an embodiment of the present invention for the two-dimensional discrete cosine transform for 64 samples. However, the embodiment shown in Appendix 7 accomplishes the calculations as a single step special case process. Appendix 8 shows an embodiment of present invention for calculating discrete cosine transforms for eight samples.
Process Examples of Fourier Transform and Discrete Cosine Transform
A simulation program was developed to determine how much storage and the number of adds and multiplies that would be required to process the Fourier Transform (see: Appendix 9 Process Example of SFT) and the Discrete Cosine Transform (see: Appendix 10 Process Example of SDCT). Since there are eight samples and eight coefficients to be found we can rule out any number below eight. The simulation started with storage elements G1, G2, . . . G8. Each of the first four samples was loaded in two locations according to a map of what combinations were needed. The process went as follows: S1: G4, G6; S2: G3, G7; S3: G2, G8; S4: G1, G5. All eight storage elements thus had information. The next three samples were then add-accumulated to the initial contents as follows: S5: G2, G8; S6: G3, G7; S7: G4, G6.
It is now optional to process or load the eighth sample. The process steps are done by putting the G2−G4 in a new storage element, TEMP, G2+G4 in G4, and then TEMP in G2. It is then possible to put G6+G8 in TEMP, G6−G8 in G6 and then TEMP in G8. B2 is now in G6.
The next step is to a multiply G8 times T2 (T2=0.7071067812 . . . ) and place G8*T2 in TEMP. Then place TEMP−G7 in G8 and TEMP+G7 in G7. B1 is now in G7 and B3 is in G8. We therefore see that it is possible to compute all of the Bn after the (N−1) sample as they are independent of the Nth sample.
The process has used 16 add-accumulate procedures and one multiply. A0 is now in G4, A2 is in G1 and A4 is in G3.
The other coefficient requires that G2 is multiplied by T2. First place G2*T2 in TEMP. Now place G5−TEMP in G2 and G5+TEMP in G5. Now A1 is in G2 and A3 is in G5.
The coefficients are complete and the entire process used a total of twenty add-accumulate procedures, two multiplications. The Butterfly method by comparison uses 24 multiply-add-accumulate procedures.
The total storage requirement is N+1 words. In a practical situation there is usually a requirement to be saving one sample set while processing the next. It follows that some amount of storage (up to N words) will be added to store the samples that would be received before the coefficients are transmitted.
Embodiments of the present invention are of value when the original program is dedicated to a specific purpose such as evaluating the actual values of functions at selected points of evaluation. Once this information is determined, the entire solution is formed with the real variables being unrestricted. The program is then structured for optimum behavior and made ready for the input values to be applied with no difference or controlled changes in the functional behavior. The process can improve the understandability of mathematical relationships and identify cases where variables change their effects.
In general, the process of generating coefficient from input values is parallel to the process of generating input values from coefficients. Therefore, the methods shown to do the former are equally methods to do the latter.
Appendices 1-11 are method steps, tables, examples, and control flow illustrations of methods and program products according to the invention. It will be understood that each step, flowchart and control flow illustrations, and combinations thereof can be implemented by computer program instructions. These computer program instructions may be loaded onto a computer or other programmable apparatus to produce a machine, such that the instructions which execute on the computer or other programmable apparatus create means for implementing the functions specified in the block diagram, flowchart or control flow block(s) or step(s). These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the block diagram, flowchart or control flow block(s) or step(s). The computer program instructions may also be loaded onto a computer or other programmable apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer-implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the block diagram, flowchart or control flow block(s) or step(s).
Accordingly, blocks or steps of the block diagram, flowchart or control flow illustrations support combinations of means for performing the specified functions, combinations of steps for performing the specified functions and program instruction means for performing the specified functions. It will also be understood that each block or step of the block diagram, flowchart or control flow illustrations, and combinations of blocks or steps in the block diagram flowchart or control flow illustrations, can be implemented by special purpose hardware-based computer systems which perform the specified functions or steps, or combinations of special purpose hardware and computer instructions.
Many modifications and other embodiments of the invention will come to mind to one skilled in the art to which this invention pertains, having the benefit of the teachings presented in the foregoing descriptions and the associated drawings. Therefore, it is to be understood that the invention is not to be limited to the specific embodiments disclosed and that modifications and other embodiments are intended to be included within the scope of the appended claims. Although specific terms are employed herein, they are used in a generic and descriptive sense only and not for purposes of limitation.
** = Complete on arrival of the (N − 1)th sample.
*** = Term is used in the final Formula set.
Claims
1. A method of generating a second set of equations requiring reduced numbers of computations from a first set of general equations, wherein each general equation defines a coefficient in terms of a set of samples and a plurality of functions having respective values dependent upon each sample, said method comprising the steps of:
- assigning a first set of tokens to the plurality of functions such that every value of the plurality of functions having a different magnitude is assigned a different token, thereby permitting each general equation to be defined by the set of samples and their associated tokens;
- evaluating each of the general equations as defined by the set of samples and associated tokens and grouping the samples having the same associated token together into separate groups;
- assigning a second set of tokens to represent a plurality of unique combinations of the samples; and
- generating the second set of equations based on at least the first and second sets of tokens.
2. A method according to claim 1 further comprising after said assigning a second set of tokens step the step of assigning an nth set of tokens to represent a plurality of unique combinations of the (n−1)th set of tokens, and wherein said generating step comprises generating the second set of equations based on at least the first through the nth sets of tokens.
3. A method according to claim 1 wherein the general equation defines a discrete Fourier transform, and wherein said generating step generates a second set of equations that define a discrete Fourier transform.
4. A method according to claim 1 wherein the general equation defines a discrete cosine transform, and wherein said generating step generates a second set of equations that define a discrete cosine transform.
5. A method according to claim 1 wherein the general equation defines a function selected from the group consisting of Fourier transform, two-dimensional Fourier transform, cosine transform, two-dimensional cosine transform, Bessel functions, Legendre Polynomials, Tschebysheff Polynomials of First and Second Kind, Jacoby Polynomials, Generalized Laguerre Polynomials, Hermite Polynomials, Bernoulli Polynomials, Euler Polynomials, Matrices used in Quantum Mechanics, Linear Algebra and wavelets, and wherein said generating step generates a second set of equations that define the function.
6. A method according to claim 1 wherein the method is developed using universal approximators.
7. A method according to claim 1 further comprising the step of using the second set of equations generated in said generating step to determine the coefficients based on a set of samples.
8. An apparatus for generating a second set of equations requiring reduced numbers of computations from a first set of general equations, wherein each general equation defines a coefficient in terms of a set of samples and a plurality of functions having respective values dependent upon each sample, said apparatus comprising a processor capable of performing the following functions:
- assigning a first set of tokens to the plurality of functions such that every value of the plurality of functions having a different magnitude is assigned a different token, thereby permitting each general equation to be defined by the set of samples and their associated tokens;
- evaluating each of the general equations as defined by the set of samples and associated tokens and grouping the samples having the same associated token together into separate groups;
- assigning a second set of tokens to represent a plurality of unique combinations of samples; and
- generating the second set of equations based on at least the first and second sets of tokens.
9. An apparatus according to claim 8 wherein said processor is further capable of after assigning a second set of tokens, assigning an nth set of tokens to represent a plurality of unique combinations of the (n−1)th set of tokens and generating the second set of equations based on at least the first through the nth sets of tokens.
10. An apparatus according to claim 8 wherein the general equation defines a discrete Fourier transform, and wherein said processor is capable of generating a second set of equations that define a discrete Fourier transform.
11. An apparatus according to claim 8 wherein the general equation defines a discrete cosine transform, and wherein said processor is capable of generating a second set of equations that define a discrete cosine transform.
12. An apparatus according to claim 8 wherein said processor is further capable of using the second set of equations generated in said generating step to determine the coefficients based on a set of samples.
13. A computer program product for generating a second set of equations requiring reduced numbers of computations from a first set of general equations, wherein each general equation defines a coefficient in terms of a set of samples and a plurality of functions having respective values dependent upon each sample, wherein the computer program product comprises:
- a computer readable storage medium having computer readable program code means embodied in said medium, said computer-readable program code means comprising:
- first computer instruction means for assigning a first set of tokens to the plurality of functions such that every value of the plurality of functions having a different magnitude is assigned a different token, thereby permitting each general equation to be defined by the set of samples and their associated tokens;
- second computer instruction means for evaluating each of the general equations as defined by the set of samples and associated tokens and grouping the samples having the same associated token together into separate groups;
- third computer instruction means for assigning a second set of tokens to represent a plurality of unique combinations of samples; and
- fourth computer instruction means for generating the second set of equations based on at least the first and second sets of tokens.
14. A computer program product according to claim 13 comprising after said third computer instruction means, fifth computer instruction means for assigning an nth set of tokens to represent a plurality of unique combinations of the (n−1)th set of tokens, and wherein said fourth computer instruction means generates the second set of equations based on at least the first through the nth sets of tokens.
15. A computer program product according to claim 13 wherein the general equation defines a discrete Fourier transform, and wherein said fourth computer instruction means generates a second set of equations that define a discrete Fourier transform.
16. A computer program product according to claim 13 wherein the general equation defines a discrete cosine transform, and wherein said fourth computer instruction means generates a second set of equations that define a discrete cosine transform.
17. A computer program product according to claim 13 further comprising fifth computer instruction means for using the second set of equations generated in said generating step to determine the coefficients based on a set of samples.
18-23. (canceled)
Type: Application
Filed: Oct 22, 2004
Publication Date: Jun 30, 2005
Inventors: Walter Pelton (Fremont, CA), Adrian Stoica (Pasadena, CA)
Application Number: 10/971,568