MAXIMIZING SYSTEM RESOURCES USED TO DECOMPRESS READ-ONLY COMPRESSED ANALYTIC DATA IN A RELATIONAL DATABASE TABLE
A method, computer program product and system for minimizing system resources used to decompress read-only compressed analytic data in a relational database table. An i-code list associated with a relational database table is converted into a programming language. The programming language is compiled in object code and stored in a module in the user's system. The object code is called with a pointer designating the particular row in the database containing the compressed data to be decompressed. The compressed data designated by the pointer is decompressed upon execution of the object code. By having the source code for decompressing the compressed data stored as object code in the user's system, the interpretation step (as used in the i-code method) is avoiding thereby reducing the number of machine cycles used to decompress the compressed data. As a result, query programs will be able to access large amounts of data more quickly.
Latest QD TECHNOLOGY, LLC Patents:
- MAXIMIZING SYSTEM RESOURCES USED TO DECOMPRESS READ-ONLY COMPRESSED ANALYTIC DATA IN A RELATIONAL DATABASE TABLE
- Maximizing system resources used to decompress read-only compressed analytic data in a relational database table
- Selecting various algorithms to compress columns of analytic data in a read-only relational database in a manner that allows decompression of the compressed data using minimal system resources
- SELECTING VARIOUS ALGORITHMS TO COMPRESS COLUMNS OF ANALYTIC DATA IN A READ-ONLY RELATIONAL DATABASE IN A MANNER THAT ALLOWS DECOMPRESSION OF THE COMPRESSED DATA USING MINIMAL SYSTEM RESOURCES
This application is related to the following commonly owned copending U.S. patent application:
Provisional Application Ser. No. 60/668,322, “Method for Creating and Executing Compiled Access Methods for Relational Data Tables and Indexes that Achieves High Performance Access Against Compressed Data and Indexes,” filed Apr. 4, 2005, and claims the benefit of its earlier filing date under 35 U.S.C. §119(e).
TECHNICAL FIELDThe present invention relates to the field of data compression and decompression for databases, and more particularly to minimizing system resources used to decompress read-only compressed analytic data in a relational database table.
BACKGROUND INFORMATIONA database may refer to a collection of related records that is created and managed by what is commonly referred to as a database management system. One type of database is a “relational database.” A relational database may refer to a database that maintains a set of separate, related files or tables, but combines data elements from the tables for queries and reports when required.
The present invention is directed to a relational database that stores a particular type of data, referred to herein as “analytic data.” Analytic data may refer to data that is analyzed. For example, stock transaction data may be analyzed for trends such as the age group of the individuals engaged in stock transactions. In another example, insurance data may be analyzed to determine whether it is profitable to maintain particular individuals as customers. In another example, data may be analyzed for fraud.
Often the analytic data stored in a relational database is “compressed” in order to maximize the amount of data stored in a given amount of disk space. Data compression may refer to the process of encoding information using fewer bits than an unencoded representation (original format of the data) would use through use of specific encoding schemes. For example, an article could be encoded with fewer bits if we accept the convention that the word “compression” be encoded as “comp.” Once the analytic data is compressed, the compressed analytic data may be “read-only.” Read-only may refer to data that will not change after it is compressed. It is noted that when “compressed data” is used herein that “compressed data” refers to “compressed analytic data.”
When a user desires to access the compressed data in the relational database, the compressed data needs to be “decompressed” in order to reverse the effects of data compression. Decompression may refer to the act of reversing the effects of data compression which restores the data to its original form prior to being compressed. In this manner, the user is able to retrieve the requested data in its original form.
The present invention is directed to a decompression approach that does not decompress the entire rows of compressed data in a relational database table at a single time. Instead, the present invention is directed to a decompression approach that selectively decompresses column data in relational data tables, such as decompressing the compressed data row by row and then column by column within each row as each row is needed by the relational query processor.
One such method (“commonly referred to as the “control block method”) used in such a decompression approach involves a decompression program reading information from a data structure, commonly referred to as a “control block,” associated with a particular table of the relational database. The control block may store algorithms and parameters used to identify the particular subroutines to call to decompress the data in the table. The decompression program may read column by column within a row. After reading a column, the decompression program may read the information in the control block to call the appropriate subroutine to decompress the data for that column. The same process is repeated for the other columns in the row. The control block method uses a small amount of code to decompress the compressed data. However, a drawback to using the control block method is an excessive number of computer cycles being used to decompress the compressed data resulting in poor performance for query programs that access large amounts of data.
Another method (“i-code method”) for decompressing compressed data used in the decompression approach discussed above is to create a string of commands, stored in a list (“i-code list”), that perform the same functionality as the parameters in the control block. These string of commands are created for a particular table of the relational database which are used by the decompression program to uncompress the data for each row of the table, row by row and column by column within the row. The string of commands may be referred to herein as “i-code” or “p-code,” which are pseudo-code, i.e., not machine executable code. The i-code may be built by an “i-code builder” which is executed by an “i-code interpreter.” The i-code interpreter interprets each of the commands. That is, the i-code interpreter interprets each command, one at a time, and then executes the code in-line thereby avoiding the subroutine calls. In comparison to the control block method, many machine cycles are eliminated to uncompress data. It does, however, require more programming effort as an i-code builder and an i-code interpreter have to be built. Further, while the i-code method does reduce the number of machine cycles used to decompress compressed data, there is still improvement to be made. By further reducing the number of machine cycles used to decompress compressed data, query programs will be able to access large amounts of data more quickly.
Therefore, there is a need in the art for further reducing the system resources (e.g., machine cycles) used to decompress read-only analytic data in a relational database table.
SUMMARYThe problems outlined above may at least in part be solved in some embodiments by converting the i-code list associated with a relational database table into a programming language and then compiling the programming language into object code. The object code may then be stored in the user's system and executed to decompress the compressed rows of data stored in the relational database table. By having the source code for decompressing the compressed data in the relational database table stored as object code in the user's system, the interpretation step (as used in the i-code method) is avoided thereby reducing the number of machine cycles used to decompress the compressed data. As a result, query programs will be able to access large amounts of data more quickly.
In one embodiment of the present invention, a method for minimizing system resources used to decompress read-only compressed analytic data in a relational database table may comprise the step of generating an i-code list from a control block, where the control block stores information regarding compression algorithms used to compress data in the relational database table, where the control block further stores parameters used to decompress compressed data in the relational database table. The method may further comprise converting the i-code list associated with the relational database table into a programming language. The method may further comprise compiling the programming language into object code. The method may further comprise storing the object code in a module stored in an executable library in a user's system enabling the object code to be reused for all subsequent query accesses.
In another embodiment of the present invention, a method for minimizing system resources used to decompress read-only compressed analytic data in a relational database table may comprise the step of compiling a programming code for the relational database table on a load machine if there is a compiler installed on the load machine that is compatible with a compiler installed on a user's machine, where an i-code list associated with the relational database table is converted into the programming code. The method may further comprise compiling the programming code for the relational database table on the user's machine if there is not a compiler installed on the load machine that is compatible with the compiler installed on the user's machine and if the compiler installed on the user's machine is compatible with the programming code. The method may further comprise installing on the user's machine a first package, where the first package comprises compressed data for the relational database table. The first package may further comprise a control block for the relational database table, where the control block stores information regarding compression algorithms used to compress data in the relational database table, where the control block further stores algorithms and parameters used to decompress compressed data in the relational database table. The first package may further comprise the i-code list associated with the relational database table; the programming code; and the compiled programming code.
In another embodiment of the present invention, a method for minimizing system resources used to decompress read-only compressed analytic data in a relational database table may comprise the step of executing an i-code interpreter to execute an i-code list associated with the relational database table, where the i-code list is generated from a control block, where the control block stores information regarding compression algorithms used to compress data in the relational database table, where the control block further stores parameters used to decompress compressed data in the relational database table, where the i-code interpreter is executed to execute the i-code list if there is not a compiled programming code installed on a user's machine, where the compiled programming code is converted from the i-code list. The method may further comprise calling an operating system to load the compiled programming code if the compiled programming code is installed on the user's machine.
The foregoing has outlined rather generally the features and technical advantages of one or more embodiments of the present invention in order that the detailed description of the invention that follows may be better understood. Additional features and advantages of the invention will be described hereinafter which may form the subject of the claims of the invention.
A better understanding of the present invention can be obtained when the following detailed description is considered in conjunction with the following drawings, in which:
The present invention comprises a method, computer program product and system for minimizing system resources used to decompress read-only compressed analytic data in a relational database table. In one embodiment of the present invention, an i-code list associated with a relational database table is converted into a programming language. During an install procedure, the programming language is compiled into object code and stored in a module in the user's system. The object code may be called by all subsequent query processing with a pointer designating the particular row in the database containing the compressed data to be decompressed. The compressed data designated by the pointer may then be decompressed upon execution of the object code. By having the source code for decompressing the compressed data in the relational database table stored as object code in the user's system, the interpretation step (as used in the i-code method) is avoiding thereby reducing the number of machine cycles used to decompress the compressed data. As a result, query programs will be able to access large amounts of data more quickly.
In the following description, numerous specific details are set forth to provide a thorough understanding of the present invention. However, it will be apparent to those skilled in the art that the present invention may be practiced without such specific details. In other instances, well-known circuits have been shown in block diagram form in order not to obscure the present invention in unnecessary detail. For the most part, details considering timing considerations and the like have been omitted inasmuch as such details are not necessary to obtain a complete understanding of the present invention and are within the skills of persons of ordinary skill in the relevant art.
FIG. 1—Computer SystemReferring to
Referring to
Referring to
Implementations of the invention include implementations as a computer system programmed to execute the method or methods described herein, and as a computer program product. According to the computer system implementations, sets of instructions for executing the method or methods may be resident in the random access memory 114 of one or more computer systems configured generally as described above. Until required by computer system 100, the set of instructions may be stored as a computer program product in another computer memory, for example, in disk unit 120. Furthermore, the computer program product may also be stored at another computer and transmitted when desired to the user's workstation by a network or by an external network such as the Internet. One skilled in the art would appreciate that the physical storage of the sets of instructions physically changes the medium upon which it is stored so that the medium carries computer readable information. The change may be electrical, magnetic, chemical or some other physical change.
As stated in the Background Information section, methods used in selectively decompressing column data in relational data tables, such as decompressing the compressed data row by row and then column by column within each row, use an excessive number of computer cycles to decompress the compressed data resulting in poor performance for query programs that access large amounts of data. By reducing the number of machine cycles used to decompress compressed data, query programs will be able to access large amounts of data more quickly. Therefore, there is a need in the art for further reducing the system resources (e.g., machine cycles) used to decompress read-only analytic data in a relational database table.
Decompressing read-only analytic data in a relational database table using minimal system resource may be accomplished using the methods/processes discussed below in association with
Referring to
In step 202, application 150 (running on user's machine 100 in one embodiment or running on load machine 100 in another embodiment) converts the i-code list associated with a relational database table into a programming language, e.g., C programming language. In step 203, application 150 compiles the programming language into object code, i.e., machine code.
In step 204, application 150 stores the object code in a module, which is stored in a executable library in user's system 100, thereby enabling the object code to be reused for all subsequent query accesses. In one embodiment, the module resides in RAM 114.
In one embodiment, application 150 resides in user's machine 100 and executes all of the procedures of method 200. In another embodiment, application 150 resides in a separate machine, referred to herein as the “load machine” 100, and executes only the i-code generation, c-code generation and c-code compile steps (steps 201-204).
In step 205, uncompression program 150 of user's machine 100 calls the module which stores the object code with a pointer to the compressed row of data. That is, uncompression program 150 of user's machine 100 calls the module that stores the object code, where the call includes a pointer to the compressed row of data to be decompressed.
In step 206, uncompression program 150 of user's machine 100 decompresses the compressed row of data (compressed row of data that was identified by the pointer) using the stored object code.
In this manner, uncompression program 150 simply issues calls to the module as uncompression program 150 decompresses each row in the relational database table, column by column, without interpreting commands as in the i-code method. By having the source code for decompressing the compressed data in the relational database table stored as object code in the user's machine, the interpretation step (as used in the i-code method) is avoiding thereby reducing the number of machine cycles used to decompress the compressed data. As a result, query programs will be able to access large amounts of data more quickly.
It is noted that method 200 may include other and/or additional steps that, for clarity, are not depicted. It is further noted that method 200 may be executed in a different order presented and that the order presented in the discussion of
The amount of system resources used in decompressing read-only analytic data in a relational database table may also be reduced by installing a package on user's machine 100, where the package includes the necessary resources to decompress the compressed data in the relational database table, as discussed below in association with
FIG. 3—Method for Installing a Package on a User's Machine which Includes the Necessary Resources to Decompress the Compressed Data in a Relational Database Table
Referring to
If there is a compiler in load machine 100 that is compatible with the compiler in user's machine 100, then, in step 302, application 150 compiles the programming code for the relational database table on load machine 100. That is, if there is a compiler in load machine 100 that is compatible with the compiler in user's machine 100, then application 150 compiles the programming code which is converted from an i-code list as discussed above in step 202 of
In step 303, application 150 moves the install package to user's machine 100.
In step 304, application 150 installs the package in files on user's machine 100 where the package includes the following: compressed data for the relational database table; control block for the relational database table; i-code list for the relational database table; programming code for the relational database table; and the compiled programming code for the relational database table.
In step 305, application 150 installs the compiled code in an executable program module (e.g., dynamic load library such as in Windows™ operating system). By installing such a package on user's machine 100, uncompression program 150 simply issues calls to the compiled programming code (as discussed above in step 205 of method 200) as uncompression program 150 decompresses each row in the relational database table, column by column, and then proceeds to the next row. By having the source code for decompressing the compressed data in the relational database table stored as object code in user's machine 100, the interpretation step (as used in the i-code method) is avoiding thereby reducing the number of machine cycles used to decompress the compressed data. As a result, query programs will be able to access large amounts of data more quickly.
If, however, there is not a compiler in load machine 100 that is compatible with the compiler in user's machine 100, then, in step 306, application 150 moves the install package to user's machine 100.
In step 307, application 150 determines if there is a compiler installed on user's machine 100 that is compatible with the programming code. For example, if the programming code is written in C, then, application 150 may determine if there is a C compiler installed on user's machine 100.
If there is a compiler in user's machine 100 that is compatible with the programming code, then, in step 308, application 150 compiles the programming code for the relational database table on user's machine 100. That is, if there is a compiler in user's machine 100 that is compatible with the programming code, then application 150 compiles the programming code which is converted from an i-code list as discussed above in step 202 of
In step 309, application 150 installs the package in files on user's machine 100 where the package includes the following: compressed data for the relational database table; control block for the relational database table; i-code list for the relational database table; programming code for the relational database table; and the compiled programming code for the relational database table.
In step 310, application 150 installs the compiled code in an executable program module (e.g., dynamic load library such as in Windows™ operating system). By installing such a package on user's machine 100, uncompression program 150 simply issues calls to the compiled programming code (as discussed above in step 205 of method 200) as uncompression program 150 decompresses each row in the relational database table, column by column, and then proceeds to the next row. By having the source code for decompressing the compressed data in the relational database table stored as object code in user's machine 100, the number of machine cycles used to decompress the compressed data is further reduced in relation to the methods discussed in the Background Information section. As a result, query programs will be able to access large amounts of data more quickly.
If, however, there is not a compiler in user's machine 100 that is compatible with the programming code, then, in step 311, application 150 installs a package in files (e.g., stored on disk unit 120) on user's machine 100 where the package includes the following: compressed data for the relational database table; control block for the relational database table; and the i-code list for the relational database table. In this manner, an i-code interpreter stored on user's machine 100 (such as on disk unit 120) can decompress the compressed data in the relational database table by executing the i-code list (string of commands) as the uncompression program decompresses the compressed data for each row of the relational database table, row by row and column by column within the row.
It is noted that method 300 may include other and/or additional steps that, for clarity, are not depicted. It is further noted that method 300 may be executed in a different order presented and that the order presented in the discussion of
The uncompression program may perform the decompression of the compressed data in the relational database table in connection with the installed package (discussed above in connection with
Referring to
If there does not exist an installed compiled programming code in user's machine 100 that is associated with a relational database table, then, in step 402, uncompression program 150 executes the i-code interpreter component to execute the i-code list. As discussed in method 300, the package installed on user's machine 100 may not include compiled programming code but instead include an i-code list as indicated in step 307. The package may not include compiled programming code if there is not a compiler installed on user's machine 100 that is compatible with the programming code. As a result, uncompression program 150 may execute the i-code list by executing the i-code interpreter which may be part of uncompression program 150. In one embodiment, the i-code interpreter may reside in disk unit 120.
If, however, there does exist an installed compiled programming code in user's machine 100 that is associated with a relational database table, then, in step 403, uncompression program 150 determines if there is an override request included in the query request. An override request may refer to a request to not execute the compiled programming code but instead execute either the i-code list or implement the control block method, as discussed above, such as for testing purposes. That is, an override request may refer to a request to either execute the i-code list or implement the control block method, as discussed above, as a fallback or diagnostic mode of processing.
If there is not an override request, then, in step 404, uncompression program 150 calls the compiled programming code for each row in the relational database table that needs to be uncompressed, row by row. In this manner, as explained above, the compressed data in the relational database table becomes decompressed as needed.
If, however, there is an override request, then, in step 405, uncompression program 150 either executes the i-code method or the control block method discussed above in the Background Information section. The package installed on user's machine 100 includes both the control block for the relational database table as well as the i-code list for the relational database table as discussed above in connection with steps 303, 306 and 307 in method 300.
It is noted that method 400 may include other and/or additional steps that, for clarity, are not depicted. It is further noted that method 400 may be executed in a different order presented and that the order presented in the discussion of
Although the method, computer program product and system are described in connection with several embodiments, it is not intended to be limited to the specific forms set forth herein, but on the contrary, it is intended to cover such alternatives, modifications and equivalents, as can be reasonably included within the spirit and scope of the invention as defined by the appended claims. It is noted that the headings are used only for organizational purposes and not meant to limit the scope of the description or claims.
Claims
1-24. (canceled)
25. A method for minimizing system resources used to decompress read-only compressed analytic data comprising the steps of:
- executing an i-code interpreter to execute an i-code list associated with said relational database table, wherein said i-code list is generated from a control block, wherein said control block stores information regarding compression algorithms used to compress data in said relational database table, wherein said control block further stores parameters used to decompress compressed data in said relational database table, wherein said i-code interpreter is executed to execute said i-code list if there is not a compiled programming code installed on a user's machine, wherein said compiled programming code is converted from said i-code list; and
- calling said compiled programming code for each row in said relational database table that needs to be uncompressed.
26. The method as recited in claim 25 further comprising the step of:
- executing one of said i-code list and said control block if a user of said user's machine requests to execute one of said i-code list and said control block.
27. A computer program product embodied in a computer readable medium for minimizing system resources used to decompress read-only compressed analytic data comprising the programming steps of:
- executing an i-code interpreter to execute an i-code list associated with said relational database table, wherein said i-code list is generated from a control block, wherein said control block stores information regarding compression algorithms used to compress data in said relational database table, wherein said control block further stores parameters used to decompress compressed data in said relational database table, wherein said i-code interpreter is executed to execute said i-code list if there is not a compiled programming code installed on a user's machine, wherein said compiled programming code is converted from said i-code list; and
- calling said compiled programming code for each row in said relational database table that needs to be uncompressed.
28. The computer program product as recited in claim 27 further comprising the programming step of:
- executing one of said i-code list and said control block if a user of said user's machine requests to execute one of said i-code list and said control block.
29. A system, comprising:
- a memory unit for minimizing system resources used to decompress read-only compressed analytic data; and
- a processor coupled to said memory unit, wherein said processor, responsive to said computer program, comprises: circuitry for executing an i-code interpreter to execute an i-code list associated with said relational database table, wherein said i-code list is generated from a control block, wherein said control block stores information regarding compression algorithms used to compress data in said relational database table, wherein said control block further stores parameters used to decompress compressed data in said relational database table, wherein said i-code interpreter is executed to execute said i-code list if there is not a compiled programming code installed on said system, wherein said compiled programming code is converted from said i-code list; and circuitry for calling said compiled programming code for each row in said relational database table that needs to be uncompressed.
30. The system as recited in claim 29, wherein said processor further comprises:
- circuitry for executing one of said i-code list and said control block if a user of said system requests to execute one of said i-code list and said control block.
Type: Application
Filed: Sep 3, 2010
Publication Date: Dec 30, 2010
Applicant: QD TECHNOLOGY, LLC (Clifton, NJ)
Inventor: Jack Edward Olson (Austin, TX)
Application Number: 12/875,376
International Classification: G06F 17/30 (20060101);