Systems And Methods For Performing A Query On A Distributed Database
Systems and methods are provided for performing a query in a distributed system. In one example, a query processor receives an instruction to perform a database operation involving a query. Based on an identification of a correlated subquery within the query, the query processor modifies the correlated subquery by replacing at least one correlated variable with a parameter or updatable constant. The modified subquery is sent to an external database for execution, where the external database is identified in the correlated subquery. The results of the modified subquery are received at the query processor from the external database and are used to execute the query. The correlated subquery includes a conditional relationship between a column in a first set of data and a column in a second set of data, wherein the first and second sets of data are stored in different external databases.
This application is a continuation-in-part of U.S. patent application Ser. No. 13/168,424, filed on Jun. 24, 2011, and titled “Systems and Methods for Performing Index Joins Using Auto Generative Queries, the entirety of which is incorporated herein by reference.
FIELDThe technology described in this patent document relates generally to computer-implemented database systems.
BACKGROUNDIn the field of query processing, tables can often be stored on multiple databases, as illustrated in
In accordance with the teachings described herein, systems and methods are provided for performing a query in a distributed system. In one example, a query processor receives an instruction to perform a database operation involving a query. Based on an identification of a correlated subquery within the query, the query processor modifies the correlated subquery by replacing at least one correlated variable with a parameter. The modified subquery is sent by the query processor to an external database for execution, where the external database is identified in the correlated subquery. The results of the modified subquery are received at the query processor from the external database, and the query processor uses the results to execute the query. The correlated subquery includes a conditional relationship between a column in a first set of data and a column in a second set of data, wherein the first and second sets of data are stored in different external databases.
In another example, a system or method for performing a query in a distributed database system may execute the following steps: (a) receiving, at a query processor, an instruction to perform a database operation involving an outer query and a subquery, the subquery including a conditional relationship between a column in a first set of data and a column in a second set of data, wherein the first and second sets of data are stored in separate external databases; (b) automatically modifying the subquery to generate a modified subquery by replacing the column in the first set of data with a parameter or updatable constant; (c) substituting a value from a row of the first set of data for the parameter or updatable constant; (d) causing the modified subquery to be executed by the external database to identify one or more rows from the second set of data that satisfy the modified subquery; (e) receiving, at the query processor, results of the modified subquery from the external database; and (f) repeating steps (c), (d) and (e) for each row in the first set of data
In operation, the query processor 102 utilizes parameters in order to push portions of a subquery operation to the database 106 by substituting, in place of a correlated column of the subquery, a parameter that can be updated for each row of the parent query. After the row is updated with the value from the column of the parent query, the query processor then re-executes the query against the database 106. Specifically, in the illustrated embodiment, the query processor receives an instruction 114 to perform a database operation involving an outer query between a column of data in Table A and a column of data in Table B. The received instruction 114 further includes a subquery that defines a conditional relationship between the column of data in Table A and a column of data in Table C. The query processor 102 then automatically modifies the subquery by replacing the conditional expression with a parameterized clause, and then submits the modified subquery 116 to the database 106. For instance, the query processor 102 may modify the subquery by replacing the correlated column of data in Table A with the parameter “?”.
If the database 106 accepts the modified subquery 116, then the query processor 102 causes a value from the correlated column of data in Table A 108 to be substituted for the parameter. For example, the value may be copied into a memory space allocated at the query processor 102, and the query processor 102 may provide the memory address to the database 106 so that the value may be retrieved by the database 106 and substituted for the parameter in the modified subquery 116. The subquery with the substituted value 118 is then executed against Table C and the results are returned to the query processor 102 for use in performing the outer query and generating the query results 120.
To help illustrate the operation of the system 100 shown in
-
- select * from C where ?>C.Z
The where clause in the above subquery defines a conditional relationship (>) between column C.Z and the parameter “?”. If the parameterized subquery is accepted by the database 106, then the query processor 102 can cause values from each row for the column A.X from Table A 108 to be substituted for the parameter “?”, and cause the query with each substituted value to be executed on Table C. An example of this substitution process is described below with reference toFIGS. 4-6 .
- select * from C where ?>C.Z
To help illustrate the operation of the system 200 of
-
- select * from C where <value>>C.Z
The where clause in the above subquery defines a conditional relationship (>) between column C.Z and the updatable constant <value>, where <value> points within the internal query form to an allocated memory location. The query processor 208 can then substitute values from each row of column A.X of Table A 210 for the updatable constant, and cause the subquery to be executed on Table C, only returning rows that satisfy the where clause. The results of the subquery 202 may then be used to perform the outer query and generate the query results 212.
- select * from C where <value>>C.Z
As shown in
It should be understood that in the case where the query includes an “exists” condition, as in the example query discussed herein, the query processor will only check with the database to determine if matching rows exist in C.Z. For instance, in the example shown in
-
- select B.Y from B where B.Y=?
The where clause in the above subquery defines a conditional relationship (=) between column B.Y and the parameter “?”. If the parameterized subquery is accepted by the database 408, then the query processor 402 causes values from each row from column A.Z from Table A 410 to be substituted for the parameter “?” in the processor 402, and causes the query with each substituted value to be executed on Table B.
- select B.Y from B where B.Y=?
As shown in
It should be understood that the example illustrated in
At 606, a row is fetched from the correlated column in the first set of data (e.g., A.X.), and the fetched value is substituted for the parameter or updatable constant at 608. The modified subquery is then executed by the external database at 610 to identify one or more rows from the second set of data that satisfy the modified subquery. The result(s) of the modified subquery are returned to the query processor at 612. The subquery results are used to execute the outer query at 614. Steps 606-614 are then repeated for each row in the first set of data.
At 708, the query is processed to identify one or more correlated subqueries. If no correlated subqueries are identified, the method proceeds to 706 and the query is processed in a typical fashion (e.g., without using parameters.) Otherwise, if one or more correlated subqueries are identified, then the method proceeds to 709. At 709, the method determines if the database(s) on which the query is to be executed supports parameters. If so, then the method proceeds to 710. If parameters are not supported, then the method proceeds to 711.
At 710, a parameter list is established to provide a parameter for each correlated variable that is identified within the one or more correlated subqueries, and a modified subquery is generated by replacing the one or more correlated variables in the received query with the parameters from the parameter list. For instance, in the example described above with reference to
At 711, an empty where clause is created to support an updateable constant value, and a modified subquery is generated by replacing the one or more correlated variables in the received query with the updatable constant. For instance, in the example described above with reference to
At 712, an attempt is made to push the modified subquery to the external database for processing. If the push fails, then the parameter list at is cleared at 714 (if the database supports parameters) and the method proceeds to 706 and the query is processed in a typical fashion. If the push succeeds, however, then the one or more correlated subqueries are executed on the external database at 716, for example using the method of
A disk controller 860 interfaces one or more optional disk drives to the system bus 852. These disk drives may be external or internal floppy disk drives such as 862, external or internal CD-ROM, CD-R, CD-RW or DVD drives such as 864, or external or internal hard drives 866. As indicated previously, these various disk drives and disk controllers are optional devices.
Each of the element managers, real-time data buffer, conveyors, file input processor, database index shared access memory loader, reference data buffer and data managers may include a software application stored in one or more of the disk drives connected to the disk controller 860, the ROM 856 and/or the RAM 858. Preferably, the processor 854 may access each component as required.
A display interface 868 may permit information from the bus 852 to be displayed on a display 870 in audio, graphic, or alphanumeric format. Communication with external devices may optionally occur using various communication ports 872.
In addition to the standard computer-type components, the hardware may also include data input devices, such as a keyboard 873, or other input device 874, such as a microphone, remote control, pointer, mouse and/or joystick.
This written description uses examples to disclose the invention, including the best mode, and also to enable a person skilled in the art to make and use the invention. The patentable scope of the invention may include other examples. Additionally, the methods and systems described herein may be implemented on many different types of processing devices by program code comprising program instructions that are executable by the device processing subsystem. The software program instructions may include source code, object code, machine code, or any other stored data that is operable to cause a processing system to perform the methods and operations described herein. Other implementations may also be used, however, such as firmware or even appropriately designed hardware configured to carry out the methods and systems described herein.
The systems' and methods' data (e.g., associations, mappings, data input, data output, intermediate data results, final data results, etc.) may be stored and implemented in one or more different types of computer-implemented data stores, such as different types of storage devices and programming constructs (e.g., RAM, ROM, Flash memory, flat files, databases, programming data structures, programming variables, IF-THEN (or similar type) statement constructs, etc.). It is noted that data structures describe formats for use in organizing and storing data in databases, programs, memory, or other computer-readable media for use by a computer program.
The computer components, software modules, functions, data stores and data structures described herein may be connected directly or indirectly to each other in order to allow the flow of data needed for their operations. It is also noted that a module or processor includes but is not limited to a unit of code that performs a software operation, and can be implemented for example as a subroutine unit of code, or as a software function unit of code, or as an object (as in an object-oriented paradigm), or as an applet, or in a computer script language, or as another type of computer code. The software components and/or functionality may be located on a single computer or distributed across multiple computers depending upon the situation at hand.
It should be understood that as used in the description herein and throughout the claims that follow, the meaning of “a,” “an,” and “the” includes plural reference unless the context clearly dictates otherwise. Also, as used in the description herein and throughout the claims that follow, the meaning of “in” includes “in” and “on” unless the context clearly dictates otherwise. Finally, as used in the description herein and throughout the claims that follow, the meanings of “and” and “or” include both the conjunctive and disjunctive and may be used interchangeably unless the context expressly dictates otherwise; the phrase “exclusive or” may be used to indicate situation where only the disjunctive meaning may apply.
Claims
1. A method for performing a query in a distributed database system, comprising:
- (a) receiving, at a query processor, an instruction to perform a database operation involving an outer query and a subquery, the subquery including a conditional relationship between a column in a first set of data and a column in a second set of data, wherein the first and second sets of data are stored in separate external databases;
- (b) automatically modifying the subquery to generate a modified subquery by replacing the column in the first set of data with a parameter or updatable constant;
- (c) substituting a value from a row of the first set of data for the parameter or updatable constant;
- (d) causing the modified subquery to be executed by the external database to identify one or more rows from the second set of data that satisfy the modified subquery;
- (e) receiving, at the query processor, results of the modified subquery from the external database; and
- (f) repeating steps (c), (d) and (e) for each row in the first set of data.
2. The method of claim 1, further comprising:
- performing the database operation at the query processor using the results of the modified subquery.
3. The method of claim 1, wherein the subquery includes a where expression that includes the conditional relationship between the column in the first set of data and the column in the second set of data.
4. The method of claim 1, wherein the outer query includes a join operation between columns in the first set of data and a third set of data.
5. The method of claim 1, wherein:
- the subquery comprises, where A.X<RO>B.Y, wherein A.X is a first variable that identifies the column in the first set of data, <RO> is a relational operator, and B.Y is a second variable that identifies the column in the second set of data; and
- the modified subquery comprises, where ?<RO>B.Y, wherein ? is the parameter.
6. The method of claim 1, wherein:
- the subquery comprises, where A.X<RO>B.Y, wherein A.X is a first variable that identifies the column in the first set of data, <RO> is a relational operator, and B.Y is a second variable that identifies the column in the second set of data; and
- the modified subquery comprises, where <value><RO>B.Y, wherein <value> is the updatable constant.
7. A method for performing a query in a distributed database system, comprising:
- receiving, at a query processor, an instruction to perform a database operation involving a query;
- based on an identification of a correlated subquery within the query, modifying the correlated subquery to generate a modified subquery by replacing at least one correlated variable with a parameter or updatable constant;
- sending the modified subquery to an external database for execution, the external database being identified in the correlated subquery;
- receiving, at the query processor, results of the modified subquery from the external database; and
- executing the query, at the query processor, using the results received from the external database.
8. The method of claim 7, wherein the correlated subquery includes a conditional relationship between a column in a first set of data and a column in a second set of data, wherein the first and second sets of data are stored in different external databases.
9. The method of claim 8, wherein:
- the correlated subquery comprises, where A.X<RO>B.Y, wherein A.X is a first variable that identifies the column in the first set of data, <RO> is a relational operator, and B.Y is a second variable that identifies the column in the second set of data; and
- the modified subquery comprises, where ?<RO>B.Y, wherein ? is the parameter.
10. A system for performing a query in a distributed database system, comprising:
- a processor;
- a memory;
- a database management application stored in the memory and executable by the processor, when executed, the database management application being configured to: receive an instruction to perform a database operation involving a query; based on an identification of a correlated subquery within the query, modify the correlated subquery to generate a modified subquery by replacing at least one correlated variable with a parameter; send the modified subquery to an external database for execution, the external database being identified in the correlated subquery; receive results of the modified subquery from the external database; and execute the query using the results received from the external database.
11. The system of claim 10, wherein the correlated subquery includes a conditional relationship between a column in a first set of data and a column in a second set of data, wherein the first and second sets of data are stored in different external databases.
12. The system of claim 11, wherein:
- the correlated subquery comprises, where A.X<RO>B.Y, wherein A.X is a first variable that identifies the column in the first set of data, <RO> is a relational operator, and B.Y is a second variable that identifies the column in the second set of data; and
- the modified subquery comprises, where ?<RO>B.Y, wherein ? is the parameter.
13. A system for performing a query in a distributed database system, comprising:
- a processor;
- a memory;
- a database management application stored in the memory and executable by the processor, when executed, the database management application being configured to: (a) receive an instruction to perform a database operation involving an outer query and a subquery, the subquery including a conditional relationship between a column in a first set of data and a column in a second set of data, wherein the first and second sets of data are stored in separate external databases; (b) automatically modify the subquery to generate a modified subquery by replacing the column in the first set of data with a parameter or updatable constant; (c) substitute a value from a row of the first set of data for the parameter or updatable constant; (d) cause the modified subquery to be executed by the external database to identify one or more rows from the second set of data that satisfy the modified subquery; (e) receive results of the modified subquery from the external database; and (f) repeat steps (c), (d) and (e) for each row in the first set of data.
14. The system of claim 13, wherein the database management system is further configured to perform the database operation at the query processor using the results of the modified subquery.
15. The system of claim 14, wherein the subquery includes a where expression that includes the conditional relationship between the column in the first set of data and the column in the second set of data.
16. The system of claim 13, wherein the outer query includes a join operation between columns in the first set of data and a third set of data.
17. The system of claim 13, wherein:
- the subquery comprises, where A.X<RO>B.Y, wherein A.X is a first variable that identifies the column in the first set of data, <RO> is a relational operator, and B.Y is a second variable that identifies the column in the second set of data; and
- the modified subquery comprises, where ?<RO>B.Y, wherein ? is the parameter.
18. The system of claim 13, wherein:
- the subquery comprises, where A.X<RO>B.Y, wherein A.X is a first variable that identifies the column in the first set of data, <RO> is a relational operator, and B.Y is a second variable that identifies the column in the second set of data; and the modified subquery comprises, where <value><RO>B.Y, wherein <value> is the updatable constant.
Type: Application
Filed: Oct 14, 2011
Publication Date: Dec 27, 2012
Inventor: Douglass Adam Christie (Morrisville, NC)
Application Number: 13/273,875
International Classification: G06F 17/30 (20060101);