FAST SELECT FOR FETCH FIRST N ROWS WITH ORDER BY
Fetching an ordered first N rows of a table, includes: reading a row in the table; determining that the read row qualifies as the first N rows of the table for rows read so far, and storing data of the read row; and determining an order of data of qualifying rows and storing the order. Only data in rows that qualify to be among the first N rows are ordered and stored. This provides a significantly more efficient processing. It eliminates tournament tree sorts, corresponding work file read and write I/O's, and associated CPU time. This reduces the time for the running of a query and improves the performance of other applications sharing the use of work files. Further, the improved performance is particular significant when the buffer sizes are within a page of records or rows, although multiple pages may be used.
Latest IBM Patents:
This application claims the benefit of priority under 35 USC 120 as a continuation application to U.S. application Ser. No. 11/222,894, entitled “FAST SELECT FOR FETCH FIRST N ROWS WITH ORDER BY,” filed on Sep. 8, 2005, the contents of which are hereby incorporated by reference in its entirety.
FIELD OF THE INVENTIONThe present invention relates to the sorting of rows in a table, and more particularly to the fetching of first N rows without sorting all the rows of the table.
BACKGROUND OF THE INVENTIONThe Fetch First N Rows Only clause in DB2™ enables optimized processing, especially for queries with potentially large result sets when only a limited number of resulting rows is requested. For example, in Fetch First 10 Rows with Order By, it would generally be more efficient to use a non-matching non-clustering index rather than a tablespace scan followed by a sort, especially for a large table with many qualifying rows. However, there can be a severe performance problem if there is no index available on an order by key. In this case, for 100 million row table, up to 100 million rows must be read and sorted, which requires many work files read and write I/O's for multiple sort/merge passes. Then only 10 rows would be read from the sorted result work file. This can consume an enormous amount of resources from both the Input/Output device and processor, which not only delays the particular query, but also potentially many other applications which may share the work file data sets.
Accordingly, there exists a need for a method for a fast fetching of ordered first N rows of a table. The method should fetch the ordered first N rows of a table without requiring work file read and write I/O's, thus providing significantly higher efficiency. The present invention addresses such a problem.
SUMMARY OF THE INVENTIONA system and computer readable medium for fetching an ordered first N rows of a table, includes: reading a row in the table; determining that the read row qualifies as the first N rows of the table for rows read so far, and storing data of the read row; and determining an order of data of qualifying rows and storing the order. Only data in rows that qualify to be among the first N rows are ordered and stored. This provides a significantly more efficient processing. It eliminates tournament tree sorts, corresponding work file read and write I/O's, and associated CPU time. This reduces the time for the running of a query and improves the performance of other applications sharing the use of work files. Further, the improved performance is particular significant when the buffer sizes are within a page of records or rows, although multiple pages may be used.
The present invention provides a method for a fast fetching of ordered first N rows of a table. The following description is presented to enable one of ordinary skill in the art to make and use the invention and is provided in the context of a patent application and its requirements. Various modifications to the preferred embodiment will be readily apparent to those skilled in the art and the generic principles herein may be applied to other embodiments. Thus, the present invention is not intended to be limited to the embodiment shown but is to be accorded the widest scope consistent with the principles and features described herein.
To more particularly describe the features of the present invention, please refer to
In this manner, only data in rows that qualify to be among the first N rows are ordered and stored. This provides a significantly more efficient processing. It eliminates tournament tree sorts, corresponding work file read and write I/O's, and associated CPU time. This not only reduces the time for the running of a query, it also improves the performance of other applications sharing the use of work files by preventing performance disruption due to the monopolization of shared work files. Further, the improved performance is particularly significant when the buffer sizes are within a page of records or rows, although multiple pages may be used Efficiency can be further increased by using a binary search of the order buffer 203 in sorting qualified rows until the search area is less than 5 entries, and then using a sequential search thereafter.
Next, a row of a table is read, via step 304. If the data buffer 201 is not yet full, via step 305, then the data and key of the current row is stored in the data buffer 201, via 306, in the next available position. If the position number of the data/key of the current row is the first entry into the order buffer 203, via step 307, then the position number for the data/key is simply stored in the order buffer 203, via step 311. If the position number of the data/key of the current row is not the first entry into the order buffer 203, via step 307, then the entries in the order buffer 303 are searched to determine the new order for the keys in the data buffer 201, via step 308. Optionally, if duplicates are to be avoided, the qualifying row is discarded if an equal key is found in the order buffer 203. The position numbers in the order buffer 303 are then shifted to reflect the new order, via step 309. If there are more rows in the table to be read, via step 310, the method repeats steps 304 through 311 until the data buffer 201 is full.
When a row is read and the data buffer is full, via steps 304 and 305, the key of the current row is compared with the key of the last element in the order buffer 203, via step 312. The last element would be the position number of the highest key in the current order. If the key of the current row is higher than the key of the last element, via step 313, then it does not qualify as one of the first N rows. This row is discarded, via step 314, and the next row is read, via step 304. If the key of the current row is lower than the key of the last element, via step 313, then the data and key in the data buffer 201 corresponding to the last element is replaced with the data and key of the current row, via step 315. The data/key of the current row thus will have the same position number as the data/key of the last element it replaced. A search of the order buffer 203 is then performed to determine the new order for the keys in the data buffer 201, via step 308, and the position numbers in the order buffer 203 are shifted accordingly, via step 309. Steps 304 through 315 are then repeated until all rows of the table have been read.
For example, as illustrated in
SELECT C1,C2 FROM T3 ORDER BY C1 FETCH FIRST 10 ROWS ONLY;.
Assume that C1 is a char(1) not null and C2 is a char(2) not null.
First, the data buffer 201 is set up, via step 301. Here, the buffer length=(3+1)×10=40 bytes, where 3 is the length of C1 and C2, 1 is the length of the key C1, and 10 is the number of rows to be fetched. The offsets are stored in the offset buffer 202, via step 302, and the order buffer 203 is set up, via step 303.
Assume that a row is read, via step 304, with C1=F, C2=01 and key=F. As illustrated in
As illustrated in
As illustrated in
As illustrated in
As illustrated in
As illustrated in
Steps 304 through 311 are repeated for the next rows, with C1=T, A, D, and S and C2=07, 08, 09, 10 respectively. The resulting data buffer 201 and order buffer 203 are illustrated in
As illustrated in
As illustrated in
As illustrated in
Although the method is described in the context of the buffers above, one of ordinary skill in the art will understand that other implementations are possible without departing from the spirit and scope of the present invention. For example, only a data buffer may be used, where the data/keys stored in the data buffer are moved each time the order changes. Alternatively, only the data buffer and the offset buffer may be used, where the offsets moved each time the order changes.
A method for a fast fetching of ordered first N rows of a table is disclosed.
Although the present invention has been described in accordance with the embodiments shown, one of ordinary skill in the art will readily recognize that there could be variations to the embodiments and those variations would be within the spirit and scope of the present invention. Accordingly, many modifications may be made by one of ordinary skill in the art without departing from the spirit and scope of the appended claims.
Claims
1. A computer readable medium encoded with program instructions for fetching an ordered first N rows of a table, wherein N is a number of rows to the fetched, the program instructions configured for:
- (a) reading a row in the table;
- (b) determining that the read row qualifies as the first N rows of the table for rows read so far, and storing data of the read row; and
- (c) determining an order of data of qualifying rows and storing the order, comprising: (c1) searching an order buffer to determine a new order of keys for the qualifying rows, wherein the order buffer stores position numbers corresponding to data and keys of the qualifying rows, and (c2) shifting the position numbers in the order buffer to reflect the new order.
2. The computer readable medium of claim 1, wherein the reading (a) comprises:
- (a1) setting up a data buffer for storing data and keys for the qualifying rows;
- (a2) storing in an offset buffer offsets to a base address of the data buffer corresponding to the data and the keys stored in the data buffer; and
- (a3) setting up an order buffer for storing position numbers corresponding to the data and the keys stored in the data buffer.
3. The computer readable medium of claim 2, wherein a length of the data buffer is equal to the lengths of the data and the key for a qualifying row multiplied by N.
4. The computer readable medium of claim 1, wherein the determining (b) comprises:
- (b1) determining that a data buffer is not full, wherein the data buffer stores data and keys for the qualifying rows; and
- (b2) storing the data and a key of the read row in the data buffer.
5. The computer readable medium of claim 1, wherein the determining (b) comprises:
- (b1) determining that a data buffer is full, wherein the data buffer stores data and keys for the qualifying rows;
- (b2) comparing a key of the read row with a key of a last element in an order buffer, wherein the order buffer stores position numbers corresponding to the data and the keys stored in the data buffer; and
- (b3) replacing a data and the key in the data buffer corresponding to the last element with the data and the key of the read row, if the key of the read row is lower than the key of the last element.
6. The computer readable medium of claim 5, wherein the determining (b) further comprises:
- (b4) discarding the read row if the key of the read row is higher than the key of the last element.
7. The computer readable medium of claim 1, wherein the determining (c) comprises:
- (c1) determining a position number is a first or second entry in an order buffer, wherein the position number corresponds to the data and a key of the read row, wherein the order buffer stores position numbers corresponding to data and keys of the qualifying rows; and
- (c2) storing the position number in the order buffer.
8. The computer readable medium of claim 1, further comprising:
- (d) discarding the read row if the read row does not qualify as the first N rows of the table for rows read so far.
Type: Application
Filed: Aug 3, 2008
Publication Date: Nov 20, 2008
Applicant: International Business Machines Corporation (Armonk, NY)
Inventors: Akiko Hoshikawa (San Jose, CA), Allan B. Lebovitz (Morgan Hill, CA), Akira Shibamiya (Los Altos, CA)
Application Number: 12/185,102
International Classification: G06F 7/06 (20060101); G06F 17/30 (20060101);