ARCHITECTURE OF HYBRID IN-MEMORY AND PAGED DICTIONARY
Disclosed herein are system, method, and computer program product embodiments for identifying and loading a relevant page of a dictionary into temporary memory. An embodiment operates by receiving a query to be executed. The query includes a value for executing the query. The server queries a dictionary to retrieve a value ID. The server executes a binary search on a helper vector of the dictionary based on the value. The helper vector includes a last value for each page of a dictionary. The server identifies a page of the dictionary including the value. The server loads the page into temporary memory and retrieves the value ID of the value from the page. The server executes the query on a column using the value ID.
This application claims priority to U.S. Provisional Application No. 62/858,693, filed on Jun. 7, 2019, the contents of which are incorporated herein in their entirety.
BACKGROUNDHybrid in-memory and paged storage configurations allow storage of data in dynamic storage devices as well as persistent storage. However, providing on-demand access to such data requires loading the data stored in persistent storage to in-memory storage. This can leave a large memory footprint and can be burdensome on a computing system.
For example, to save space, database columns can store compressed values rather than the entire value of data. A dictionary corresponding to a database column can store the entire value. The compressed value can act as a value identifier in the dictionary. However, it can be difficult to retrieve a value from the dictionary as the dictionary can be voluminous and can include multiple different pages. Conventional systems operated to load the entire dictionary in memory when attempting to retrieve the value corresponding to a value ID. But this can be burdensome on a computing system and waste operational resources, such as memory.
The accompanying drawings are incorporated herein and form a part of the specification.
In the drawings, like reference numbers generally indicate identical or similar elements. Additionally, generally, the left-most digit(s) of a reference number identifies the drawing in which the reference number first appears.
DETAILED DESCRIPTIONProvided herein are system, apparatus, device, method and/or computer program product embodiments, and/or combinations and sub-combinations thereof, for identifying and loading a relevant page of a dictionary into temporary memory.
In an embodiment, a server receives a query to be executed. The query includes a value for executing the query. The server executes a query on a dictionary to retrieve a value ID corresponding to the value. The dictionary includes pages including value IDs and corresponding values. The server executes a binary search on a helper vector of the dictionary based on the value. The helper vector includes the last value for each page of a dictionary. The server identifies a page of the dictionary including the value. The server loads the page into temporary memory and retrieves the value ID of the value from the page. The server executes the query on a column using the value ID.
This configuration provides for loading the relevant page which includes the requested value rather than an entire dictionary into memory (in some embodiments, plural relevant pages are loaded). This solves the technical problem of reducing the burden on the computing system. Furthermore, this reduces the memory footprint and improves operational efficiency. The described architecture also provides for a unified persistency format and the ability for transient paged and in-memory structures.
As an example server 100, database 128, and client device 130, can be connected through a network. The network can be an ad hoc network, an intranet, an extranet, a virtual private network (VPN), a local area network (LAN), a wireless LAN (WLAN), a wide area network (WAN), a wireless wide area network (WWAN), a metropolitan area network (MAN), a portion of the Internet, a portion of the Public Switched Telephone Network (PSTN), a cellular telephone network, a wireless network, a WiFi network, a WiMax network, any other type of network, or a combination of two or more such networks.
Server 100 can include a database management system (DMS) 102, temporary memory 104, main memory 108, and secondary memory 123. As an example, main memory 108 can be Random Access Memory (RAM) or any other type of dynamic memory. Secondary memory 123 can be persistent memory such as a non-volatile storage device. Temporary memory 104 can be a buffer or cache memory. The temporary memory 104 may reside within the main memory 108.
Each of the main memory 108, secondary memory 123, and temporary memory 104 can be configured to store portion or entire copies of a dictionary corresponding to a column of the database 128. The column of the database 128 can be configured to store data including compressed versions of certain values (i.e., a value ID) rather than the entire value. The dictionary can store the value ID and the corresponding entire value. Alternatively, the dictionary can store the value and the value ID may be inferred based on the position of the value in the dictionary. In this regard, the dictionary can be used to correlate and retrieve values of different value IDs so that these value IDs can be looked up in the column.
The dictionary can include multiple pages of value IDs and values. An example dictionary 116 can be stored in secondary memory 123. Dictionary 116 can include page 106, 118, and 120. Also, an example dictionary 124 can be stored in main memory 108. Dictionary 124 can store a copy of pages 106-a, 118-a, and 120-a which are a copy of pages 106, 118, and 120 separately or in contiguous memory space. In this regard, dictionary 124 can be an in-memory version of dictionary 116. As described above, main memory 108 can be dynamic memory. In this regard, dictionary 124 may be an in-memory dictionary which offers in-memory processing. Secondary memory 123 may be a persistent storage device. In this regard, dictionary 116 may be a paged dictionary which offers paged processing. This may be referred to as a hybrid storage configuration or hybrid column store.
This hybrid storage configuration offers in-memory processing for performance critical operations and buffer-managed paged processing for less critical, economical data access. This hybrid capability extends up from a unified persistence format which can be used to load paged or in-memory primitive structures, which in turn form paged or in-memory column store structures (data vectors, dictionaries 116 or 124, and indexes) that are ultimately arranged by the hybrid storage configuration or hybrid column store according to each column's load unit configuration. Hybrid columns can be configured to load all in-memory structures, all paged structures, or a mixture of both.
It can be appreciated, a user can control which portions of the dictionary are stored in main memory 108 or secondary memory 123, in whichever configuration necessary. For example, a user can use client device 130 to manage the storage of the dictionaries 116 and 124 in main memory 108 and secondary memory 123. Alternatively, the storage of the dictionaries 116 and 124 can be automatically configured based on workload, data access algorithms, and intelligent algorithms. Dictionary 124 can be referred to as an in-memory dictionary and dictionary 116 can be a paged dictionary.
In an example, an entire copy of dictionary 116 can be stored in main memory 108. Alternatively, a copy of a portion of dictionary 116 can be stored in main memory 108. In another example, different portions of a dictionary can be stored in main memory 108 and secondary memory 123.
Furthermore, columns of database 128 can also be stored in a similar configuration as described above. In an example, the columns of database 128 can be stored in secondary memory 123 and an entire copy of columns can be stored in main memory 108. Alternatively, the columns of database 128 can be stored in secondary memory 123 and a copy of a portion of the columns of database 128 can be stored in main memory 108. In another example, different portions of the columns can be stored in main memory 108 and secondary memory 123.
Dictionary 116 can be a multi-page vector. The multi-page vector can be a type of paged primitive that provides a uniform interface with in-memory counterparts to read-write algorithms. This allows the codebase to seamlessly operate on either format with minimum adaption, hiding the details of operating on data in native store extension (NSE). Furthermore, paged primitives can be enriched with auxiliary and performant search structures and indexes.
A multi-page vector is a large vector that can be stored on a single page chain with fixed-size pages. Having a fixed number of objects per page simplifies identifying the page that has the content for a given vector position. The multi-page vector is configured to store more than one vector on each page chain, with each vector having its metadata. Once a vector is sorted, each multi-page vector can be extended with a helper structure (e.g., helper vector) to facilitate search and to avoid loading pages that are guaranteed not to have a value that does not satisfy the search.
Dictionary 116 can include a value ID index 126. The value ID index 126 can be a data structure that stores the last value ID of each page of the dictionary 116 for data types of variable length. In an embodiment, dictionary 124 may not include a value ID index. The value ID can act as the identifier of the value. As described above, the columns of database 128 can store a value ID representing the value, while the dictionaries can store both the value ID and value. Each page of dictionary 116 and dictionary 124 can store value IDs and the corresponding values.
The values can be fixed data types such as integers, doubles, longs, shorts, or the like. Furthermore, in the event the values are fixed size data types, each page of dictionary 116 can include a fixed amount of value IDs and values. As a non-limiting example, page 106, page 118, and page 120 can each include three value IDs and their corresponding value. Dictionary 116 and dictionary 124 can be used to retrieve a given value ID so that the value ID can be used to execute a query on a given column.
The secondary memory 123 can also include a helper vector 122. Helper vector can also be a paged primitive. In particular, helper vector 122 can be an arbitrary sized item. For an arbitrary sized item, a block is fully stored on a single page. The number of blocks stored on a page depends on block sizes. For data items larger than the page size, data is internally divided into multiple blocks.
Helper vector 122 can also be referred to as a compact memory resident sorted vector. Helper vector 122 is sorted and stores the last value of each page stored in dictionary 116. Helper vector 122 can be used to quickly identify a particular page including a given value.
DMS 102 can include a wrapper 110, unification engine 112, paging engine 114, and query engine 115. DMS 102 can receive and process query requests by retrieving value ID of a given value or value for a given value ID from a given page of a dictionary and executing the query. Wrapper 110 can be configured to determine whether the dictionary to be queried is stored in main memory 108 or secondary memory 123. Unification engine 112 can generate a new helper vector and value index in response to an occurrence of an event save operation of the dictionary from main memory 108 to secondary memory 123. Paging engine 114 can be configured to load a particular page of dictionary 116 into temporary memory 104. Query engine 115 can be configured to execute queries against a given column.
As a non-limiting example, DMS 102 can receive a request to execute a query including a given value, 2.50. The query may be received from client device 130. For example, the request can be to retrieve all plants which are 2.50 inches tall. Wrapper 110 can determine whether to search dictionary 116 or dictionary 124 for the value ID corresponding to 2.50. In some instances, the request can indicate whether to query the in-memory dictionary (e.g., dictionary 124) or paged dictionary (e.g., dictionary 116). In this example, wrapper 110 can determine that dictionary 116 is to be queried.
Paging engine 114 can execute a binary search on helper vector 122 to identify a particular page of dictionary 116 on which the value, 2.50, is located. To execute the binary search, paging engine 114 compares the target value (2.50) to the middle value of the helper vector. If they are not equal, the half in which the value cannot lie is eliminated and the search continues on the remaining half. Paging engine 114 repeats this process taking the middle element to compare to the target value, and repeating this until the target value is found or there is a single value remaining. As described above, helper vector 122 stores the last value of each page in dictionary 116 and the corresponding page. In the event, there is a single value remaining, paging engine 114 can determine that the 2.50 is on the page corresponding to the remaining value. In this example, the paging engine 114 can determine that 2.50 is included in page 106.
Paging engine 114 can load page 106 from secondary memory 123 to temporary memory 104 (which can be from the buffer cache) and retrieve the value ID, 2, for the value 2.50. By doing so, the paging engine 114 avoids having to load the entire dictionary 116 into memory. This reduces the memory footprint. Query engine 115 can execute the query for retrieving all plants that are 2.50 inches, using the value ID, 2.
In an embodiment, DMS 102 can receive a request to materialize a value corresponding to the value identifier observed in a data vector at the desired row position. The request can include the value ID. Since each page includes a fixed amount of values, the paging engine 114 can identify a particular page that includes the value based on the value ID included in the request and the fixed amount of values per page.
In an alternative embodiment, the dictionary 116 can include values that are variable size data types (e.g., VARCHAR). Each page can include a different number of value IDs and corresponding values. The value ID 126 index can store the last value ID of each page. In this regard, in the event, DMS 102 receives a request to materialize a value corresponding to the value ID observed in a data vector at the desired row position, the paging engine 114 can identify a particular page that includes the value based on the last value ID index 126 and the value ID included in the request. For example, the paging engine 114 may execute a binary search on the value ID index 126 using the value identifier to identify the page that includes the value.
In an embodiment, wrapper 110 can receive a request to execute a query from an upper layer. The upper layer may be an API, which is agnostic to the configuration dictionary 116 and dictionary 124. That is, the upper layer may query the dictionary 116 or 124 without distinguishing the dictionary 116 or 124. The lower layer includes services or APIs that are configured to distinguish dictionary 116 or 124 when querying either dictionary. In an example, these requests may specify a particular dictionary to query. The wrapper 110 can determine which dictionary to query. This allows the upper layer to make a single API call to a dictionary and the wrapper 110 can identify which implementation of the dictionary, in-memory (e.g., dictionary 124) or paged (e.g., dictionary 116) to query. This allows for more complex queries on dictionary 116 and 124. For example, wrapper 110 can allow a query for a value that matches a certain pattern rather than a specific value. Furthermore, the wrapper 110 allows such algorithms to be implemented and maintained once, independent of the storage location (in-memory vs paged).
In an embodiment, when dictionary 124 is being saved or persisted from main memory 108 to secondary memory 123, unification engine 112 determines that dictionary 124 is being stored in paged format and can generate auxiliary data structures. The auxiliary data structures can include metadata such as a value ID index 126 and a value index (i.e., the helper vector 122). The value ID index 126 is the last value ID on a page. The value index is the last value on a page. The save operation occurs any time there is a delta merge, optimized compression, or any data definition language operation. This ensures when there is a write operation performed on dictionary 124, the data persisted to the secondary memory 123 is accurately reflected in the value ID index 126 and helper vector 122, so that the next received query is processed accurately.
As a non-limiting example, page 106 can include value IDs 1, 2, 3 and corresponding values 1.56, 2.50, 3.14. Page 118 can include value IDs 4, 5, 6 and corresponding values 4.98, 6.11, and 6.50. Page 120 can include value IDs 7, 8, 9 and corresponding values 6.70, 8.88, and 10.88. Helper vector 122 can include the last value of 3.14 for page 106, 6.50 as the last value of page 118, and 10.88 as the last value for page 120.
In the event, the DMS 102 as shown in
In the event, the DMS 102 receives a request to materialize a value based on a value ID 5. The paging engine can determine that each page stores 3 value IDs. Based to this, the paging engine can determine page 106 stores the first three value IDs, page 118 can store the next three value IDs, and page 120 can store the last three value IDs. The paging engine can determine that value ID 5 is located on page 118. The paging engine can load page 118 into temporary memory and determine the corresponding value for value ID 5 is 6.11.
Method 300 shall be described with reference to
In 302, DMS 102 receives a request to execute a query including a value. The request can be directed to executing a query against a particular column. As described above, in some embodiments, columns may not store entire values. Rather, columns can store value IDs representing the value. For example, the request can be received from the client device 130.
In 304, the paging engine 114 queries a dictionary for a value ID corresponding to the value received in the request. The dictionary includes multiple pages including value IDs and the corresponding values. Before executing the query, the wrapper 110 can determine whether to query an in-memory dictionary (dictionary 124) or a paged dictionary (dictionary 116). In this embodiment, the paged dictionary is queried. The paged dictionary includes a helper vector 122.
In 306, the paging engine 114 executes a binary search on the helper vector 122 of the dictionary 116 using the value. The helper vector 122 includes a last value for each page of a dictionary 116. The binary search can search for a value in the helper vector based on the last value for each page.
In 308, the paging engine 114 identifies a page of the dictionary including the value. The paging engine identifies the page by executing the binary search on the helper vector. For example, based on the binary search the paging engine 114 can eliminate pages on which the value is not included and based on processes of elimination identify the page on which the value is included.
In 310, the paging engine 114 loads the page into temporary memory 104. Temporary memory 104 can be buffer or cache memory. The page can include a list of value IDs and corresponding values.
In 312, the paging engine 114 retrieves the value ID of the value from the page loaded in the temporary memory. When the page is loaded into memory the paging engine 114 can execute a query to retrieve the value ID from the page. Since a single page is loaded into temporary memory, executing this query is not burdensome on the operational resources.
In 314, the query engine 116 executes the requested query on the column using the value ID. The query engine 116 can receive the results from the query and transmit the results to the client device 130.
Method 400 shall be described with reference to
In 402, the DMS 102 receives a request to materialize a value using a value ID. As described above, a (paged) dictionary 116 can include multiple pages of value IDs and corresponding values. In the event, the dictionary 116 include values of fixed data type, the dictionary 116 can store a fixed amount of value IDs and corresponding values per page.
In 404, the paging engine 114 determines a number of values per page in the dictionary. As described above, each page of the dictionary can store a fixed amount of value IDs and values.
In 406, the paging engine 114 identifies a page including the requested value based on value ID and the number of value IDs per page in the dictionary. For example, paging engine 114 can divide the number of value IDs by the value ID (and take the floor of the result) to identify the page including the value ID and requested value.
In 408, the paging engine 114 loads the identified page into the temporary memory 104. As described above, temporary memory 104 can be buffer or cache configured to store data for a short amount of time.
In 410, the paging engine 114 retrieves value corresponding to the value ID form the page stored in the temporary memory 104. When the page is loaded into memory the paging engine 114 can execute a query to retrieve the value ID from the page.
Method 500 shall be described with reference to
In 502, the unification engine 112 detects an event causing an update to the values and value IDs in the dictionary. For example, a merge, optimized compression, or DDL operation can be executed on the in-memory dictionary (e.g., dictionary 124). The dictionary 124 can be persisted or saved from the main memory 108 to the secondary memory 123. This can cause the values or value IDs to be updated in the dictionary 116.
In 504, the unification engine 112 identifies a change in the last values for each page in the dictionary based on the event. For example, based on a merge, more data can be added in the dictionary. This can shift the positions of the values such that the last value for each page can change.
In 506, the unification engine 112 identifies a change in the last value ID of each page in the dictionary. Continuing with the earlier example, based on a merge, more data can be added in the dictionary. This can shift the positions of the value IDs such that the last value ID for each page can change.
In 508, the unification engine 112 generates a new helper vector to reflect the changed last values for each page in the dictionary. The new helper vector can store the updated last values for each page in the dictionary. The new helper vector will be persisted in the secondary memory 123.
In 510, the unification engine 112 generates a new value ID index to reflect the changed last value IDs for each page in the dictionary. The new value ID index can store the update last value ID for each page in the dictionary in the event the dictionary includes values of variable size data types. The new value ID index will be persisted in the secondary memory 123.
Various embodiments can be implemented, for example, using one or more computer systems, such as computer system 600 shown in
Computer system 600 can be any well-known computer capable of performing the functions described herein.
Computer system 600 includes one or more processors (also called central processing units, or CPUs), such as a processor 604. Processor 604 is connected to a communication infrastructure or bus 606.
One or more processors 604 can each be a graphics processing unit (GPU). In an embodiment, a GPU is a processor that is a specialized electronic circuit designed to process mathematically intensive applications. The GPU can have a parallel structure that is efficient for parallel processing of large blocks of data, such as mathematically intensive data common to computer graphics applications, images, videos, etc.
Computer system 600 also includes user input/output device(s) 603, such as monitors, keyboards, pointing devices, etc., that communicate with communication infrastructure 606 through user input/output interface(s) 602.
Computer system 600 also includes a main or primary memory 608, such as random access memory (RAM). Main memory 608 can include one or more levels of cache. Main memory 608 has stored therein control logic (i.e., computer software) and/or data.
Computer system 600 can also include one or more secondary storage devices or memory 610. Secondary memory 610 can include, for example, a hard disk drive 612 and/or a removable storage device or drive 614. Removable storage drive 614 can be a floppy disk drive, a magnetic tape drive, a compact disk drive, an optical storage device, tape backup device, and/or any other storage device/drive.
Removable storage drive 614 can interact with a removable storage unit 618. Removable storage unit 618 includes a computer usable or readable storage device having stored thereon computer software (control logic) and/or data. Removable storage unit 618 can be a floppy disk, magnetic tape, compact disk, DVD, optical storage disk, and/any other computer data storage device. Removable storage drive 614 reads from and/or writes to removable storage unit 618 in a well-known manner.
According to an exemplary embodiment, secondary memory 610 can include other means, instrumentalities or other approaches for allowing computer programs and/or other instructions and/or data to be accessed by computer system 600. Such means, instrumentalities or other approaches can include, for example, a removable storage unit 622 and an interface 620. Examples of the removable storage unit 622 and the interface 620 can include a program cartridge and cartridge interface (such as that found in video game devices), a removable memory chip (such as an EPROM or PROM) and associated socket, a memory stick and USB port, a memory card and associated memory card slot, and/or any other removable storage unit and associated interface.
Computer system 600 can further include a communication or network interface 624. Communication interface 624 enables computer system 600 to communicate and interact with any combination of remote devices, remote networks, remote entities, etc. (individually and collectively referenced by reference number 628). For example, communication interface 624 can allow computer system 600 to communicate with remote devices 628 over communications path 626, which can be wired and/or wireless, and which can include any combination of LANs, WANs, the Internet, etc. Control logic and/or data can be transmitted to and from computer system 600 via communication path 626.
In an embodiment, a tangible, non-transitory apparatus or article of manufacture comprising a tangible, non-transitory computer useable or readable medium having control logic (software) stored thereon is also referred to herein as a computer program product or program storage device. This includes, but is not limited to, computer system 600, main memory 608, secondary memory 610, and removable storage units 618 and 622, as well as tangible articles of manufacture embodying any combination of the foregoing. Such control logic, when executed by one or more data processing devices (such as computer system 600), causes such data processing devices to operate as described herein.
Based on the teachings contained in this disclosure, it will be apparent to persons skilled in the relevant art(s) how to make and use embodiments of this disclosure using data processing devices, computer systems and/or computer architectures other than that shown in
It is to be appreciated that the Detailed Description section, and not any other section, is intended to be used to interpret the claims. Other sections can set forth one or more but not all exemplary embodiments as contemplated by the inventor(s), and thus, are not intended to limit this disclosure or the appended claims in any way.
While this disclosure describes exemplary embodiments for exemplary fields and applications, it should be understood that the disclosure is not limited thereto. Other embodiments and modifications thereto are possible, and are within the scope and spirit of this disclosure. For example, and without limiting the generality of this paragraph, embodiments are not limited to the software, hardware, firmware, and/or entities illustrated in the figures and/or described herein. Further, embodiments (whether or not explicitly described herein) have significant utility to fields and applications beyond the examples described herein.
Embodiments have been described herein with the aid of functional building blocks illustrating the implementation of specified functions and relationships thereof. The boundaries of these functional building blocks have been arbitrarily defined herein for the convenience of the description. Alternate boundaries can be defined as long as the specified functions and relationships (or equivalents thereof) are appropriately performed. Also, alternative embodiments can perform functional blocks, steps, operations, methods, etc. using orderings different than those described herein.
References herein to “one embodiment,” “an embodiment,” “an example embodiment,” or similar phrases, indicate that the embodiment described can include a particular feature, structure, or characteristic, but every embodiment can not necessarily include the particular feature, structure, or characteristic. Moreover, such phrases are not necessarily referring to the same embodiment. Further, when a particular feature, structure, or characteristic is described in connection with an embodiment, it would be within the knowledge of persons skilled in the relevant art(s) to incorporate such feature, structure, or characteristic into other embodiments whether or not explicitly mentioned or described herein. Additionally, some embodiments can be described using the expression “coupled” and “connected” along with their derivatives. These terms are not necessarily intended as synonyms for each other. For example, some embodiments can be described using the terms “connected” and/or “coupled” to indicate that two or more elements are in direct physical or electrical contact with each other. The term “coupled,” however, can also mean that two or more elements are not in direct contact with each other, but yet still co-operate or interact with each other.
The breadth and scope of this disclosure should not be limited by any of the above-described exemplary embodiments, but should be defined only in accordance with the following claims and their equivalents.
Claims
1. A computer-implemented method comprising:
- receiving, by one or more computing devices, a query including a value;
- querying, by the one or more computing devices, a dictionary for a value ID corresponding to the value, using the value, wherein the dictionary includes a plurality of pages including value IDs and corresponding values;
- executing, by the one or more computing devices, a binary search on a helper vector of the dictionary using the value, wherein the helper vector includes a last value for each page of a dictionary;
- identifying, by the one or more computing devices, a page of the dictionary including the value;
- loading, by the one or more computing devices, the page into temporary memory;
- retrieving, by the one or more computing devices, the value ID of the value from the page; and
- executing, by the one or more computing devices, the query on a column using the value ID.
2. The method of claim 1, wherein each page of the dictionary includes a predetermined number of value IDs.
3. The method of claim 2, further comprising:
- receiving, by the one or more computing devices, a request to retrieve a different value, the request including a different value ID;
- identifying, by the one or more computing devices, a different page including the different value based on the different value ID and the predetermined number of value IDs per page in the dictionary;
- loading, by the one or more computing devices, the different page into the temporary memory; and
- retrieving, by the one or more computing devices, the different value.
4. The method of claim 1, further comprising:
- determining, by the one or more computing devices, that the dictionary is stored in a persistent storage device.
5. The method of claim 1, wherein the value is a fixed data type.
6. The method of claim 1, wherein the dictionary is a multi-page vector.
7. The method of claim 1, further comprising:
- detecting, by the one or more computing devices, an event causing an update in the values and value IDs in the dictionary;
- identifying, by the one or more computing devices, a change in the last values for each page in the dictionary based on the event; and
- generating, by the one or more computing devices, a new helper vector to reflect the changed last values for each page in the dictionary.
8. The method of claim 1, further comprising loading, by the one or more computing devices, the helper vector into the temporary memory in response to receiving the query.
9. A system comprising:
- a memory; and
- at least one processor coupled to the memory and configured to: receive a query including a value; query a dictionary for a value ID corresponding to the value, using the value, wherein the dictionary includes a plurality of pages including value IDs and corresponding values; execute a binary search on a helper vector of the dictionary using the value, wherein the helper vector includes a last value for each page of a dictionary; identify a page of the dictionary including the value; load the page into temporary memory; retrieve the value ID of the value from the page; and execute the query on a column using the value ID.
10. The system of claim 9, wherein each page of the dictionary includes a predetermined number of value IDs.
11. The system of claim 10, the at least one processor further configured to:
- receive a request to retrieve a different value, the request including a different value ID;
- identify a different page including the different value based on the different value ID and the predetermined number of value IDs per page in the dictionary;
- load the different page into the temporary memory; and
- retrieve the different value.
12. The system of claim 9, the at least one processor further configured to:
- determine that the dictionary is stored in a persistent storage device.
13. The system of claim 9, wherein the dictionary is a multi-page vector.
14. The system of claim 9, the at least one processor further configured to:
- detect an event causing an update in the values and value IDs in the dictionary;
- identify a change in the last values for each page in the dictionary based on the event; and
- generate a new helper vector to reflect the changed last values for each page in the dictionary.
15. The system of claim 9, the at least one processor configured to:
- load the helper vector into the temporary memory in response to receiving the query.
16. A non-transitory computer-readable device having instructions stored thereon that, when executed by at least one computing device, cause the at least one computing device to perform operations comprising:
- receiving a query including a value;
- querying a dictionary for a value ID corresponding to the value, using the value, wherein the dictionary includes a plurality of pages including value IDs and corresponding values;
- executing a binary search on a helper vector of the dictionary using the value, wherein the helper vector includes a last value for each page of the dictionary;
- identifying a page of the dictionary including the value;
- loading the page into temporary memory;
- retrieving the value ID of the value from the page; and
- executing the query on a column using the value ID.
17. The non-transitory computer-readable device of claim 16, wherein each page of the dictionary includes a predetermined number of value IDs.
18. The non-transitory computer-readable device of claim 17, the operations further comprising:
- receiving a request to retrieve a different value, the request including a different value ID;
- identifying a different page including the different value based on the different value ID and the predetermined number of value IDs per page in the dictionary;
- loading the different page into the temporary memory; and
- retrieving the different value.
19. The non-transitory computer-readable device of claim 16, the operations further comprising:
- determining that the dictionary is stored in a persistent storage device.
20. The non-transitory computer-readable device of claim 16, the operations further comprising:
- detecting an event causing an update in the values and value IDs in the dictionary;
- identifying a change in the last values for each page in the dictionary based on the event; and
- generating a new helper vector to reflect the changed last values for each page in the dictionary.
Type: Application
Filed: May 13, 2020
Publication Date: Dec 10, 2020
Inventors: Reza SHERKAT (Waterloo), Colin FLORENDO (Marlborough, MA), Chaitanya GOTTIPATI (Pune), Bernhard SCHEIRLE (Leimen), Carsten THIEL (Heidelberg), Prasanta GHOSH (San Ramon, CA)
Application Number: 15/931,063