FIXED STRING DICTIONARY

The subject matter described herein relates to implementation of a dictionary in a column-based, in-memory database where values are not stored directly, rather, for each column, a dictionary is created with all distinct values. For each row, a reference to the corresponding value in the dictionary is stored. In one aspect, data is stored in a memory structure organized in a column store format defined by a plurality of columns and a plurality of rows. A dictionary for each column in the memory structure is generated. The dictionary has distinct values for each column. A reference to the dictionary is generated for each column in the memory structure. The dictionary and the reference to the dictionary are stored in the memory structure.

Skip to: Description  ·  Claims  · Patent History  ·  Patent History
Description
RELATED APPLICATION

This application claims priority to U.S. Patent Application Ser. No. 61/640,689 entitled “Fixed String Dictionary” filed on Apr. 30, 2012, the contents of which are hereby incorporated by reference.

TECHNICAL FIELD

The subject matter described herein relates to database management, and more particularly to systems and methods employing a fixed string dictionary for faster and more efficient data access.

BACKGROUND

An in-memory database system is a database system that primarily uses main memory for data storage. One example of an in-memory database system is the HANA in-memory database system provided by SAP AG of Walldorf Germany. As opposed to employing a disk storage mechanism for storing business data, in-memory database systems such as HANA use main memory, which is faster than disk-based databases since storage, access and retrieval functions execute far fewer CPU instructions. Accessing data in main memory reduces the input/output reading activity when querying the data, which in turn provides faster and more predictable performance than disk.

Traditional databases are row-based, and for each row, the size of the row in bytes is known and needs to be set aside for storage. For example, the instruction:

Create Table T (City Char(10), Revenue Integer)

Each row will require 10+4 bytes of storage (INTEGER requires 4 byte), and rows with identical values will still require 10+4 bytes per row. Accordingly, a storage estimate is 14 bytes per row, and 42 bytes total, for storing 3 rows. One challenge is to represent the data with small memory consumption, because memory is costly, and less storage means faster access (less memory bus bandwidth).

SUMMARY

The current subject matter relates to implementation of a dictionary in a column-based, in-memory database where values are not stored directly, rather, for each column, a dictionary is created with all distinct values. For each row, a reference to the corresponding value in the dictionary is stored.

In one aspect, data is stored in a memory structure organized in a column store format defined by a plurality of columns and a plurality of rows. A dictionary for each column in the memory structure is generated. The dictionary has distinct values for each column. A reference to the dictionary is generated for each column in the memory structure. The dictionary and the reference to the dictionary are stored in the memory structure.

In some variations, one or more of the following can be additionally implemented individually or in any feasible combination. A block of memory is defined for each dictionary. The block of memory is compressed for each dictionary. The block includes a reference term, a bitvector representing a character position in each dictionary, a reference to a dictionary for all non-constant character positions, and a bit array having bitstrings of all coded terms associated with non-constant characters in a group of dictionaries. Further, a dictionary is built/constructed/generated for all positions in the block other than the reference term.

Implementations of the current subject matter can include, but are not limited to, systems and methods consistent including one or more features are described as well as articles that comprise a tangibly embodied machine-readable medium operable to cause one or more machines (e.g., computers, etc.) to result in operations described herein. Similarly, computer systems are also described that may include one or more processors and one or more memories coupled to the one or more processors. A memory, which can include a computer-readable storage medium, may include, encode, store, or the like one or more programs that cause one or more processors to perform one or more of the operations described herein. Computer implemented methods consistent with one or more implementations of the current subject matter can be implemented by one or more data processors residing in a single computing system or multiple computing systems. Such multiple computing systems can be connected and can exchange data and/or commands or other instructions or the like via one or more connections, including but not limited to a connection over a network (e.g. the Internet, a wireless wide area network, a local area network, a wide area network, a wired network, or the like), via a direct connection between one or more of the multiple computing systems, etc.

The details of one or more variations of the subject matter described herein are set forth in the accompanying drawings and the description below. Other features and advantages of the subject matter described herein will be apparent from the description and drawings, and from the claims. While certain features of the currently disclosed subject matter are described for illustrative purposes in relation to an enterprise resource software system or other business software solution or architecture, it should be readily understood that such features are not intended to be limiting. The claims that follow this disclosure are intended to define the scope of the protected subject matter.

DESCRIPTION OF DRAWINGS

The accompanying drawings, which are incorporated in and constitute a part of this specification, show certain aspects of the subject matter disclosed herein and, together with the description, help explain some of the principles associated with the disclosed implementations. In the drawings:

FIG. 1 illustrates a prior art dictionary for a traditional row-based database;

FIG. 2 illustrates a block diagram of an in-memory database system (IMDS) that includes an in-memory appliance having a main memory that forms an in-memory database having a column and/or row structure, for fast storage, access and retrieval of business data;

FIG. 3 is a diagram illustrating aspects of a system showing features consistent with implementations of the current subject matter;

FIG. 4 illustrates a storage structure for use with a system in accordance with implementations of the current subject matter;

FIG. 5 illustrates a delta merge operation;

FIG. 6 illustrates a delta to main memory merge operation;

FIG. 7 illustrates a fixed string dictionary with several special properties;

FIG. 8 illustrates term 3 encoding for the example shown in FIG. 7; and

FIG. 9 illustrates another implementation of a fixed string dictionary.

When practical, similar reference numbers denote similar structures, features, or elements.

DETAILED DESCRIPTION

To address these and potentially other issues with currently available solutions, methods, systems, articles of manufacture, and the like consistent with one or more implementations of the current subject matter can, among other possible advantages, provide high compression and yet fast search on certain types of data, which is relevant for key columns and many data columns in a business warehouse, enterprise resource planning (BW/ERP) system.

FIG. 1 illustrates a prior art dictionary for a traditional row-based database.

FIG. 2 is a block diagram of an in-memory database system (IMDS) 100 that includes an in-memory appliance 102 having a main memory 104 that forms an in-memory database having a column and/or row structure, for fast storage, access and retrieval of business data. The main memory can be provided in silicon, such as random access memory. The in-memory appliance 102 of the IMDS 100 also includes modeling tools 106 for modeling any number of applications or visualizations of the data from the main memory 104, i.e. executing various business applications or the like, real-time replication services 108 and data services 110 for the business data received for storage by the in-memory appliance 102.

The in-memory appliance 102 can be a computing system such as a server, or a set of computing system distributed across a network. The in-memory appliance 102 receives data for storage according to one or more business objects 112, which can be retrieved and used by one or more business intelligence (BI) applications 114 or other applications 116. Other consumers of business data from the in-memory appliance 102 can be a business warehouse 118 or similar application framework.

FIG. 3 illustrates a storage structure 200 that can be used with an in-memory database according to implementations of an LOB storage scheme. The storage structure includes a unified table structure 202 having a row store structures 204 for storing, as an example, level 1 delta data based on operations executed according to a number of physical operators. The unified table structure 202 can include column store structures 206 and 208, for storing level 2 data and main memory storage, respectively. A typical setup of the storage structure 200 includes the following three stages for records within a regular table.

L1-delta: The L1-delta structure accepts all incoming data requests and stores them in a write-optimized manner, i.e. the L1-delta preserves the logical row format of the record. The data structure is optimized for fast insert and delete, field update, and record projection. Moreover, the L1-delta structure does not perform any data compression. As a rule of thumb, the L1-delta structure may hold 10,000 to 100,000 rows per single-node database instance depending on the workload characteristics and the amount of available memory.

L2-delta: The L2-delta structure represents the second stage of the record life cycle and is organized in the column store format. In contrast to the L1-delta, the L2-delta employs dictionary encoding to achieve better memory usage. However, for performance reasons, the dictionary is unsorted requiring secondary index structures to optimally support point query access patterns, e.g. fast execution of unique constraint checks. The L2-delta is well suited to store up to 10 millions of rows.

Main Store: The main store represents the core data format with the highest compression rate, exploiting a variety of different compression schemes. By default, all values within a column are represented via the position in a sorted dictionary and stored in a bit-packed manner to have a tight packing of the individual values. While the dictionary is always compressed using a variety of prefix-coding schemes, a combination of different compression techniques—ranging from simple

The different data structures share a set of common data types. The access is exposed through a common abstract interface with row and column iterator, both optionally dictionary based. Moreover, some of the physical operators may pull record-by-record or in a vectorized way (i.e. block-by-block) following a classical ONC-protocol to enable pipelined operation and reduce the memory requirements for intermediate results as much as possible. Other physical operators can implement a “materialize all”—strategy to avoid operator switching costs during query execution. The optimizer decides on a mixture of the different types of operators depending on the logical calculation model, i.e. the different types of operators are seamlessly integrated within a final query execution plan.

For the operators leveraging sorted dictionaries, the unified table access interface also exposes the table content via a global sorted dictionary. Dictionaries of two delta structures are computed (only for L1-delta) and sorted (for both L1-delta and L2-delta) and merged with the main dictionary on the fly. In order to implement efficient validations of uniqueness constraints, the unified table provides inverted indexes for the delta and main structures.

The record life cycle is organized in a way to asynchronously propagate individual records through the system without interfering with currently running database operations within their transactional sphere of control. The database system provides two transformations, called “merge steps”:

L1-to-L2-delta Merge: FIG. 3 further illustrates this merge operation. The transformation from L1-delta to L2-delta uses a pivoting step from row to column organization. Rows of the L1-delta are split into their corresponding columnar values and column-by-column inserted into the L2-delta structure. At the receiving side, the system performs a lookup to identify potentially missing values in the dictionary structure and optionally inserts new entries at the end of the dictionary to avoid any major restructuring operations within the dictionary. In the second step, the corresponding column values are added to the value vector using the dictionary encodings (append-only structure). Both steps can be performed in parallel, because the number of tuples to be moved is known in advance enabling the reservation of encodings in the new dictionary before actually inserting them. In a third step, the propagated entries are removed from the L1-delta. All running operations either see the full L1-delta and the old end-of-delta border or the truncated version of the L1-delta structure with the expanded version of the L2-delta. The transition from L1-delta to L2-delta is incremental in nature, i.e. the transition of records does not have any impact in terms of reorganizing the data of the target structure.

L2-delta-to-main Merge: FIG. 4 illustrates this merge operation. A new main structure is created out of the L2-delta and the existing main. While the L1-to-L2-delta Merge is minimally invasive with respect to running transactions, a L2-delta-to-main merge is a resource-intensive task which has to be carefully scheduled and highly optimized on a physical level. As soon as a L2-delta-to-main merge is started, the current L2-delta is closed for updates and a new empty L2-delta structure is created serving as the new target for the L1-to-L2-delta merge. If a merge fails, the system still operates with the new L2-delta and retries the merge with the previous versions of L2-delta and main. Section 4 will detail the core algorithms and give some more details of different optimization techniques such as column-wise or partial merge.

In preferred implementations, the dictionary consists of fixed strings. The dictionary can be implemented depending on the data stored: either the dictionary with a special compression is chosen (i.e., when the data allows it), or a “standard” dictionary is used. The special-compression dictionary also includes an optimized search capability as a binary search on bitcoded data, and uses special central processing unit (CPU) instructions to efficiently perform the compression.

The dictionary stores a list of strings which are sorted, i.e. AAA, ABC, ABB, ABC. This special dictionary is only used when all the strings have the same length (here, length 3). In this case, for each column position a “mini dictionary” is built containing the distinct values at that position:


Position 0: [A]


Position 1: [A, B]


Position 2: [A, B, C]

The strings are then coded using these dictionaries. While for the original string exactly 1 byte was needed for each character, this changes when using the new “mini” dictionaries: the first character of each string can be omitted, since it is the same for all strings—it needs to be stored in the mini dictionary only; and the second character in each string (position 1) only has two distinct values. These can be coded with 1 bit only. Also, the third character can be stored with 2 bits, which can lead to substantial savings.

As noted herein, FIG. 5 illustrates a delta merge operation.

In implementations of the present subject matter, values are not stored directly. Rather, for each column, a dictionary is created with all distinct values, and for each row, a reference to the corresponding value in the dictionary is stored. FIG. 6 illustrates the data structure from FIG. 1, but in a column store with dictionaries. The storage estimation of using a fixed string dictionary as described below is: 2×10 bytes for the city dictionary; 2×4 bytes for the revenue dictionary; 3×1 bits for the city column referencing the city dictionary (since only values 0 and 1 need to be stored, one bit is sufficient); and 3×1 bits for the revenue column. This yields a storage estimate of about 30 bytes, compared to 42 bytes for the traditional row-based storage. The savings increases for more rows.

However, data in the dictionary itself is still not compressed. This is not a problem for numbers, but some dictionaries, especially string dictionaries, can get quite large. Therefore, there are two special dictionary implementions for strings in HANA:

    • 1) prefix-coded dictionary, in which common prefixes are removed. Example:

3 Terms: [ Wall, Wallclock, Walldorf] Dictionary: [ Wall, (4, clock), (4, dorf) ] The first entry in the dictionary is uncompressed (Wall) The subsequent entries reference the preceding entry: ,,04“: take 4 characters from the preceding entry (Wall) ,,clock“: add ,,clock“ −> ,,Wallclock“
    • 2) fixed string dictionary, where each character is encoded with a minimum number of bits. For example, GUIDs or TIMESTAMP-like strings only need a small subset of ASCII (0-9, A-F). It would be a waste to use 8 bits per character, sometimes 2-4 bits are enough, ˜0 bits are possible. This is described in further detail below.

FIG. 7 illustrates a fixed string dictionary, which has several special properties. Characters at position 1 are constant (0), and the prefix dictionary can compress those. Characters at the last two positions are also constant (0), however the prefix dictionary cannot compress those, but the fixed string dictionary can. Lastly, only a limited number of characters: e.g. [0, 1, 5, 6, A] can be at second position. The prefix dictionary needs 1 byte to encode, while the fixed string dictionary requires only 3 bits.

The terms in the fixed string dictionary are compressed in blocks, such as in 512 terms per block. For each block, there is a reference string (first string of the block). Characters that are constant among all the terms of the block are marked and not stored for terms 2 . . . 512 of the block (positions 1, 4, 5 in the example shown in FIG. 7). For all remaining character positions (2, 3), a separate dictionary is built. The non-constant characters of the remaining 511 terms in the block are coded according to these dictionaries. As an example:

Reference string: 00000 Bitvector denoting constant characters: [10011] These characters do not need to be encoded for terms 2..5 Dictionary for character position 2: [0, 1, 5, 6, A] 5 possible values −> 3 bits are required to encode one character Dictionary for character position 3: [0, A] 2 possible values −> 1 bit Encoding non-constant characters of term 2 ,,01000“: Character ,,1“: position 1 in dict, 3 bits −> bitstring 001 Character ,,0“: position 0 in dict, 1 bit −> bitstring 0 −> 0010 Encoding term 3: Character ,,5“: position 2, bitstring 010 Character ,,A“: position 1, bitstring 1 −> 0101

FIG. 8 illustrates term 3 encoding for the example shown in FIG. 7. If uncompressed, the storage required would be 25 bytes.

The use of a prefix-coded dictionary can save 4 bytes at most. However, the fixed string dictionary implementation uses only 17 bytes. Further, in typical use cases, such as GUIDs, Dates, material numbers, etc., the memory saved from using a fixed string dictionary can be twice as much as using a prefix-coded dictionary.

FIG. 9 illustrates another implementation of a fixed string dictionary. The fixed string dictionary includes a reference term (the first term in the block), and a bitvector denoting characters positions identical in each term. These bitvectors can be retrieved from the reference term. A reference to a dictionary for all non-constant character positions is also used. The fixed string dictionary further includes bit array with the bitstrings of all coded terms, and only non-constant characters are coded.

The fixed string dictionary is well-suited for terms with constant length using only a subset of ASCII. Otherwise, storage and decompression overhead is too high and no memory savings will occur, and the prefix-coded dictionary should be used in these cases. Therefore, a wrapper exists for string-like attribute type that dynamically chooses the best dictionary depending on the data, which is called a central “StringDict.” When a dictionary is constructed, the StringDict receives all terms. It analyzes them and builds either a prefix-coded or fixed string dictionary. If all terms are the same length, and all terms are only in the ASCII domain, and either only lower case or only upper case, then the fixed string dictionary should be chosen, otherwise, a prefix-coded dictionary should be used. Some advantages of the fixed string dictionary are that search performance is always constant, and memory savings is automatically independent of the data type used by the database, and only dependent on the data in the table.

One or more aspects or features of the subject matter described herein can be realized in digital electronic circuitry, integrated circuitry, specially designed application specific integrated circuits (ASICs), field programmable gate arrays (FPGAs) computer hardware, firmware, software, and/or combinations thereof. These various aspects or features can include implementation in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which can be special or general purpose, coupled to receive data and instructions from, and to transmit data and instructions to, a storage system, at least one input device, and at least one output device. The programmable system or computing system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.

These computer programs, which can also be referred to as programs, software, software applications, applications, components, or code, include machine instructions for a programmable processor, and can be implemented in a high-level procedural and/or object-oriented programming language, and/or in assembly/machine language. As used herein, the term “machine-readable medium” refers to any computer program product, apparatus and/or device, such as for example magnetic discs, optical disks, memory, and Programmable Logic Devices (PLDs), used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term “machine-readable signal” refers to any signal used to provide machine instructions and/or data to a programmable processor. The machine-readable medium can store such machine instructions non-transitorily, such as for example as would a non-transient solid-state memory or a magnetic hard drive or any equivalent storage medium. The machine-readable medium can alternatively or additionally store such machine instructions in a transient manner, such as for example as would a processor cache or other random access memory associated with one or more physical processor cores.

To provide for interaction with a user, one or more aspects or features of the subject matter described herein can be implemented on a computer having a display device, such as for example a cathode ray tube (CRT) or a liquid crystal display (LCD) or a light emitting diode (LED) monitor for displaying information to the user and a keyboard and a pointing device, such as for example a mouse or a trackball, by which the user may provide input to the computer. Other kinds of devices can be used to provide for interaction with a user as well. For example, feedback provided to the user can be any form of sensory feedback, such as for example visual feedback, auditory feedback, or tactile feedback; and input from the user may be received in any form, including, but not limited to, acoustic, speech, or tactile input. Other possible input devices include, but are not limited to, touch screens or other touch-sensitive devices such as single or multi-point resistive or capacitive trackpads, voice recognition hardware and software, optical scanners, optical pointers, digital image capture devices and associated interpretation software, and the like.

The subject matter described herein can be embodied in systems, apparatus, methods, and/or articles depending on the desired configuration. The implementations set forth in the foregoing description do not represent all implementations consistent with the subject matter described herein. Instead, they are merely some examples consistent with aspects related to the described subject matter. Although a few variations have been described in detail above, other modifications or additions are possible. In particular, further features and/or variations can be provided in addition to those set forth herein. For example, the implementations described above can be directed to various combinations and subcombinations of the disclosed features and/or combinations and subcombinations of several further features disclosed above. In addition, the logic flows depicted in the accompanying figures and/or described herein do not necessarily require the particular order shown, or sequential order, to achieve desirable results. Other implementations may be within the scope of the following claims.

Claims

1. A method comprising:

storing, by one or more processors, data in a memory structure organized in a column store format defined by a plurality of columns and a plurality of rows;
generating, by the one or more processors, a dictionary for each column in the memory structure, the dictionary having distinct values for each column;
generating, by the one or more processors, a reference to the dictionary for each column in the memory structure; and
storing the dictionary and the reference to the dictionary in the memory structure.

2. The method in accordance with claim 1, further comprising:

defining, by the one or more processors, a block of memory for each dictionary; and
compressing, by the one or more processors, the block of memory for each dictionary.

3. The method in accordance with claim 2, wherein the block comprises:

a reference term;
a bitvector representing a character position in each dictionary;
a reference to a dictionary for all non-constant character positions; and
a bit array having bitstrings of all coded terms associated with non-constant characters in a group of dictionaries.

4. The method in accordance with claim 3, further comprising building, by the one or more processors, a dictionary for all positions in the block other than the reference term.

5. A computer program product comprising a machine-readable medium storing instructions that, when executed by at least one programmable processor, cause the at least one programmable processor to perform operations comprising:

storing data in a memory structure organized in a column store format defined by a plurality of columns and a plurality of rows;
generating a dictionary for each column in the memory structure, the dictionary having distinct values for each column;
generating a reference to the dictionary for each column in the memory structure; and
storing the dictionary and the reference to the dictionary in the memory structure.

6. The computer program product in accordance with claim 5, wherein the instructions further cause the at least one programmable processor to perform operations comprising:

defining a block of memory for each dictionary; and
compressing the block of memory for each dictionary.

7. The computer program product in accordance with claim 6, wherein the block comprises:

a reference term;
a bitvector representing a character position in each dictionary;
a reference to a dictionary for all non-constant character positions; and
a bit array having bitstrings of all coded terms associated with non-constant characters in a group of dictionaries.

8. The computer program product in accordance with claim 7, wherein the instructions further cause the at least one programmable processor to perform operations comprising building a dictionary for all positions in the block other than the reference term.

9. A system comprising:

at least one programmable processor; and
a machine-readable medium storing instructions that, when executed by the at least one processor, cause the at least one programmable processor to perform operations comprising:
store data in a memory structure organized in a column store format defined by a plurality of columns and a plurality of rows;
generate a dictionary for each column in the memory structure, the dictionary having distinct values for each column;
generate a reference to the dictionary for each column in the memory structure; and
store the dictionary and the reference to the dictionary in the memory structure.

10. The system in accordance with claim 5, wherein the instructions further cause the at least one programmable processor to:

define a block of memory for each dictionary; and
compress the block of memory for each dictionary.

11. The system in accordance with claim 10, wherein the block comprises:

a reference term;
a bitvector representing a character position in each dictionary;
a reference to a dictionary for all non-constant character positions; and
a bit array having bitstrings of all coded terms associated with non-constant characters in a group of dictionaries.

12. The system in accordance with claim 11, wherein the instructions further cause the at least one programmable processor to build a dictionary for all positions in the block other than the reference term.

Patent History
Publication number: 20140222418
Type: Application
Filed: Apr 30, 2013
Publication Date: Aug 7, 2014
Patent Grant number: 11010415
Inventors: Martin Richtarsky (Bad Schoenborn), Franz Faerber (Walldorf), Juchang Lee (Seoul), Ivan Schreter (Malsch)
Application Number: 13/874,327
Classifications
Current U.S. Class: Dictionary Building, Modification, Or Prioritization (704/10)
International Classification: G06F 17/27 (20060101);