B-Tree Based Data Model for File Systems
Methods and systems for organizing data are provided. An example method includes providing an object store to store objects. Each of the objects represents fragments of the data is are associated with an address. The method further allows associating a B-tree with the object store. The B-tree includes nodes, wherein each of the nodes includes keys, and wherein each of the keys is associated with at least one object from the object store. Values for each of the keys are generated based at least partially on objects from the object store. If the size of an object from the object store is less than a pre-determined size, a value of the object is stored in a particular node of the B-tree, with the particular nodes including a particular key associated with the object. Otherwise, the method includes storing the address associated with the object in the particular node of the B-tree.
The present application claims benefit of U.S. provisional application No. 62/210,385 filed on Aug. 26, 2015. The disclosure of the aforementioned application is incorporated herein by reference for all purposes.
TECHNICAL FIELDThis disclosure relates generally to data processing and, more specifically, to methods and systems for providing a B-tree based data model for organizing file systems.
BACKGROUNDThe approaches described in this section could be pursued but are not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section.
In computer systems, user data are typically organized as file systems. In general, a file system may be viewed as a directed graph of objects, wherein nodes and leaves of the directed graph represent files and directories. Directories may further include subdirectories and files. In a multi-user computer system, each file or directory is assigned attributes that regulate user permissions for viewing, editing, and creation of files and directories. Attributes of directories and files are kept in the directed graph as objects. Large files in a file system may be split into a chain of blocks. Therefore, during a lifetime of a file system, the directed graph may be developed to include very long paths or chains of referring objects to form the root of the directed graph to leaves. Therefore, any modifications to the file system, such as a modification or creation of a new file or directory, require traveling the path through nodes to find a place for adding, creating, or modifying a new node. Having long and unbalanced paths from the root to leaves may require excessive input/output operations that may be time consuming and lead to unnecessarily redundant storage consumption.
SUMMARYThis summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used as an aid in determining the scope of the claimed subject matter.
The technology described herein includes methods for organizing data. An example embodiment can provide a B-tree based data model constructed on top of an object store. The object store may include immutable, content-addressable, distributed objects representing fragments of user content.
According to the example embodiment, a method includes providing an object store to store objects. The objects can represent fragments of the data. Each object can be associated with an address. The method can further include associating a B-tree with the object store. The B-tree can include nodes, with each of the nodes including keys. Each key can be associated with at least one object from the object store. The method allows generating, based at least partially on objects from the object store, values for each key.
In some embodiments, the address of an object in the object store is based on content of the fragment of the data. In some embodiments, the method determines whether a size of an object from the object store is less than a pre-determined size. If the result of the determination is positive, a value of the object is stored in a particular node of the B-tree. The particular node includes a particular key associated with the object. If the result of the determination is negative, the address of the object is stored in the particular node of the B-tree.
In some embodiments, each object includes one of a metadata object or a data object. Metadata objects store at least a number of references to further objects from the data objects and an identification number associated with a file or a directory. The data object represents a continuous fragment of the file or a directory entry.
In some embodiments, the key associated with the metadata object includes at least three fields. The first field includes an indication of the metadata object. The second field includes the identification number. The third field includes a metadata index representing a distinct type of the metadata object.
In some embodiments, the type of the metadata object includes one of: attributes associated with the file or the directory, a symbolic link to the file or the directory, and an extended file attribute associated with the file or the directory. In some embodiments, the key associated with the fragment of the file includes at least three fields. The first field includes an indication of the data object. The second field includes the identification number. The third field includes an offset of the continuous fragment of the file from the beginning of the file.
In some embodiments, the key associated with the directory entry includes at least three fields. The first field includes an indication of the data object. The second field includes the identification number. The third field includes a hash calculated based on a literal name of the directory entry.
In some embodiments, calculating the hash includes applying a hash function to the literal name to obtain a preliminary hash. The hash function can include, for example, crc32c, SipHash, or SipHash-order3. The preliminary hash can be shifted by a pre-determined base number to obtain the hash. If the hash matches an existing hash, then the hash is incremented by 1.
According to another example embodiment of the present disclosure, the steps of the method for organizing data are stored on a machine-readable medium comprising instructions, which, when implemented by one or more processors, perform the recited steps.
Embodiments are illustrated by way of example and not limitation in the figures of the accompanying drawings, in which like references indicate similar elements.
The following detailed description includes references to the accompanying drawings, which form a part of the detailed description. The drawings illustrate exemplary embodiments. These exemplary embodiments, which are also referred to herein as “examples,” are described in enough detail to enable those skilled in the art to practice the present subject matter. The embodiments can be combined, other embodiments can be utilized, or structural, logical and electrical changes can be made without departing from the scope of what is claimed. The following detailed description is, therefore, not to be taken in a limiting sense, and the scope is defined by the appended claims and their equivalents.
The technology described herein allows organizing data using a B-tree built on top of an object store. In some embodiments, the object store includes immutable content-addressable distributed objects. The objects may represent fragments of data (for example, fragments of files and directories and metadata including attributes of files and directories).
According to an example embodiment, the method for organizing data includes providing an object store to store objects. The objects represent fragments of the data. Each object is associated with an address. The method can further include associating a B-tree with the object store. The B-tree includes nodes, with each node including keys. Each key can be associated with at least one object from the object store. The method allows generating, based at least partially on objects from the object store, values for each of the keys.
In various embodiments, the object store 130 includes key-value entries (objects). Each of the key-value entries includes an identifier (a key) and a payload (also referred to as a value or an object content) representing a chunk of the data (for example, a chunk of user content.) In various embodiments, objects are designated as either “data” or “metadata.” Payloads of “data” objects include only uninterpreted bytes. Payloads of “metadata” objects have an internal structure and may refer to other objects. In some embodiments, the payloads of “metadata” objects include keys of the objects to which the “metadata” objects refers.
In some embodiments, objects in the object store 130 are organized by a graph structure 120 (also referred as a directed graph or a collection). The graph structure 120 is a directed graph in which each node is an immutable content-addressable object. The collection includes a specific object designated as a root object.
In various embodiments, objects are at least partially content-addressable. This implies that some portions of the identifiers of objects are functions of object contents. A key for the object representing a chunk of data can be calculated using a smart hash (Smash) which is a function of bytes in the chunk. In some embodiments, each version of data (a snapshot) corresponds to a new graph structure. Two graph structures representing two different snapshots can share mutual objects.
In some embodiments, the objects in object store 130 include entities such as identifier node (“mode”), extended attributes (“xattr”), a symbolic link (“symlink”), and chunks of files and directories. In a collection (for example, collection 100 or 200), each file or directory is represented by a chain of at least three objects: “mode” object, “xattr” object, and a “data” object. An “mode” object refers the “xattr” object and the “data” object. An “mode” object includes an identification number (“inum”) associated with the file or directory, number of links, file type and permission, creation and modification of file or directory, and other attributes. An “xattr” object includes extended file attributes depending on a type of a file system, such as information concerning an author of a text, a checksum, encoding, and so forth. An extended attribute may include a name of the attribute and a value associated with the attribute's name. The “data” object is an object containing a chunk of data (for example, a chunk of a file). In some embodiments, an address of an object in object store 130 is a Smash, which is a hash function of the content of the object.
In some embodiments, the graph structure 120 includes a B-tree (also referred to as a B-tree model). In various embodiments, using a B-tree model for referencing metadata objects and data objects may allow accessing files by “mode” number in order to support persistent file handles. The B-tree may provide tools for managing efficient large directories having up to 1,000,000 entries, hard links, and a large amount of small files. The B-tree model allows flexible data layouts for non-streaming workloads.
The result is that B-tree collections tend to have fewer, larger objects with many smaller file system entities such as “modes,” “symlinks,” “xattrs,” and small amounts of data packed together. The downside includes an increased conceptual complexity resulting in a structure that is unlike the user-visible file system namespace, and in some cases, results in increased input/output (I/O) resource consumption due to additional read-modify-write operations to repack updates within an otherwise unchanged object.
All leaves in a B-tree have the same distance from the root with wide fanouts. The left to root path length is bounded to a small number, typically no more than 4-5 levels, even for bigger file systems. This property results in efficient access patterns to look data up and minimal I/O amplification when writing.
In some embodiments, a “libebtree” library is used for the B-tree implementation. In some embodiments, keys in B-tree nodes have a fixed size. In other embodiments, values in B-tree nodes have variable sizes. In some embodiments, much of the overall structure of a file system is placed into the design of the key space. In various embodiments, values associated with keys are then used to store small items efficiently packed into larger storage objects while larger objects are directly stored in object store 130.
Packing small items such as “mode” attributes, extended attributes, “symlinks,” and small file content into B-tree nodes can significantly reduce the number of objects required. For example, a small file with a “xattr” can be represented by 3 objects (“mode,” “xattr,” and “data”) in a collection described in
In some embodiments, data storage is not limited by a fixed block size. Block sizes can be chosen to match the workload. Non-sequential I/O patterns can be matched to small blocks in order to avoid excess I/O amplification, whereas larger streaming writes are better served by large blocks. The B-tree model design allows using file blocks from 1 kilobyte to 1 Megabyte.
An example B-tree 300 is shown in
In some embodiments, each distinct type of metadata has its own metadata index. A list of metadata indexes is shown in table 610 in
In some embodiments, root directory has a constant “Inode” number of 1000. The root directory has no parent. The root includes an “. . . ” entry pointing back to the root directory. All new directories, including the root directory, start with attribute “nlink” equal to 2.
Referring back to table 610, in some embodiments, “xattrs” are indexed by name using a “namehash” scheme which is further described in
In some embodiments, value of the file chunk can be stored inline in a B-tree and externally in object store 130. Internal form 820 is used when the chunk is stored directly in a B-tree. Internal form 820 including a “BTC_DATA_INLINE” field 822 is set to 1 and a second field 824 containing literal byte data represents the chunk. The file chunk is stored inline in a node of a B-tree if the size of the chunk is less than “maxinlinesize” (described in
External form 830 is used when the file chunk is stored externally in object store 130. External form 830 includes a “BTC_DATA_EXTERN” field 832 set to 2, “size” field 834 representing size of the file chunk, and Smash of an object in object store 130, and the object corresponding to the file chunk. The external form is stored in a node of a B-tree if size of the chunk is larger than “maxinlinesize”(described in
In some embodiments, dividing files in chunks (also referred herein as chunking) is performed in accordance with one or more policy. While introducing a policy for chunking one may consider following facts:
-
- Large size chunks may amortize per chunk or per object overhead at cost of increased IO amplification if the whole object is not needed.
- The object store may perform deduplication of objects based on object ID. Boundaries of an object determine the object ID. Therefore, if two identical byte sequences are chunked differently, then these two byte sequences may not deduplicate against each other.
- Compression of data in a file may cause a wide variance between a logical file content and resulting objects if the file data is highly compressible. For example, for common patterns like “all zeros” the difference between the logical file content and resulting objects can be in hundredfold.
In some embodiments, a “block-aligned chunking” policy is applied. The block-aligned policy causes a chunk structure of a file to reflect write patterns to the file. If the file is written with streaming sequential writes then the chunks end up being as large as possible, which minimizes the per-object overhead and metadata. If the file is written non-sequentially, the chunks reflect the IO sizes of the writes to minimize read-modify-write overhead of updating a portion of a chunk's object.
The following two parameters are relevant for “block-aligned chunking” policy.
-
- maxdataobjsize parameter defined for a collection. The maxdataobjsize parameter constrains the maximum size of the object used to store data; and
- block size parameter defined per a file. A block size parameter controls how chunks are split and aggregated. The block size parameter includes a power of 2.
In some embodiments, when data is written to a file, the data is accumulated in a memory waiting for a subsequent write. Adjacent writes are merged in an object up to the maxdataobjectsize.
In some embodiments, existing data for non-sequential writes, if present, is removed and replaced with the new data. The block size logically subdivides the file into a sequence of block sized and aligned segments. A chunk can span multiple blocks, but not more than one chunk can be located within a block. Therefore, if a write is not block-aligned, any existing chunk is split at the block boundaries, and a new chunk is inserted. The new chunk contains a combination of the old and new data. If the overwrite is already block aligned, then the new data is simply written. If the overwrite aligns with an existing chunk then the chunk is simply replaced without affecting the surrounding chunks.
In some embodiments, if it is determined that a write results in more than one chunk within a single block region, the chunks are merged to maintain the invariant of no more than one chunk per block.
In some embodiments, an additional constraint is introduced that forces chunks to be always split at “maxdataobjsize” boundaries in the file. This constrain may allow two files which are substantially similar to deduplicate against each other by allowing chunking to resynchronize at “maxdataobjsize” boundaries.
In some embodiments, a “fingerprint chunking” policy can be applied. The “fingerprint chunking” policy is intended to maximize the opportunities for deduplication by making the chunk structure a function of the file content rather than the write patterns. When “fingerprint chunking” is applied, the same byte sequence may result in the same chunk structure, and, therefore, the same object IDs. In some embodiments, Rabin fingerprint algorithm can be used to select content-dependent chunk division points.
The “fingerprint chunking” policy may work best for streaming sequential writes. Non-sequential overwrites are especially expensive, as the new writes need to be merged with existing data, and the new data re-chunked according to the fingerprinting.
In some embodiments, a compression can be applied while chunking the file data. In some embodiments, the user data (as large as possible) are fed into a compression algorithm at write time in order to create a specific output size. The compression can reduce not only the data size in bytes, but also in objects. Like “fingerprinting”, compression is a content-dependent transformation, and is easiest to apply to streaming writes. Using compression for non-sequential writes, and, especially, overwrites, is expensive because the read-modify-write cycle also requires decompression and recompression.
In some embodiments, an “inlining” policy is applied. The inlining may allow small data to be directly embedded within a B-tree, rather than requiring small external objects in the object store. The inlining precludes deduplication for small files and results in a minimum amount of space savings from deduping small files.
A B-tree collection possesses a global “maxinlinesize” setting. Maxinlinesize is defined at creation time of the B-tree collection to set the upper bound on the largest chunk that may be inlined. Typically, maxinlinesize is about 4k. As mentioned above, each file is also associated with an “inlinesize” which defines the chunk size that is inlined. The inlining is useful for small files or large sparse files with lots of small spans. The default “inlinesize” is 1k.
Each entry in slots has a form 930 as shown in
In some embodiments, when a new directory of extended attribute entry is inserted in the B-tree, the name of the entry is hashed with a hash function to find a corresponding key, which is a slot in reserved slots. If the located slot is occupied by another name due to a hash collision, the key is incremented until it finds an available slot, which can be either a completely unused slot or occupied by a tombstone.
In some embodiments, when an entry is looked up in the B-tree, the name is hashed to determine the corresponding slot. If the determined slot is unused, the entry does not exist. If the determined slot is occupied and the name does not match, the key is incremented until the name is found or an unused slot is found. Any tombstones encountered are then ignored and skipped over.
In some embodiments, when an entry is deleted, the lookup algorithm is used to find the slot for the name in order to form B-tree. If the slot exists and the next slot is unused, then the entry is deleted. If the next slot is occupied, then it is assumed that a hash collision occurs and the entry is replaced by a tombstone (a zero-length name, and no payload). If there is a series of tombstones followed by an empty entry, the tombstones can be deleted.
In some embodiments, three types of hash function can be used:
1) crc32c. It is a resource inexpensive 32-bit hash. crc32c is neither strong nor collision resistant. Resulting names will have no apparent order and can be attacked to cause collisions, resulting in a denial-of-service attack. crc32c is used for “xattrs” since “xattr” is not generally writable from untrusted sources, and the number of “xattr” is not large.
2) SipHash. It is an efficient 64-bit keyed hash. Each directory has its own randomly generated key which is not exposed outside of the filesystem. Any attacker would need to know the key to be able to cause collisions at a higher rate than a random chance.
3) SipHash-order3. It is a variant of SipHash that truncates the hash to 4 bytes and prepends 3 bytes of the name to the start of the hash. With a “normal” mix of names in a directory, this may result in a directory that is nearly lexically sorted, with entries having a common prefix being mixed randomly. This may help to accommodate applications which tend to operate on directories in an alphabetical order.
In some embodiments, a case-insensitive variant of a hash function can be used to support case-insensitive names of directories. For example, the hash function may be marked as “CICP” (that is “case-insensitive, case-preserving”). Using a case-insensitive hash function allows looking up names without regard to case of characters of the names while keeping original capitalization of characters of the names. Applying a case-insensitive hash function allows performing filesystem operations on large directories to be efficient. For example, without case-insensitive lookups, Samba package is needed to scan the whole directory to perform case-insensitive matching. Names of directories are required to be properly formed with UTF-8 coding.
In some embodiments, when an Inode loses the last name (that is nlink=0 or the Inode is “deleted”), the Inode may still be in use. While the file is still open, it functions normally. This means that the bulk of the work related to deleting the file is deferred until the file is closed.
If there is a crash occurring before the close happens, then the Inode state could be left as stray garbage, which is Inodes with no names. Such an Inode is effectively inaccessible. At the same time the Inode can be still present within the B-tree, so a garbage collection stores Inode data alive.
In some embodiments, to solve the issue of having an Inode without a last name, an “orphan directory” is provided. The “orphan directory” is a nameless directory with Inum=1. When a file is unlinked and the file “nlink” count goes to zero, an entry is added to the orphan directory and the file's nlink remains 0. When mounted, it traverses the entries in the orphan directory and releases all associated data.
The method 1000 may commence with providing an object store to store objects in block 1010. The object represents fragments of the data and can be assigned addresses. In block 1020, method 1000 can proceed with associating a B-tree with the object store. The B-tree includes nodes. Each of the nodes includes keys. Each of the keys is associated with at least one object from the object store. In block 1030, method 1000 generates, based at least partially on objects from the object store, values for the each of the keys. In block 1040, method 1000 allows determining that a size of an object from the object store is less than a pre-determined size. In block 1050, if the result of the determination is positive, a value of the object is stored in a particular node of the B-tree, wherein the particular node includes a particular key associated with the object. In block 1060, if the result of the determination is negative, the address of the object is stored in the particular node of the B-tree.
The example computer system 1100 includes a processor or multiple processors 1102, a hard disk drive 1104, a main memory 1106, and a static memory 1108, which communicate with each other via a bus 1110. The computer system 1100 may also include a network interface device 1112. The hard disk drive 1104 may include a computer-readable medium 1120, which stores one or more sets of instructions 1122 embodying or utilized by any one or more of the methodologies or functions described herein. The instructions 1122 can also reside, completely or at least partially, within the main memory 1106 and/or within the processors 1102 during execution thereof by the computer system 1100. The main memory 1106 and the processors 1102 also constitute machine-readable media.
While the computer-readable medium 1120 is shown in an exemplary embodiment to be a single medium, the term “computer-readable medium” should be taken to include a single medium or multiple media (e.g., a centralized or distributed database, and/or associated caches and servers) that store the one or more sets of instructions. The term “computer-readable medium” shall also be taken to include any medium that is capable of storing, encoding, or carrying a set of instructions for execution by the machine and that causes the machine to perform any one or more of the methodologies of the present application, or that is capable of storing, encoding, or carrying data structures utilized by or associated with such a set of instructions. The term “computer-readable medium” shall accordingly be taken to include, but not be limited to, solid-state memories, and optical and magnetic media. Such media can also include, without limitation, hard disks, floppy disks, NAND or NOR flash memory, digital video disks, RAM, ROM, and the like.
The exemplary embodiments described herein can be implemented in an operating environment comprising computer-executable instructions (e.g., software) installed on a computer, in hardware, or in a combination of software and hardware. The computer-executable instructions can be written in a computer programming language or can be embodied in firmware logic. If written in a programming language conforming to a recognized standard, such instructions can be executed on a variety of hardware platforms and for interfaces to a variety of operating systems. Although not limited thereto, computer software programs for implementing the present method can be written in any number of suitable programming languages such as, for example, C, Python, JavaScript, Go, or other compilers, assemblers, interpreters or other computer languages or platforms.
Thus, systems and methods for methods for organizing data are disclosed. Although embodiments have been described with reference to specific example embodiments, it may be evident that various modifications and changes can be made to these example embodiments without departing from the broader spirit and scope of the present application. Accordingly, the specification and drawings are to be regarded in an illustrative rather than a restrictive sense.
Claims
1. A computer-implemented method for organizing data, the method comprising:
- providing an object store to store objects, each of the objects representing a fragment of the data and being associated with an address;
- associating a B-tree with the object store, the B-tree including nodes, wherein each of the nodes includes keys, wherein each of the keys is associated with at least one object from the object store; and
- generating, based at least partially on objects from the object store, values for each of the keys.
2. The method of claim 1, wherein the address of an object in the object store is based on content of the fragment of the data.
3. The method of claim 1, further comprising:
- determining that a size of an object from the object store is less than a pre-determined size;
- if a result of the determination is positive, storing a value of the object in a particular node of the B-tree, the particular node including a particular key associated with the object; and
- if a the result of the determination is negative, storing the address of the object in the particular node of the B-tree.
4. The method of claim 1, wherein each of the objects includes at least one of the following:
- a metadata object storing at least a number of references to further objects from the data objects and an identification number associated with a file or a directory; and
- a data object representing at least one of the following: a continuous fragment of the file or a directory entry.
5. The method of claim 4, wherein the key associated with the metadata object includes at least a first field including an indication of the metadata object, a second field including the identification number, and a third field including a metadata index representing a distinct type of the metadata object.
6. The method of claim 5, wherein the type of the metadata object includes at least one of the following: attributes associated with the file or the directory, a symbolic link to the file or the directory, and an extended file attribute associated with the file or the directory.
7. The method of claim 4, wherein the key associated with the fragment of the file includes at least a first field including an indication of the data object, a second field including the identification number, and a third field including an offset of the continuous fragment of the file from a beginning of the file.
8. The method of claim 4, wherein the key associated with the directory entry includes at least a first field including an indication of the data object, a second field including the identification number, and a third field including a hash calculated based on a literal name of the directory entry.
9. The method of claim 8, wherein calculating the hash includes:
- applying a hash function to the literal name to obtain a preliminary hash, the hash function including at least one of the following: crc32c, SipHash, and SipHash-order3; and
- shifting the preliminary hash by a pre-determined base number to obtain the hash.
10. The method of claim 9, further comprising:
- determining that the hash matches an existing hash; and
- based on the determination, incrementing the hash by 1.
11. A system for organizing data, the system comprising:
- at least one processor; and
- a memory communicatively coupled to the at least one processor, the memory storing instructions, which, when executed by the at least one processor, perform a method comprising: providing an object store to store objects, each of the objects representing a fragment of the data and associated with an address; associating a B-tree with the object store, the B-tree including nodes, wherein each of the nodes includes keys, wherein each of the keys is associated with at least one object from the object store; and generating, based at least partially on objects from the objects, values for each of the keys.
12. The system of claim 11, wherein the address of an object in the object store is based on content of the fragment of the data.
13. The system of claim 11, wherein the method further comprising:
- determining that a size of an object from the object store is less than a pre-determined size;
- if a result of the determination is positive, storing a value of the object in a particular node of the B-tree, the particular node including a particular key associated with the object; and
- if a the result of the determination is negative, storing the address of the object in the particular node of the B-tree.
14. The system of claim 11, wherein each of the objects includes at least one of the following:
- a metadata object storing at least a number of references to further objects from the data objects and an identification number associated with a file or a directory; and
- a data object representing at least one of the following: a continuous fragment of the file or a directory entry.
15. The system of claim 14, wherein the key associated with the metadata object includes at least a first field including an indication of the metadata object, a second field including the identification number, and a third field including a metadata index representing a distinct type of the metadata object.
16. The system of claim 15, wherein the type of the metadata object includes at least one of the following: attributes associated with the file or the directory, a symbolic link to the file or the directory, and an extended file attribute associated with the file or the directory.
17. The system of claim 14, wherein the key associated with the fragment of the file includes at least a first field including an indication of the data object, a second field including the identification number, and a third field including an offset of the continuous fragment of the file from beginning of the file.
18. The system of claim 14, wherein the key associated with the directory entry includes at least a first field including an indication of the data object, a second field including the identification number, and a third field including a hash calculated based on a literal name of the directory entry.
19. The system of claim 18, wherein calculating the hash includes:
- applying a hash function to the literal name to obtain a preliminary hash, the hash function including at least one of the following: crc32c, SipHash, and SipHash-order3;
- shifting the preliminary hash by a pre-determined base number to obtain the hash;
- determining that the hash matches an existing hash; and
- based on the determination, incrementing the hash by 1.
20. A non-transitory computer-readable storage medium having embodied thereon instructions, which, when executed by one or more processors, perform a method for organizing data, the method comprising:
- providing an object store to store objects, each of the objects representing a fragment of the data and associated with an address;
- associating a B-tree with the object store, the B-tree including nodes, wherein each of the nodes includes keys, wherein each of the keys is associated with at least one object from the object store;
- generating, based at least partially on objects from the objects, values for the each of the keys;
- determining that a size of an object from the object store is less than a pre-determined size;
- if a result of the determination is positive, storing a value of the object in a particular node of the B-tree, the particular node including a particular key associated with the object; and
- if a result of the determination is negative, storing the address of the object in the particular node of the B-tree.
Type: Application
Filed: Mar 29, 2016
Publication Date: Mar 2, 2017
Inventor: Jeremy Fitzhardinge (San Francisco, CA)
Application Number: 15/084,401