METHOD AND SYSTEM OF SNAPSHOT GENERATION AND MANAGEMENT
In one aspect, a computerized method, useful for providing and managing scalable snapshots of a storage entity that avoids reference counts that leads to amplification issues, includes the steps of providing a base image generating a scalable snapshot of the base image; and setting an incremental layer identifier for the scalable snapshot.
This application relates generally to snapshot generation and management.
DESCRIPTION OF THE RELATED ARTThe existing snapshotting technologies like reference counting and chaining have drawbacks. The reference counting leads to the write amplification issue. With chaining of snapshots, the IO performance becomes inversely proportional to the number of snapshots in the chain. In the layer identifier approach, the IO performance does not depend on the length of chain, neither it suffers from the write amplification issues.
SUMMARYIn one aspect, a computerized method, useful for providing and managing scalable snapshots of a storage entity that avoids reference counts that leads to amplification issues, includes the steps of providing a base image generating a scalable snapshot of the base image; and setting an incremental layer identifier for the scalable snapshot.
Optionally, the computerized method can include the step of generating a chain of scalable snapshots. Each layer of the chain of scalable snapshots comprises a layer incremental layer identifier correlating to each respective layer. The computerized method can include the step of providing a reference of a set of metadata for the base image; representing the set of metadata as a tree data structure; and within the tree data structure; representing the scalable snapshots with the incremental layer identifier. Each time a chain of scalable snapshots is created, a set of new scalable snapshots can be assigned a new incremental layer identifier.
The Figures described above are a representative set, and are not an exhaustive with respect to embodying the invention.
DESCRIPTIONDisclosed are a system, method, and article of manufacture for method and system of snapshot generation and management. The following description is presented to enable a person of ordinary skill in the art to make and use the various embodiments. Descriptions of specific devices, techniques, and applications are provided only as examples. Various modifications to the examples described herein can be readily apparent to those of ordinary skill in the art, and the general principles defined herein may be applied to other examples and applications without departing from the spirit and scope of the various embodiments.
Reference throughout this specification to ‘one embodiment,’ ‘an embodiment,’ ‘one example,’ or similar language means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the present invention. Thus, appearances of the phrases ‘in one embodiment,’ ‘in an embodiment,’ and similar language throughout this specification may, but do not necessarily, all refer to the same embodiment.
Furthermore, the described features, structures, or characteristics of the invention may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided, such as examples of programming, software modules, user selections, network transactions, database queries, database structures, hardware modules, hardware circuits, hardware chips, etc., to provide a thorough understanding of embodiments of the invention. One skilled in the relevant art can recognize, however, that the invention may be practiced without one or more of the specific details, or with other methods, components, materials, and so forth. In other instances, well-known structures, materials, or operations are not shown or described in detail to avoid obscuring aspects of the invention.
The schematic flow chart diagrams included herein are generally set forth as logical flow chart diagrams. As such, the depicted order and labeled steps are indicative of one embodiment of the presented method. Other steps and methods may be conceived that are equivalent in function, logic, or effect to one or more steps, or portions thereof, of the illustrated method. Additionally, the format and symbols employed are provided to explain the logical steps of the method and are understood not to limit the scope of the method. Although various arrow types and line types may be employed in the flow chart diagrams, and they are understood not to limit the scope of the corresponding method. Indeed, some arrows or other connectors may be used to indicate only the logical flow of the method. For instance, an arrow may indicate a waiting or monitoring period of unspecified duration between enumerated steps of the depicted method. Additionally, the order in which a particular method occurs may or may not strictly adhere to the order of the corresponding steps shown.
DefinitionsExample definitions for some embodiments are now provided.
Base image is some dataset which can be used as base to create another (base) image by adding/removing/modifying some data.
Block can be units used to store electronic data.
B-tree can be a self-balancing tree data structure that keeps data sorted and allows searches, sequential access, insertions, and deletions in logarithmic time. The B-tree can be a generalization of a binary search tree in that a node can have more than two children.
Container can be a server virtualization instance used in operating system-level virtualization.
DOCKER is an open-source project that automates the deployment of applications inside software containers, by providing an additional layer of abstraction and automation of operating-system-level virtualization on LINUX. DOCKER uses the resource isolation features of the LINUX kernel such as cgroups and kernel namespaces, and a union-capable file system such as aufs and others to allow independent “containers” to run within a single LINUX instance, avoiding the overhead of starting and maintaining virtual machines.
Key-value database can be a data storage paradigm designed for storing, retrieving, and managing associative arrays (e.g. a dictionary or hash). Dictionaries may contain a collection of objects, or records, which in turn have many different fields within them, each containing data. These records are stored and retrieved using a key that uniquely identifies the record, and is used to quickly find the data within the database.
Image can be the state of a computer system stored in some form.
Pointer can be an object whose value refers to another value stored elsewhere in the computer memory using its address.
Read operation can be used to retrieve data from a storage device/entity.
Root node can be the node in a tree data structure from which every other node is accessible.
Snapshot a set of computer files and directories kept in storage as they were sometime in the past. Each iteration of a snapshots can be a clone.
Tree can be an abstract data type (ADT) or data structure implementing a ADT. The tree structure can simulate a hierarchical tree structure, with a root value and subtrees of children with a parent node, represented as a set of linked nodes.
Virtual machine (VM) can be an emulation of a particular computer system. VMs operate based on the computer architecture and functions of a real or hypothetical computer, and their implementations may involve specialized hardware, software, or a combination of both.
Virtual disk can be set of software components that emulate an actual disk storage device.
Write operation can be creating or altering digital data in a storage device/entity.
Example MethodsIn one example, the references of the metadata can be provided. The metadata can be represented in a tree form (e.g. as a B-tree, etc.). A snapshot copy of a root node (e.g. a portion of the storage entity, etc.) can be created. In the tree, scalable snapshots can be represented with a layer identifier. The layer identifier can be an incremental number. Each time a chain of scalable snapshots is created, the new snapshots can be assigned a layer identifier. A In this context, creating snapshot does not degrade the IO performance. Hence, a very large number of snapshots can be created. This is unlike the snapshot technology which employs chaining to link the snapshots.
More specifically, in step 102 a base image can be provided. A base image (e.g. a DOCKER base image) can a basic image on which addition layers (e.g. filesystem changes) are added and a final image containing an application can be created. In step 104, a snapshot can be generated. The snapshot can be of the base image or another previously generated snapshot. In step 106, an incremental level can be set for the snapshot.
It is noted that multiple snapshots can be created from the base image. For example, snapshot S2 can be generated as a read/write snapshot of the base image at a specified time. Additional, ‘n’ number of snapshots (e.g. snapshot S3, etc.) can be created in this way). When a snapshot is created, the base image is ‘frozen’. This means that the base image is no longer writeable. The ‘frozen’ base image is read only. In order to write to the base image, a clone of the base image B′ can be generated. This base-image clone B′ can receive and implement write operations. Additionally, snapshots can be cloned into other snapshots to create various chains of snapshots as well. For example, a snapshot S11 can be made from snapshot S1. A snapshot S21 can be made from snapshot S11. A snapshot S12 can be made from snapshot S2. Tree 200 is provided by way of example and not of limitation. In other examples, other tree-structures can be generated with other chains of scalable snapshots.
The method of formation of tree 200 can be utilized to generate a representation of metadata related to the base image and snapshots of tree 200. Each level of the depth of the tree can be numbers. For example, the base image can be level 0->1. The layer of B′, S1 and S2 can be level 1->2 and so on as provided in
Metadata information about the base image and snapshots can also include information about the relevant layer identifiers. Accordingly,
However, S3 306 forms a single chain of snapshots (e.g. no divergent snapshots) and can be deleted. S2 in
When the base image creates metadata nodes that record operation histories (e.g. write operations, etc.), the associated metadata nodes can have a layer identifier of ‘1’. Continuing the present example, when S1 creates a root entry, its metadata nodes can have a layer identifier ‘2’. Each member of the tree 200 creates metadata nodes with the corresponding layer identifiers as provided in
When a write operation to S1 is occurring, the metadata nodes with layer identifier ‘2’ can be traversed until the leaf node is determined. Any nodes traversed on the way to the leaf node that are not already labeled with layer identifier ‘2’ can be copied and relabeled with identifier layer ‘2’. For write operations, just the nodes that are traversed are modified and copied to the current layer identifier.
When there is a read operation, the various nodes of tree 300 can include pointers to the addressed block. For example, the dotted line shows a path to reach a leaf-node block of the base image that can be reached from snapshot S1 as the block was never modified by S1. Layer deletion operations within a block can be performed directly on the blocks with the same layer identifier. It is noted that snapshots within a chain (e.g. not at the end of the chain) cannot be deleted. In the present example of tree 300, S1 cannot be deleted but S2 can be deleted. This is because here there is a single chain of snapshots with no divergent snapshots from it. In the event S2 is deleted, its node's layer identifiers can be dropped down to layer identifier ‘3’ and its entries can be merged with the next dependent snapshot.
A key-value database can be a database system that uses key-value pairs. For example, in a key-value database, the data can be represented by a key-value pair. This paradigm can be used to build the data-storage system. The data-storage system can be based on the key-value system may not have a hierarchy as with a ‘traditional’ filesystem. For example, given a value of an entity as its key, the key-value system can determine how the data is stored (e.g. how stored on an SSD, in a cloud-computing platform, etc.). The key-value pairs can have a set of constructs. For example, given a key then a particular value can be obtained. This can be used to implement various operations such as, inter alia: get operations, put operations, etc. In this way, a particular key-value database can be dependent on its own key-value pairs that may not be transferable to other types of key-value databases.
In one example, key-value database A 402 can be a Cassandra®-type database. The metadata of key-value database A 402 can be decoupled from its data. For example, the keys of key-value database A 402 can be queried and stored as metadata 404. The metadata 404 can be represented based on the structures and methods of trees 200 and 300 in
Additional Computer Architecture
Although the present embodiments have been described with reference to specific example embodiments, various modifications and changes can be made to these embodiments without departing from the broader spirit and scope of the various embodiments. For example, the various devices, modules, etc. described herein can be enabled and operated using hardware circuitry, firmware, software or any combination of hardware, firmware, and software (e.g., embodied in a machine-readable medium).
In addition, it can be appreciated that the various operations, processes, and methods disclosed herein can be embodied in a machine-readable medium and/or a machine accessible medium compatible with a data processing system (e.g., a computer system), and can be performed in any order (e.g., including using means for achieving the various operations). Accordingly, the specification and drawings are to be regarded in an illustrative rather than a restrictive sense. In some embodiments, the machine-readable medium can be a non-transitory form of machine-readable medium.
Claims
1. A computerized method, useful for providing and managing scalable snapshots of a storage entity that avoids reference counts that leads to amplification issues, comprising:
- providing a base image;
- generating a scalable snapshot of the base image; and
- setting an incremental layer identifier for the scalable snapshot.
2. The computerized method of claim 1 further comprising:
- generating a chain of scalable snapshots, wherein each layer of the chain of scalable snapshots comprises a layer incremental layer identifier correlating to each respective layer.
3. The computerized method of claim 2 further comprising:
- providing a reference of a set of metadata for the base image;
- representing the set of metadata as a tree data structure; and
- within the tree data structure: representing the scalable snapshots with the incremental layer identifier, and wherein each time a chain of scalable snapshots is created, a set of new scalable snapshots is assigned a new incremental layer identifier.
4. The computerized method of claim 3, wherein the incremental layer identifier comprises an incremental number.
5. The computerized method of claim 4, wherein the tree data structure comprises a B-tree data structure.
6. The computerized method of claim 5 further comprising:
- creating a snapshot copy of a root node, wherein the root node comprises a portion of the storage entity.
7. The computerized method of claim 5, wherein the storage entity comprises a virtual disk system.
8. The computerized method of claim 5, wherein the storage entity comprises physical disk system.
9. A computer system, useful for providing and managing scalable snapshots of a storage entity that avoids reference counts that leads to amplification issues, comprising:
- at least one processor configured to execute instructions;
- a memory containing instructions when executed on the processor, causes the at least one processor to perform operations that: provide a base image; generate a scalable snapshot of the base image; and set an incremental layer identifier for the scalable snapshot.
10. The computerized system of claim 9, wherein the memory containing instructions when executed on the processor, causes the at least one processor to perform operations that:
- generates a chain of scalable snapshots, wherein each layer of the chain of scalable snapshots comprises a layer incremental layer identifier correlating to each respective layer.
11. The computerized system of claim 10, wherein the memory containing instructions when executed on the processor, causes the at least one processor to perform operations that:
- provides a reference of a set of metadata for the base image;
- represents the set of metadata as a tree data structure; and
- within the tree data structure: represents the scalable snapshots with the incremental layer identifier, and wherein each time a chain of scalable snapshots is created, a set of new scalable snapshots is assigned a new incremental layer identifier.
12. The computerized system of claim 11, wherein the incremental layer identifier comprises an incremental number.
13. The computerized system of claim 12, wherein the tree data structure comprises a B-tree data structure.
14. The computerized system of claim 13, wherein the memory containing instructions when executed on the processor, causes the at least one processor to perform operations that:
- creates a snapshot copy of a root node, wherein the root node comprises a portion of the storage entity.
15. The computerized system of claim 14, wherein the storage entity comprises a virtual disk system.
16. The computerized system of claim 15, wherein the storage entity comprises physical disk system.
Type: Application
Filed: Apr 30, 2018
Publication Date: Oct 31, 2019
Inventor: ASHISH PURI (PUNE)
Application Number: 15/965,979