METHOD AND SYSTEM FOR FULL RESTORE POINT CREATION USING INCREMENTAL BACKUPS

Info

Publication number: 20200349012
Type: Application
Filed: Sep 5, 2019
Publication Date: Nov 5, 2020
Inventors: Shelesh Chopra (Bangalore), Yasemin Ugur-Ozekinci (Oakville), Kenneth William Owens (Burlington), Amith Ramachandran (Bangalore), Navneet Upadhyay (Ghaziabad), Trichy Krishnamurthi Premkumar (Mount Waverley), Hui Yang (Carlsbad, CA)
Application Number: 16/562,160

Abstract

A method and system for full restore point creation using incremental backups. Specifically, the method and system disclosed herein entail performing a modified synthetic full backup operation during processes directed to the protection of database data. The modified synthetic full backup operation, in contrast to traditional synthetic full backup operations, provides a mechanism through which a history of successively stored or synthesized full backups that may be maintained as full restore (or rollback) points for a given database.

Description

Description

CROSS-REFERENCE TO RELATED APPLICATIONS

This application claims priority under 35 U.S.C. § 119(e) to U.S. Provisional Patent Application No. 62/841,764, filed on May 1, 2019, which is incorporated herein by reference in its entirety.

BACKGROUND

In traditional synthetic full backup operations, full backups are synthesized on the backup storage system through the merging of one or more incremental backups with the current full backup, which had previously been stored or synthesized thereon. The merging of the current full backup with the incremental backup(s) produces a new current full backup, which replaces the previous current full backup.

SUMMARY

In general, in one aspect, the invention relates to a method for performing backup operations. The method includes receiving a first remote write request including a first incremental backup, identifying a first full backup relevant to the first incremental backup, making a determination that a backup tag associated with the first full backup matches a reserved incremental merge tag, cloning, based on the determination, the first full backup to obtain a first full backup copy, and merging the first incremental backup with the first full backup to produce a second full backup.

In general, in one aspect, the invention relates to a system. The system includes a backup storage system including a first computer processor configured to receive a remote write request including an incremental backup, identify a first full backup relevant to the incremental backup, make a determination that a backup tag associated with the first full backup matches a reserved incremental merge tag, clone, based on the determination, the first full backup to obtain a first full backup copy, and merge the incremental backup with the first full backup to produce a second full backup.

In general, in one aspect, the invention relates to a non-transitory computer readable medium (CRM). The non-transitory CRM includes computer readable program code, which when executed by a computer processor, enables the computer processor to receive a first remote write request including a first incremental backup, identify a first full backup relevant to the first incremental backup, make a determination that a backup tag associated with the first full backup matches a reserved incremental merge tag, clone, based on the determination, the first full backup to obtain a first full backup copy, and merge the first incremental backup with the first full backup to produce a second full backup.

Other aspects of the invention will be apparent from the following description and the appended claims.

BRIEF DESCRIPTION OF DRAWINGS

FIG. 1A shows a system in accordance with one or more embodiments of the invention.

FIG. 1B shows a client device in accordance with one or more embodiments of the invention.

FIG. 1C shows a backup storage system in accordance with one or more embodiments of the invention.

FIG. 2 shows a flowchart describing a method for processing a remote write request in accordance with one or more embodiments of the invention.

FIGS. 3A and 3B show flowcharts describing a method for processing a remote write request in accordance with one or more embodiments of the invention.

FIG. 4 shows an exemplary computing system in accordance with one or more embodiments of the invention.

DETAILED DESCRIPTION

Specific embodiments of the invention will now be described in detail with reference to the accompanying figures. In the following detailed description of the embodiments of the invention, numerous specific details are set forth in order to provide a more thorough understanding of the invention. However, it will be apparent to one of ordinary skill in the art that the invention may be practiced without these specific details. In other instances, well-known features have not been described in detail to avoid unnecessarily complicating the description.

In the following description of FIGS. 1A-4, any component described with regard to a figure, in various embodiments of the invention, may be equivalent to one or more like-named components described with regard to any other figure. For brevity, descriptions of these components will not be repeated with regard to each figure. Thus, each and every embodiment of the components of each figure is incorporated by reference and assumed to be optionally present within every other figure having one or more like-named components. Additionally, in accordance with various embodiments of the invention, any description of the components of a figure is to be interpreted as an optional embodiment which may be implemented in addition to, in conjunction with, or in place of the embodiments described with regard to a corresponding like-named component in any other figure.

Throughout the application, ordinal numbers (e.g., first, second, third, etc.) may be used as an adjective for an element (i.e., any noun in the application). The use of ordinal numbers is not to necessarily imply or create any particular ordering of the elements nor to limit any element to being only a single element unless expressly disclosed, such as by the use of the terms “before”, “after”, “single”, and other such terminology. Rather, the use of ordinal numbers is to distinguish between the elements. By way of an example, a first element is distinct from a second element, and a first element may encompass more than one element and succeed (or precede) the second element in an ordering of elements.

In general, embodiments of the invention relate to a method and system for full restore point creation using incremental backups. Specifically, one or more embodiments of the invention entails performing a modified synthetic full backup operation during processes directed to the protection of database data. The modified synthetic full backup operation, in contrast to traditional synthetic full backup operations, provides a mechanism through which a history of successively stored or synthesized full backups that may be maintained as full restore (or rollback) points for a given database.

FIG. 1A shows a system in accordance with one or more embodiments of the invention. The system (100) may include one or more client devices (102A-102N) operatively connected to a backup storage system (106). Each of these system (100) components is described below.

In one embodiment of the invention, the above-mentioned system (100) components may operatively connect to one another through a network (104) (e.g., a local area network (LAN), a wide area network (WAN) such as the Internet, a mobile network, etc.). The network may be implemented using any combination of wired and/or wireless connections. Further, the network may encompass various interconnected, network-enabled subcomponents (or systems) (e.g., switches, routers, gateways, etc.) that may facilitate communications between the above-mentioned system (100) components. Moreover, the above-mentioned system (100) components may communicate with one another using any combination of wired and/or wireless communication protocols.

In one embodiment of the invention, a client device (102A-102N) may represent any physical appliance or computing system designed and configured to receive, generate, process, store, and/or transmit digital data, as well as to provide an environment in which one or more computer programs may execute thereon. The computer programs (not shown) may, for example, implement large-scale and complex data processing; or implement one or more services offered locally or over the network (104). Further, in providing an execution environment for any computer programs installed thereon, a client device (102A-102N) may include and allocate various resources (e.g., computer processors, memory, storage, virtualization, network bandwidth, etc.), as needed, to the computer programs and the tasks (or processes) instantiated thereby. One of ordinary skill will appreciate that a client device (102A-102N) may perform other functionalities without departing from the scope of the invention. Examples of a client device (102A-102N) may include, but are not limited to, a desktop computer, a laptop computer, a server, a mainframe, or any other computing system similar to the exemplary computing system shown in FIG. 4. Moreover, client devices (102A-102N) are described in further detail below with respect to FIG. 1B.

In one embodiment of the invention, the backup storage system (106) may represent a data backup, archiving, and/or disaster recovery storage system. The backup storage system (106) may be implemented using one or more servers (not shown). Each server may refer to a physical server, which may reside in a datacenter, or a virtual server, which may reside in a cloud computing environment. Additionally or alternatively, the backup storage system (106) may be implemented using one or more computing systems similar to the exemplary computing system shown in FIG. 4. Furthermore, the backup storage system (106) is described in further detail below with respect to FIG. 1C.

While FIG. 1A shows a configuration of components, other system (100) configurations may be used without departing from the scope of the invention.

FIG. 1B shows a client device in accordance with one or more embodiments of the invention. The client device (102) may include one or more user programs (110A-110N), a client protection agent (112), a client dedupe agent (114), a client operating system (116), and a client storage array (124). Each of these client device (102) components is described below.

In one embodiment of the invention, a user program (110A-110N) may refer to a computer program that may execute on the underlying hardware of the client device (102). Specifically, a user program (110A-110N) may be designed and configured to perform one or more functions, tasks, and/or activities instantiated by a user of the client device (102). Accordingly, towards performing these operations, a user program (110A-110N) may include functionality to request and consume client device (102) resources (e.g., computer processors, memory, storage (124), virtualization, network bandwidth, etc.) by way of service calls to the client operating system (116). One of ordinary skill will appreciate that a user program (110A-110N) may perform other functionalities without departing from the scope of the invention. Examples of a user program (110A-110N) may include, but are not limited to, a word processor, an email client, a database client, a web browser, a media player, a file viewer, an image editor, a simulator, a computer game, or any other computer executable application.

In one embodiment of the invention, the client protection agent (112) may refer to a computer program that may execute on the underlying hardware of the client device (102). Specifically, the client protection agent (112) may be designed and configured to perform client-side database backup and recovery operations. To that extent, the client protection agent (112) may protect one or more databases (also referred herein as assets (126A-126N)) on the client device (102) against data loss (i.e., backup the database(s)); and reconstruct one or more databases on the client device (102) following such data loss (i.e., recover the database(s)). One of ordinary skill will appreciate that the client protection agent (112) may perform other functionalities without departing from the scope of the invention.

In one embodiment of the invention, the client dedupe agent (114) may refer to a computer program that may execute on the underlying hardware of the client device (102). Specifically, the client dedupe agent (114) may be designed and configured to perform client- or source-side data deduplication. Source-side data deduplication may refer to the identification and subsequent elimination of redundant data prior to transmission of the data to the backup storage system (106). To that extent, the client dedupe agent (114) may include functionality to: obtain database data selected for backup from and by the client protection agent (112); apply data deduplication on the obtained database data to render deduplicated database data; and remotely write the deduplicated database data onto the backup storage system (106) by way of the backup file system interface (122) (described below). One of ordinary skill will appreciate that the client dedupe agent (114) may perform other functionalities without departing from the scope of the invention.

In one embodiment of the invention, the client operating system (116) may refer to a computer program that may execute on the underlying hardware of the client device (102). Specifically, the client operating system (116) may be designed and configured to oversee client device (102) operations. To that extent, the client operating system (116) may include functionality to, for example, support fundamental client device (102) functions; schedule tasks; mediate interactivity between logical (e.g., software) and physical (e.g., hardware) client device (102) components; allocate client device (102) resources; and execute or invoke other computer programs executing on the client device (102). One of ordinary skill will appreciate that the client operating system (116) may perform other functionalities without departing from the scope of the invention.

For example, the client operating system (116) may facilitate user program (110A-110N) interaction with asset (126A-126N) data stored locally on the client device (102) or remotely over the network (104). In facilitating the aforementioned interaction, the client operating system (116) may implement a client logical file system (118). The client logical file system (118) may represent a collection of in-memory data structures maintained, by the client operating system (116), to manage the various accessible asset (126A-126N) data stored locally on the client device (102) and/or remotely on the backup storage system (106). Further, the client logical file system (118) may expose an application programming interface (API) through which the user program(s) (110A-110N) may manipulate—i.e., via one or more file operations—any granularity of locally and/or remotely stored asset (126A-126N) data. These file operations, requested by the user program(s) (110A-110N), may subsequently be delivered to the client file system (120) or backup file system interface (122) for processing.

In one embodiment of the invention, the client file system (120) may represent a physical file system (also referred to as a file system implementation). A physical file system may refer to a collection of subroutines concerned with the physical operation of one or more physical storage devices (described below). The client file system (120), in this respect, may be concerned with the physical operation of the client storage array (124). Accordingly, the client file system (120) may employ client storage array (124) device drivers (or firmware) to process requested file operations from the user program(s) (110A-110N). Device drivers enable the client file system (120) to manipulate physical storage or disk blocks as appropriate.

In one embodiment of the invention, the backup file system interface (122) may represent a computer program that may execute on the underlying hardware of the client device (102). Specifically, the backup file system interface (122) may be designed and configured to facilitate the access and manipulation of remotely stored database data as if the aforementioned database data were stored locally on the client device (102). Accordingly, the backup file system interface (122) may, in part, implement a distributed file system (DFS), which may employ any known distributed file system protocol (e.g., the network file system (NFS) protocol). A distributed file system may refer to a mechanism through which files (e.g., database data) may be stored and accessed based on client-server architecture over a network (104). Particularly, in a distributed file system, one or more central appliances (e.g., the backup storage system (106)) store files that can be accessed, with proper authorization permissions, by any number of remote clients (e.g., the client device (102)) across the network (104). Furthermore, the backup file system interface (122) may include functionality to issue remote procedure calls (RPCs) directed to accessing and manipulating any granularity of database data remotely stored on the backup storage system (106).

In one embodiment of the invention, the client storage array (124) may refer to a collection of one or more physical storage devices (not shown) on which various forms of digital data—e.g., one or more assets (126A-126N) (described below)—may be consolidated. Each physical storage device may encompass non-transitory computer readable storage media on which data may be stored in whole or in part, and temporarily or permanently. Further, each physical storage device may be designed and configured based on a common or different storage device technology—examples of which may include, but are not limited to, flash based storage devices, fibre-channel (FC) based storage devices, serial-attached small computer system interface (SCSI) (SAS) based storage devices, and serial advanced technology attachment (SATA) storage devices. Moreover, any subset or all of the client storage array (124) may be implemented using persistent (i.e., non-volatile) storage. Examples of persistent storage may include, but are not limited to, optical storage, magnetic storage, NAND Flash Memory, NOR Flash Memory, Magnetic Random Access Memory (M-RAM), Spin Torque Magnetic RAM (ST-MRAM), Phase Change Memory (PCM), or any other storage defined as non-volatile Storage Class Memory (SCM).

In one embodiment of the invention, an asset (126A-126N) may represent a database, or a logical container to and from which related digital data, or any granularity thereof, may be stored and retrieved, respectively. An asset (126A-126N) may occupy a portion of a physical storage device or, alternatively, may span across multiple physical storage devices, of the client storage array (124). Furthermore, an asset (126A-126N) may refer to a composite of various database objects including, but not limited to, one or more data files, one or more control files, and one or more redo log files (all not shown). Each of these asset (126A-126N) subcomponents is described below.

In one embodiment of the invention, a data file may refer to a database object for storing database data. Database data may encompass computer readable content (e.g., images, text, video, audio, machine code, any other form of computer readable content, or a combination thereof), which may be generated, interpreted, and/or processed by any given user program (110A-110N). Further, a data file may store database data in (a) undeduplicated form or (b) deduplicated form. In brief, the latter form of database data may be produced through the application of data deduplication on the former form of the database data. That is, undeduplicated database data may entail computer readable content that may or may not include redundant information. In contrast, deduplicated database data may result from the elimination of any redundant information found throughout the undeduplicated computer readable content and, accordingly, may instead reflect a content recipe of the undeduplicated computer readable content. A content recipe may refer to a sequence of chunk identifiers (or pointers) associated with (or directed to) unique database data chunks consolidated in physical storage. Collectively, the sequence of chunk identifiers (or pointers)—representative of the deduplicated database data—may be used to reconstruct the corresponding undeduplicated database data. Moreover, a given chunk identifier for a given database data chunk may encompass a cryptographic fingerprint or hash of the given database data chunk.

In one embodiment of the invention, a control file may refer to a database object for storing asset (126A-126N) metadata (also referred to as database metadata). Database metadata may encompass information descriptive of the database (or asset (126A-126N)) status and structure. By way of examples, database metadata may include, but are not limited to, a database name assigned to the asset (126A-126N), the name(s) and storage location(s) of one or more data files and redo log files associated with the asset (126A-126N), a creation timestamp encoding the date and/or time marking the creation of the asset (126A-126N), a log sequence number associated with a current redo log file, etc.

In one embodiment of the invention, a redo log file may refer to a database object for storing a history of changes made to the database data. A redo log file may include one or more redo entries (or redo records), which may include a set of change vectors. Each change vector subsequently describes or represents a modification made to a single asset (126A-126N) data block. Furthermore, a redo log file may serve to recover the asset (126A-126N) should a failover occur, or to apply recent changes to a recovered asset (126A-126N) which may have transpired during the database recovery process.

While FIG. 1B shows a configuration of components, other client device (102) configurations may be used without departing from the scope of the invention.

FIG. 1C shows a backup storage system in accordance with one or more embodiments of the invention. The backup storage system (106) may include a backup operating system (140), a backup protection agent (148), and a backup storage array (150). Each of these backup storage system (106) components is described below.

In one embodiment of the invention, the backup operating system (140) may refer to a computer program that may execute on the underlying hardware of the backup storage system (106). Specifically, the backup operating system (140) may be designed and configured to oversee backup storage system (106) operations. To that extent, the backup operating system (140) may include functionality to, for example, support fundamental backup storage system (106) functions; schedule tasks; mediate interactivity between logical (e.g., software) and physical (e.g., hardware) backup storage system (106) components; allocate backup storage system (106) resources; and execute or invoke other computer programs executing on the backup storage system (106). One of ordinary skill will appreciate that the backup operating system (140) may perform other functionalities without departing from the scope of the invention.

For example, the backup operating system (140) may facilitate backup asset (156A-156N) access and manipulation by one or more computer programs (e.g., backup protection agent (148)) executing locally on the backup storage system (106) or, alternatively, by one or more remote computing systems (e.g., client device(s) (102A-102N)) over the network (104). In facilitating the aforementioned interaction, the backup operating system (140) may implement a backup logical file system (142). The backup logical file system (142) may represent a collection of in-memory data structures maintained, by the backup operating system (140), to manage the various accessible backup asset (156A-156N) data stored locally on the backup storage system (106). Further, the backup logical file system (142) may expose an application programming interface (API) through which the local computer programs and/or remote computing systems may manipulate—i.e., via one or more file operations—any granularity of locally stored backup asset (156A-156N) data. File operations, requested by the local computer programs, may be delivered to the backup file system (146) for processing, whereas file operations, requested by the remote computing systems, may be received and processed by the backup file system service (144).

In one embodiment of the invention, the backup file system service (144) may represent a computer program that may execute on the underlying hardware of the backup storage system (106). Specifically, the backup file system service (144) may be designed and configured to facilitate the authorized, remote access and manipulation of locally stored backup database data. Accordingly, the backup file system service (144) may, in part, implement a distributed file system (DFS), which may employ any known distributed file system protocol (e.g., the network file system (NFS) protocol). A distributed file system may refer to a mechanism through which files (e.g., database data) may be stored and accessed based on client-server architecture over a network (104). Particularly, in a distributed file system, one or more central appliances (e.g., the backup storage system (106)) store files that can be accessed, with proper authorization permissions, by any number of remote clients (e.g., the client device(s) (102A-102N)) across the network (104). Furthermore, the backup file system service (144) may include functionality to service remote procedure calls (RPCs) directed to accessing and manipulating any granularity of backup database data locally stored on the backup storage system (106).

In one embodiment of the invention, the backup file system (146) may represent a physical file system (also referred to as a file system implementation). A physical file system may refer to a collection of subroutines concerned with the physical operation of one or more physical storage devices (described below). The backup file system (146), in this respect, may be concerned with the physical operation of the backup storage array (150). Accordingly, the backup file system (146) may employ backup storage array (150) device drivers (or firmware) to process requested file operations from the local computer programs or the remote computing systems (via the backup file system service (144)). Device drivers enable the backup file system (146) to manipulate physical storage or disk blocks as appropriate.

In one embodiment of the invention, the backup protection agent (148) may refer to a computer program that may execute on the underlying hardware of the backup storage system (106). Specifically, the backup protection agent (148) may be designed and configured to perform server-side database backup and recovery operations. To that extent, the backup protection agent (148) may receive database data, submitted by the client device(s) (102A-102N), to store as backup assets (156A-156N) on the backup storage array (150) during database backup operations; and, conversely, may retrieve backup database data from the backup storage array (150) during database recovery operations. One of ordinary skill will appreciate that the backup protection agent (148) may perform other functionalities without departing from the scope of the invention.

In one embodiment of the invention, the backup storage array (150) may refer to a collection of one or more physical storage devices (not shown) on which various forms of digital data—e.g., one or more backup assets (156A-156N) (described below)—may be consolidated. Each physical storage device may encompass non-transitory computer readable storage media on which data may be stored in whole or in part, and temporarily or permanently. Further, each physical storage device may be designed and configured based on a common or different storage device technology—examples of which may include, but are not limited to, flash based storage devices, fibre-channel (FC) based storage devices, serial-attached small computer system interface (SCSI) (SAS) based storage devices, and serial advanced technology attachment (SATA) storage devices. Moreover, any subset or all of the backup storage array (150) may be implemented using persistent (i.e., non-volatile) storage. Examples of persistent storage may include, but are not limited to, optical storage, magnetic storage, NAND Flash Memory, NOR Flash Memory, Magnetic Random Access Memory (M-RAM), Spin Torque Magnetic RAM (ST-MRAM), Phase Change Memory (PCM), or any other storage defined as non-volatile Storage Class Memory (SCM).

In one embodiment of the invention, the backup storage array (150) may include a fingerprint store (152) and a chunk store (154), which may collectively consolidate deduplicated database data. Recall from above (see e.g., FIG. 1B), that deduplicated database data may result from the elimination of any redundant information found throughout the database data in undeduplicated form. Accordingly, instead of reflecting the binary composition of the undeduplicated database data in its entirety, deduplicated database data may alternatively reflect reduced information in the form of a content recipe of the representative, undeduplicated computer readable content. The aforementioned content recipe may refer to a sequence of chunk identifiers (or pointers) associated with (or directed to) unique database data chunks identified throughout the undeduplicated database data. Any unique database data chunks, along with their respective chunk identifiers (i.e., cryptographic fingerprints or hashes), may be indexed in appropriate physical storages—e.g., the chunk store (154) and the fingerprint store (152), respectively.

In one embodiment of the invention, the fingerprint store (152) may represent a repository for maintaining chunk identifiers. Each chunk identifier may be indexed by way of a fingerprint store (152) entry (not shown), which may store a mapping relating the chunk identifier to a storage identifier. A chunk identifier (also referred to as a fingerprint or hash) may represent a digital signature that uniquely identifies an associated database data chunk. Further, a chunk identifier may be produced by submitting the associated database data chunk through a hash function, which may employ any existing cryptographic mapping algorithm As such, a chunk identifier may be outputted by the hash function given the associated database data chunk as input. Meanwhile, a storage identifier may represent a character or bit string that uniquely identifies a storage location in the backup storage array (150). By way of an example, a storage identifier may encompass a tuple reflecting (a) a storage device identifier uniquely assigned to a given physical storage device (not shown) of the backup storage array (150); and (b) a binary address assigned to a starting byte (or storage block) in the given physical storage device at which the database data chunk may be physically stored.

On the other hand, in one embodiment of the invention, the chunk store (154) may represent a repository for maintaining unique database data chunks. Each unique database data chunk may be indexed by way of a chunk store (154) entry (not shown), which may store a mapping relating a storage identifier (described above) to the unique database data chunk. A database data chunk may refer to a fragment or a partition of undeduplicated database data. More specifically, a database data chunk may capture a unique byte pattern that may occur or recur throughout the undeduplicated database data.

In one embodiment of the invention, a backup asset (156A-156N) may refer to a deduplicated backup copy of a given asset (126A-126N) (see e.g., FIG. 1B). Accordingly, a backup asset (156A-156N) may similarly represent a database, or a logical container to and from which related digital data, or any granularity thereof, may be stored and retrieved, respectively. A backup asset (156A-156N) may occupy a portion of a physical storage device or, alternatively, may span across multiple physical storage devices, of the backup storage array (150). Furthermore, similar to an asset (126A-126N), a backup asset (156A-156N) may refer to a composite of various database objects including, but not limited to, one or more data files, one or more control files, and one or more redo log files (all described above).

While FIG. 1C shows a configuration of components, other backup storage system (106) configurations may be used without departing from the scope of the invention.

FIG. 2 shows a flowchart describing a method for processing a remote write request in accordance with one or more embodiments of the invention. The various steps outlined below may be performed by the backup storage system (see e.g., FIGS. 1A and 1C). Further, while the various steps in the flowchart are presented and described sequentially, one of ordinary skill will appreciate that some or all steps may be executed in different orders, may be combined or omitted, and some or all steps may be executed in parallel.

Turning to FIG. 2, in Step 200, a remote write request is received from a client device. In one embodiment of the invention, the remote write request may include a full backup data file and a corresponding backup tag. The full backup data file may reflect all database data for a given asset (i.e., database) stored on the client device. Further, the full backup data file may arrive in deduplicated form and, thus, may include a full backup content recipe. A content recipe may refer to a sequence of chunk identifiers (or pointers) associated with (or directed to) unique database data chunks indexed in physical storage—e.g., the chunk store of the backup storage array (see e.g., FIG. 1C). In turn, a given chunk identifier for a given database data chunk may encompass a cryptographic fingerprint or hash of the given database data chunk. Accordingly, in one embodiment of the invention, the full backup content recipe may convey a sequence of fingerprints collectively representative of the undeduplicated full backup content. Meanwhile, the backup tag may represent a text attribute (i.e., metadata) that may be associated with or descriptive of the full backup data file.

In Step 202, the full backup content recipe (received in Step 200) is examined. In one embodiment of the invention, the examination may result in the identification of one or more unique full backup fingerprints (i.e., a set of unique full backup fingerprints), which may occur or recur throughout the full backup content recipe.

In Step 204, a full backup fingerprint set difference is obtained. In one embodiment of the invention, the full backup fingerprint set difference may refer to zero or more unique fingerprints, included in the set of unique full backup fingerprints (identified in Step 202), that may be excluded from a superset of known (i.e., pre-stored) unique fingerprints indexed in physical storage—e.g., the fingerprint store of the backup storage array (see e.g., FIG. 1C). For example, if the set of unique full backup fingerprints includes a first fingerprint, a second fingerprint, a third fingerprint, and a fourth fingerprint, whereas the superset of known unique fingerprints (indexed in the aforementioned fingerprint store) includes the first fingerprint, the second fingerprint, and the third fingerprint, then the fingerprint set difference of the two sets of fingerprints includes the fourth fingerprint. Alternatively, the superset of known unique fingerprints (indexed in the aforementioned fingerprint store) may include the fourth fingerprint as well and, in such a scenario, the fingerprint set difference of the two sets of fingerprints may result in an empty set.

In Step 206, a determination is made as to whether the full backup fingerprint set difference (obtained in Step 204) reflects an empty set. Accordingly, in one embodiment of the invention, if it is determined that the full backup fingerprint set difference is indeed an empty set, then the process proceeds to Step 216. On the other hand, in another embodiment of the invention, if it is alternatively determined that the full backup fingerprint set difference includes at least one unknown, unique fingerprint, then the process alternatively proceeds to Step 208.

In Step 208, after determining (in Step 206), that the full backup fingerprint set difference (obtained in Step 204) includes at least one unknown, unique fingerprint, the aforementioned full backup fingerprint set difference is transmitted to the client device (from which the remote write request had been received in Step 200).

In Step 210, in response to the above-mentioned transmission (performed in Step 208), a chunk package is received from the client device. In one embodiment of the invention, the chunk package may represent a logical container within which one or more database data chunks may be enclosed. The cardinality of enclosed database data chunks may match the cardinality of unknown, unique fingerprints transmitted to the client device via the full backup fingerprint set difference. Accordingly, each database data chunk, received via the chunk package, may map to a respective unknown, unique fingerprint of the full backup fingerprint set difference, where the unknown, unique fingerprint uniquely identifies the database data chunk. Furthermore, each database data chunk may reflect a different fragment (or byte pattern) that occurs or recurs throughout the undeduplicated full backup content.

In Step 212, the database data chunk(s) (received in Step 210) is/are indexed in physical storage—e.g., the chunk store of the backup storage array (see e.g., FIG. 1C). In one embodiment of the invention, each database data chunk may be indexed in the chunk store within a new chunk store entry, which may include a mapping relating the database data chunk to a storage identifier (described above) (see e.g., FIG. 1C).

In Step 214, the unknown, unique fingerprint(s), representative of the full backup fingerprint set difference (obtained in Step 204), is/are subsequently indexed in physical storage—e.g., the fingerprint store of the backup storage array (see e.g., FIG. 1C). Specifically, each unknown, unique fingerprint may be indexed in the fingerprint store within a new fingerprint store entry, which may include a mapping relating the unknown, unique fingerprint to the storage identifier mapped to the database data chunk that which the unknown, unique fingerprint uniquely identifies.

In Step 216, following the determination that the full backup fingerprint set difference (obtained in Step 204) is an empty set, or following the indexing of the unknown, unique fingerprint(s) (in Step 214), the full backup data file (or more, specifically, the associated full backup content recipe) (received in Step 200) is stored in the backup file system at a prescribed directory path. Thereafter, in Step 218, the backup tag (also received in Step 200) is recorded and stored as associative metadata describing the full backup data file.

FIGS. 3A and 3B show flowcharts describing a method for processing a remote write request in accordance with one or more embodiments of the invention. The various steps outlined below may be performed by the backup storage system (see e.g., FIGS. 1A and 1C). Further, while the various steps in the flowcharts are presented and described sequentially, one of ordinary skill will appreciate that some or all steps may be executed in different orders, may be combined or omitted, and some or all steps may be executed in parallel.

Turning to FIG. 3A, in Step 300, a remote write request is received from a client device. In one embodiment of the invention, the remote write request may include an incremental backup data file. The incremental backup data file may reflect any changes made to database data for a given asset (i.e., database), stored on the client device, since a previously submitted remote write request—which may have included another incremental backup data file or a full backup data file (described above) (see e.g., FIG. 2). Further, the incremental backup data file may arrive in deduplicated form and, thus, may include an incremental backup content recipe. A content recipe may refer to a sequence of chunk identifiers (or pointers) associated with (or directed to) unique database data chunks indexed in physical storage—e.g., the chunk store of the backup storage array (see e.g., FIG. 1C). In turn, a given chunk identifier for a given database data chunk may encompass a cryptographic fingerprint or hash of the given database data chunk. Accordingly, in one embodiment of the invention, the incremental backup content recipe may convey a sequence of fingerprints collectively representative of the undeduplicated incremental backup content.

In Step 302, the incremental backup content recipe (received in Step 300) is examined. In one embodiment of the invention, the examination may result in the identification of one or more unique incremental backup fingerprints (i.e., a set of unique incremental backup fingerprints), which may occur or recur throughout the incremental backup content recipe.

In Step 304, an incremental backup fingerprint set difference is obtained. In one embodiment of the invention, the incremental backup fingerprint set difference may refer to zero or more unique fingerprints, included in the set of unique incremental backup fingerprints (identified in Step 302), that may be excluded from a superset of known (i.e., pre-stored) unique fingerprints indexed in physical storage—e.g., the fingerprint store of the backup storage array (see e.g., FIG. 1C). For example, if the set of unique incremental backup fmgerprints includes a first fingerprint, a second fingerprint, a third fingerprint, and a fourth fingerprint, whereas the superset of known unique fingerprints (indexed in the aforementioned fingerprint store) includes the first fingerprint, the second fingerprint, and the third fingerprint, then the fingerprint set difference of the two sets of fingerprints includes the fourth fingerprint. Alternatively, the superset of known unique fingerprints (indexed in the aforementioned fingerprint store) may include the fourth fingerprint as well and, in such a scenario, the fingerprint set difference of the two sets of fingerprints may result in an empty set.

In Step 306, a determination is made as to whether the incremental backup fingerprint set difference (obtained in Step 304) reflects an empty set. Accordingly, in one embodiment of the invention, if it is determined that the incremental backup fingerprint set difference is indeed an empty set, then the process proceeds to Step 316. On the other hand, in another embodiment of the invention, if it is alternatively determined that the incremental backup fingerprint set difference includes at least one unknown, unique fingerprint, then the process alternatively proceeds to Step 308.

In Step 308, after determining (in Step 306), that the incremental backup fingerprint set difference (obtained in Step 304) includes at least one unknown, unique fingerprint, the aforementioned incremental backup fingerprint set difference is transmitted to the client device (from which the remote write request had been received in Step 300).

In Step 310, in response to the above-mentioned transmission (performed in Step 308), a chunk package is received from the client device. In one embodiment of the invention, the chunk package may represent a logical container within which one or more database data chunks may be enclosed. The cardinality of enclosed database data chunks may match the cardinality of unknown, unique fingerprints transmitted to the client device via the incremental backup fingerprint set difference. Accordingly, each database data chunk, received via the chunk package, may map to a respective unknown, unique fingerprint of the incremental backup fingerprint set difference, where the unknown, unique fingerprint uniquely identifies the database data chunk. Furthermore, each database data chunk may reflect a different fragment (or byte pattern) that occurs or recurs throughout the undeduplicated incremental backup content.

In Step 312, the database data chunk(s) (received in Step 310) is/are indexed in physical storage—e.g., the chunk store of the backup storage array (see e.g., FIG. 1C). In one embodiment of the invention, each database data chunk may be indexed in the chunk store within a new chunk store entry, which may include a mapping relating the database data chunk to a storage identifier (described above) (see e.g., FIG. 1C).

In Step 314, the unknown, unique fingerprint(s), representative of the incremental backup fingerprint set difference (obtained in Step 304), is/are subsequently indexed in physical storage—e.g., the fingerprint store of the backup storage array (see e.g., FIG. 1C). Specifically, each unknown, unique fingerprint may be indexed in the fingerprint store within a new fingerprint store entry, which may include a mapping relating the unknown, unique fingerprint to the storage identifier mapped to the database data chunk that which the unknown, unique fingerprint uniquely identifies.

In Step 316, following the determination that the incremental backup fingerprint set difference (obtained in Step 304) is an empty set, or following the indexing of the unknown, unique fingerprint(s) (in Step 314), a previously stored full backup data file is identified. In one embodiment of the invention, the identified full backup data file may be relevant to the incremental backup data file (received in Step 300). That is, the incremental backup data file may relate to the full backup data file such that the former reflects database data changes to the latter, which may have transpired since a last database backup operation.

Turning to FIG. 3B, in Step 320, a backup tag associated with the full backup data file (identified in Step 316) is obtained. In one embodiment of the invention, the backup tag may represent a text attribute (i.e., metadata) that may be associated with or descriptive of the full backup data file.

In Step 322, a determination is made as to whether the backup tag (obtained in Step 320) matches a reserved incremental merge tag (described below). Accordingly, in one embodiment of the invention, if it is determined that the backup tag matches the reserved incremental merge tag, then the process proceeds to Step 324. On the other hand, in another embodiment of the invention, if it is alternatively determined that the backup tag does not match the reserved incremental merge tag, then the process alternatively proceeds to Step 330.

In one embodiment of the invention, the reserved incremental merge tag may represent a unique, prescribed text attribute, which when matched, instructs the backup storage system to perform a modified synthetic full backup operation. In traditional synthetic full backup operations, full backups are synthesized on the backup storage system through the merging of one or more incremental backups with the current full backup, which had previously been stored or synthesized thereon. The aforementioned merging of the current full backup with the incremental backup(s) produces a new current full backup, which replaces the previous current full backup. Subsequently, in traditional synthetic full backup operations, only the new current full backup for any given time may be maintained, whereas any previous current full backups are discarded.

In one embodiment of the invention, the above-mentioned modified synthetic full backup operation, triggered by a backup tag matching the reserved incremental merge tag, includes the performance of an additional step prior to the execution of a traditional synthetic full backup operation. More specifically, the additional step may entail cloning the current full backup prior to its merger with one or more incremental backups to produce a new current full backup. Accordingly, with the performance of the additional step, a history of successively stored or synthesized full backups may be maintained as full restore (or rollback) points for a given asset.

In Step 324, after determining (in Step 322) that the backup tag (obtained in Step 320) matches the above-described reserved incremental merge tag, the full backup data file (identified in Step 316) is cloned. In one embodiment of the invention, cloning of the full backup data file may entail generating a pointer-based snapshot of the full backup data file. That is, rather than cloning or copying the physical database data itself, associated with the full backup data file, generation of a pointer-based snapshot exercises the cloning or copying of the full backup content recipe (described above). Subsequently, the cloning process of the full backup data file is rapid despite the physical storage space consumed by the associated physical database data; and, further, the cloning process results in the obtaining of a full backup data file copy, which consumes little to no physical storage capacity.

In Step 326, the full backup data file copy (or more, specifically, a copy of the associated full backup content recipe) (obtained in Step 324) is stored in the backup file system. More specifically, in one embodiment of the invention, whereas the full backup data file had been stored in a first directory path, the full backup data file copy may be stored in a second (or different) directory path following the cloning process.

In Step 328, the full backup data file copy (obtained in Step 324) is catalogued with the client device (from which the remote write request had been received in Step 300). In one embodiment of the invention, cataloguing of the full backup data file copy may entail informing the client device of the existence of the full backup data file copy. Further, informing the client device may include relaying metadata pertaining to the full backup data file copy—e.g., a timestamp encoding a date and/or time during which the full backup data file (from which the full backup data file copy depends) had been stored or synthesized; a backup identifier or name associated with the full backup data file copy; a directory path in the backup file system at which the full backup data file copy may be found, etc.

In Step 330, following the determination that the backup tag (obtained in Step 320) does not match the reserved incremental merge tag, or following the cataloguing of the full backup data file copy with the client device (performed in Step 328), the incremental backup file (received in Step 300) is merged into/with the full backup data file (identified in Step 316). In one embodiment of the invention, the aforementioned merger may entail applying any database data changes, recorded in the incremental backup data file, onto the full backup data file, thereby creating a new full backup data file. Further, the new full backup data file may be retained in deduplicated form, similar to the full backup data file or the incremental backup data file. Accordingly, the new full backup data file may include a new full backup content recipe representative of the new full backup content.

In Step 332, the full backup data file (identified in Step 316) is overwritten by (or replaced with) the new full backup data file (obtained in Step 330). Thereafter, in Step 334, the backup tag (obtained in Step 320) is recorded as metadata associated with the new full backup data file. That is, in one embodiment of the invention, association of the backup tag—that had been associated with the full backup data file—may be extended to the new full backup data file, which stems from at least the full backup data file.

FIG. 4 shows an exemplary computing system in accordance with one or more embodiments of the invention. The computing system (400) may include one or more computer processors (402), non-persistent storage (404) (e.g., volatile memory, such as random access memory (RAM), cache memory), persistent storage (406) (e.g., a hard disk, an optical drive such as a compact disk (CD) drive or digital versatile disk (DVD) drive, a flash memory, etc.), a communication interface (412) (e.g., Bluetooth interface, infrared interface, network interface, optical interface, etc.), input devices (410), output devices (408), and numerous other elements (not shown) and functionalities. Each of these components is described below.

In one embodiment of the invention, the computer processor(s) (402) may be an integrated circuit for processing instructions. For example, the computer processor(s) may be one or more cores or micro-cores of a central processing unit (CPU) and/or a graphics processing unit (GPU). The computing system (400) may also include one or more input devices (410), such as a touchscreen, keyboard, mouse, microphone, touchpad, electronic pen, or any other type of input device. Further, the communication interface (412) may include an integrated circuit for connecting the computing system (400) to a network (not shown) (e.g., a local area network (LAN), a wide area network (WAN) such as the Internet, mobile network, or any other type of network) and/or to another device, such as another computing device.

In one embodiment of the invention, the computing system (400) may include one or more output devices (408), such as a screen (e.g., a liquid crystal display (LCD), a plasma display, touchscreen, cathode ray tube (CRT) monitor, projector, or other display device), a printer, external storage, or any other output device. One or more of the output devices may be the same or different from the input device(s). The input and output device(s) may be locally or remotely connected to the computer processor(s) (402), non-persistent storage (404), and persistent storage (406). Many different types of computing systems exist, and the aforementioned input and output device(s) may take other forms.

Software instructions in the form of computer readable program code to perform embodiments of the invention may be stored, in whole or in part, temporarily or permanently, on a non-transitory computer readable medium such as a CD, DVD, storage device, a diskette, a tape, flash memory, physical memory, or any other computer readable storage medium. Specifically, the software instructions may correspond to computer readable program code that, when executed by a processor(s), is configured to perform one or more embodiments of the invention.

While the invention has been described with respect to a limited number of embodiments, those skilled in the art, having benefit of this disclosure, will appreciate that other embodiments can be devised which do not depart from the scope of the invention as disclosed herein. Accordingly, the scope of the invention should be limited only by the attached claims.

Claims

1. A method for performing backup operations, comprising:

receiving a first remote write request comprising a first incremental backup;

identifying a first full backup relevant to the first incremental backup;

making a determination that a backup tag associated with the first full backup matches a reserved incremental merge tag;

cloning, based on the determination, the first full backup to obtain a first full backup copy; and

merging the first incremental backup with the first full backup to produce a second full backup.

2. The method of claim 1, wherein the first remote write request is a remote procedure call (RPC) employed in a distributed file system protocol.

3. The method of claim 1, wherein the first full backup copy is a pointer-based snapshot of the first full backup.

4. The method of claim 1, wherein the first full backup is stored in a first directory of a physical file system, wherein the first full backup copy is stored in a second directory of the physical file system following the cloning of the first full backup.

5. The method of claim 4, wherein the second full backup replaces the first full backup in the first directory of the physical file system following the cloning of the first full backup.

6. The method of claim 1, wherein the first full backup and the first incremental backup each exist in deduplicated form.

7. The method of claim 6, wherein the first full backup, in deduplicated form, comprises a full backup content recipe comprising a first sequence of cryptographic fingerprints, wherein the first incremental backup, in deduplicated form, comprises an incremental backup content recipe comprising a second sequence of cryptographic fingerprints.

8. The method of claim 1, further comprising:

receiving a second remote write request comprising a second incremental backup;

identifying the second full backup as being relevant to the second incremental backup;

cloning the second full backup to obtain a second full backup copy; and

merging the second incremental backup with the second full backup to produce a third full backup.

9. A system, comprising:

a backup storage system comprising a first computer processor configured to: receive a remote write request comprising an incremental backup; identify a first full backup relevant to the incremental backup; make a determination that a backup tag associated with the first full backup matches a reserved incremental merge tag; clone, based on the determination, the first full backup to obtain a first full backup copy; and merge the incremental backup with the first full backup to produce a second full backup.

10. The system of claim 9, wherein the backup storage system further comprises a backup file system service executing on the first computer processor, wherein the backup file system service implements a portion of a distributed file system.

11. The system of claim 10, further comprising:

a client device operatively connected to the backup storage system and comprising a second computer processor configured to: submit the remote write request to the backup storage system using a distributed file system protocol employed by the distributed file system.

12. The system of claim 11, wherein the client device further comprises a backup file system interface executing on the second computer processor, wherein the backup file system interface implements another portion of the distributed file system.

13. A non-transitory computer readable medium (CRM) comprising computer readable program code, which when executed by a computer processor, enables the computer processor to:

receive a first remote write request comprising a first incremental backup;

identify a first full backup relevant to the first incremental backup;

make a determination that a backup tag associated with the first full backup matches a reserved incremental merge tag;

clone, based on the determination, the first full backup to obtain a first full backup copy; and

merge the first incremental backup with the first full backup to produce a second full backup.

14. The non-transitory CRM of claim 13, wherein the first remote write request is a remote procedure call (RPC) employed in a distributed file system protocol.

15. The non-transitory CRM of claim 13, wherein the first full backup copy is a pointer-based snapshot of the first full backup.

16. The non-transitory CRM of claim 13, wherein the first full backup is stored in a first directory of a physical file system, wherein the first full backup copy is stored in a second directory of the physical file system following the cloning of the first full backup.

17. The non-transitory CRM of claim 16, wherein the second full backup replaces the first full backup in the first directory of the physical file system following the cloning of the first full backup.

18. The non-transitory CRM of claim 13, wherein the first full backup and the first incremental backup each exist in deduplicated form.

19. The non-transitory CRM of claim 18, wherein the first full backup, in deduplicated form, comprises a full backup content recipe comprising a first sequence of cryptographic fingerprints, wherein the first incremental backup, in deduplicated form, comprises an incremental backup content recipe comprising a second sequence of cryptographic fingerprints.

20. The non-transitory CRM of claim 13, comprising computer readable program code, which when executed by the computer processor, further enables the computer processor to:

receive a second remote write request comprising a second incremental backup;

identify the second full backup as being relevant to the second incremental backup;

clone the second full backup to obtain a second full backup copy; and

merge the second incremental backup with the second full backup to produce a third full backup.