Method and apparatus for data recovery using storage based journaling
A storage system maintains a journal and a snapshot of one or more data volumes. Two journal entry types are maintained, an AFTER journal entry and a BEFORE journal entry. Two modes of data recovery are provided: “fast” recovery and “undo-able” recovery. A combination of both recovery modes allows the user to quickly recover a targeted data state.
Latest HITACHI, LTD. Patents:
- SYSTEM, METHOD, AND PROGRAM FOR DATA TRANSFER PROCESS
- COMMUNICATION CONTROL SYSTEM AND COMMUNICATION CONTROL METHOD
- Signal processing system and signal processing method
- Storage apparatus and method of controlling storage controller
- Fracture surface analysis apparatus and fracture surface analysis method
This application is related to the following commonly owned and co-pending U.S. applications:
-
- “Method and Apparatus for Data Recovery Using Storage Based Journaling,” Attorney Docket Number 16869B-082700US, and
- “Method and Apparatus for Synchronizing Applications for Data Recovery Using Storage Based Journaling,” Attorney Docket Number 16869B-082900US, both of which are herein incorporated by reference for all purposes.
The present invention is related to computer storage and in particular to the recovery of data.
Several methods are conventionally used to prevent the loss of data. Typically, data is backed up in a periodic manner (e.g., once a day) by a system administrator. Many systems are commercially available which provide backup and recovery of data; e.g., Veritas NetBackup, Legato/Networker, and so on. Another technique is known as volume shadowing. This technique produces a mirror image of data onto a secondary storage system as it is being written to the primary storage system.
Journaling is a backup and restore technique commonly used in database systems. An image of the data to be backed up is taken. Then, as changes are made to the data, a journal of the changes is maintained. Recovery of data is accomplished by applying the journal to an appropriate image to recover data at any point in time. Typical database systems, such as Oracle, can perform journaling.
Except for database systems, however, there are no ways to recover data at any point in time. Even for database systems, applying a journal takes time since the procedure includes:
-
- reading the journal data from storage (e.g., disk)
- the journal must be analyzed to determine at where in the journal the desired data can be found
- apply the journal data to a suitable image of the data to reproduce the activities performed on the data—this usually involves accessing the image, and writing out data as the journal is applied
Also, if an application running on the database system interacts with another application (regardless of whether it is a database system or not), then there is no way to recover its data at any point in time. This is because there is no coordination mechanism to recover the data of the other application.
Recovering data at any point in time addresses the following types of administrative requirements. For example, a typical request might be, “I deleted a file by mistake at around 10:00 am yesterday. I have to recover the file just before it was deleted.”
If the data is not in a database system, this kind of request cannot be conveniently, if at all, serviced. A need therefore exists for processing data in a manner that facilitates recovery of lost data. A need exists for being able to provide data processing that facilitates data recovery in user environments other than in a database application, or database application interacting with other applications.
SUMMARY OF THE INVENTIONThe invention is directed to method and apparatus for data recovery and comprises performing a fast recovery mode operation in conjunction with an undo-able recovery mode operation. In the fast recovery mode operation, after-journal entries are applied to a snapshot to update the snapshot. In the undo-able recovery mode operation, a before-journal entry is taken of the snapshot before applying an after-journal entry to it. A user can perform one or more undo operations when a snapshot has been updated in the undo-able recovery mode.
BRIEF DESCRIPTION OF THE DRAWINGSAspects, advantages and novel features of the present invention will become apparent from the following description of the invention presented in conjunction with the accompanying drawings:
The backup and recovery system shown in
The host 110 typically will have one or more user applications (APP) 112 executing on it. These applications will read and/or write data to storage media contained in the data volumes 101 of storage system 100. Thus, applications 112 and the data volumes 101 represent the target resources to be protected. It can be appreciated that data used by the user applications can be stored in one or more data volumes.
In accordance with the invention, a journal group (JNLG) 102 is defined. The data volumes 101 are organized into the journal group. In accordance with the present invention, a journal group is the smallest unit of data volumes where journaling of the write operations from the host 110 to the data volumes is guaranteed. The associated journal records the order of write operations from the host to the data volumes in proper sequence. The journal data produced by the journaling activity can be stored in one or more journal volumes(JVOL) 106.
The host 110 also includes a recovery manager (RM) 111. This component provides a high level coordination of the backup and recovery operations. Additional discussion about the recovery manager will be discussed below.
The storage system 100 provides a snapshot (SS) 105 of the data volumes comprising a journal group. For example, the snapshot 105 is representative of the data volumes 101 in the journal group 106 at the point in time that the snapshot was taken. Conventional methods are known for producing the snapshot image. One or more snapshot volumes (SVOL) 107 are provided in the storage system which contain the snapshot data. A snapshot can be contained in one or more snapshot volumes. Though the disclosed embodiment illustrates separate storage components for the journal data and the snapshot data, it can be appreciated that other implementations can provide a single storage component for storing the journal data and the snapshot data.
A management table (MT) 108 is provided to store the information relating to the journal group 102, the snapshot 105, and the journal volume(s) 106.
A controller component 140 is also provided which coordinates the journaling of write operations and snapshots of the data volumes, and the corresponding movement of data among the different storage components 101, 106, 107. It can be appreciated that the controller component is a logical representation of a physical implementation which may comprise one or more sub-components distributed within the storage system 100.
The Journal Header 219 comprises an offset number (JH_OFS) 211. The offset number identifies a particular data volume 101 in the journal group 102. In this particular implementation, the data volumes are ordered as the 0th data volume, the 1st data volume, the 2nd data volume and so on. The offset numbers might be 0, 1, 2, etc.
A starting address in the data volume (identified by the offset number 211) to which the write data is to be written is stored to a field in the Journal Header 219 to contain an address (JH_ADR) 212. For example, the address can be represented as a block number (LBA, Logical Block Address).
A field in the Journal Header 219 stores a data length (JH_LEN) 213, which represents the data length of the write data. Typically it is represented as a number of blocks.
A field in the Journal Header 219 stores the write time (JH_TIME) 214, which represents the time when the write request arrives at the storage system 100. The write time can include the calendar date, hours, minutes, seconds and even milliseconds. This time can be provided by the disk controller 140 or by the host 110. For example, in a mainframe computing environment, two or more mainframe hosts share a timer, called the Sysplex Timer, and can provide the time in a write command when it is issued.
A sequence number(JH_SEQ) 215 is assigned to each write request. The sequence number is stored in a field in the Journal Header 219. Every sequence number within a given journal group 102 is unique. The sequence number is assigned to a journal entry when it is created.
A journal volume identifier (JH_JVOL) 216 is also stored in the Journal Header 219. The volume identifier identifies the journal volume 106 associated with the Journal Data 225. The identifier is indicative of the journal volume containing the Journal Data. It is noted that the Journal Data can be stored in a journal volume that is different from the journal volume which contains the Journal Header.
A journal data address (JH_JADR) 217 stored in the Journal Header 219 contains the beginning address of the Journal Data 225 in the associated journal volume 106 that contains the Journal Data.
A journal type field (JH_TYPE) 218 identifies the type of journal entry. In accordance with the invention, two types of journal entries are kept: (1) an AFTER journal and (2) a BEFORE journal. An AFTER journal entry contains the data that is contained in the write operation for which a journal entry is made. A BEFORE journal entry contains the original data of the area in storage that is the target of a write operation. A BEFORE journal entry therefore represents the contents “before” the write operation is performed. The purpose of maintaining BEFORE journal entries will be discussed below.
Journal Header 219 and Journal Data 225 are contained in chronological order in their respective areas in the journal volume 106. Thus, the order in which the Journal Header and the Journal Data are stored in the journal volume is the same order as the assigned sequence number. As will be discussed below, an aspect of the present invention is that the journal information 219, 225 wrap within their respective areas 210, 220.
A journal attribute (GRATTR) 312 is associated with the journal group 102. In accordance with this particular implementation, two attributes are defined: MASTER and RESTORE. The MASTER attribute indicates the journal group is being journaled. The RESTORE attribute indicates that the journal group is being restored from a journal.
A journal status (GRSTS) 315 is associated with the journal group 102. There are two statuses: ACTIVE and INACTIVE.
The management table includes a field to hold a sequence counter (SEQ) 313. This counter serves as the source of sequence numbers used in the Journal Header 219. When creating a new journal, the sequence number 313 is read and assigned to the new journal. Then, the sequence number is incremented and written back into the management table.
The number (NUM_DVOL) 314 of data volumes 101 contained in a give journal group 102 is stored in the management table.
A data volume list (DVOL_LIST) 320 lists the data volumes in a journal group. In a particular implementation, DVOL_LIST is a pointer to the first entry of a data structure which holds the data volume information. This can be seen in
The management table includes a field to store the number of journal volumes (NUM_JVOL) 330 that are being used to contain the data (journal header and journal data) associated with a journal group 102.
As described in
The management table includes fields to store pointers to different parts of the data areas 210, 220 to facilitate wrapping. Fields are provided to identify where the next journal entry is to be stored. A field (JI_HEAD_VOL) 331 identifies the journal volume 106 that contains the Journal Header Area 210 which will store the next new Journal Header 219. A field (JI_HEAD_ADR) 332 identifies an address on the journal volume of the location in the Journal Header Area where the next Journal Header will be stored. The journal volume that contains the Journal Data Area 220 into which the journal data will be stored is identified by information in a field (JI_DATA_VOL) 335. A field (JI_DATA_ADR) 336 identifies the specific address in the Journal Data Area where the data will be stored. Thus, the next journal entry to be written is “pointed” to by the information contained in the “JI_” fields 331, 332, 335, 336.
The management table also includes fields which identify the “oldest” journal entry. The use of this information will be described below. A field (JO_HEAD_VOL) 333 identifies the journal volume which stores the Journal Header Area 210 that contains the oldest Journal Header 219. A field (JO_HEAD_ADR) 334 identifies the address within the Journal Header Area of the location of the journal header of the oldest journal. A field (JO_DATA_VOL) 337 identifies the journal volume which stores the Journal Data Area 220 that contains the data of the oldest journal. The location of the data in the Journal Data Area is stored in a field (JO_DATA_ADR) 338.
The management table includes a list of journal volumes (JVOL_LIST) 340 associated with a particular journal group 102. In a particular implementation, JVOL_LIST is a pointer to a data structure of information for journal volumes. As can be seen in
The management table includes a list (SS_LIST) 350 of snapshot images 105 associated with a given journal group 102. In this particular implementation, SS_LIST is a pointer to snapshot information data structures, as indicated in
Each snapshot information data structure also includes a list of snapshot volumes 107 (
Further in accordance with the invention, a single sequence of numbers (SEQ) 313 are associated with each of one or more snapshots and journal entries, as they are created. The purpose of associating the same sequence of numbers to both the snapshots and the journal entries will be discussed below.
Continuing with
In a step 420, the recovery manager 111 will initiate the journaling process. Suitable communication(s) are made to the storage system 100 to perform journaling. In a step 425, the storage system will make a journal entry (also referred to as an “AFTER journal”) for each write operation that issues from the host 110.
With reference to
The fields JI_DATA_VOL 335 and in the management table identify the journal volume and the beginning of the Journal Data Area 220 for storing the data associated with the write operation. The JI_DATA_VOL and JI_DATA_ADR fields are copied to JH_JVOL 216 and to JH_ADR 212, respectively, of the Journal Header, thus providing the Journal Header with a pointer to its corresponding Journal Data. The data of the write operation is stored.
The JI_HEAD_VOL 331 and JI_HEAD_ADR 332 fields are updated to point to the next Journal Header 219 for the next journal entry. This involves taking the next contiguous Journal Header entry in the Journal Header Area 210. Likewise, the JI_DATA_ADR field (and perhaps JI_DATA_VOL field) is updated to reflect the beginning of the Journal Data Area for the next journal entry. This involves advancing to the next available location in the Journal Data Area. These fields therefore can be viewed as pointing to a list of journal entries. Journal entries in the list are linked together by virtue of the sequential organization of the Journal Headers 219 in the Journal Header Area 210.
When the end of the Journal Header Area 210 is reached, the Journal Header 219 for the next journal entry wraps to the beginning of the Journal Header Area. Similarly for the Journal Data 225. To prevent overwriting earlier journal entries, the present invention provides for a procedure to free up entries in the journal volume 106. This aspect of the invention is discussed below.
For the very first journal entry, the JO_HEAD_VOL field 333, JO_HEAD_ADR field 334, JO_DATA_VOL field 337, and the JO_DATA_ADR field 338 are set to contain their contents of their corresponding “JI_” fields. As will be explained the “JO_” fields point to the oldest journal entry. Thus, as new journal entries are made, the “JO_” fields do not advance while the “JI_” fields do advance. Update of the “JO_” fields is discussed below.
Continuing with the flowchart of
The snapshot is stored in one (or more) snapshot volumes (SVOL) 107. A suitable amount of memory is allocated for fields 355-357. The information relating to the SVOLs for storing the snapshot are then stored into the fields 355-357. If additional volumes are required to store the snapshot, then additional memory is allocated for fields 355-357.
Recovering data typically requires recover the data state of at least a portion of the data volumes 101 at a specific time. Generally, this is accomplished by applying one or more journal entries to a snapshot that was taken earlier in time relative to the journal entries. In the disclosed illustrative embodiment, the sequence number SEQ 313 is incremented each time it is assigned to a journal entry or to a snapshot. Therefore, it is a simple matter to identify which journal entries can be applied to a selected snapshot; i.e., those journal entries whose associated sequence numbers (JH_SEQ, 215) are greater than the sequence number (SS_SEQ, 351) associated with the selected snapshot.
For example, the administrator may specify some point in time, presumably a time that is earlier than the time (the “target time”) at which the data in the data volume was lost or otherwise corrupted. The time field SS_TIME 352 for each snapshot is searched until a time earlier than the target time is found. Next, the Journal Headers 219 in the Journal Header Area 210 is searched, beginning from the “oldest” Journal Header. The oldest Journal Header can be identified by the “JO_” fields 333, 334, 337, and 338 in the management table. The Journal Headers are searched sequentially in the area 210 for the first header whose sequence number JH_SEQ 215 is greater than the sequence number SS_SEQ 351 associated with the selected snapshot. The selected snapshot is incrementally updated by applying each journal entry, one at a time, to the snapshot in sequential order, thus reproducing the sequence of write operations. This continues as long as the time field JH_TIME 214 of the journal entry is prior to the target time. The update ceases with the first journal entry whose time field 214 is past the target time.
In accordance with one aspect of the invention, a single snapshot is taken. All journal entries subsequent to that snapshot can then be applied to reconstruct the data state at a given time. In accordance with another aspect of the present invention, multiple snapshots can be taken. This is shown in
If the free space falls below a predetermined threshold, then in a step 720 some of the journal entries are applied to a snapshot to update the snapshot. In particular, the oldest journal entry(ies) are applied to the snapshot.
Referring to
As an observation, it can be appreciated by those of ordinary skill, that the sequence numbers will eventually wrap, and start counting from zero again. It is well within the level of ordinary skill to provide a suitable mechanism for keeping track of this when comparing sequence numbers.
Continuing with
Thus, in step 730, if the threshold for stopping the process is met (i.e., free space exceeds threshold), then the process stops. Otherwise, step 720 is repeated for the next oldest journal entry. Steps 730 and 720 are repeated until the free space level meets the threshold criterion used in step 730.
If such a snapshot can be found in step 721, then the earlier journal entries can be removed without having to apply them to a snapshot. Thus, in a step 722, the “JO_” fields (JO_HEAD_VOL 333, JO_HEAD_ADR 334, JO_DATA_VOL 337, and JO_DATA_ADR 338) are simply moved to a point in the list of journal entries that is later in time than the selected snapshot. If no such snapshot can be found, then in a step 723 the oldest journal entry is applied to a snapshot that is earlier in time than the oldest journal entry, as discussed for step 720.
Still another alternative for step 721 is simply to select the most recent snapshot. All the journal entries whose sequence numbers are less than that of the most recent snapshot can be freed. Again, this simply involves updating the “JO_” fields so they point to the first journal entry whose sequence number is greater than that of the most recent snapshot. Recall that an aspect of the invention is being able to recover the data state for any desired point in time. This can be accomplished by storing as many journal entries as possible and then applying the journal entries to a snapshot to reproduce the write operations. This last embodiment has the potential effect of removing large numbers of journal entries, thus reducing the range of time within which the data state can be recovered. Nevertheless, for a particular configuration it may be desirable to remove large numbers of journal entries for a given operating environment.
In another aspect of the present invention, recovery of the production volume(s) 101 can be facilitated by allowing the user to interact with the recovery process. A “fast recovery” can be performed which quickly recovers the data state to a point in time prior to a target time. A more granular recovery procedure can then be performed which allows a user to hone in on the target data state. The user can perform “undo-able recoveries” to inspect the data state in a trial and error manner by allowing the user to step forward and backward (undo operation) in time. This aspect of the invention allows a user to be less specific as to the time of the desired data state. The target time specified by the user need only be a time that he is certain is prior to the time of the target data state. It is understood that “the target data state” can refer to any desired state of the data.
The fields related to the AFTER journal entries include a field to store the number of journal volumes (NUM_JVOLa) 330 that are used to contain the data journal header and journal data) associated with the AFTER journal entries for a journal group 102.
As described in
The management table includes fields to store pointers to different parts of the data areas 210, 220 to facilitate wrapping. Pointer-type information is provided to facilitate identifying where the next journal entry is to be stored. A set of such information (“AFTER journal pointers”) is provided for the AFTER journal entries. A field (JVOL_PTRa) 331 in the management table identifies the location of the AFTER journal pointers.
The AFTER journal entries are stored in one or more journal volumes, separate from the BEFORE journal entries. A field (JI_HEAD_VOL) 331 a identifies the journal volume 106 that contains the Journal Header Area 210 from which the next Journal Header 219 will be obtained. A field (JI_HEAD_ADR) 331b identifies where in the in Journal Header Area the next Journal Header is located. The journal volume that contains the Journal Data Area 220 into which the journal data will be stored is identified by information in a field (JI_DATA_VOL) 331e. A field (JI_DATA_ADR) 33 if identifies the specific address in the Journal Data Area where the data will be stored. Thus, the next AFTER journal entry to be written is “pointed” to by the information contained in the “JI_” fields 331a, 331b, 331e, 331f.
The AFTER journal pointers also includes fields which identify the “oldest” AFTER journal entry. The use of this information will be described below. A field (JO_HEAD_VOL) 331c identifies the journal volume which stores the Journal Header Area 210 that contains the oldest Journal Header 219. A field (JO_HEAD_ADR) 331d identifies the address within the Journal Header Area of the location of the journal header of the oldest journal. A field (JO_DATA_VOL) 331 g identifies the journal volume which stores the Journal Data Area 220 that contains the data of the oldest journal. The location of the data in the Journal Data Area is stored in a field (JO_DATA_ADR) 331h.
The management table includes a list of journal volumes (JVOL_LISTa) 340 associated with the AFTER journal entries of a journal group 102. In a particular implementation, JVOL_LISTa is a pointer to a data structure of information for journal volumes. As can be seen in
The management table also includes a set of similar fields for managing the BEFORE journal entries. The fields related to the BEFORE journal entries include a field to store the number of journal volumes (NUM_JVOLb) 332 that are being used to contain the data (journal header and journal data) associated with the BEFORE journal entries for a journal group 102.
As discussed above for the AFTER journal entries, an aspect of the invention is that the data areas 210, 220 wrap. The management table includes fields to store pointers to different parts of the data areas 210, 220 to facilitate wrapping. Pointer-type information is provided to facilitate identifying where the next BEFORE journal entry is to be stored. A set of such information (“BEFORE journal pointers”) is provided for the BEFORE journal entries. A field (JVOL_PTRb) 333 in the management table identifies the location of the BEFORE journal pointers.
The BEFORE journal entries are stored in one or more journal volumes, separate from the journal volume(s) used to store the AFTER journal entries. A field (JI_HEAD_VOL) 332a identifies the journal volume 106 that contains the Journal Header Area 210 from which the next Journal Header 219 will be obtained. A field (JI_HEAD_ADR) 332b identifies where in the in Journal Header Area the next Journal Header is located. The journal volume that contains the Journal Data Area 220 into which the journal data will be stored is identified by information in a field (JI_DATA_VOL) 332e. A field (JI_DATA_ADR) 332f identifies the specific address in the Journal Data Area where the data will be stored. Thus, the next BEFORE journal entry to be written is “pointed” to by the information contained in the “JI_” fields 332a, 332b, 332e, 332f.
The AFTER journal pointers also includes fields which identify the “oldest” BEFORE journal entry. The use of this information will be described below. A field (JO_HEAD_VOL) 332c identifies the journal volume which stores the Journal Header Area 210 that contains the oldest Journal Header 219. A field (JO_HEAD_ADR) 332d identifies the address within the Journal Header Area of the location of the journal header of the oldest journal. A field (JO_DATA_VOL) 332g identifies the journal volume which stores the Journal Data Area 220 that contains the data of the oldest journal. The location of the data in the Journal Data Area is stored in a field (JO_DATA_ADR) 332h.
The management table includes a list of journal volumes (JVOL_LISTh) 341 associated with the AFTER journal entries of a journal group 102. In a particular implementation, JVOL_LISTa is a pointer to a data structure of information for journal volumes. As can be seen in
The recover manager 111 provides the following interface to the storage system for the aspect of the invention which provides for “fast” and “undo-able” recovery modes. The interface is shown in a format of an application programmer's interface (API). The functionality and needed information (parameters) are described. It can be appreciated that any suitable programming language can be used.
BACKUP journal_volume
-
- This initiates backup processing to commence in the storage system 100. More specifically, the logging of AFTER journal entries is initiated for each write operation to the data volumes 101. The parameter journal_volume identifies the volume 102 that contains the journal entries. As discussed above, a initial snapshot is taken after journaling commences.
RECOVER_PH1 journal_volume target_time
-
- This initiates a PHASE I recovery process. This recovery is the procedure discussed above. Briefly, AFTER journal entries are applied to an appropriate snapshot. The journal entries are contained in the volume(s) identified by journal_volume. The desired data state is specified by target_time. The target_time can be a time format (e.g., year:month:date:hh:mm). Alternatively, the target_time can be a journal sequence number 215, so that journal entries subsequent to the sequence number associated with the snapshot and up to the specified sequence number are applied. Still another alternative is that the target_time is simply the number of journal entries to be applied to a snapshot (e.g., apply the next one hundred journal entries).
- Depending on configuration and storage resources, the snapshot can be copied to the production volume. Data recovery can then proceed on the production volume.
RECOVER_PH2 journal_volume—1 journal_volume—2 target_time
-
- This initiates a PHASE II recovery process. As will be discussed in more detail below, this procedure involves making a BEFORE journal entry before applying eacn AFTER journal entry to a snapshot. As will be explained below, this recovery process allows for “un-doing” an update operation on a snapshot. The AFTER journal entries are located in the volume identified by journal_volume—1. The BEFORE journal entries are located in the volume identified by journal_volume—2. The desired data state is specified by target_time. The desired data state is specified by target_time. The target_time can be a time format (e.g., year:month:date:hh:mm).
- Alternatively, the target_time can be a journal sequence number 215, so that journal entries subsequent to the sequence number associated with the snapshot and up to the specified sequence number are applied. Still another alternative is that the target_time is simply the number of journal entries to be applied to a snapshot (e.g., apply the next one hundred journal entries).
STOP_RECOVER
-
- This will cause the storage system to cease recovery processing. Thus, a PHASE I recovery operation or a PHASE II recovery operation will be terminated. In addition, BEFORE journaling is initiated. This will cause BEFORE journal entries to be made each time the host 110 issues a write operation, in addition to the making an AFTER journal entry.
UNDO_RECOVER journal_volume—1 journal_volume—2 target_time
-
- As will be discussed in more detail below, this operation will revert an updated snapshot to an earlier point in time. This is accomplished by “undoing” one or more applications of an AFTER journal entry. The target_time can be any of the forms previously discussed.
Referring now to
It can be appreciated that the recovery manager 111 can include a suitable interface for interaction with a user. An appropriate interface might be a graphical user interface, or a command line interface. It can be appreciated that voice recognition technology and even virtual reality technology can be used as input and output components of the interface for interacting with a user. Alternatively, the “user” can be a machine (such as a data processing system) rather than a human. In such a case, a suitable machine-machine interface can be readily devised and implemented.
The first phase of the recovery process is referred to as “fast” recovery. The idea is to quickly access the data state of the recovery volume at a point in time that is “close” in time to the desired data state, but prior in time to the desired data state. Thus, in a step 810, the recovery manager 111 obtains from the user a “target time” that specifies a point in time that is close to the time of the desired data state. A suitable query to the user might inform the user as to the nature of this target time. For example, if the user interacted with a system administrator, she might tell the administrator that she was sure her files were not deleted until after 10:30 AM. The target time would then be 10:30 AM, or earlier. Likewise, a user interface can obtain such information from a user by presenting a suitable set of queries or prompts. Given the target time, the recovery manager can then issue a RECOVER_PH1 operation to the storage system (e.g., system 100,
In response, the storage system would initiate phase I recovery. Referring to
-
- (1) a good snapshot exists—A snapshot must have been taken between the oldest journal and newest journal. As discussed above, every snapshot has a sequence number. The sequence number can be used to identify a suitable snapshot. If the sequence number of a candidate snapshot is greater than that of the oldest journal and smaller than that of the newest journal, then the snapshot is suitable.
(2) recovery target time is in scope—The target time that user specifies must be between the oldest journal and the newest journal.
Then in a step 920, the recovery volume is set to an offline state. In the context of the present invention “offline” is taken to mean that the user, and more generally the host device 110, cannot access the recovery volume. For example, in the case that the production volume is being used as the recovery volume, it is likely to be desirable that the host 110 be prevented at least from issuing write operations to the volume. Also, the host typically will not be permitted to perform read operations. Of course, the storage system itself has full access to the recovery volume in order to perform the recovery task.
In a step 930, the snapshot is copied to the recovery volume in preparation for phase I recovery. Tthe production volume itself can be the recovery volume. However, it can be appreciated that the recovery manager 111 can allow the user to specify a volume other than the production volume to serve as the target of the data recovery operation. For example, the recovery volume can be the volume on which the snapshot is stored. Using a volume other than the production volume to perform the recovery operation may be preferred where it is desirable to provide continued use of the production volume.
In a step 940, one or more AFTER journal entries are applied to update the snapshot volume in the manner as discussed previously. Enough AFTER journal entries are applied to update the snapshot to a point in time up to or prior to the user-specified target time.
Returning to
Next, in a step 840, the user is given the opportunity to review the state of the data on the recovery volume to determine whether the desired data state has been recovered. At this point, the data state has been recovered to some point in time prior to the time of the desired data state. Additional recovery might bee needed to reach the desired data state. If the desired data state has been achieved then the recovery process is stopped. If the desired data state is not achieved, then a determination is made whether another phase I recovery operation is to be performed, or whether a phase II recovery operation is to be performed.
Recall that phase I recovery involves updating the snapshot by applying the AFTER journal entries to it to reproduce the sequence of write operations made since the snapshot was taken. A phase II recovery operation involves taking a BEFORE journal entry for each AFTER journal entry that is applied. It can be appreciated that phase II recovery is a slower process than phase I recover. The decision whether to proceed using phase I recovery mode or phase II recovery mode can be made by the user after she has inspected the recovered data state. For example, she may learn from inspecting the recovered data state that an additional few hours of recovery is needed, in which case she may specify via the recovery manager 111 to perform the faster phase I recovery and provide a refined target time. If the recovered data state seems close to the desired data state, then the user may want to perform the slower phase II recovery to take advantage of the “undo” aspect (see below) provided by a phase II recovery operation.
Alternatively, the user interface can algorithmically determine whether to perform phase I or phase II recover. The interface can input the user's refined target time and compare that against the initial target time. Based on the comparison, the interface can choose an appropriate recovery mode. For example, if the difference in time is X minutes or greater, then a phase I recovery is performed, otherwise a phase II recovery is commenced.
A factor to consider at this decision point (step 840) is that phase I recovery cannot be conveniently “undone.” If the recovered data state is beyond the desired data state, then the only way to reverse the data recovery action is to start again from the original snapshot. This can be time consuming. A phase II recovery in accordance with the present invention, on the other hand, can be undone. Thus, if a recovered data state is close to the user's refined time estimate, then a phase II recovery operation may be preferred.
In a step 860, a STOP_RECOVER operation is issued to put the recovery volume in an online state. The user is then able to inspect the recovery volume. Based on the inspection, if the user determines in a step 870 that the desired data state of the recovery volume is achieved, then the recovery process is complete. If the user determines that the desired data state is not achieved, then a further determination is made whether the data recovery has gone beyond the desired data state. If so, then the snapshot updates are “undone” (step 880) by accessing one or more BEFORE journal entries. This combination of taking BEFORE journals and AFTER journals constitutes a phase II recovery.
Continuing, to the next AFTER journal entry 1014a, again a BEFORE journal entry 1014 is created to record the original data in the area of the production volume that is the target of the AFTER journal entry before the AFTER journal entry is applied to the snapshot 1020. Again, a pair of journal entries result: an AFTER journal entry 1014a and its corresponding BEFORE journal entry 1014. Similar BEFORE journal entries 1016 and 1018 are created for the AFTER journal entries 1016a and 1018a.
Now, with reference to
Referring to
-
- procedure includes applying the information contained in the BEFORE journals to the updated snapshots. The BEFORE journal entries are applied in timewise reverse order. Thus, to restore the snapshot from its state in 1012d to its previous state in 1120c, the BEFORE journal entry 1018 is applied to the snapshot 1020d to reproduce the snapshot 1120c. To perform another “undo” iteration, the BEFORE journal entry 1016 is applied to the snapshot 1120c to reproduce the snapshot 1120b. From this discussion, it can be appreciated that in order to “undo” a snapshot that has been updated by a set of AFTER journals, a BEFORE journal is needed that exists earlier in time than any of the AFTER journals in the set. Phase II processing provides the requisite BEFORE journal entries in order to perform the undo operation.
Returning to
It can be appreciated that phase II processing will be slower than phase I recovery for the reason that a BACKUP journal entry must be created before applying an AFTER journal entry to update the snapshot. For this reason, phase I recovery is also referred to as “fast recovery.” Since phase II recovery permits the user to undo an updated snapshot, it can be referred to as “undo-able” recovery.
The foregoing disclosed embodiments typically can be provided using a combination of hardware and software implementations; e.g., combinations of software, firmware, and/or custom logic such as ASICs (application specific ICs) are possible. One of ordinary skill can readily appreciate that the underlying technical implementation will be determined based on factors including but not limited to or restricted to system cost, system performance, the existence of legacy software and legacy hardware, operating environment, and so on. The disclosed embodiments can be readily reduced to specific implementations without undue experimentation by those of ordinary skill in the relevant art.
Claims
1. A method for processing data in a data store comprising:
- obtaining a snapshot of a data store;
- updating the snapshot with one or more first after-journal entries; and
- after updating the snapshot with one or more first after-journal entries, performing one or more subsequent updates of the snapshot with one or more second after-journal entries, each subsequent update of the snapshot including: storing a before-journal entry; and after storing the before-journal entry, applying one of the second after-journal entries to the snapshot, wherein the subsequent updates of the snapshot can be undone.
2. The method of claim 1 further comprising, after performing one or more subsequent updates, applying one or more before-journal entries to the snapshot, wherein one or more updates of the snapshot by the second after-journal entries can be undone.
3. The method of claim 2 further comprising receiving information indicative of an undo request, and in response thereto performing the step of applying one or more before-journal entries to the snapshot.
4. The method of claim 1 wherein the number of first after-journal entries is determined based on a user-provided target time.
5. The method of claim 1 wherein the second after-journal entries are applied in increasing order of time.
6. The method of claim 1 wherein the step of updating the snapshot with one or more first after-journal entries includes further updating the snapshot with one or more additional after-journal entries, wherein the step of further updating is performed in response to receiving information indicative of a fast recovery request.
7. The method of claim 1 wherein the step of obtaining a snapshot includes making a copy of the snapshot on the data store, wherein the updating steps are performed on the copy of the snapshot stored on the data store.
8. The method of claim 1 further comprising receiving information indicative of a user-specified data store, wherein the step of obtaining a snapshot includes making a copy of the snapshot on the user-specified data store, wherein the updating steps are performed on the copy of the snapshot stored on the user-specified data store.
9. A data processing device comprising:
- a data store;
- a controller;
- a data storage component configured to store after-journal entries and before-journal entries, and further configured to provide access to the after-journal entries and the before-journal entries,
- the controller configured to access the data store and to access the data storage component,
- the controller further configured to perform the method steps of claim 1.
10. A method for processing data comprising:
- obtaining a snapshot of at least a portion of a data store;
- applying a plurality of first after-journal entries to update the snapshot, including receiving a first time indication from a user, the number of first after-journal entries being based on the first time indication;
- providing access to the snapshot so that the user can access the snapshot;
- receiving a recovery mode indication and a second time indication from the user;
- applying a plurality of second after-journal entries to further update the snapshot, the number of second after-journal entries being based on the second time indication; and
- if the recovery mode indication is indicative of an undo-able recovery mode, then for each second after-journal entry, taking a before-journal entry of the snapshot before applying the second after-journal entry to the snapshot.
11. The method of claim 10 further comprising receiving a third time indication from the user and applying one or more before-journal entries to the snapshot, the number of before-journal entries that are applied to the snapshot being dependent on the third time indication.
12. A data processing system comprising:
- a host component comprising at least one host processing unit;
- a storage component comprising at least one storage control unit;
- first program control means contained in the host component for controlling operation of the host processing unit; and
- second program control means contained in the storage component for controlling operation of the storage control unit,
- the first program control means and the second program control means further for operating, respectively, the host processing unit and the storage control unit to perform the method steps of claim 10.
13. The data processing system of claim 12 wherein the first program control means comprises first program code and the second program control means comprises second program code.
14. A method for processing data on a data store comprising:
- receiving input from a user indicative of a first data volume;
- receiving input from the user indicative of a second data volume;
- obtaining a snapshot of at least a portion of the first data volume;
- storing the snapshot on the second data volume;
- a first step of updating the snapshot with a plurality of first after-journal entries;
- providing user-access to the second data volume;
- receiving a first indication from the user, wherein if the first indication is indicative of a fast recovery operation, then repeating the first step of updating the snapshot with a plurality of second after-journal entries; and
- subsequent to the first step of updating, a second step of updating the snapshot with a plurality of third after-journal entries, including for each third after-journal entry taking a before-journal entry of the snapshot prior to updating the snapshot with the third after-journal entry,
- the first, second, and third after-journal entries being representative of write operations previously performed on the first data volume.
15. The method of claim 14 further comprising receiving input from the user indicative of a target time wherein the number of first after-journal entries is based on the target time.
16. The method of claim 15 further comprising receiving input from the user indicative of a refined target time wherein the number of second after-journal entries is based on the refined target time.
17. The method of claim 15 further comprising receiving input from the user indicative of a refined target time wherein the number of third after-journal entries is based on the refined target time.
18. The method of claim 14 further comprising applying one or more before-journal entries to the snapshot to undo snapshot updates produced by the application of one or more of the third after-journal entries.
19. The method of claim 14 further comprising receiving a second indication from the user and in response thereto, applying one or more before-journal entries to the snapshot to undo snapshot updates produced by the application of one or more of the third after-journal entries.
20. The method of claim 19 further comprising receiving input from the user indicative of a time, wherein the number of before-journal entries is based on the time.
21. The method of claim 19 wherein the one or more before-journal entries are applied sequentially beginning with the most recent before-journal entry.
22. The method of claim 14 wherein the first data volume and the second data volume refer to the same data volume, wherein the snapshot represents a data state of at least a portion of the first data volume at a first point in time.
23. The method of claim 14 wherein the first data volume is a production volume and the second data volume refers to a data volume different from the production volume.
Type: Application
Filed: Jul 16, 2003
Publication Date: Jan 20, 2005
Applicant: HITACHI, LTD. (Tokyo)
Inventor: Kenji Yamagami (Los Gatos, CA)
Application Number: 10/621,791