CONSTRUCTION OF AN INFORMATION MANAGEMENT DATABASE IN A HIERARCHICAL DATA STORAGE SYSTEM
Systems and methods for constructing a database in an information management system. The systems and methods may include using metadata stored in non-production storage devices to construct an information management system database. In some implementations the methods are used to consolidate multiple information management systems. In other implementations, the systems and methods are used for reviewing content of archived storage media.
This application is a continuation of U.S. application Ser. No. 16/387,488, filed Apr. 17, 2019, which is a continuation of U.S. application Ser. No. 14/733,250, filed Jun. 8, 2015, which is a continuation of U.S. application Ser. No. 13/727,981, filed Dec. 27, 2012 (U.S. Pat. No. 9,069,799), which are hereby incorporated by reference in their entireties.
BACKGROUNDInformation management systems organize and backup information, i.e., “production data”, generated during the operation of client computing systems. Information management systems enable companies, and other computing system users, to comply with legal requirements and other business needs by providing retrievable copies, i.e., “secondary copies”, of the production data for each client computing system. Information management systems make various types of secondary copies, such as a backup copy, a snapshot copy, a hierarchical storage management (“HSM”) copy, an archive copy, and other types of copies. Each of the types of secondary copies have advantages and weaknesses as compared to each other type of secondary copy, but each type of secondary copy generally enables a company to restore settings or data of a computing system to a particular past point-in-time.
To illustrate, an example company that would use an information management system might be a clothing manufacturer, e.g., of denim jeans, that is based out of San Francisco, Calif. The jeans manufacturer uses hundreds of computers to conduct business operations, i.e., to generate production data. The generated production data includes, among other things, reports generated by accountants, benefits records maintained by human resources (“HR”), spreadsheets predicting the future of fashion trends, communications between internal departments, orders placed with third-party material distributors, records showing compliance with international manufacturing laws, and other business-critical information. The jeans manufacturer would be at a tremendous loss if all production data were lost, so the jeans manufacturer uses an information management system to organize and create secondary copies of production data. Information management systems generally use at least one managing computing device to transfer the non-production copies of production data to non-production storage media, such as magnetic drives, magnetic tapes, optical media, solid-state media, or cloud storage devices. With each transfer to the non-production storage media, the managing computing device keeps record of which client computing device information is stored at which non-production storage media, device or location. The managing computing device may compile the records into a table or other data structure (“managing computing device database”) to keep track of where the non-production data for each client computing device is stored. In the case that one or more of the client computing devices experiences a failure, the managing computing device uses its database as a reference to restore lost production data using the non-production data stored at the non-production storage media. However, if the storage manager database suffers from a disaster, e.g., the storage manager hard drive is damaged in an earthquake or flood, the mappings of non-production data of the hundreds of client computing devices to specific locations on non-production storage media may all be lost. Thus, the loss of the managing computing device database may render all of the non-production data effectively unusable because the ability to restore the non-production to a particular one of the hundreds of client computing devices is lost.
Currently, to prevent such catastrophic losses, information management systems use techniques that require a company to make various types of secondary copies of the managing computing device database before the managing computing devices database encounters a disaster. However, creation and management of the secondary copies of portions of the information management system require additional resources and therefore raise complexity and overall cost of the information management system. Some of the current techniques include: maintaining a second managing computing device, and creating a secondary copy of the managing computing device database in the non-production storage media. To maintain a second managing computing device, the information management system causes the second managing computing device to mirror the database of the primary managing computing device. The information management system then brings the second managing computing device online to replace the primary managing computing device if the primary managing computing device encounters a disaster.
As an alternative to maintaining a second managing computing device, the information management system can store a secondary copy of the managing computing device database on non-production storage media or at a different or offsite location. If the managing computing device encounters a disaster, the information management system uses the secondary copy of the managing computing device database to create a replacement managing computing device database.
In practice, the jeans manufacturer would have to install a second managing computing device or configure the information management system to install a secondary copy of the managing computing device database in the non-production storage media because these techniques would enable the jeans manufacturer to preserve its valued data. However, if the jeans manufacturer fails to install a second storage manager or fails to configure the storage manager to store secondary copies of the storage manager database, the jeans manufacturer would, according to currently used techniques, lose access to the non-production data until customized software scripts or other time-intensive software could be written to extract the desired information. Further, even if a secondary copy were maintained, e.g. at an offsite and offline location, it can take an intolerable amount of time to restore a failed managing computing device.
The need exists for systems and methods that overcome the above problems, as well as systems and methods that provide additional benefits. Overall, the examples herein of some prior or related systems and methods and their associated limitations are intended to be illustrative and not exclusive. Other limitations of existing or prior systems and methods will become apparent to those of skill in the art upon reading the following detailed description.
The techniques disclosed in this document are useful, in one aspect, in solving the above-discussed problems.
The headings provided herein are for convenience only and do not necessarily affect the scope or meaning of the disclosure.
OverviewDisclosed are systems and methods for restoring a storage manager database. If a company, such as the jeans manufacturer in the example above, fails to create secondary copies of portions of its information management system before encountering a storage manager database disaster, the jeans manufacturer loses the ability to restore client computing devices with the non-production data stored in non-production storage media. That is, the jeans manufacturer loses access to its backup data. The jeans manufacturer loses the ability to restore the client computing devices because the mapping, between the client computing devices and the respective non-production data on the non-production storage media, is contained in the storage manager database. The techniques disclosed herein enable the jeans manufacturer to restore a storage manager database in order to regain access to non-production data without having to make secondary copies of the information management system.
Notably, a restore agent may be used to restore databases for other computing devices in an information management cell, rather than simply restoring secondary copies or backup data. In some implementations of the systems and methods for restoring the storage manager database, a replacement storage manager executes the restore agent. The restore agent scans magnetic tapes, or other non-production storage media, and retrieves portions of the content of the magnetic tapes. The content includes headers and other metadata stored on the magnetic tapes. The restore agent restores the damaged or otherwise unusable storage manager database by building a database on the replacement storage manager based on the headers and other metadata retrieved from the magnetic tapes. Notably, the restore agent builds the database by retrieving some, but not necessarily all, of the content of the magnetic tapes. Moreover, the restore agent restores the storage manager database without the information management system having created a secondary copy of the storage manager database prior to the database disaster.
Similar to a storage manager, a media agent may include a database that maps non-production data on the non-production storage media to client computing devices that are directly managed by the media agent. If the database of the media agent is damaged or rendered unusable, the restore agent can be used to restore the database of the media agent. The restore agent can restore the database of the media agent by retrieving content from non-production storage media or by requesting from the storage manager those portions of the storage manager database that are associated with the media agent.
In other implementations, the replacement storage manager and restore agent consolidate multiple information management cells into a single information management cell. During times of economic prosperity, the jeans manufacturer may have grown its employee base and corresponding information technology (IT) resources to include multiple subsets of an information management system, i.e., information management cells. In an economic downturn, the jeans manufacturer may downsize employees and have reduced IT resource and reduced information management needs. To reduce costs and maintenance, the jeans manufacturer may use the replacement storage manager and restore agent to consolidate multiple information management cells.
In other implementations, the replacement storage manager and restore agent enables a user to access and review magnetic tapes, e.g., from an obsolete or no longer functioning information management system. This implementation advantageously enables a user, for example, a bankruptcy lawyer or trustee, to review boxes of records stored on magnetic tapes for information that may not be available on present systems of the bankrupt company.
Various examples of the systems and methods will now be described. The following description provides specific details for a thorough understanding and enabling description of these examples. One skilled in the relevant art will understand, however, that the invention may be practiced without many of these details. Likewise, one skilled in the relevant art will also understand that the invention may include many other obvious features not described in detail herein. Additionally, some well-known structures or functions may not be shown or described in detail below, so as to avoid unnecessarily obscuring the relevant description.
The terminology used below is to be interpreted in its broadest reasonable manner, even though it is being used in conjunction with a detailed description of certain specific examples of the invention. Indeed, certain terms may even be emphasized below; however, any terminology intended to be interpreted in any restricted manner will be overtly and specifically defined as such in this Detailed Description section.
Information Management EnvironmentAspects of the technologies described herein may be practiced in an information management environment 100, which will now be described while referencing
The environment 100 may include virtualized computing resources, such as a virtual machine 120 provided to the organization by a third-party cloud service vendor or a virtual machine 125 running on a virtual machine host 130 operated by the organization. For example, the organization may use one virtual machine 125A as a database server and another virtual machine 125B as a mail server. The environment 100 may also include mobile or portable computing devices, such as laptops 135, tablet computers 140, personal data assistants 145, mobile phones 152 (such as smartphones), and other mobile or portable computing devices such as embedded computers, set top boxes, vehicle-mounted devices, wearable computers, etc.
Of course, other types of computing devices may form part of the environment 100. As part of their function, each of these computing devices creates, accesses, modifies, writes, and otherwise uses production copies of data and metadata that are typically stored in a persistent storage medium having fast I/O times. For example, each computing device may regularly access and modify data files and metadata stored on semiconductor memory, a local disk drive or a network-attached storage device. Each of these computing devices may access data and metadata via a file system supported by an operating system of the computing device.
The environment 100 may also include hosted services 122 that provide various online services to the organization or its constituent members (e.g., the organization's departments, employees, independent contractors, etc.) such as social networking services (e.g., FACEBOOK®, TWITTER®, PINTEREST®), hosted email services (e.g., GMAIL®, YAHOO®, HOTMAIL®), or hosted productivity applications or other hosted applications (e.g., MICROSOFT OFFICE 365®, GOOGLE DOCS®, SALESFORCE. COM). Hosted services may include software-as-a-service (SaaS), platform-as-a-service (PaaS), application service providers (ASPs), cloud services, and all manner of delivering computing or functionality via a network. As it provides services to users, each hosted service may generate additional “hosted data and metadata” that is associated with each user. For example, FACEBOOK® may generate and store photos, wall posts, notes, videos, and other content that are associated with a particular FACEBOOK® user's account.
The organization directly or indirectly employs an information management system 150 to protect and manage the data and metadata used by the various computing devices in the environment 100 and the data and metadata that is maintained by hosted services on behalf of users associated with the organization. One example of an information management system is the COMMVAULT SIMPANA® system, available from CommVault Systems, Inc. of Oceanport, N.J. The information management system creates and manages non-production copies of the data and metadata to meet information management goals, such as: permitting the organization to restore data, metadata or both data and metadata if an original copy of the data/metadata is lost (e.g., by deletion, corruption, or disaster, or because of a service interruption by a hosted service); allowing data to be recovered from a previous time; complying with regulatory data retention and electronic discovery (“e-discovery”) requirements; reducing the amount of data storage media used; facilitating data organization and search; improving user access to data files across multiple computing devices and/or hosted services; and implementing information lifecycle management (“ILM”) or other data retention policies for the organization. The information management system 150 may create the additional non-production copies of the data and metadata on any suitable non-production storage medium such as magnetic disks 155, magnetic tapes 160, other storage media 165 such as solid-state storage devices or optical disks, or on cloud data storage sites 170 (e.g. those operated by third-party vendors). Further details on the information management system may be found in the assignee's U.S. patent application Ser. No. 12/751,850, filed Mar. 31, 2010 entitled DATA OBJECT STORE AND SERVER FOR A CLOUD STORAGE ENVIRONMENT, INCLUDING DATA DEDUPLICATION AND DATA MANAGEMENT ACROSS MULTIPLE CLOUD STORAGE SITES, now U.S. Patent Publication Number 2010/0332456 (attorney docket 606928075US2), which is hereby incorporated herein by reference in its entirety.
The information management system 150 accesses or receives copies of the various production copies of data objects and metadata, and via an information management operation (such as a backup operation, archive operation, or snapshot operation), creates non-production copies of these data objects and metadata, often stored in one or more non-production storage mediums 265 different than the production storage medium 218 where the production copies of the data objects and metadata reside. A non-production copy of a data object represents the production data object and its associated metadata at a particular point in time (non-production objects 260A-C). Since a production copy of a data object or metadata changes over time as it is modified by an application 215, hosted service 122, or the operating system 210, the information management system 150 may create and manage multiple non-production copies of a particular data object or metadata, each representing the state of the production data object or metadata at a particular point in time. Moreover, since a production copy of a data object may eventually be deleted from the production data storage medium and the file system from which it originated, the information management system may continue to manage point-in-time representations of that data object, even though a production copy of the data object itself no longer exists.
For virtualized computing devices, such as virtual machines, the operating system 210 and applications 215A-D may be running on top of virtualization software, and the production data storage medium 218 may be a virtual disk created on a physical medium such as a physical disk. The information management system may create non-production copies of the discrete data objects stored in a virtual disk file (e.g., documents, email mailboxes, and spreadsheets) and/or non-production copies of the entire virtual disk file itself (e.g., a non-production copy of an entire .vmdk file).
Each non-production object 260A-C may contain copies of or otherwise represent more than one production data object. For example, non-production object 260A represents three separate production data objects 255C, 230 and 245C (represented as 245C′, 230′ and 245′, respectively). Moreover, as indicated by the prime mark (′), a non-production object may store a representation of a production data object or metadata differently than the original format of the data object or metadata, e.g., in a compressed, encrypted, deduplicated, or otherwise optimized format. Although
Non-production copies include backup copies, archive copies, and snapshot copies. Backup copies are generally used for shorter-term data protection and restoration purposes and may be in a native application format or in a non-native format (e.g., compressed, encrypted, deduplicated, and/or otherwise modified from the original application format). Archive copies are generally used for long-term data storage purposes and may be compressed, encrypted, deduplicated and/or otherwise modified from the original application format. In some examples, when an archive copy of a data object is made, a logical reference or stub may be used to replace the production copy of the data object in the production storage medium 218. In such examples, the stub may point to or otherwise reference the archive copy of the data object stored in the non-production storage medium so that the information management system can retrieve the archive copy if needed. The stub may also include some metadata associated with the data object, so that a file system and/or application can provide some information about the data object and/or a limited-functionality version (e.g., a preview) of the data object. A snapshot copy represents a data object at a particular point in time. A snapshot copy can be made quickly and without significantly impacting production computing resources because large amounts of data need not be copied or moved. A snapshot copy may include a set of pointers derived from the file system or an application, where each pointer points to a respective stored data block, so collectively, the set of pointers reflect the storage location and state of the data object at a particular point in time when the snapshot copy was created. In “copy-on-write”, if a block of data is to be deleted or changed, the snapshot process writes the block to a particular data storage location, and the pointer for that block is now directed to that particular location. The set of pointers and/or the set of blocks pointed to by a snapshot may be stored within the production data storage medium 218.
Non-production copies of a data object or metadata may be distinguished from a production copy of a data object or metadata in several ways. First, a non-production copy of a data object is created to meet the different information management goals described above and is not directly used or modified by applications 215A-D, hosted services 122, or the operating system 210. Second, a non-production copy of a data object is stored as one or more non-production objects 260 that may have a format different from the native application format of the production copy of the data object, and thus often cannot be directly used by the native application or a hosted service 122 without first being modified. Third, non-production objects are often stored on a non-production storage medium 265 that is inaccessible to the applications 215A-D running on computing devices and hosted services 122. Also, some non-production copies may be “offline copies,” in that they are not readily available (e.g. not mounted tape or disk.) Offline copies include copies of data that the information management system can access without any human intervention (e.g. tapes within an automated tape library, but not yet mounted in a drive), and copies that the information management system 150 can access only with at least some human intervention (e.g. tapes located at an offsite storage site).
The information management system 150 also generates information management data 275, such as indexing information, that permit the information management system to perform its various information management tasks. As shown in
The storage manager 402 may be a software module or other application that coordinates and controls information management operations performed by one or more information management cells 350 to protect and control copies of non-production data objects and metadata. As shown by the dashed lines 416 and 422, the storage manager may communicate with some or all elements of the information management cell 350, such as the media agents 410 and computing devices 205, to initiate and manage backup operations, snapshot operations, archive operations, data replication operations, data migrations, data distributions, data recovery, and other information management operations. The storage manager may control additional information management operations including ILM, deduplication, content indexing, data classification, data mining or searching, e-discovery management, collaborative searching, encryption, and compression. Alternatively or additionally, a storage manager may control the creation and management of disaster recovery copies, which are often created as secondary, high-availability disk copies, using auxiliary copy or replication technologies.
The storage manager 402 may include a jobs agent 455, a management agent 450, a network agent 445, and an interface agent 460, all of which may be implemented as interconnected software modules or application programs. The jobs agent 455 monitors the status of information management operations previously performed, currently being performed, or scheduled to be performed by the information management cell 350. The management agent 450 provides an interface that allows various management agents 450 in multiple information management cells 350 (or in a global storage manager 305) to communicate with one another. This allows each information management cell 350 to exchange status information, routing information, capacity and utilization information, and information management operation instructions or policies with other cells. In general, the network agent 445 provides the storage manager 402 with the ability to communicate with other components within the information management cell and the larger information management system, e.g., via proprietary or non-proprietary network protocols and application programming interfaces (“APIs”) (including HTTP, HTTPS, FTP, REST, virtualization software APIs, cloud service provider APIs, hosted service provider APIs). The interface agent 460 includes information processing and display software, such as a graphical user interface (“GUI”), an API, or other interactive interface through which users and system processes can retrieve information about the status of information management operations or issue instructions to the information management cell and its constituent components. The storage manager 402 may also track information that permits it to select, designate, or otherwise identify content indices, deduplication databases, or similar databases within its information management cell (or another cell) to be searched in response to certain queries.
The storage manager 402 may also maintain information management data, such as a database 465 of management data and policies. The database 465 may include a management index that stores logical associations between components of the system, user preferences, user profiles (that among other things, map particular information management users to computing devices or hosted services), management tasks, or other useful data. The database 465 may also include various “information management policies,” which are generally data structures or other information sources that each include a set of criteria and rules associated with performing an information management operation. The criteria may be used to determine which rules apply to a particular data object, system component, or information management operation, an may include:
-
- frequency with which a production or non-production copy of a data object or metadata has been or is predicted to be used, accessed, or modified;
- access control lists or other security information;
- the sensitivity (e.g., confidentiality) of a data object as determined by its content and/or metadata;
- time-related factors;
- deduplication information;
- the computing device, hosted service, computing process, or user that created, modified, or accessed a production copy of a data object; and
- an estimated or historic usage or cost associated with different components.
The rules may specify, among other things:
-
- a schedule for performing information management operations,
- a location (or a class or quality of storage media) for storing a non-production copy,
- preferences regarding the encryption, compression, or deduplication of a non-production copy,
- resource allocation between different computing devices or other system components (e.g., bandwidth, storage capacity),
- whether and how to synchronize or otherwise distribute files or other data objects across multiple computing devices or hosted services,
- network pathways and components to utilize (e.g., to transfer data) during an information management operation, and
- retention policies (e.g., the length of time a non-production copy should be retained in a particular class of storage media).
As noted above, each computing device 205 may include one or more data management agents 280. Each data management agent is a software module or component that helps govern communications with other system components. For example, the data management agent receives commands from the storage manager 402 and sends to and receives from media agents 410 copies of data objects, metadata, and other payload (as indicated by the heavy arrows). Each data management agent accesses data and/or metadata stored in a production data storage medium 218 and arranges or packs the data and metadata in a certain format (e.g., backup or archive format) before it is transferred to another component. Each data management agent can also restore a production copy of a data object or metadata in a production data storage medium 218 from a non-production copy. A data management agent may perform some functions provided by a media agent, which are described further herein, such as compression, encryption, or deduplication. Each data management agent may be specialized for a particular application (e.g. a specified data management agent customized to handle data generated or used by Exchange by Microsoft Corp.). Alternatively or additionally, a more generic data management agent may handle data generated or used by two or more applications.
Each computing device 205 may also include a data distribution and live browsing client module 405 (herein “distribution client module”). The distribution client module 405 is responsible for, inter alia, associating mobile devices and/or hosted service accounts with users of the information management system, setting information management policies for mobile and other computing devices, pushing data objects to a distribution module for distribution to other computing devices, providing unified access to a user's data via an interface, and providing live browsing features. The various functions of the distribution client module are described in greater detail herein.
A media agent 410, which may be implemented as a software module, conveys data, as directed by the storage manager 402, between a computing device 205 (or hosted service 122) and one or more non-production storage mediums 155-170. Each media agent 410 may control one or more intermediary storage devices 418, such as a cloud server or a tape or magnetic disk library management system, to read, write, or otherwise manipulate data stored in a non-production storage medium 155-170. Each media agent 410 may be considered to be “associated with” a storage device and its related non-production storage media if that media agent is capable of routing data to and storing data in the storage media managed by the particular storage device. A media agent may communicate with computing devices 205, hosted services 122, storage devices 418A-D, and the storage manager 402 via any suitable communications path, including SCSI, a Storage Area Network (“SAN”), a Fibre Channel communications link, or a wired, wireless, or partially wired/wireless computer or telecommunications network, including the Internet.
To perform its functions, the media agent 410 may include a media file system module 425, a data classification module 435, a content indexing module 420, a deduplication module 430, an encryption module 475, a compression module 485, a network module 415, a distribution module 490, and a media agent database 440. The media file system module 425 is responsible for reading, writing, archiving, copying, migrating, restoring, accessing, moving, sparsifying, deleting, sanitizing, destroying, or otherwise performing file system operations on various non-production storage devices of disparate types. The media file system module may also instruct the storage device to use a robotic arm or other retrieval means to load or eject certain storage media such as a tape.
The network module 415 permits the media agent to communicate with other components within the system and hosted services 122 via one or more proprietary and/or non-proprietary network protocols or APIs (including cloud service provider APIs, virtual machine management APIs, and hosted service provider APIs). The deduplication module 430 performs deduplication of data objects and/or data blocks to reduce data redundancy in the cell. The deduplication module may generate and store data structures to manage deduplicated data objects, such as deduplication tables, in the media agent database 440. The encryption module 475 performs encryption of data objects, data blocks, or non-production objects to ensure data security in the cell. The compression module 485 performs compression of data objects, data blocks, or non-production objects to reduce the data capacity needed in the cell.
The content indexing module 420 analyzes the contents of production copies or non-production copies of data objects and/or their associated metadata and catalogues the results of this analysis, along with the storage locations of (or references to) the production or non-production copies, in a content index stored within a media agent database 440. The results may also be stored elsewhere in the system, e.g., in the storage manager 402, along with a non-production copy of the data objects, and/or an index cache. Such index data provides the media agent 410 or another device with an efficient mechanism for locating production copies and/or non-production copies of data objects that match particular criteria. The index data or other analyses of data objects or metadata may also be used by the data classification module 435 to associate data objects with classification identifiers (such as classification tags) in the media agent database 440 (or other indices) to facilitate information management policies and searches of stored data objects.
The distribution module 490 may be a set of instructions that coordinates the distribution of data objects and indices of data objects. The distribution may occur from one computing device 205 to another computing device 205 and/or from hosted services 122 to computing devices 205. As a first example, the distribution module may collect and manage data and metadata from hosted services 122 or mobile devices 205. As another example, the distribution module may synchronize data files or other data objects that are modified on one computing device so that the same modified files or objects are available on another computing device. As yet another example, the distribution module may distribute indices of data objects that originated from multiple computing devices and/or hosted services, so a user can access all of their data objects through a unified user interface or a native application on their computing device. The distribution module may also initiate “live browse” sessions to permit communications between different computing devices so that the devices can interchange data and metadata or so the devices can provide computing resources, such as applications, to each other. The functions performed by the distribution module are described in greater detail herein.
Storage Manager RestorationAs described, the storage manager database 465 includes information useful for operating an entire information management cell 350. Returning to the illustrative example of the jeans manufacturer that was introduced in the background, the following discussion initially assumes that the jeans manufacturer has implemented an information management cell 350, though the present system may be implemented in simpler (or more complex) data storage environments. In order to prevent the loss of access to non-production data stored in non-production storage media 155-170, the jeans manufacturer may apply one or more techniques to its information management cell 350. Of particular interest are techniques that do not require a secondary copy of the storage manager database to be created or maintained prior to the occurrence of a storage manager disaster, or where even the secondary copy is lost/unusable. In a first implementation, if the storage manager encounters a disaster, the jeans manufacturer replaces the old storage manager with a replacement storage manager that includes an installation of a restore agent. The restore agent enables the replacement storage manager to generate the storage manager database without using a secondary copy of the storage manager database. The restoration agent enables the replacement storage manager to generate the storage manager database by reading portions of the content stored by the non-production storage media. In a second implementation, the restoration agent enables the replacement storage manager to restore the storage manager database by gathering and combining information that may be stored in media agent databases 440. Systems and methods related to the first and second implementation are described in greater detail below, followed by additional advantageous uses of the restore agent.
In a third implementation, the jeans manufacturer uses the restore agent to consolidate multiple information management cells 350, i.e., multiple storage managers, into a single information management cell. In a fourth implementation, the jeans manufacturer uses the restore agent to identify, review, and or analyze non-production data that is stored on magnetic tapes, or other non-production storage media, that is associated with an obsolete or no longer operating information management system or cell. These additional advantageous implementations are discussed in more detail after the first and second implementations.
Unfortunately, the database 465 may become lost, corrupt, or otherwise inaccessible to the storage manager 402 through a variety of “disasters” (indicated by a large “X” in
According to various implementations, the information management cell 500 includes a replacement storage manager 505 that restores the information of the database 465 into a database 510 by using restore agent 515. Using the restore agent 515, the replacement storage manager 505 is able to restore the database 465 even without a secondary copy of the database 465 stored somewhere within the information management cell 500. As a result, in response to a loss of the database 465, the jeans manufacturer may communicatively connect or couple the replacement storage manager 505 to the other systems and components of the information management cell 500 to restore or rebuild the contents of the database 465 into the database 510, i.e., to build the database 510.
The jeans manufacturer can directly or indirectly connect the replacement storage manager 505 to the magnetic tapes 160, or other non-production storage media 155, 165, 170, to build the database 510. In a direct connection implementation, replacement storage manager 505 connects to the magnetic tapes 160 via connection path 520B, where connection path 520B is a subset of connection paths 520 (inclusive of paths 520A, 520B, 520C, 520D). In an indirect connection implementation, replacement storage manager 505 connects to the magnetic tapes 160 via connection path 525. Depending on the location of the magnetic tapes 160, or other non-production storage media, connection paths 520 and 525 may be intranet connections, Internet connections, or a combination of both.
The magnetic tapes 160 include multiple sets of magnetic tapes, such as magnetic tapes 160a-n, as part of a magnetic tape library 530. The magnetic tape library 530 may be located at the facilities of the jeans manufacturer, may be located at a data center, or may be located at a long-term storage facility, such as Iron Mountain. The magnetic tape library 530 includes a tape library management system, represented by intermediary storage devices 418B. Through the tape library management system, the replacement storage manager 505 and/or one or more media agents 410 access the magnetic tapes 160 of the magnetic tape library 530.
Each magnetic tape 160a-n of the magnetic tape library 530 stores one or more headers or other metadata to identify: the magnetic tape 160, the software and hardware sources of the non-production data, the version of the non-production data, the storage and/or retention policy associated with the non-production data, the media agent, the storage manager, the date and time of the creation of the non-production data, and file markers. Some of the headers or metadata are on-media labels (OML) which identify the magnetic tape 160 and distinguish one magnetic tape 160 from another.
To provide background information as to how the restore agent 515 operates, additional information regarding the headers and file markers on the magnetic tape 160 is now provided. Each used magnetic tape 160 includes headers and file markers, i.e. data/metadata that delineates and identifies stored data segments. The headers, file markers, and data segments include a main header 535, file markers 540 (inclusive of 540A, 540B, 540C), file trailer 545, and data segments 550 (inclusive of 550A, 550B).
The main header 535 identifies the magnetic tape 160 and includes information associated with the magnetic tape library 530. The main header 535 occupies a predetermined length or storage capacity at the beginning of each magnetic tape 160. The main header 535 may be an OML that is useful for identifying the magnetic tape 160. The main header 535 may include:
-
- an information management cell ID
- tape and/or volume ID;
- a slot ID that indicates a slot assignment within the magnetic tape library 530;
- a storage capacity of the tape;
- an available capacity of the tape;
- an index of other headers stored on the tape;
- storage manager identification;
- media agent identification;
- date and time of first and last use;
- which storage policy is being used for the data segments 550;
- the storage policy name;
- client ID, e.g., ID of computing device 205A;
- software agent ID;
- operating system of the client;
- location on the tape of each chunk header;
- location on the tape of each file trailer 545, e.g., chunk trailer; and
- information related to the encryption used on the content.
The main header 535 may include an encryption key, symmetrical or asymmetrical, stored in a predetermined location within the main header 535. The encryption key is stored within the main header 535 to enable decryption of the content of the magnetic tape 160 by a system or person with knowledge of the location of the encryption key within the main header 535. More details regarding magnetic tape headers, such as OML, are found in commonly-assigned U.S. patent application Ser. No. 10/663,383, entitled “System and Method For Blind Media Support,” filed Sep. 16, 2003, now U.S. Pat. No. 7,162,496, which is hereby incorporated by reference herein in its entirety. The tape library management system, media agent 410, and/or replacement storage manager 505 rewrite the main header 535 each time new information is stored on the magnetic tape 160 so the main header 535 is up-to-date and facilitates each subsequent read or write.
File markers 540 precede each data segment 550, and file trailer 545 succeeds each chunk 555, where each chunk comprises a grouping of the data segments 550. The file markers 540 are headers that indicate where on the tape each data segment 550 begins. The file markers 540 include date and time stamps, encryption information, and compression information and do not include all of the information included in the file trailers 545. The file trailer 545 is metadata that indicates where on the tape the chunk 555 ends. The file trailer 545 identifies information that cannot be included in the file markers 540 because the file trailer 545 identifies information that is written to the magnetic tape 160 after the file markers 540 are created. For example, the file trailer 545 includes identification of applications 215 (shown in
The chunk 555 includes one or more data segments 550 because the information management cell 500 enables a multiplexed transfer of information to the magnetic tapes 160. In other words, once one of the media agents 410 begins transferring information to the magnetic tape 160 for one of the computing devices 205, the media agent 410 or other media agents 410 will queue up information from other computing devices 205 for storage on the magnetic tape 160 within the same chunk. Returning briefly to the jeans manufacturer example, the computing device 205A may be an email server, e.g., an Exchange server, and the computing device 205B may be a materials orders database. While media agent 410A is storing a secondary copy of email messages from the computing device 205A, the media agent 410A may receive a secondary copy of a materials orders database from the computing device 205B. In response, the media agent 410A stores the secondary copy of the materials orders database as data segment 550B after or during the creation of data segment 550A. The chunk 555 is illustrated as including only two data segments 550, however, many more or less data segments 550 may be included in the chunk 555 from more or less than two data management agents and/or applications. The dynamic and multiplexing characteristics of data storage to the magnetic tape 160 underscores the importance of the content of the file trailer 545 because the file trailers 545 associated with all chunks 555 collectively index the contents of the magnetic tape 160.
Having described the information included on the magnetic tapes 160, a description of the restore agent 515 functionality will now be provided. In response to the failure of the storage manager 402 and/or the disaster of the database 465, the jeans manufacturer connects the replacement storage manager 505 to the information management cell 500. Connecting the replacement storage manager 505 to the information management cell 500 enables the restore agent 515 to build the database 510, i.e., restore the database 465, from the headers and metadata stored in the magnetic tapes 160. The restore agent 515 is a software module that may be installed on the replacement storage manager 505 at the same time as the other storage manager agents or may be installed independent of the installation of the other storage manager agents.
The restore agent 515 uses one or more of the storage manager agents, e.g., the network agent 445, the management agent 450, and the interface agent 460, to build the database 510. The interaction between the restore agent 515 and the other storage manager agents will be discussed in combination with the functionality of the restore agent 515.
Location module 605 determines where, within the information management cell 500, the non-production data is stored. The location module 605 automatically scans the information management cell 500 to locate and identify each of non-production storage media 155-170. The location module 605 uses the network agent 445 to locate and identify the non-production storage media 155-170 through direct connection paths 520. Through the direct connection paths 520, the location module 605, using the network agent 445, identifies the non-production storage media using their corresponding intermediary storage devices 418. The intermediary storage devices 418 may be tape, disk, or other library management systems. Through the indirect connection path 525, the location module 605, via the network agent 445, identifies each media agent 410 in the information management cell 500. The location module 605, via various API, can receive identification handles or network addresses of the various non-production storage media 155-170 from the media agents 410, by requesting that each media agent 410 report the desired information to the location module 605. Thus, the location module 605 enables the restore agent 515 to gather information from all non-production storage media 155-170 within the information management cell 500.
The interface module 610 enables a user to control the functions of the restore agent 515. The interface module 610 generates a graphical user interface (GUI) and includes one or more user interface objects, such as buttons, menus, dialog boxes, and the like. An example GUI is illustrated in
The fetch module 615 retrieves the headers and metadata from the nonproduction storage media 155-170 needed to build the database 510. The fetch module 615 uses a handle identifier and network address for the magnetic tapes 160. The fetch module 615 reads the main header 535, the file markers 540, and the file trailer 545. The fetch module 615 reads each of the magnetic tapes 160a-160n by transmitting instructions to the magnetic tape library 530 so that each of the magnetic tapes are systematically loaded and read. Importantly, the fetch module 615 does not read or retrieve the contents of data segments 550. Instead, the fetch module 615 retrieves less than all of the content stored on each of the magnetic tapes 160. Advantageously, because the fetch module 615 retrieves less than all of the content stored on each the magnetic tapes 160, the fetch module 615 is able to retrieve the information needed to build the database 510 without performing time and resource intensive operations, such as reading all the data or content indexing each magnetic tape 160. Rather, the fetch module 615 is able to quickly scan the contents of the magnetic tapes 160 for the main headers 535, the file markers 540, and/or file trailers 545 to obtain the information used for building the database 510.
The data stored on the magnetic tapes or other media can also include information identifying individual clients, and type of clients, such as information that a client is a SQL server. Indeed, each tape may store data will, different clients within an organization, such as the jeans manufacturer. The tape you can likewise data regarding when data from each client was copied onto that piece of media. Further, each tape can include information regarding the configuration parameters of each client. All of this information can then be used to re-create a table or other data structure representing a logical mapping or architecture for the system. The table can include granular information including data regarding each client, files/folders and other data copied from each client, and the date and time of each copy or backup job.
The mapper module 620 creates an interim table from the headers and other metadata retrieved from the non-production storage media 155-170. The mapper module 620 extracts content from the retrieved headers and metadata and organizes the information based on common characteristics. As example interim table is shown below as Table 1.
Table 1
Table 1 includes columns for device ID, media agent, tape ID, network address, storage type, and data segment ID. Table 1 may include many more or less columns than are shown. Table 1 is sorted by device ID but may be sorted by content of other columns as well. As shown, the table includes two entries for computing device 205A and one entry for computing device 205B. The first row of Table 1 illustrates examples of information that is extracted from headers and metadata stored on the magnetic tapes 160. The first row of Table 1 indicates that an entry for computing device 205A was found on magnetic tape 610C located at network address 10.108.1.123. The media agent 410A created data segment 550G as a full backup copy. Table 1 shows that data segments 550G and 550A for computing device 205A are stored on different magnetic tapes having different network addresses. Table 1 also shows that media agent 410A stored data segments 550A and 550B for computing devices 205A and 205B on the same magnetic tape 410A using different types of secondary copies. By extracting the information from the retrieved headers and metadata, the mapper module 620 enables the build module 630 to restore the database 465 by building the database 510.
The build module 630 creates and/or populates the database 510 based on the interim tables created by the mapper module 620. As new interim tables are created by the mapper module 620, the build module 630 transmits read and write commands to the database 510 to insert the rows of the interim tables into the database 510. The build module 630 may mark or delete the portions of the interim tables that have been inserted into the database 510 in order to free up or reallocate memory for use by the mapper module 620. The mapper module 620 and the build module 630 continue to build or update the database 510 until all identified or selected non-production storage media have been scanned.
Upon completion of building the database 510, the interface module 610 may update the user interface to indicate completion of the task. For example, the interface module 610 may cause a GUI to display a dialog box which reads, “The storage manager restoration is complete.” Once restored, the database 510 includes indexes of content of portions of, or all of, the non-production storage media 155-170 within the information management cell 500. Using the restored database 510, the replacement storage manager 505 may resume the operations of storage manager 402. In other words, because the jeans manufacturer connected the replacement storage manager 505 to the information management cell 500 and executed the restore agent 515, the jeans manufacturer can continue to review, schedule, and manage the creation and restoration of secondary copies of production data associated with the computing devices 205. Since the database 510 now mirrors at least some of the content of the database 465, all accounting records, HR records, materials orders databases, and any other information that was transferred to the non-production storage media 155-170 can be used to restore any one or more of the computing devices 205 to a particular point in time, in accordance with the storage policies associated with the stored versions of the non-production data.
Referring back to
The restore agent 515 can be used to provide beneficial services other than post-disaster restoration of a storage manager database. For example, the restore agent 515 can be used to replace multiple storage managers with a single storage manager as an information management cell consolidation process. The restore agent 515 can also be used for reviewing archived non-production data of an obsolete or non-operating information management cell. However, prior to discussing these alternative uses for the restore agent, methods for recovering a storage manager database are discussed. The methods provide further enablement for operating the restore agent 515.
At block 805, a user, such as the jeans manufacturer, installs a restore agent on a replacement storage manager. The user may install the restore agent using a number of techniques known to those of ordinary skill in the art. For example, the user may install the restore agent using a floppy-disk, a CD, a DVD, a USB drive, a network drive, may install the restore agent from the Internet as part of a purchased software package, or otherwise access the agent via network connections. Installing the restore agent on the replacement storage manager enables the restore agent to interact with other software agents or modules on the storage manager.
At block 810, the restore agent searches for the locations of all non-production data within an information management cell. The restore agent uses a location module to perform the search. The location module identifies locations of all non-production data by scanning every magnetic tape and magnetic tape library within the information management cell. The location module may also scan all other storage media in the information management cell to identify other storage media which include names, handles, network IDs, or other information that identify the storage media as being non-production storage media. The location module may also identify locations of all non-production data within the information management cell by reading a portion of every storage medium in the information management cell, e.g., the first 5 MB of information where a main header may be stored. Alternatively or additionally, the location module communicates with the media agents within the information management cell to request that each media agent identify all non-production storage media in use by or identified by the media agent.
The restore agent performs automated searches, manual searches, or a combination of automated searches and manual searches of the information management cell. In an automated search, the restore agent scans all storage media in the information management cell or searches all storage media identified by the media agents. In a manual search, the restore agent searches in storage media or in network locations of storage media that are identified by a user. In the combined automated and manual search, the restore agent searches through all storage media of a particular network location that is identified by a user.
At block 815, the restore agent retrieves a portion, but need not retrieve all, of the non-production storage media content. The restore agent retrieves the portion of the non-production storage media content that includes headers and other metadata that identifies characteristics of the non-production data and that identifies the location of the non-production data on the storage media. The restore agent uses the fetch module to retrieve the headers and metadata. The headers and metadata include main headers, OML, file markers, file trailers, tables of contents, allocation tables, and other file system data or metadata useful for identifying content or characteristics of the non-production data. The headers and other metadata provide information such as the type of secondary copy used on the non-production data, the ID of media agent that created the secondary copy, the ID of the storage manager of the information management cell, the ID of the magnetic tape or other non-production storage medium, storage capacity, the remaining storage capacity, start and stop markers for each file or segment of data, and the like. To retrieve this information, the fetch module may send commands and requests from the replacement storage manager via a network agent of the storage manager because the network agent provides a communication link with all other parts of the information management cell.
At block 820, the restore agent extracts content from the retrieved headers and metadata and organizes the extracted content in an interim table or other data structure. The restore agent uses a mapper module to extract and organize the extracted content. The mapper module creates and populates an interim table with a structure that is similar or the same as the database that is being restored. The interim table includes information that is associated with each data file or data segment stored on the magnetic tape. The interim table includes information such as an ID of the computing device associated with the non-production data, the media agent associated with the computing device, an ID of the magnetic tape or other non-production storage medium, the format of the secondary copy, etc.
At block 823, the restore agent checks the consistency of the metadata elements and synthesizes any missing attributes. For example, if a computing device name is missing from the metadata, the restore agent may parse or extract the computing device name from a file system directory for the computing device. Other techniques may be used to synthesize or fill in missing information, such as using a network identifier to request the information directly from the computing device for which the metadata is missing, or accessing a network accessible database of metadata (a metabase).
At block 825, the restore agent creates or populates the restore database based on the interim tables or data structures created by the mapper module. The restore agent uses a build module to update the restored database. The build module transmits read and write commands to the database to insert the rows of the interim tables into database.
At block 830, the restore agent determines if the fetch module has completed retrieving information from all identified or selected magnetic tapes. If the fetch module indicates that all of the identified or selected magnetic tapes have been scanned, the restore agent indicates that the restore operation is complete at block 835. If the fetch module indicates that additional magnetic tapes or other non-production media need to be scanned, the process returns to block 815.
As described, the restore agent can be used to create or restore a storage manager database that indexes or maps all secondary copies or non-production data stored in an information management cell. The restore agent can be used to benefit computing devices other than a storage manager, such as a media agent.
At block 905, a user installs a restore agent on the media agent. As discussed in the method 800, many techniques can be used to install the restore agent on the media agent, e.g., CD, DVD, USB drive, or the like.
At block 910, the restore agent identifies the storage manager from which the media agent database will be rebuilt. The restore agent identifies the storage manager automatically or with manual assistance from a user. If the restore agent identifies the storage manager automatically, the restore agent searches the information management cell for the storage manager that controls the cell. If the restore agent identifies the storage manager manually, a user enters a handle or other network ID for the storage manager, or the user browses through a GUI to identify the storage manger.
At block 915, the restore agent requests, from the storage manager, all of the storage manager database entries that are associated with the media agent sending the request. As part of the request, the restore agent sends an identifier of the media agent. The identifier of the media agent can be a network address within the information management cell or the identifier can be an alphanumeric code. If the media agent does not know its own identifier within the information management cell, the media agent transmits an identification of one or more of the computing devices to which the media agent is connected and responsible for managing. In response to the request from the restore agent of the media agent, the storage manager queries the storage manager database and transmits the pertinent portions of the storage manager database to the media agent.
At block 920, the restore agent receives the storage manager database entries associated with the requesting media agent. The restore agent subsequently creates and populates a media agent database based on the received storage manager database entries.
At block 925, the restore agent notifies the user that the media agent database has been restored. With the restored media agent database, the media agent verifies that each of the non-production storage media identified in the media agent database are accessible. The restore agent provides a warning or error notification if components within the information management cell conflict with the contents of the restored media agent database.
At block 1005, the restore agent identifies all media agents in the information management cell. The restore agent identifies all media agents in the information management cell by identifying all computing devices within the information management cell and by enabling a user to select or otherwise determine which of all the computing devices are configured as media agents. Alternatively, the restore agent identifies all media agents in the information management cell by, as an example, sending echo request packets to software network modules that operate on the media agents.
At block 1010, the restore agent requests copies of each media agent database in the information management cell. The request can be a request for all entries in the media agent database, a request for all active entries in the media agent database, or can be a request for all entries within the media agent database that have been modified or updated within a defined time frame, e.g., within the last 2 years. In response to the request, the media agent transmits all or part of the entries of the media agent database to the storage manager.
At block 1015, the restore agent merges and validates the received entries of each of the media agent databases to populate the restored storage manager database. While merging the received entries, the restore agent adds information to the entries to distinguish the entries of one media agent database from another media agent database. In one implementation, the restore agent adds a column to the media agent database entries that identifies the media agent that transmitted the media agent database entries. The restore agent may also validate the received entries by pinging, or otherwise sending an echo request, to the devices included with each entry.
Information Management Cell ConsolidationA replacement storage manager and a restore agent can be used to provide beneficial services other than disaster recovery of a database. For example, the replacement storage manager and restore agent can be used to replace multiple storage managers with a single storage manager to consolidate multiple information management cells into a single information management cell. In an economy where companies grow and shrink rapidly, infrastructure needed to support those companies also grows and shrinks rapidly. Returning to the example of the jeans manufacturer, while demand for jeans was high, the jeans manufacturer may have added significant numbers of employees to support production. The jeans manufacturer may have also expanded its information technology (IT) resources to accommodate the execution of storage and retention policies for the computing devices in use by the employees. However, with a downturn in economy, such as after the great recession of 2008, the jeans manufacturer may have significantly reduced its employment force, reduced the number of computing devices in operation for the business, and as a result may have a need to scale back its information management resources.
The replacement storage manager 1105 consolidates the operations of the storage managers 402 of the information management cells 350 by transferring the content of the storage manager databases to the replacement storage manager database 1110 (“the database 1110”). The replacement storage manager 1105 transfers the content of the storage manager databases of the information management cells 350 with the restore agent 1115.
The restore agent 1115 includes one or more of the software modules of the restore agent 515 (shown in
The disclosed consolidation of the multiple storage managers 402 by the replacement storage manager 1105 includes benefits for the computing environment 1100 that are not appreciated by a mere merger or consolidation of standard databases. The consolidation performed by replacement storage manager 1105 enables the computing environment 1100 to seamlessly continue performing information management policy operations during the transition from multiple information management cells to a single information management cell. Both employers and employees are regularly interrupted by IT department notifications stating that one or more important networks will be down for a specified or unspecified duration of time. Such notifications from IT departments are both unsettling and inconvenient. In the case where intentionally-downed computing devices or networks may result in the loss of information that is critical to the operations of the business, the ability for the replacement storage manager 1105 to enable continued information management policy operations, while seamlessly transitioning the storage managers 402 off-line, can be an important an invaluable feature of the system to businesses.
Although the computing environment 1100 illustrates the replacement storage manager 1105 replacing multiple storage managers 402, the replacement storage manager 1105 enables even further consolidation of computing devices within the computing environment 1100. A media agent and a storage manager can be installed on a single computing device and share resources for the operation of an information storage policy for a number of computing devices. In some implementations, where the storage needs of computing devices in an information management cell exceed the resource capacity of a single computing device, one or more media agents may be separated from a storage manager and installed on independent computing devices. If a company, such as the jeans manufacturer, has sufficiently reduced the number of computing devices in operation or has reduced the frequency or quantity or quality of secondary copies transferred to non-production storage media, the functions of all of the media agents 410 of all of the information management cells 350 may be consolidated into the replacement storage manager 1105 using the techniques described above. In particular, the restore agent 1115 updates the database 1110 with the media agent database entries of the media agents 410 in addition to the storage manager database entries of the storage managers 402. The software modules associated with the media agents 410 are also installed onto the replacement storage manager 1105 to enable the replacement manager 1105 to perform the various operations of the media agent, such as deduplication, encryption, and compression of production data received from computing devices 205.
At block 1205, the replacement storage manager identifies all of the storage managers to be consolidated. The replacement storage manager may automatically consolidate all other identified information management cells, or may receive instructions from a user, e.g., via a user interface. The system may also automatically access a table of accessible storage managers, or automatically crawl the network to locate such storage managers.
At block 1210, the replacement storage manager requests copies of each storage manager database from each information management cell that has been identified for consolidation. The replacement storage manager includes a management agent which provides an interface to enable various management agents in multiple information management cells (or in a global storage manager) to communicate with one another. This allows each information management cell to exchange status information, routing information, capacity and utilization information, and information management operation instructions or policies with other cells. The replacement storage manager user the management agent, or other inter-cell communication technique to request and receive copies of the storage manager databases from the identified storage managers.
At block 1215, the replacement storage manager aggregates the metadata of each of the received databases into a single replacement storage manager database. The replacement storage manager may test its connections with each computing device and storage device listed in the consolidated database using, for example, a network agent. The replacement storage manager may notify an administrator of unresponsive devices and may remove obsolete or unresponsive devices from the database. Once connections to the computing device of the consolidated information management cell have been tested, the replacement storage manager notifies an administrator that the consolidated storage managers may be removed from the system.
As a result, a smaller, more energy efficient, and more cost efficient system replaces its conglomerate predecessor. However, if needed, a reverse process can be used to divide the replacement information management system into multiple information management systems by selectively copying portions of the replacement storage manager database over to newly added storage managers.
Information Management for Obsolete SystemsA replacement storage manager and restore agent can be used to provide beneficial services in addition to disaster recovery of a database and information management cell consolidation. A replacement storage manager and restore agent can also be used in a system for accessing and reviewing archived non-production data of an obsolete or no-longer operating information management cell. There are many useful scenarios and applications for a system that accesses and reviews archived non-production data. In a first scenario, the jeans manufacturer may have acquired a small textile business having years of records stored in boxes of magnetic tapes. A system that accesses and reviews archived non-production data enables the jeans manufacturer to review the content of the magnetic tapes and quickly determine what type of content stored on the magnetic tapes, without reviewing all of the information stored on the magnetic tapes. In a second scenario, a court, attorney, bank, or trustee may have a need to review records from a company that is being liquidated after bankruptcy. The company may have lost its IT support and may only have records that are in a magnetic tape library or that are in boxes of magnetic tapes. Rather than restoring all of the data of all of the magnetic tapes, the court, attorney, or trustee may use the disclosed restore agent to identify the locations of email messages, accounting records, or other information associated with a particular date or time frame.
Referring to
The restore agent 1320 operates in a manner similar to the previously disclosed restore agents. Namely, the restore agent 1320 scans the magnetic tapes that are identified by a user for metadata that includes characteristics of the magnetic tapes and the non-production data stored on the magnetic tapes. The restore agent 1320, in response to selections made by the user, builds the database 1335 or the database 1340 using the information retrieved from the scans and enables the user to review the dates, type of content, sizes, and other information related to the non-production data stored on the magnetic tapes. The restore agent 1320 enables the user, via the computing device 1305, to restore very specific information from the magnetic tapes, by displaying the entries of the database 1335 or 1340 and by instructing the replacement media agent 1315 to restore selected database entries to memory in the computing device 1305. For example, using the computing device 1305, a user can inventory the magnetic tapes 1345 of the box of magnetic tapes 1350 and restore all email messages backed up on, for example, Mar. 19, 1999 for further review or analysis. In this way, the system in this implementation is able to employ a replacement storage manager and restore agent to access and review archived non-production data, from, e.g. an obsolete or no-longer operating information management cell.
CONCLUSIONSystems and modules described herein may comprise software, firmware, hardware, or any combination(s) of software, firmware, or hardware suitable for the purposes described herein. Software and other modules may reside on servers, workstations, personal computers, computerized tablets, PDAs, smart phones, and other devices suitable for the purposes described herein. Modules described herein may be executed by a general-purpose computer, e.g., a server computer, wireless device, or personal computer. Those skilled in the relevant art will appreciate that aspects of the invention can be practiced with other communications, data processing, or computer system configurations, including: Internet appliances, hand-held devices (including personal digital assistants (PDAs)), wearable computers, all manner of cellular or mobile phones, multi-processor systems, microprocessor-based or programmable consumer electronics, set-top boxes, network PCs, mini-computers, mainframe computers, and the like. Indeed, the terms “computer,” “server,” “host,” “host system,” and the like, are generally used interchangeably herein and refer to any of the above devices and systems, as well as any data processor. Furthermore, aspects of the invention can be embodied in a special purpose computer or data processor that is specifically programmed, configured, or constructed to perform one or more of the computer-executable instructions explained in detail herein.
Software and other modules may be accessible via local memory, a network, a browser, or other application in an ASP context, or via another means suitable for the purposes described herein. Examples of the technology can also be practiced in distributed computing environments where tasks or modules are performed by remote processing devices, which are linked through a communications network, such as a Local Area Network (LAN), Wide Area Network (WAN), or the Internet. In a distributed computing environment, program modules may be located in both local and remote memory storage devices. Data structures described herein may comprise computer files, variables, programming arrays, programming structures, or any electronic information storage schemes or methods, or any combinations thereof, suitable for the purposes described herein. User interface elements described herein may comprise elements from graphical user interfaces, command line interfaces, and other interfaces suitable for the purposes described herein.
Examples of the technology may be stored or distributed on computer-readable media, including magnetically or optically readable computer disks, hard-wired or preprogrammed chips (e.g., EEPROM semiconductor chips), nanotechnology memory, biological memory, or other data storage media. Indeed, computer-implemented instructions, data structures, screen displays, and other data under aspects of the invention may be distributed over the Internet or over other networks (including wireless networks), on a propagated signal on a propagation medium (e.g., an electromagnetic wave(s), a sound wave, etc.) over a period of time, or they may be provided on any analog or digital network (packet switched, circuit switched, or other scheme).
Unless the context clearly requires otherwise, throughout the description and the claims, the words “comprise,” “comprising,” and the like are to be construed in an inclusive sense, as opposed to an exclusive or exhaustive sense; that is to say, in the sense of “including, but not limited to.” As used herein, the terms “connected,” “coupled,” or any variant thereof, means any connection or coupling, either direct or indirect, between two or more elements; the coupling or connection between the elements can be physical, logical, or a combination thereof. Additionally, the words “herein,” “above,” “below,” and words of similar import, when used in this application, refer to this application as a whole and not to any particular portions of this application. Where the context permits, words in the above Detailed Description using the singular or plural number may also include the plural or singular number respectively. The word “or,” in reference to a list of two or more items, covers all of the following interpretations of the word: any of the items in the list, all of the items in the list, and any combination of the items in the list.
The above Detailed Description is not intended to be exhaustive or to limit the invention to the precise form disclosed above. While specific examples for the invention are described above for illustrative purposes, various equivalent modifications are possible within the scope of the invention, as those skilled in the relevant art will recognize. For example, while processes or blocks are presented in a given order, alternative implementations may perform routines having steps, or employ systems having blocks, in a different order, and some processes or blocks may be deleted, moved, added, subdivided, combined, and/or modified to provide alternative or subcombinations. Each of these processes or blocks may be implemented in a variety of different ways. Also, while processes or blocks are at times shown as being performed in series, these processes or blocks may instead be performed or implemented in parallel, or may be performed at different times. Further any specific numbers noted herein are only examples: alternative implementations may employ differing values or ranges.
The teachings of the invention provided herein can be applied to other systems, not necessarily the systems described herein. The elements and acts of the various examples described above can be combined to provide further implementations of the invention.
Any patents and applications and other references noted above, including any that may be listed in accompanying filing papers, are incorporated herein by reference. Aspects of the invention can be modified, if necessary, to employ the systems, functions, and concepts of the various references described above to provide yet further implementations of the invention.
These and other changes can be made to the invention in light of the above Detailed Description. While the above description describes certain examples of the invention and describes the best mode contemplated, no matter how detailed the above appears in text, the invention can be practiced in many ways. Details of the system may vary considerably in its specific implementation, while still being encompassed by the invention disclosed herein. As noted above, particular terminology used when describing certain features or aspects of the invention should not be taken to imply that the terminology is being redefined herein to be restricted to any specific characteristics, features, or aspects of the invention with which that terminology is associated. In general, the terms used in the following claims should not be construed to limit the invention to the specific examples disclosed in the specification, unless the above Detailed Description section explicitly defines such terms. Accordingly, the actual scope of the invention encompasses not only the disclosed examples, but also all equivalent ways of practicing or implementing the invention under the claims.
While certain examples are presented below in certain forms, the applicant contemplates the various aspects of the invention in any number of claim forms. Accordingly, the applicant reserves the right to add additional claims after filing the application to pursue such additional claim forms for other aspects of the invention.
Claims
1. A method for constructing an information management database that stores metadata for an information management network, the method comprising:
- receiving instructions to identify non-production storage media, wherein the non-production storage media comprise secondary copies of application data generated by client computing devices in an information management network, wherein an information management server provides information management services to client computing devices within the information management network;
- identifying non-production storage media within the information management network in accordance with the received instructions;
- scanning the identified non-production storage media to identify metadata associated with the secondary copies of the application data;
- retrieving the identified metadata from the non-production storage media without retrieving the secondary copies of the application data; and
- building an information management database by populating the information management database with the retrieved metadata from the non-production storage media, wherein the information management database comprises metadata identifying files stored by the information management server and the client computing devices associated with those files, and wherein the retrieved metadata maps the client computing devices to the secondary copies of the application data contained by the non-production storage media.
2. The method of claim 1, wherein the metadata comprises headers at beginning of a file within at least one of the secondary copies of the application data, trailing file markers appended to end of the file, or both headers and trailing file markers.
3. The method of claim 1, wherein the information management network is a first network that is a subnetwork of a second network.
4. The method of claim 1, wherein the information management network is a private network,
- wherein the client computing devices have private IP addresses so that computing devices that are not connected to the private network cannot access the client computing devices without logging into the private network, and
- wherein at least some of the client computing devices are associated with a storage management cell that is hierarchically subordinate to the information management server.
5. The method of claim 1, further comprising:
- providing a user interface, with the information management server, to enable a user to provide instructions associated with identifying the non-production storage media;
- receiving, via the user interface, the instructions associated with identifying the non-production storage media,
- wherein the instructions indicate whether the information management server performs an automated search of the information management network or a manual search of the information management network,
- wherein when the instructions indicate a manual search, the instructions further include one or more of a network address for at least one of the non-production storage media and an identifier of at least one of the non-production storage media; and
- displaying at least some of the non-production storage media identified by the information management server,
- wherein displaying comprises displaying a device ID for the at least some of the non-production storage media,
- wherein the user interface receives selections from the user of one or more of the displayed non-production storage media, and
- wherein scanning the identified non-production storage media comprises scanning the selections and not from non-production storage media not selected via the user interface.
6. The method of claim 1, further comprising:
- displaying at least some of the non-production storage media identified by the information management server,
- wherein displaying comprises displaying a device ID for the at least some of the non-production storage media,
- wherein a user interface receives selections from a user of one or more of the non-production storage media,
- wherein scanning the identified non-production storage media comprises scanning the non-production storage media and not from non-production storage media that has not been selected via the user interface.
7. The method of claim 1, wherein the information management server is a first information management server that receives information management instructions associated with the client computing devices from a second information management server.
8. The method of claim 7, wherein the first information management server created at least some of the secondary copies of application data on at least some of the identified non-production storage media.
9. The method of claim 1, wherein, for the non-production storage media that are magnetic tapes, scanning comprises:
- reading, into temporary memory of the information management server, portions of content on the non-production storage media; and,
- writing over the portions of the content in the temporary memory with additional portions of the content until metadata is found.
10. A system for constructing an information management database that stores metadata for an information management network, the system comprising:
- at least one data processor; and
- at least one non-transitory memory, coupled to the at least one data processor and storing instructions, which when executed by the at least one data processor, perform a method comprising: receiving instructions to identify non-production storage media, wherein the non-production storage media comprise secondary copies of application data generated by client computing devices in an information management network, wherein an information management server provides information management services to client computing devices within the information management network; identifying non-production storage media within the information management network in accordance with the received instructions; scanning the identified non-production storage media to identify metadata associated with the secondary copies of the application data; retrieving the identified metadata from the non-production storage media without retrieving the secondary copies of the application data; and building an information management database by populating the information management database with the retrieved metadata from the non-production storage media, wherein the information management database comprises metadata identifying files stored by the information management server and the client computing devices associated with those files, and wherein the retrieved metadata maps the client computing devices to the secondary copies of the application data contained by the non-production storage media.
11. The system of claim 10, wherein the metadata comprises headers at beginning of a file within at least one of the secondary copies of the application data, trailing file markers appended to end of the file, or both headers and trailing file markers.
12. The system of claim 10, wherein the metadata comprises information regarding files recorded in an original information management database, and information regarding the client computing devices associated with each of the files.
13. The system of claim 10, wherein the information management network is a private network,
- wherein the client computing devices have private IP addresses so that computing devices that are not connected to the private network cannot access the client computing devices without logging into the private network, and
- wherein at least some of the client computing devices are associated with a storage management cell that is hierarchically subordinate to the information management server.
14. The system of claim 10, wherein the method further comprises:
- providing a user interface, with the information management server, to enable a user to provide instructions associated with identifying the non-production storage media;
- receiving, via the user interface, the instructions associated with identifying the non-production storage media, wherein the instructions indicate whether the information management server performs an automated search of the information management network or a manual search of the information management network, wherein when the instructions indicate a manual search, the instructions further include one or more of a network address for at least one of the non-production storage media and an identifier of at least one of the non-production storage media; and
- displaying at least some of the non-production storage media identified by the information management server, wherein displaying comprises displaying a device ID for the at least some of the non-production storage media, wherein a user interface receives selections from the user of one or more of the displayed non-production storage media, and wherein scanning the identified non-production storage media comprises scanning the selections of the non-production storage media and not from non-production storage media not selected via the user interface.
15. The system of claim 10, wherein the method further comprises:
- displaying at least some of the non-production storage media identified by the information management server,
- wherein displaying comprises displaying a device ID for the at least some of the non-production storage media,
- wherein the user interface receives selections from a user of one or more of the displayed ones of the non-production storage media,
- wherein scanning the identified non-production storage media comprises scanning the selections of the non-production storage media and not from non-production storage media that has not been selected via the user interface.
16. The system of claim 10, wherein the information management network is a first network that is a subnetwork of a second network.
17. The system of claim 10, wherein the information management server is a first information management server that receives information management instructions associated with the client computing devices from a second information management server
18. The system of claim 17,
- wherein the first information management server created at least some of the secondary copies of application data on at least some of the identified non-production storage media.
19. The system of claim 10, wherein, for each of the non-production storage media that are magnetic tapes, scanning comprises:
- reading, into temporary memory of the information management server, portions of content on the non-production storage media; and,
- writing over the portions of the content in the temporary memory with additional portions of the content until metadata is found.
Type: Application
Filed: Dec 28, 2021
Publication Date: Aug 18, 2022
Inventor: Manoj Kumar Vijayan (Marlboro, NJ)
Application Number: 17/564,066