Distributed storage resource management in a storage area network
A method and system for managing storage resources associated with a network having at least one storage resource coupled to at least one server and at least one client over at least one data path. The method and system includes servers managing the storage resource over the data path, and clients directing I/O requests to the storage resources and redirecting. I/O requests to the servers upon the detection of a failure condition.
This invention relates to storage resource management in a computer network, and more particularly to distributed storage management in a storage area network (SAN).
BACKGROUNDThe emergence of fibre channel as a networking technology designed specifically for storage resources has been a primary impetus in the ongoing development of SAN technology in enterprise computing environments. These technologies, coupled with the changing needs of users, are causing the demand for storage to accelerate. Consequently, this has increased the basic requirement of managing, storing, and accessing storage resources in a SAN.
A SAN interconnects different kinds of storage resources with associated data servers on behalf of a larger network of users represented by client computers. Typically, the SAN uses fibre channel technology to facilitate high bandwidth communication between the storage resources and the data servers. The storage resources are usually implemented using physical data storage configurations such as Redundant Arrays of Inexpensive Disks (RAID), simple disk arrays, and complex disk subsystems. The data servers manage the storage resources using a traditional volume manager comprising a data access portion and a data management portion. The data management portion is responsible for managing the physical data storage devices including abstracting the physical device and presenting to the client computer user a logical unit of storage called a volume. The data management portion also is responsible for backup and restore, data migration from one storage device to another, and the sharing of data. In contrast, the data access portion of the volume manager is responsible for converting the logical data requests issued by the clients into data transfer operations directed to the physical storage corresponding to the logical device. Once the physical data blocks corresponding to the requested logical blocks have been retrieved, the server handles the data transfer over the fibre channel and delivers the blocks to the client computer.
However, sharing storage resources using a SAN infrastructure is currently limited. A typical SAN may interconnect to other computer systems including other networks and servers. While these interconnections allow these systems to share data, it could also lead to not only the possibility of data corruption but also to an increase in the complexity of managing these storage resources. System administrators responsible for managing the SAN and its storage resources are faced with a time consuming and costly management task.
One solution involves the use of zoning, in which a fibre channel switch is placed between storage resources and a computer system. The switch is programmed to grant to the computer system access to the storage resource that has been configured for the port. However, this solution is severely limited because in a large “fabric” effective zoning may require the programming of several layers of switches to represent the correct grouping, which can be difficult and prone to error. Moreover, if it becomes necessary to rearrange the cables associated with the fibre channel, this can impact the current access of storage to other computer systems, because the port numbers can change.
Another solution might include placing an intermediate computer between the storage resource and the other computer systems to mediate access to the storage. The intermediate computer intercepts all input/output (I/O) requests flowing to the disks and routes the requests as required. The intermediate computer must be capable of storing and forwarding the requests. To avoid a loss in performance, the intermediate computer must have twice the bandwidth of the incoming fibre channel connection. However, in a multi-port storage topology, the bandwidth requirement increases dramatically, leading to an increase in cost. The intermediate computer does alleviate the management problem by providing the system administrator with a single management console for zoning and virtual volume management. Although the intermediate computer provides attractive management capabilities, it lacks scalability and is costly to implement.
In many enterprise computer environments, the storage resource typically is implemented using different levels of RAID. Although RAID configurations provide improved I/O performance and/or reliability, management can be complex. For example, if an enterprise is running heterogeneous host computer systems, then a system administrator must deal with multiple management interfaces. The RAID volume may need to be modified if any of the components of the RAID have failed or if the administrator has changed the configuration. To avoid downtime when modifying the RAID configuration, it must be rebuilt while online which may impact the I/O performance of the running host computer system and client systems. In light of the foregoing, a SAN infrastructure that is able to share storage resources by distributing the volume management functions between server computers responsible for data management and client computers responsible for data access would be an improvement in the art.
SUMMARYIn a first aspect, the invention provides a method of managing storage resources associated with a computer network. The method includes managing storage resources associated with a network having at least one storage resource coupled to at least one server and at least one client over at least one data path, wherein the client directs I/O requests to the storage resources and redirects I/O requests to the server upon the detection of a failure condition.
In one implementation, the method includes the communication of volume information associated with the storage resource to a client based on the results of the authentication. In yet another implementation, the method includes the allocation of storage space from the storage resource in response to a client request, and the communication of volume information associated with the allocated space to the requesting client. In another implementation, the method comprises the allocation of a new storage space from the storage resource in response to a receipt of a failure condition, wherein the new storage space includes a new virtual disk associated with a new physical storage resource; the initiation of the recovery of the contents associated with the failure condition in cooperation with the new storage space; and the communication of a recovery status to the client, wherein the client and the server continue the recovery based on the recovery status. The method also comprises changing the volume configuration corresponding to the storage resource; committing the changes to the changed configuration during which time the client is excluded from accessing the storage resource; and communicating the new state of the configuration to the client. In another implementation, the method comprises providing a copy of unmodified data blocks before modifying the data blocks; communicating a list of the modified data blocks to a backup process residing on the server, wherein the backup process uses a pseudo-device to read the unmodified and modified data blocks.
In a second aspect, the invention provides a distributed shared resource management system. This system includes at least one storage resource coupled to at least one server and at least one client over at least one data path, wherein the server manages the storage resource over the data path, and the client directs I/O requests to the storage resource and redirects the I/O requests to the server upon the detection of a failure condition. This system is adapted to operate according to the method described above.
In a third aspect, the invention provides an article comprising a computer-readable medium that stores computer executable instructions for controlling a computer in a distributed shared storage resource management system in which system comprises at least one storage resource coupled to at least one server and at least one client over at least one data path, wherein the computer executable instructions cause system to operate according to the methods described above.
With the methods and systems for managing storage resources associated with a storage network disclosed in the present invention, the management of volumes may be advantageously centralized using a common management interface. In addition, storage resources can be shared in a secure environment without requiring an intermediate node. Furthermore, the more complex volume management functions associated with managing the storage resources may be allocated to at least one server, which relieves the client system from performing these tasks. Moreover, client systems may now concentrate on accessing data from the storage resources and offload the recovery process onto the servers which are capable of performing this process efficiently.
The details of various embodiments of the invention including certain preferred embodiments are set forth in the accompanying drawings and the description below. Other features, objects, and advantages of the invention will be apparent from the description, drawings, and claims.
DESCRIPTION OF DRAWINGS
Like reference symbols in the various drawings indicate like elements.
DETAILED DESCRIPTION
The client computers 12, 13, 14 view the data on the storage resources 19, 20, 21 as a logical representation of data called a volume where each volume contains logical data units called data blocks. The client systems 12, 13, 14 access the volumes containing logical data blocks without knowledge of the underlying structure of the storage resource. The client systems 12, 13, 14 access data blocks directly from the storage resource 19, 20, 21 over the fibre channel 18 coupled to the SAN 17. Embodiments of the client systems 12, 13, 14 may include, but is not limited to, personal computers (PC), workstations, portable handheld computing devices such as a personal digital assistant (PDA), and other types of electronic devices adapted with a processor. Additional non-limiting embodiments also include a low-cost PC designed for Internet access and specialized business purposes called a network appliance, but which does not have the full capabilities of PCs.
The VMC 15 is running a volume manager (VM) 15A which is a module responsible for managing the storage resources 19, 20, 21. Although the VMC 15 is shown as a single server, in a preferred embodiment, the VMC 15 may be implemented as a cluster of servers. In the cluster arrangement, the VM 15A module may be duplicated on each of the servers across the cluster. So if one server in the cluster fails, the VM 15A is still accessible from an operational server. Embodiments of servers that can be deployed as a VMC include high-end enterprise host computers, special purpose computers such as data-warehousing data servers, or other types of host computers.
In a preferred embodiment, the storage resources 19, 20, 21 can comprise different levels of RAID. For example, in a RAID 0 configuration, the data on a storage device is distributed across several devices but does not provide redundant information. This technique is called “striping” and improves I/O performance, but lacks reliability. In contrast, under a RAID 1 implementation, the data is “mirrored” or duplicated onto other storage devices. In this technique the data is duplicated, thus increasing reliability since data can be recovered if a disk fails. On the other hand, in a RAID 4 implementation, the data is distributed in a similar fashion to RAID 0 but redundant information is stored onto a dedicated parity disk. The parity information allows a RAID 4 subsystem to recover data after a single disk failure. RAID 5 is similar to RAID 4 except the redundant information is interspersed with user data across all the disks. In other embodiments, the storage resources may be configured to include disk subsystems containing tape, CD-ROM, removable storage, optical drives, or other types of storage resource devices.
The “mirroring” portion of the “RAID X/mirroring-layer” 58 manages the task of reading from and writing to multiple storage resources to access the same data. As discussed above, the storage resources can be arranged according to any one of the conventional RAID configurations known in the art. The selection of a particular configuration depends on several factors including cost, reliability, and performance tradeoffs. The “RAID X” portion of layer 58 is responsible for handling storage resources configured in a RAID arrangement. The ‘X’ in “RAID X” refers to the different possible RAID levels that can be implemented. For example, in a RAID 4 configuration, parity information associated with the data is stored in the storage resources for reliability purposes.
In a preferred embodiment, a CVM framework should contain all five layers, however, the minimum CVM configuration requires the presence of the client-management-layer 48 and the LUN-management-layer 50. The CVM provides the client with the flexibility of selecting the layers that are necessary for the specific application environment. For example, if reliability were critical to the application, then layer 58 would be necessary to include in the CVM. On the other hand, layer 56 might be included if data performance is an issue. Alternatively, both layer 56 and 58 may be incorporated in the CVM framework if both high reliability and increased I/O performance are required by the application as a whole.
For example, the LUN-management-layer informs the concatenation-layer of the CVM of the new volume information. In turn, the concatenation-layer makes the storage resources appear as one logical device to the client based on the new volume information. The client system processes the new storage space without needing to know the details of the underlying storage space. For example, if the client system is running Windows-NT®, then the new logical device is now visible under the Disk Administrator, however, the physical disk resources are still hidden from the client system. Other behavior appropriate to each client OS may occur.
Certain embodiments according to the invention have been described. Nevertheless, it will be understood that various modifications may be made without departing from the spirit and scope of the invention. For example, the backup method can be configured to create several snapshots of the storage resources and then allow the backup applications in the server to process the backups in parallel. Accordingly, other embodiments are within the scope of the following claims.
Claims
1. A method of managing storage resources associated with a network having at least one storage resource coupled to at least one server and at least one client over at least one data path, wherein said server manages said storage resources over said data path, and wherein said client receives a description of the storage resources from the at least one server, said client directs I/O requests directly to said storage resources using the description of the storage resources and redirects I/O requests to said server upon the detection of a failure condition.
2. The method of claim 1 which further includes:
- authentication of said client; and
- communication of volume information associated with said storage resource to said client based on the results of said authentication.
3. The method of claim 1 which further includes:
- allocation of storage space from said storage resource in response to a client request; and
- communication of volume information associated with said allocated space to said client.
4. (canceled)
5. (canceled)
6. The method of claim 1 which further includes:
- changing the volume configuration corresponding to said storage resource;
- committing the changes to said changed configuration, during which time said client is excluded from accessing said storage resource; and
- communicating the new state of said configuration to said client.
7. (canceled)
8. The method of claim 1 which further includes:
- communication between said clients and said servers over at least a second data path.
9. A distributed shared storage resource management system comprising:
- at least one storage resource coupled to at least one server and at least one client over at least one data path, wherein said server manages said storage resource over said data path, and said client receives a description of the storage resources from the at least one server, said client directs I/O requests directly to said storage resources using the description of the storage resources and redirects I/O requests to said server upon the detection of a failure condition.
10. The system of claim 9 wherein said server is configured to:
- authenticate each client; and
- communicate volume information associated with said storage resource to said client based on the results of said authentication.
11. The system of claim 9 wherein said server is configured to:
- allocate space from said storage resource in response to a request from a client; and communicate volume information associated with said allocated space to said client.
12. (canceled)
13. (canceled)
14. The system of claim 9 wherein said server is configured to:
- change volume configuration associated with said storage resource;
- commit the changes to said changed configuration during which time said client is excluded from accessing said storage resource; and
- communicate the new state of said configuration to said client.
15. (canceled)
16. The system of claim 9 further includes:
- at least a second data path configured to allow communication between said client and said server.
17. An article comprising a computer-readable medium that stores computer executable instructions for causing a computer in a distributed shared storage resource management system which comprises at least one storage resource coupled to at least one server and at least one client over at least one data path, wherein said computer executable instructions cause said server to manage said storage resource over said data path, and said client receives a description of the storage resources from the at least one server, said client directs I/O requests directly to said storage resources using the description of the storage resources and redirects 110 requests to said server upon the detection of a failure condition.
18. The article of claim 17 further includes instructions to:
- authenticate each client; and
- communicate volume information associated with said storage resource to said client based on the results of said authentication.
19. The article of claim 17 further comprising instructions to:
- allocate space from said storage resource in response to a request from a client; and communicate volume information associated with said allocated space to said client.
20. (canceled)
21. (canceled)
22. The article of claim 17 further comprising instructions to:
- change configuration associated with said storage resource;
- commit said changes to said changed configuration during which time said client is excluded from accessing said storage resource; and
- communicate the new state of the changed configuration to said client.
23. (canceled).
24. The article of claim 17 further comprising instructions to:
- provide least a second data path to facilitate communication between said client and said server.
25. A method of managing storage resources associated with a network having at least one storage resource coupled to at least one server and at least one client over at least one data path, wherein said server manages said storage resources over said data path, and wherein said client receives a description of the storage resources from the at least one server, said client directs I/O requests directly to said storage resources using the description of the storage resources and redirects I/O requests to said server upon the detection of a failure condition, wherein said method comprising:
- changing the volume configuration corresponding to said storage resource;
- committing the changes to said changed configuration, during which time said client is excluded from accessing said storage resource; and
- communicating the new state of said configuration to said client.
Type: Application
Filed: Nov 12, 2004
Publication Date: Apr 28, 2005
Inventors: Gordon Harris (Somerset, NJ), Stephen Rago (Berkeley Heights, NJ), Timothy Williams (Spring Lake, NJ)
Application Number: 10/987,389