External encapsulation of a volume into a LUN to allow booting and installation on a complex volume

-

A system for external encapsulation of a volume into a logical unit (LUN) to allow booting and installation on a complex volume may include a host, one or more physical storage devices, and an off-host virtualizer. The off-host virtualizer (i.e., a device external to the host, capable of providing block virtualization functionality) may be configured to aggregate storage within the one or more physical storage devices into a logical volume and to generate metadata to emulate the logical volume as a bootable target device. The off-host virtualizer may make the metadata accessible to the host, allowing the host to boot off a file system resident in the logical volume.

Skip to: Description  ·  Claims  · Patent History  ·  Patent History
Description

This application is a continuation-in-part of U.S. patent application Ser. No. 10/722,614, entitled “SYSTEM AND METHOD FOR EMULATING OPERATING SYSTEM METADATA TO PROVIDE CROSS-PLATFORM ACCESS TO STORAGE VOLUMES”, filed Nov. 26, 2003.

BACKGROUND OF THE INVENTION

1. Field of the Invention

This invention relates to computer systems and, more particularly, to off-host virtualization of bootable devices within storage environments.

2. Description of the Related Art

Many business organizations and governmental entities rely upon applications that access large amounts of data, often exceeding a terabyte of data, for mission-critical applications. Often such data is stored on many different storage devices, which may be heterogeneous in nature, including many different types of devices from many different manufacturers.

Configuring individual applications that consume data, or application server systems that host such applications, to recognize and directly interact with each different storage device that may possibly be encountered in a heterogeneous storage environment would be increasingly difficult as the environment scaled in size and complexity. Therefore, in some storage environments, specialized storage management software and hardware may be used to provide a more uniform storage model to storage consumers. Such software and hardware may also be configured to present physical storage devices as virtual storage devices (e.g., virtual SCSI disks) to computer hosts, and to add storage features not present in individual storage devices to the storage model. For example, features to increase fault tolerance, such as data mirroring, snapshot/fixed image creation, or data parity, as well as features to increase data access performance, such as disk striping, may be implemented in the storage model via hardware or software. The added storage features may be referred to as storage virtualization features, and the software and/or hardware providing the virtual storage devices and the added storage features may be termed “virtualizers” or “virtualization controllers”. Virtualization may be performed within computer hosts, such as within a volume manager layer of a storage software stack at the host, and/or in devices external to the host, such as virtualization switches or virtualization appliances. Such external devices providing virtualization may be termed “off-host” virtualizers, and may be utilized in order to offload processing required for virtualization from the host. Off-host virtualizers may be connected to the external physical storage devices for which they provide virtualization functions via a variety of interconnects, such as Fiber Channel links, Internet Protocol (IP) networks, and the like.

In many corporate data centers, as the application workload increases, additional hosts may need to be provisioned to provide the required processing capabilities. The internal configuration (e.g., file system layout and file system sizes) of each of these additional hosts may be fairly similar, with just a few features unique to each host. Booting and installing each newly provisioned host manually may be a cumbersome and error-prone process, especially in environments where a large number of additional hosts may be required fairly quickly. A virtualization mechanism that allows hosts to boot and/or install operating system software off a virtual bootable target device may be desirable to support consistent booting and installation for multiple hosts in such environments. In addition, in some storage environments it may be desirable to be able to boot and/or install off a snapshot volume or a replicated volume, for example in order to be able to re-initialize a host to a state as of a previous point in time (e.g., the time at which the snapshot or replica was created).

SUMMARY

Various embodiments of a system and method for external encapsulation of a volume into a logical unit (LUN) to allow booting and installation on a complex volume are disclosed. According to a first embodiment, a system may include a host, one or more physical storage devices, and an off-host virtualizer. The off-host virtualizer (i.e., a device external to the host, capable of providing block virtualization functionality) may be configured to aggregate storage within the one or more physical storage devices into a logical volume and to generate metadata to emulate the logical volume as a bootable target device. The off-host virtualizer may make the metadata accessible to the host, allowing the host to boot off the logical volume, e.g., off a file system resident in the logical volume.

The metadata generated by the host may include such information as the layouts or offsets of various boot-related partitions that the host may need to access during the boot process, for example to load a file system reader, an operating system kernel, or additional boot software such as one or more scripts. The metadata may be operating system-specific, i.e., the location, format and contents of the metadata may differ from one operating system to another. In one embodiment, a number of different logical volumes, each associated with a particular boot-related partition or file system, may be emulated as part of the bootable target device. In another embodiment, the off-host virtualizer may be configured to present an emulated logical volume as an installable partition (i.e., a partition in which at least a portion of an operating system may be installed). In such an embodiment, the host may also be configured to boot installation software (e.g., off external media), install at least a portion of the operating system on the installable partition, and then boot from a LUN containing the encapsulated volume.

The logical volume aggregated by the off-host virtualizer may support a number of different virtualization features in different embodiments. In one embodiment, the logical volume may be a snapshot volume (i.e., a point-in-time copy of another logical volume) or a replicated volume. The logical volume may span multiple physical storage devices, and may be striped, mirrored, or a virtual RAID volume. In some embodiments, the logical volume may include a multi-layer hierarchy of logical devices, for example implementing mirroring at a first layer and striping at a second layer below the first. In one embodiment, the host may be configured to access the logical volumes directly (i.e., without using the metadata) subsequent to an initial phase of the boot process. For example, during a later phase of the boot process, a volume manager or other virtualization driver may be activated at the host. The volume manager or virtualization driver may be configured to obtain configuration information for the logical volumes (such as volume layouts), e.g., from the off-host virtualizer or some other volume configuration server, to allow direct access.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a block diagram illustrating one embodiment of a computer system.

FIG. 2 is a block diagram illustrating one embodiment of a system where an off-host virtualizer is configured to present one or more logical volumes as a bootable target device for use by host during a boot operation.

FIG. 3a is a block diagram illustrating the mapping of blocks within a logical volume to a virtual LUN according to one embodiment.

FIG. 3b is a block diagram illustrating an example of a virtual LUN including a plurality of partitions, where each partition is mapped to a volume, according to one embodiment.

FIG. 4 is a flow diagram illustrating aspects of the operation of a system configured to support off-host virtualization and emulation of a bootable target device, according to one embodiment.

FIG. 5 is a block diagram illustrating a logical volume comprising a multi-layer hierarchy of virtual block devices according to one embodiment.

FIG. 6 is a block diagram illustrating an embodiment where physical storage devices include fibre channel LUNs accessible through a fibre channel fabric, and an off-host virtualizer includes a virtualizing switch.

FIG. 7 is a block diagram illustrating one embodiment where the Internet SCSI (iSCSI) protocol is used to access the physical storage devices.

FIG. 8 is a block diagram illustrating an embodiment where physical storage devices may be accessible via storage servers configured to communicate with an off-host virtualizer and a host using an advanced storage protocol.

FIG. 9 is a block diagram illustrating an embodiment where some physical storage devices may be accessible via a target-mode host bus adapter.

FIG. 10 is a block diagram illustrating a computer accessible medium according to one embodiment.

While the invention is susceptible to various modifications and alternative forms, specific embodiments are shown by way of example in the drawings and are herein described in detail. It should be understood, however, that drawings and detailed description thereto are not intended to limit the invention to the particular form disclosed, but on the contrary, the invention is to cover all modifications, equivalents and alternatives falling within the spirit and scope of the present invention as defined by the appended claims.

DETAILED DESCRIPTION

FIG. 1 illustrates a computer system 100 according to one embodiment. In the illustrated embodiment, system 100 includes a host 101 and a bootable target device 120. The host 101 includes a processor 110 and a memory 112 containing boot code 114. Boot code 114 may be configured to read operating-system specific boot metadata 122 at a known location or offset within bootable target device 120, and to use boot metadata 122 to access one or more partitions 130 (e.g., a partition from among partitions 130A, 130B, . . . , 130N) of bootable target device 120 in order to bring up or boot host 101. Partitions 130 may be referred to herein as boot partitions, and may contain additional boot code that may be loaded into memory 112 during the boot process.

The process of booting a host 101 may include several distinct phases. In a first phase, for example, the host 101 may be powered on or reset, and may then perform a series of “power on self test (POST)” operations to test the status of various constituent hardware elements, such as processor 110, memory 112, peripheral devices such as a mouse and/or a keyboard, and storage devices including bootable target device 120. In general, memory 112 may comprise a number of different memory modules, such as a programmable read only memory (PROM) module containing boot code 114 for early stages of boot, as well as a larger random access memory for use during later stages of boot and during post-boot or normal operation of host 101. One or memory caches associate with processor 110 may also be tested during POST operations. In traditional systems, bootable target device 120 may typically be a locally attached physical storage device such as a disk, or in some cases a removable physical storage device such as a CD-ROM. In systems employing the Small Computer System Interface (SCSI) protocol to access storage devices, for example, the bootable target device may be associated with a SCSI “logical unit” identified by a logical unit number or LUN. (The term LUN may be used herein to refer to both the identifier for a SCSI target device, as well as the SCSI target device itself.) During POST, one or more SCSI buses attached to the host may be probed, and SCSI LUNS accessible via the SCSI buses may be identified.

In some operating systems, a user such as a system administrator may be allowed to select a bootable target device from among several choices as a preliminary step during boot, and/or to set a particular target as the device from which the next boot should be performed. If the POST operations complete successfully, boot code 114 may proceed to access the designated bootable target device 120. That is, boot code 114 may read the operating system-specific boot metadata 122 from a known location in bootable target device 120. The specific location and format of boot-related metadata may vary from system to system; for example, in many operating systems, boot metadata 122 is stored in the first few blocks of bootable target device 120.

Operating system specific boot metadata 122 may include the location or offsets of one or more partitions (e.g., in the form of a partition table), such as partitions 130A-130N (which may be generically referred to herein as partitions 130), to which access may be required during subsequent phases of the boot process. In some environments the boot metadata 122 may also include one or more software modules, such as a file system reader, that may be required to access one or more partitions 130. The file system reader may then be read into memory at the host 101 (such as memory 112), and used to load one or more additional or secondary boot programs (i.e., additional boot code) from a partition 130. The additional or secondary boot programs may then be loaded and executed, resulting for example in an initialization of an operating system kernel, followed by an execution of one or more scripts in a prescribed sequence, ultimately leading to the host reaching a desired “run level” or mode of operation. Various background processes (such as network daemon processes in operating systems derived from UNIX, volume managers, etc.) and designated application processes (e.g., a web server or a database management server configured to restart automatically upon reboot) may also be started up during later boot phases. When the desired mode of operation is reached, host 101 may allow a user to log in and begin desired user-initiated operations, or may begin providing a set of preconfigured services (such as web server or database server functionality). The exact nature and sequence of operations performed during boot may vary from one operating system to another.

If host 101 is a newly provisioned host without an installed operating system, or if host 101 is being reinstalled or upgraded with a new version of its operating system, the boot process may be followed by installation of desired portions of the operating system. For example, the boot process may end with a prompt being displayed to the user or administrator, allowing the user to specify a device from which operating system modules may be installed, and to select from among optional operating system modules. In some environments, the installation of operating system components on a newly provisioned host may be automated—e.g., one or more scripts run during (or at the end of) the boot process may initiate installation of desired operating system components from a specified device.

As noted above, traditionally, computer hosts 101 have usually been configured to boot off a local disk (i.e., disks attached to the host) or local removable media. For example, hosts configured to use a UNIX™-based operating system may be configured to boot off a “root” file system on a local disk, while hosts configured with a version of the Windows™ operating system from Microsoft Corporation may be configured to boot off a “system partition” on a local disk. However, in some storage environments it may be possible to configure a host 101 to boot off a virtual bootable target device, that is, a device that has been aggregated from one or more backing physical storage devices by a virtualizer or virtualization coordinator, where the backing physical storage may be accessible via a network instead of being locally accessible at the host 101. The file systems and/or partitions expected by the operating system at the host may be emulated as being resident in the virtual bootable target device. FIG. 2 is a block diagram illustrating one embodiment of a system 200, where an off-host virtualizer 210 is configured to present one or more logical volumes 240 as a bootable target device 250 for use by host 101 during a boot operation. Off-host virtualizer 210 may be coupled to host 101 and to one or more physical storage devices 220 (i.e., physical storage devices 220A-220N) over a network 260. As described below in further detail, network 260 may be implemented using a variety of physical interconnects and protocols, and in some embodiments may include a plurality of independently configured networks, such as fibre channel fabrics and/or IP-based networks.

In general, virtualization refers to a process of creating or aggregating logical or virtual devices out of one or more underlying physical or logical devices, and making the virtual devices accessible to device consumers for storage operations. The entity or entities that perform the desired virtualization may be termed virtualizers. Virtualizers may be incorporated within hosts (e.g., in one or more software layers within host 101) or at external devices such as one or more virtualization switches, virtualization appliances, etc., which may be termed off-host virtualizers. In FIG. 2, for example, off-host virtualizer 210 may be configured to aggregate storage from physical storage devices 220 (i.e., physical storage devices 220A-220N)) into logical volumes 240 (i.e., logical volumes 240A-240M). In the illustrated embodiment, each physical storage device 220 and logical storage device 240 may be configured as a block device, i.e., a device that provides a collection of linearly addressed data blocks that can be read or written. In such an embodiment, off-host virtualizer 210 may be said to perform block virtualization. A variety of advanced storage functions may be supported by a block virtualizer such as off-host virtualizer 210 in different embodiments, such as the ability to create snapshots or point-in-time copies, replicas, and the like. In one embodiment of block virtualization, one or more layers of software may rearrange blocks from one or more physical block devices, such as disks. The resulting rearranged collection of blocks may then be presented to a storage consumer, such as an application or a file system at host 101, as one or more aggregated devices with the appearance of one or more basic disk drives. That is, the more complex structure resulting from rearranging blocks and adding functionality may be presented as if it were one or more simple arrays of blocks, or logical block devices. In some embodiments, multiple layers of virtualization may be implemented. That is, one or more block devices may be mapped into a particular virtualized block device, which may be in turn mapped into still another virtualized block device, allowing complex storage functions to be implemented with simple block devices. Further details on block virtualization, and advanced storage features supported by block virtualization, are provided below.

In addition to aggregating storage into logical volumes, off-host virtualizer 210 may also be configured to emulate storage within one or more logical volumes 240 as a bootable target device 250. That is, off-host virtualizer 210 may be configured to generate operating system-specific boot metadata 122 to make a range of storage within the one or more logical volumes 240 appear as a bootable partition (e.g., a partition 130) and/or file system to host 101. The generation and presentation of operating system specific metadata, such as boot metadata 122, for the purpose of making a logical volume appear as an addressable storage device (e.g., a LUN) to a host may be termed “volume tunneling”. The virtual addressable storage device presented to the host using such a technique may be termed a “virtual LUN”. Volume tunneling may be employed for other purposes in addition to the emulation of bootable target devices, e.g., to support dynamic mappings of logical volumes to virtual LUNs, to provide an isolating layer between front-end virtual LUNs and back-end or physical LUNs, etc.

FIG. 3a is a block diagram illustrating the mapping of blocks within a logical volume to a virtual LUN according to one embodiment. In the illustrated embodiment, a source logical volume 305 comprising N blocks of data (numbered from 0 through (N−1)) may be encapsulated or tunneled through a virtual LUN 310 comprising (N+H) blocks. Off-host virtualizer 210 may be configured to logically insert operating system specific boot metadata in a header 315 comprising the first H blocks of the virtual LUN 310, and the remaining N blocks of virtual LUN 310 may map to the N blocks of source logical volume 305. A host 101 may be configured to boot off virtual LUN 310, for example by setting the boot target device for the host to the identifier of the virtual LUN 310. Metadata contained in header 315 may be set up to match the format and content expected by boot code 114 at a LUN header of a bootable device for a desired operating system, and the contents of logical volume 305 may include, for example, the contents expected by boot code 114 in one or more partitions 130. In some embodiments, the metadata and/or the contents of the logical volume may be customized for the particular host being booted: for example, some of the file system contents or scripts accessed by the host 101 during various boot phases may be modified to support requirements specific to the particular host 101. Examples of such customization may include configuration parameters for hardware devices at the host (e.g., if a particular host employs multiple Ethernet network cards, some of the networking-related scripts may be modified), customized file systems, or customized file system sizes. In general, the generated metadata required for volume tunneling may be located at a variety of different offsets within the logical volume address space, such as within a header 315, a trailer, at some other designated offset within the virtual LUN 310, or at a combination of locations within the virtual LUN 310. The number of data blocks dedicated to operating system specific metadata (e.g., the length of header 315), as well as the format and content of the metadata, may vary with the operating system in use at host 101.

The metadata inserted within virtual LUN 310 may be stored in persistent storage, e.g., within some blocks of a physical storage device 220 or at off-host virtualizer 210, in some embodiments, and logically concatenated with the mapped blocks 320. In other embodiments, the metadata may be generated on the fly, whenever a host 101 accesses the virtual LUN 310. In some embodiments, the metadata may be generated by an external agent other than off-host virtualizer 210. The external agent may be capable of emulating metadata in a variety of formats for different operating systems, including operating systems that may not have been known when the off-host virtualizer 210 was deployed. In one embodiment, off-host virtualizer 210 may be configured to support more than one operating system; i.e., off-host virtualizer 210 may logically insert metadata blocks corresponding to any one of a number of different operating systems when presenting virtual LUN 310 to a host 101, thereby allowing hosts intended to use different operating systems to share virtual LUN 310. In some embodiments, a plurality of virtual LUNs emulating bootable target devices, each corresponding to a different operating system, may be set up in advance, and off-host virtualizer 210 may be configured to select a particular virtual LUN for presentation to a host for booting. In large data centers, a set of relatively inexpensive servers (which may be termed “boot servers”) may be designated to serve as a pool of off-host virtualizers dedicated to provide emulated bootable target devices for use as needed throughout the data center. Whenever a newly provisioned host in the data center needs to be booted and/or installed, a bootable target device presented by one of the boot servers may be used, thus supporting consistent configurations at the hosts of the data center as the data center grows.

For some operating systems, off-host virtualizer 210 may emulate a number of different boot-related volumes using a plurality of partitions within the virtual LUN 310. FIG. 3b is a block diagram illustrating an exemplary virtual LUN 310 according to one embodiment, where the virtual LUN includes three emulated partitions 341A-341C. An off-host virtualizer 210 (not shown in FIG. 3b) may be configured to present virtual LUN 310 to a host bus adapter 330 and/or disk driver 325 at host 101. Each partition 341 may be mapped to a respective volume 345 that may be accessed during boot and/or operating system installation. In the depicted example, partitions corresponding to three volumes 345A-345C used respectively for a “/” (root) file system, a “/usr” file system and a “/swap” file system, each of which may be accessed by a host 101 employing a UNIX-based operating system, are shown. In such embodiments, where multiple volumes and/or file systems are emulated within the same virtual LUN, additional operating system specific metadata identifying the address ranges within the virtual LUN where the corresponding partitions are located may be provided by off-host virtualizer 210 to host 110. In the example depicted in FIG. 3b, the address ranges for partitions 341A-341C are provided in a virtual table of contents (VTOC) structure 340. The additional metadata may be included with boot metadata 122 in some embodiments. In other embodiments, the additional metadata may be provided at some other location within the address space of the virtual LUN, or provided to the host 101 using another mechanism, such as extended SCSI mode pages or messages sent over a network from off-host virtualizer 210 to host 101. In some embodiments, the additional metadata may also be customized to suit the specific requirements of a particular host 101; e.g., not all hosts may require the same modules of an operating system to be installed and/or upgraded.

As noted above and illustrated in FIG. 3b, in some embodiments, off-host virtualizer 210 may be configured to present an emulated logical volume 240 as an installable partition or volume to host 101—i.e., a partition or volume to which at least a portion of an operating system may be installed. The host 101 may be configured to boot installation software (e.g., off removable media such as a CD provided by the operating system vendor), and then install desired portions of the operating system onto the installable partition or volume. After the desired installation is completed, in some embodiments the host 101 may be configured to boot from the LUN containing the encapsulated volume.

FIG. 4 is a flow diagram illustrating aspects of the operation of a system (such as system 200) supporting off-host virtualization and emulation of a bootable target device, according to one embodiment. Off-host virtualizer 210 may be configured to aggregate storage within physical storage devices 220 into one or more logical volumes 240 (block 405 of FIG. 4). The logical volumes 240 may be configured to implement a number of different virtualization functions, such as snapshots or replication. Off-host virtualizer 210 may then emulate the logical volumes as a bootable target device 250 (block 415), for example by logically inserting operating system-specific boot metadata 315 into a virtual LUN 310 as described above. In some embodiments, as noted above, a subset of the blocks of the logical volumes and/or the metadata may be modified to provide data specific to the host being booted (e.g., a customized boot process may be supported). The emulated bootable target device may be made accessible to a host 101 (block 425), e.g., by setting the host's target bootable device address to the address of the virtual LUN 310. The host 101 may then boot off the emulated bootable target device (block 435), for example, off a file system or partition resident in the logical volume (such as a “root” file system in the case of hosts employing UNIX-based operating systems, or a “system partition” in the case of Windows operating systems). That is, the virtualizer may emulate the particular file system or partition expected for booting by the host as being resident in the logical volume in such embodiments.

As noted earlier, the boot process at host 101 may include several phases. During each successive phase, additional modules of the host's operating system and/or additional software modules may be activated, and various system processes and services may be started. During one such phase, in some embodiments a virtualization driver or volume manager capable of recognizing and interacting with logical volumes may be activated at host 101. In such embodiments, after the virtualization driver or volume manager is activated, it may be possible for the host to switch to direct interaction with the logical volumes 240 (block 455 of FIG. 4), e.g., over network 360, instead of performing I/O to the logical volumes through the off-host virtualizer 210. Direct interaction with the logical volumes 240 may support higher levels of performance than indirect interaction via off-host virtualizer 210, especially in embodiments where off-host virtualizer 210 has limited processing capabilities. In order to facilitate a transition to direct access, off-host virtualizer 210 or some other volume configuration server may be configured to provide configuration information (such as volume layouts) related to the logical volumes 240 to the virtualization driver or volume manager. Once the transition to direct access occurs, the emulated bootable target device 250 and the off-host virtualizer 210 may no longer be used by host 101 until the next time host 101 is rebooted. During the next reboot, host 101 may switch back to accessing logical volumes 240 via the emulated bootable target device 250. In later boot phases, when the virtualization driver or volume manager is activated, direct access to the logical volumes may be resumed. Such an ability to transition to direct access to logical volumes 240 may allow off-host virtualizers 180 to be implemented using relatively low-end processors, since off-host virtualizers may be utilized heavily only during boot-related operations in system 200, and boot-related operations may be rare relative to production application processing operations.

As noted previously, a number of different virtualization functions may be implemented at a logical volume 240 by off-host virtualizer 210 in different embodiments. In one embodiment, a logical volume 240 may be aggregated from storage from multiple physical storage devices 220, e.g., by striping successive blocks of data across multiple physical storage devices, by spanning multiple physical storage devices (i.e., concatenating physical storage from multiple physical storage devices into the logical volume), or by mirroring data blocks at two or more physical storage devices. In another embodiment, a logical volume 240 that is used by off-host virtualizer 210 to emulate a bootable target device 250 may be a replicated volume. For example, the logical volume 240 may be a replica or copy of a source logical volume that may be maintained at a remote data center. Such a technique of replicating bootable volumes may be useful for a variety of purposes, such as to support off-site backup or to support consistency of booting and/or installation in distributed enterprises where hosts at a number of different geographical locations may be required to be set up with similar configurations. In some embodiments, a logical volume 240 may be a snapshot volume, such as an instant snapshot or a space-efficient snapshot, i.e., a point-in-time copy of some source logical volume. Using snapshot volumes to boot and/or install systems may support the ability to revert a host back to any desired previous configuration from among a set of configurations for which snapshots have been created. Support for automatic roll back (e.g., to a desired point in time) on boot may also be implemented in some embodiments. In one embodiment, a logical volume 240 used to emulate a bootable target device may be configured as a virtual RAID (“Redundant Array of Independent Disks”) device or RAID volume, where parity based redundancy computations are implemented to provide high availability. Physical storage from a plurality of storage servers may be aggregated to form the RAID volume, and the redundancy computations may be implemented via a software protocol. A bootable target device emulated from a RAID volume may be recoverable in the event of a failure at one of its backing storage servers, thus enhancing the availability of boot functionality supported by the off-host virtualizer 210. A number of different RAID levels (e.g., RAID-3, RAID-4, or RAID-5) may be implemented in the RAID volume.

In some embodiments, a logical volume 240 may include multiple layers of virtual storage devices. FIG. 5 is a block diagram illustrating a logical volume 240 comprising a multi-layer hierarchy of virtual block devices according to one embodiment. In the illustrated embodiment, logical volume 240 includes logical block devices 504 and 506. In turn, logical block device 504 includes logical block devices 508 and 510, while logical block device 506 includes logical block device 512. Logical block devices 508, 510, and 512 map to physical block devices 220A-C of FIG. 2, respectively.

After host 101 has booted, logical volume 240 may be configured to be mounted within a file system or presented to an application or other volume consumer. Each block device within logical volume 240 that maps to or includes another block device may include an interface whereby the mapping or including block device may interact with the mapped or included device. For example, this interface may be a software interface whereby data and commands for block read and write operations is propagated from lower levels of the virtualization hierarchy to higher levels and vice versa.

Additionally, a given block device may be configured to map the logical block spaces of subordinate block devices into its logical block space in various ways in order to realize a particular virtualization function. For example, in one embodiment, logical volume 240 may be configured as a mirrored volume, in which a given data block written to logical volume 240 is duplicated, and each of the multiple copies of the duplicated given data block are stored in respective block devices. In one such embodiment, logical volume 240 may be configured to receive an operation to write a data block from a consumer, such as an application running on host 101. Logical volume 240 may duplicate the write operation and issue the write operation to both logical block devices 504 and 506, such that the block is written to both devices. In this context, logical block devices 504 and 506 may be referred to as mirror devices. In various embodiments, logical volume 240 may read a given data block stored in duplicate in logical block devices 504 and 506 by issuing a read operation to one mirror device or the other, for example by alternating devices or defaulting to a particular device. Alternatively, logical volume 240 may issue a read operation to multiple mirror devices and accept results from the fastest responder.

In some embodiments, it may be the case that underlying physical block devices 220A-C have dissimilar performance characteristics; specifically, devices 220A-B may be slower than device 220C. In order to balance the performance of the mirror devices, in one embodiment, logical block device 504 may be implemented as a striped device in which data is distributed between logical block devices 508 and 510. For example, even- and odd-numbered blocks of logical block device 504 may be mapped to logical block devices 508 and 510 respectively, each of which may be configured to map in turn to all or some portion of physical block devices 220A-B respectively. In such an embodiment, block read/write throughput may be increased over a non-striped configuration, as logical block device 504 may be able to read or write two blocks concurrently instead of one. Numerous striping arrangements involving various distributions of blocks to logical block devices are possible and contemplated; such arrangements may be chosen to optimize for various data usage patterns such as predominantly sequential or random usage patterns. In another aspect illustrating multiple layers of block virtualization, in one embodiment physical block device 220C may employ a different block size than logical block device 506. In such an embodiment, logical block device 512 may be configured to translate between the two physical block sizes and to map the logical block space defined by logical block device 506 to the physical block space defined by physical block device 220C.

The technique of volume tunneling to emulate a bootable target device may be implemented using a variety of different storage and network configurations in different embodiments. FIG. 6 is a block diagram illustrating an embodiment where the physical storage devices include fibre channel LUNs 610 accessible through a fibre channel fabric 620, and off-host virtualizer 210 includes a virtualizing switch. A “fibre channel LUN”, as used herein, may be defined as a unit of storage addressable using a fibre channel address. For example, a fibre channel address for storage accessible via a fiber channel fabric may consist of a fabric identifier, a port identifier, and a logical unit identifier. The virtual LUN presented by off-host virtualizer to host 110 as a bootable target device 250 in such an embodiment may be a virtual fibre channel LUN. Fibre channel fabric 620 may include additional switches in some embodiments, and host 101 may be coupled to more than one switch. Some of the additional switches may also be configured to provide virtualization functions. That is, in some embodiments off-host virtualizer 210 may include a plurality of cooperating virtualizing switches. In one embodiment, multiple independently-configurable fibre channel fabrics may be employed: e.g., a first set of fibre channel LUNs 610 may be accessible through a first fabric, and a second set of fibre channel LUNs 610 may be accessible through a second fabric.

FIG. 7 is a block diagram illustrating one embodiment where the Internet SCSI (iSCSI) protocol is used to access the physical storage devices. iSCSI is a protocol used by storage initiators (such as hosts 101 and/or off-host virtualizers 210) to send SCSI storage commands to storage targets (such as disks or tape devices) over an IP (Internet Protocol) network. The physical storage devices accessible in an iSCSI-based storage network may be addressable as iSCSI LUNs, just as SCSI devices locally attached to a host may be addressable as SCSI LUNs, and physical storage devices attached via fibre channel fabrics may be addressable as fibre channel LUNs. In one embodiment, for example, an iSCSI address may include an IP address or iSCSI qualified name (iqn), a target device identifier, and a logical unit number. As shown in FIG. 7, one or more iSCSI LUNs 710 may be attached directly to the off-host virtualizer 210. For example, in one embodiment, the off-host virtualizer 210 may itself be a computer system, comprising its own processor, memory and physical storage devices (e.g., iSCSI LUN 710A). The remaining iSCSI LUNs 710B-710N may be accessible through other hosts or through iSCSI servers. In some embodiments, all the physical storage devices may be attached directly to the off-host virtualizer 210 and may be accessible via iSCSI. In general, a host 101 may require an iSCSI-enabled network adapter to participate in the iSCSI protocol. In some embodiments where the physical storage devices include iSCSI LUNs, a network boot protocol similar to BOOTP (a protocol that is typically used to allow diskless hosts to boot using boot code provided by a boot server) may be used to support a first phase boot of a host 101 that does not have an iSCSI-enabled adapter. Additional boot code loaded during the first phase may allow the host to mount a file system over iSCSI, and/or to perform further boot phases, despite the absence of an iSCSI-enabled network card. That is, software provided to the host 101 during an early boot phase (e.g., by off-host virtualizer 210) may be used later in the boot process to emulate iSCSI transactions without utilizing an iSCSI-enabled network adapter at the host.

In some embodiments, host 101 may be configured to boot from an emulated volume using a first network type such as iSCSI, and to then switch to directly accessing the volume using a second network type such as fibre channel. iSCSI-based booting may be less expensive and/or easier to configure than fibre-channel based booting in some embodiments. An off-host virtualizer 210 that uses iSCSI (such as an iSCSI boot appliance) and at the same time accesses fibre-channel based storage devices may allow such a transition between the network type that is used for booting and the network type that is used for subsequent I/O (e.g., for I/Os requested by production applications).

In one embodiment, illustrated in FIG. 8, physical storage devices 220 may be accessible via storage servers (e.g., 850A and 850B) configured to communicate with off-host virtualizer 210 and host 101 using an advanced storage protocol. The advanced storage protocol may support features, such as access security and tagged directives for distributed I/O operations that may not be adequately supported by the traditional storage protocols (such as SCSI or iSCSI) alone. In such an embodiment, a storage server 850 may translate data access requests from the advanced storage protocol to a lower level protocol or interface (such as SCSI) that may be presented by the physical storage devices 220 managed at the storage server. While the advanced storage protocol may provide enhanced functionality, it may still allow block-level access to physical storage devices 220. Storage servers 850 may be any device capable of supporting the advanced storage protocol, such as a computer host with one or more processors and one or more memories.

FIG. 9 is a block diagram illustrating an embodiment where some physical storage devices 220 may be accessible via a target-mode host bus adapter 902. A host bus adapter (HBA) is a hardware device that acts as an interface between a host 101 and an I/O interconnect, such as a SCSI bus or fibre channel link. Typically, an HBA is configured as an “initiator”, i.e., a device that initiates storage operations on the I/O interconnect, and receives responses from other devices (termed “targets”) such as disks, disk array devices, or tape devices, coupled to the I/O interconnect. However, some host bus adapters may be configurable (e.g., by modifying the firmware on the HBA) to operate as targets rather than initiators, i.e., to receive commands such as iSCSI commands sent by initiators requesting storage operations. Such host bus adapters may be termed “target-mode” host bus adapters, and may be incorporated within off-host virtualizers 210 as shown in FIG. 9 in some embodiments. The I/O operations corresponding to the received commands may be performed at the physical storage devices, and the response returned to the requesting initiator. In some embodiments, all the physical storage devices 220 used to back logical volumes 240 may be accessible via target-mode host bus adapters.

As noted above, an off-host virtualizer 210 may comprise a number of different types of hardware and software entities in different embodiments. In some embodiments, an off-host virtualizer 210 may itself be a host with its own processor, memory, peripheral devices and I/O devices, running an operating system and a software stack capable of providing the block virtualization features described above. In other embodiments, the off-host virtualizer 210 may include one or more virtualization switches and/or virtualization appliances. A virtualization switch may be an intelligent fiber channel switch, configured with sufficient processing capacity to perform desired virtualization operations in addition to supporting fiber channel connectivity. A virtualization appliance may be an intelligent device programmed to perform virtualization functions, such as providing mirroring, striping, snapshot capabilities, etc. Appliances may differ from general purpose computers in that their software is normally customized for the function they perform, pre-loaded by the vendor, and not alterable by the user. In some embodiments, multiple devices or systems may cooperate to provide off-host virtualization; e.g., multiple cooperating virtualization switches may form a single off-host virtualizer. In one embodiment, the aggregation of storage within physical storage devices 220 into logical volumes 240 may be performed by one off-host virtualizing device or host, while another off-host virtualizing device may be configured to emulate the logical volumes as bootable target devices and present the bootable target devices to host 101.

FIG. 10 is a block diagram illustrating a computer accessible medium 1000 including virtualization software 1010 configured to provide the functionality of off-host virtualizer 210 and host 101 described above. Virtualization software 1010 may be provided to a computer system using a variety of computer-accessible media including electronic media (e.g., flash memory), magnetic media such as RAM (e.g., SDRAM, RDRAM, SRAM, etc.), optical storage media such as CD-ROM, etc., as well as transmission media or signals such as electrical, electromagnetic or digital signals, conveyed via a communication medium such as a network and/or a wireless link.

Although the embodiments above have been described in considerable detail, numerous variations and modifications will become apparent to those skilled in the art once the above disclosure is fully appreciated. It is intended that the following claims be interpreted to embrace all such variations and modifications.

Claims

1. A system comprising:

a host;
one or more physical storage devices;
an off-host virtualizer;
wherein the off-host virtualizer is configured to:
aggregate storage within the one or more physical storage devices into a logical volume; and
generate metadata to emulate the logical volume as a bootable target device;
make the metadata accessible to the host; and
wherein the host is configured to use the metadata to boot off a file system residing in the logical volume.

2. The system as recited in claim 1, wherein the logical volume is a snapshot volume.

3. The system as recited in claim 1, wherein the logical volume is a replicated volume.

4. The system as recited in claim 1, wherein the logical volume is a striped volume.

5. The system as recited in claim 1, wherein the one or more physical storage devices include a first and a second physical storage device, and wherein the logical volume spans the first and the second physical storage devices.

6. The system as recited in claim 1, wherein the logical volume is a RAID volume.

7. The system as recited in claim 1, wherein the logical volume maps to a boot partition of a designated operating system.

8. The system as recited in claim 7, wherein the designated operating system is configured to access a plurality of additional boot-related partitions during a boot operation, and wherein the off-host virtualizer is further configured to:

generate additional metadata to emulate the logical volume as the plurality of additional boot-related partitions; and
make the additional metadata accessible to the host.

9. The system as recited in claim 1, wherein, subsequent to an initial phase of a boot process, the host is configured to access the logical volume directly without performing I/O through the off-host virtualizer.

10. The system as recited in claim 9, wherein the host is configured to use a first network type for the initial phase of the boot process, and wherein the host is configured to access the logical volume directly using a second network type.

11. The system as recited in claim 1, wherein a physical storage device of the one or more physical storage devices includes a fiber channel logical unit (LUN).

12. The system as recited in claim 1, wherein a physical storage device of the one or more physical storage devices includes an iSCSI LUN.

13. The system as recited in claim 1, further comprising a storage server, wherein the storage server is configured to provide access to a physical storage device of the one or more physical storage devices.

14. The system as recited in claim 1, wherein a physical storage device of the one or more physical storage devices is accessed using a target-mode host bus adapter of the off-host virtualizer.

15. The system as recited in claim 1, wherein the off-host virtualizer is further configured to:

present the logical volume to the host as an installable partition;
and wherein the host is further configured to:
boot installation software for the operating system from removable media; and
install the at least a portion of the operating system on the installable partition.

16. A method comprising:

aggregating storage within one or more physical storage devices into a logical volume;
generating metadata to emulate the logical volume as a bootable target device;
making the metadata accessible to a host; and
the host using the metadata to boot off a file system resident in the logical volume.

17. The method as recited in claim 16, wherein the logical volume is a snapshot volume.

18. The method as recited in claim 16, wherein the logical volume is a replicated volume.

19. The method as recited in claim 16, wherein a storage device of the one or more physical storage devices includes a fibre channel logical unit (LUN).

20. The method as recited in claim 16, wherein a storage device of the one or more physical storage devices includes an iSCSI (Internet SCSI) LUN.

21. The method as recited in claim 16, further comprising:

the host accessing the logical volume subsequent to the boot operation without performing I/O through the off-host virtualizer.

22. A computer accessible medium comprising program instructions, wherein the instructions are executable to:

aggregate storage within one or more physical storage devices into a logical volume;
generate metadata to emulate the logical volume as a bootable target device;
make the metadata accessible to a host; and
use the metadata to boot the host off a file system resident in the logical volume.

23. The computer accessible medium as recited in claim 22, wherein the logical volume is a snapshot volume.

24. The computer accessible medium as recited in claim 22, wherein the logical volume is a replicated volume.

25. The computer accessible medium as recited in claim 22, wherein a storage device of the one or more physical storage devices includes a fibre channel logical unit (LUN).

26. The computer accessible medium as recited in claim 22, wherein a storage device of the one or more physical storage devices includes an iSCSI (Internet SCSI) LUN.

27. The computer accessible medium as recited in claim 22, wherein the instructions are further executable to:

access the logical volume from the host subsequent to the boot operation without performing I/O through the off-host virtualizer.
Patent History
Publication number: 20050228950
Type: Application
Filed: Jun 20, 2005
Publication Date: Oct 13, 2005
Applicant:
Inventor: Ronald Karr (Palo Alto, CA)
Application Number: 11/156,636
Classifications
Current U.S. Class: 711/114.000; 711/170.000; 711/162.000