ADDING SINGLE DISKS TO AN ARRAY BY RELOCATING RAID MEMBERS
Protection group members from a cluster of W baseline size disks with RAID (D+P) protection groups associated with W partition indices, where W=D+P, are selected and relocated to a new baseline size disk using a W-by-W relocation sequence matrix. The same relocation sequence matrix is used to select and relocate protection group members from M clusters of baseline size disks to a new disk that has M times the storage capacity of each baseline size disk. A new cluster of multiple size disks is formed when W multiple size disks have been added, after which the W-by-W relocation sequence matrix is used to select and relocate protection group members from the new cluster to additional multiple size disks.
Latest EMC IP HOLDING COMPANY LLC Patents:
- Forwarding incoming IO to SCM namespaces
- Framework for preventing software piracy in virtual machines (VMs) by using virtual hardware encryption verification
- Early validation of communication behavior
- Increasing device data confidence and operation via temporal and spatial analytics
- Method, electronic device, and computer program product for task scheduling
The subject matter of this disclosure is generally related to electronic data storage, and more particularly to single disk scaling of a storage system that implements RAID protection groups.
BACKGROUNDThe disk drives in a typical mass data storage system are configured to be members of protection groups known as redundant arrays of independent disks (RAID). A RAID protection group helps to avoid data loss by enabling a failed protection group member to be rebuilt on a spare disk using the remaining non-failed members. A RAID (D+P) protection group has D data members and P parity members. The data members store data. The parity members store parity information such as XORs of the data values on the data members. The parity information enables reconstruction of the data in the event that a data member fails. Parity information can be reconstructed from the data on the data members in the event that a parity member fails. A variety of different RAID levels with different numbers, types, and configurations of members are known. A typical data storage system includes multiple RAID protection groups of the same level, with individual disks serving as protection group members.
Most data storage systems enable storage capacity to be increased to accommodate a greater amount of data by adding new disks. The storage capacity of a data storage system that uses individual disks as RAID (D+P) protection group members is increased by adding W new disks, where W=(D+P). All the disks typically have the same storage capacity. For example, a storage system that implements RAID-5 (4+1) is scaled-up in increments of five new disks of the same size as the installed disks. Similarly, a RAID-5 (3+1) is scaled-up in increments of four new disks of the same size as the installed disks. However, as the storage capacity of individual disk drives increases because of technological advances, an increment of W new disks may be undesirably large and inefficient. Moreover, it may be desirable to add new disks that have greater capacity than the installed disks.
SUMMARYIn accordance with some aspects an apparatus comprises a storage array comprising: at least one compute node comprising at least one processor and non-transitory computer-readable memory; a plurality of baseline size non-volatile disks that are accessed by the at least one compute node and used to store data that is accessed via the at least one compute node, each disk configured with W indexed partitions, each partition having a same fixed-size amount of storage capacity equal to storage capacity of other partitions; and a disk manager configured to: create a cluster of W of the baseline size disks with RAID (D+P) protection groups associated with ones of the partition indices, where W=D+P; and responsive to addition of a new baseline size disk, configure the new baseline size disk with W partitions and relocate selected members of the protection groups from the cluster of baseline disks to the new baseline size disk based on a W-by-W relocation sequence matrix.
In accordance with some aspects a method is implemented by a storage array having at least one compute node with at least one processor and non-transitory computer-readable memory, and a plurality of baseline size non-volatile disks that are accessed by the at least one compute node and used to store data that is accessed via the at least one compute node, each disk configured with W indexed partitions, each partition having a same fixed-size amount of storage capacity equal to storage capacity of other partitions, the method comprising: creating a cluster of W of the baseline size disks with RAID (D+P) protection groups associated with ones of the partition indices, where W=D+P; and responsive to addition of a new baseline size disk, configuring the new baseline size disk with W partitions and relocating selected members of the protection groups from the cluster of baseline disks to the new baseline size disk based on a W-by-W relocation sequence matrix.
In accordance with some aspects, a computer-readable storage medium stores instructions that when executed by a storage array having at least one compute node with at least one processor and non-transitory computer-readable memory, and a plurality of baseline size non-volatile disks that are accessed by the at least one compute node and used to store data that is accessed via the at least one compute node, each disk configured with W indexed partitions, each partition having a same fixed-size amount of storage capacity equal to storage capacity of other partitions, cause the storage array to add storage capacity, the method comprising: creating a cluster of W of the baseline size disks with RAID (D+P) protection groups associated with ones of the partition indices, where W=D+P; and responsive to addition of a new baseline size disk, configuring the new baseline size disk with W partitions and relocating selected members of the protection groups from the cluster of baseline disks to the new baseline size disk based on a W-by-W relocation sequence matrix.
All examples, aspects, implementations, and features mentioned in this disclosure can be combined in any technically possible way. Other aspects, features, and implementations may become apparent in view of the detailed description and figures.
The terminology used in this disclosure is intended to be interpreted broadly within the limits of subject matter eligibility. The terms “disk” and “drive” are used interchangeably to refer to non-volatile storage media and are not intended to refer to any specific type of non-volatile storage media. The terms “logical” and “virtual” are used to refer to features that are abstractions of other features, e.g., and without limitation abstractions of tangible features. The term “physical” is used to refer to tangible features that possibly include, but are not limited to, electronic hardware. For example, multiple virtual computers could operate simultaneously on one physical computer. The term “logic” is used to refer to special purpose physical circuit elements, firmware, software, computer instructions that are stored on a non-transitory computer-readable medium and implemented by multi-purpose tangible processors, and any combinations thereof. Aspects of the inventive concepts are described as being implemented in a data storage system that includes host servers and a storage array. Such implementations should not be viewed as limiting. Those of ordinary skill in the art will recognize that there are a wide variety of implementations of the inventive concepts in view of the teachings of the present disclosure.
Some aspects, features, and implementations described herein may include machines such as computers, electronic components, optical components, and processes such as computer-implemented procedures and steps. It will be apparent to those of ordinary skill in the art that the computer-implemented procedures and steps may be stored as computer-executable instructions on a non-transitory computer-readable medium. Furthermore, it will be understood by those of ordinary skill in the art that the computer-executable instructions may be executed on a variety of tangible processor devices, i.e., physical hardware. For practical reasons, not every step, device, and component that may be part of a computer or data storage system is described herein. Those of ordinary skill in the art will recognize such steps, devices, and components in view of the teachings of the present disclosure and the knowledge generally available to those of ordinary skill in the art. The corresponding machines and processes are therefore enabled and within the scope of the disclosure.
Data associated with instances of the hosted applications running on the host servers 103 is maintained on the managed disks 101. The managed disks 101 are not discoverable by the host servers but the storage array creates a logical storage object known as a production volume 140 that can be discovered and accessed by the host servers. Without limitation, the storage object may be referred to as a source device, production device, or production LUN, where the logical unit number (LUN) is a number used to identify logical storage volumes in accordance with the small computer system interface (SCSI) protocol. From the perspective of the host servers 103, the production volume 140 is a single disk having a set of contiguous fixed-size logical block addresses (LBAs) on which data used by the instances of the host application resides. However, the host application data is stored at non-contiguous addresses on various managed disks 101. The compute nodes maintain metadata that maps between the production volume 140 and the managed disks 101 in order to process IOs from the hosts.
Two clusters of baseline size disks are required to support an initial addition of a double size disk. Similarly, three clusters of baseline size disks are required to support an initial addition of a triple size disk. Within those constraints, when a disk having a size multiple M of the baseline size disk is added as determined in step 506 then that disk is divided into M*W same size partitions, which are the same size as the partitions of the baseline disk clusters. Step 518 is using the next sequence number in multiple conceptually overlaid relocation sequence matrices to select and relocate RAID members to the new disk. In the case of initial addition of the multiple size disks, the RAID members are relocated from M baseline disk clusters. In the case of addition of the multiple size disks after creation of a multiple size disk cluster, the RAID members are relocated from the multiple size disk cluster. Step 520 is creating M new protection groups in the partitions vacated as a result of the relocations. Step 522 is determining whether W new multiple size disks have been added. If fewer than W new multiple size disks have been added, then steps 504 through 522 are repeated until W new multiple disks have been added. When W new multiple size disks have been added then a new cluster of multiple size disks is created as indicated in step 524. Steps 516 through 524 are implemented separately for each multiple size, e.g., double size disks are not added to the same cluster as triple size disks.
Specific examples have been presented to provide context and convey inventive concepts. The specific examples are not to be considered as limiting. A wide variety of modifications may be made without departing from the scope of the inventive concepts described herein. Moreover, the features, aspects, and implementations described herein may be combined in any technically possible way. Accordingly, modifications and combinations are within the scope of the following claims.
Claims
1. An apparatus, comprising:
- a storage array comprising: at least one compute node comprising at least one processor and non-transitory computer-readable memory; a plurality of baseline size non-volatile disks that are accessed by the at least one compute node and used to store data that is accessed via the at least one compute node, each disk configured with W indexed partitions, each partition having a same fixed-size amount of storage capacity equal to storage capacity of other partitions; and a disk manager configured to: create a cluster of W of the baseline size disks with RAID (D+P) protection groups associated with ones of the partition indices, where W=D+P; and responsive to addition of a new baseline size disk, configure the new baseline size disk with W partitions and relocate selected members of the protection groups from the cluster of baseline disks to the new baseline size disk based on a W-by-W relocation sequence matrix.
2. The apparatus of claim 1 wherein the disk manager is configured to create a new RAID (D+P) protection group in partitions vacated by the selected members of the protection groups.
3. The apparatus of claim 1 wherein the disk manager is configured to create a second cluster of baseline disks using W new baseline size disks.
4. The apparatus of claim 1 wherein the disk manager is configured to use M clusters of baseline size disks to add a multiple size new disk having M times the baseline disk size.
5. The apparatus of claim 4 wherein the disk manager is configured to organize the multiple size new disk into M*W partitions.
6. The apparatus of claim 5 wherein the disk manager is configured to relocate selected members of the protection groups from the M clusters of baseline size disks to the new multiple size disk based on the W-by-W relocation sequence matrix.
7. The apparatus of claim 6 wherein the disk manager is configured to create a new cluster of multiple size disks using W new multiple size disks.
8. The apparatus of claim 7 wherein the disk manager is configured to use the new cluster of multiple size disks to add a subsequent multiple size new disk by relocating selected members of the protection groups from the new cluster of multiple size disks to the subsequent multiple size disk based on the W-by-W relocation sequence matrix.
9. A method implemented by a storage array having at least one compute node with at least one processor and non-transitory computer-readable memory, and a plurality of baseline size non-volatile disks that are accessed by the at least one compute node and used to store data that is accessed via the at least one compute node, each disk configured with W indexed partitions, each partition having a same fixed-size amount of storage capacity equal to storage capacity of other partitions, the method comprising:
- creating a cluster of W of the baseline size disks with RAID (D+P) protection groups associated with ones of the partition indices, where W=D+P; and
- responsive to addition of a new baseline size disk, configuring the new baseline size disk with W partitions and relocating selected members of the protection groups from the cluster of baseline disks to the new baseline size disk based on a W-by-W relocation sequence matrix.
10. The method of claim 9 comprising creating a new RAID (D+P) protection group in partitions vacated by the selected members of the protection groups.
11. The method of claim 9 comprising creating a second cluster of baseline disks using W new baseline size disks.
12. The method of claim 9 comprising using M clusters of baseline size disks to add a multiple size new disk having M times the baseline disk size.
13. The method of claim 12 comprising organizing the multiple size new disk into M*W partitions.
14. The method of claim 13 comprising relocating selected members of the protection groups from the M clusters of baseline size disks to the new multiple size disk based on the W-by-W relocation sequence matrix.
15. The method of claim 14 comprising creating a new cluster of multiple size disks using W new multiple size disks.
16. The method of claim 15 comprising using the new cluster of multiple size disks to add a subsequent multiple size new disk by relocating selected members of the protection groups from the new cluster of multiple size disks to the subsequent multiple size disk based on the W-by-W relocation sequence matrix.
17. A computer-readable storage medium stores instructions that when executed by a storage array having at least one compute node with at least one processor and non-transitory computer-readable memory, and a plurality of baseline size non-volatile disks that are accessed by the at least one compute node and used to store data that is accessed via the at least one compute node, each disk configured with W indexed partitions, each partition having a same fixed-size amount of storage capacity equal to storage capacity of other partitions, cause the storage array to add storage capacity, the method comprising:
- creating a cluster of W of the baseline size disks with RAID (D+P) protection groups associated with ones of the partition indices, where W=D+P; and
- responsive to addition of a new baseline size disk, configuring the new baseline size disk with W partitions and relocating selected members of the protection groups from the cluster of baseline disks to the new baseline size disk based on a W-by-W relocation sequence matrix.
18. The computer-readable storage medium of claim 17 wherein the method comprises using M clusters of baseline size disks to add a multiple size new disk having M times the baseline disk size.
19. The computer-readable storage medium of claim 18 wherein the method comprises relocating selected members of the protection groups from the M clusters of baseline size disks to the new multiple size disk based on the W-by-W relocation sequence matrix and creating a new cluster of multiple size disks using W new multiple size disks.
20. The computer-readable storage medium of claim 19 wherein the method comprises using the new cluster of multiple size disks to add a subsequent multiple size new disk by relocating selected members of the protection groups from the new cluster of multiple size disks to the subsequent multiple size disk based on the W-by-W relocation sequence matrix.
Type: Application
Filed: May 26, 2021
Publication Date: Dec 1, 2022
Applicant: EMC IP HOLDING COMPANY LLC (Hopkinton, MA)
Inventors: Kuolin Hua (Natick, MA), Kunxiu Gao (Boxborough, MA)
Application Number: 17/330,974