CREATING AND DISTRIBUTING SPARE CAPACITY OF A DISK ARRAY
A subset of drives with protection groups that have D data members and P parity members is created with (W+1) drives each having W partitions where W=(D+P). A single partition protection group is created in the lowest numbered partition of the W lowest numbered drives. Spares are created at drive X partition Y that satisfy X+Y=W+2. Members of additional protection groups with W members are symmetrically distributed on remaining partitions such that the protection group member at drive X partition index Y belongs to protection group N: if (X+Y)<(W+2), then N=(X+Y−2); and if (X+Y)>(W+2), then N=(X+Y−W−2). The spares are used to rebuild partitions in the event of drive failure. When a new drive is added the first W protection group members in the lowest numbered unrotated partition are rotated onto the new drive. The single partition protection group is excluded from rotation. Partitions vacated by rotated protection group members and a rotated spare are used to create a new protection group. The drive subset is split after enough new drives have been added.
Latest EMC IP HOLDING COMPANY LLC Patents:
- System and method for correlating filesystem events into meaningful behaviors
- Secure, low-cost, privacy-preserving biometric card
- Detection of anomalous backup files using known anomalous file fingerprints
- Optimal cluster selection in hierarchical clustering of files
- Method, device, and computer program product for managing storage system
The subject matter of this disclosure is generally related to electronic data storage and more particularly to providing scalable drive subsets with protection groups and spare capacity.
BACKGROUNDProtection groups help to avoid data loss by enabling a failing or failed protection group member to be reconstructed. Individual disk drives are protection group members in a typical data storage system, e.g. members of a redundant array of independent drives (RAID) protection group. A RAID (D+P) protection group has D data members and P parity members. The data members store data. The parity members store parity information such as XORs of data values. The parity information enables reconstruction of data in the event that a data member fails. Parity information can be reconstructed from the data on the data members in the event that a parity member fails. A failed protection group member is typically reconstructed on a spare drive.
It is sometimes necessary to increase the total storage capacity of a data storage system. For example, storage capacity may be increased when existing storage capacity becomes fully utilized. The storage capacity of a data storage system that uses individual drives as protection group members is increased by adding a new protection group, i.e. (W+1) drives for a RAID (D+P) protection group and spare drive where W=(D+P). A storage system that implements RAID-5 (4+1), for example, may be scaled-up in increments of five new drives plus one spare drive. Similarly, a RAID-5 (3+1) may be scaled-up in increments of four new drives plus one spare drive. One drawback of scaling storage capacity in increments of (W+1) new drives is that it may introduce excess storage capacity that will not be utilized within a reasonable timeframe. This drawback is becoming more troublesome as the storage capacity of individual drives increases due to technological advancements. More specifically, as the storage capacity and cost of drives increases, the amount of excess storage capacity and cost associated with adding W+1 drives to a storage system also increases, particularly for larger values of W.
SUMMARYAll examples, aspects and features mentioned in this document can be combined in any technically possible way.
In accordance with some implementations a method of creating and distributing spare capacity on a scalable drive subset on which protection groups are maintained comprises: creating W=(D+P) partitions that are equal in size and number on W+1 drives, wherein the partitions are sequentially ordered, and the drives are sequentially ordered; creating a first vertical protection group that has D data members and P parity members in one partition of W of the drives; creating and distributing W spares at values of drive X, partition Y that satisfy (X+Y)=(W+2); and symmetrically distributing members of additional protection groups with W members on remaining partitions; whereby the spares are distributed such that all protection group members on a failed one of the drives can be relocated to ones of the spares such that no more than one member of any of the protection groups is located on a single one of the drives.
In accordance with some implementations an apparatus comprises: a plurality of non-volatile drives; a plurality of interconnected compute nodes that manage access to the drives; and a drive manager configured to: create W=(D+P) partitions that are equal in size and number on (W+1) drives, wherein the partitions are sequentially ordered, and the drives are sequentially ordered; create a first vertical protection group that has D data members and P parity members in one partition of W of the drives; create and distribute W spares at values of drive X, partition Y that satisfy (X+Y)=(W+2); and symmetrically distribute members of additional protection groups with W members on remaining partitions; whereby the spares are distributed such that all protection group members on a failed one of the drives can be relocated to ones of the spares such that no more than one member of any of the protection groups is located on a single one of the drives.
In accordance with some implementations a computer-readable storage medium stores instructions that when executed by a computer cause the computer to perform a method for using a computer system to create and distribute spare capacity on a scalable drive subset on which protection groups are maintained, the method comprising: creating W=(D+P) partitions that are equal in size and number on (W+1) drives, wherein the partitions are sequentially ordered, and the drives are sequentially ordered; creating a first vertical protection group that has D data members and P parity members in one partition of W of the drives; creating and distributing W spares at values of drive X, partition Y that satisfy (X+Y)=(W+2); and symmetrically distributing members of additional protection groups with W members on remaining partitions; whereby the spares are distributed such that all protection group members on a failed one of the drives can be relocated to ones of the spares such that no more than one member of any of the protection groups is located on a single one of the drives.
The terminology used in this disclosure is intended to be interpreted broadly within the limits of subject matter eligibility. The terms “disk” and “drive” are used interchangeably herein and are not intended to refer to any specific type of non-volatile storage media. The terms “logical” and “virtual” are used to refer to features that are abstractions of other features, e.g. and without limitation abstractions of tangible features. The term “physical” is used to refer to tangible features that possibly include, but are not limited to, electronic hardware. For example, multiple virtual computers could operate simultaneously on one physical computer. The term “logic,” if used herein, refers to special purpose physical circuit elements, firmware, software, computer instructions that are stored on a non-transitory computer-readable medium and implemented by multi-purpose tangible processors, alone or in any combination. Aspects of the inventive concepts are described as being implemented in a data storage system that includes host servers and a storage array. Such implementations should not be viewed as limiting. Those of ordinary skill in the art will recognize that there are a wide variety of implementations of the inventive concepts in view of the teachings of the present disclosure.
Some aspects, features, and implementations described herein may include machines such as computers, electronic components, optical components, and processes such as computer-implemented procedures and steps. It will be apparent to those of ordinary skill in the art that the computer-implemented procedures and steps may be stored as computer-executable instructions on a non-transitory computer-readable medium. Furthermore, it will be understood by those of ordinary skill in the art that the computer-executable instructions may be executed on a variety of tangible processor devices, i.e. physical hardware. For practical reasons, not every step, device, and component that may be part of a computer or data storage system is described herein. Those of ordinary skill in the art will recognize such steps, devices, and components in view of the teachings of the present disclosure and the knowledge generally available to those of ordinary skill in the art. The corresponding machines and processes are therefore enabled and within the scope of the disclosure.
Data associated with the hosted application instances running on the hosts 103 is maintained on the managed drives 101. The managed drives 101 are not discoverable by the hosts but the storage array creates a logical storage device referred to herein as a production volume 140 that can be discovered and accessed by the hosts. Without limitation, the production volume may also be referred to as a storage object, source device, production device, or production LUN, where the logical unit number (LUN) is a number used to identify logical storage volumes in accordance with the small computer system interface (SCSI) protocol. From the perspective of the hosts 103, the production volume 140 is a single drive having a set of contiguous fixed-size logical block addresses (LBAs) on which data used by the instances of the host application resides. However, the host application data is stored at non-contiguous addresses on various managed drives 101. The compute nodes maintain metadata that maps between the production volume 140 and the managed drives 101 in order to process IOs from the hosts.
-
- a. If (X+Y)<(W+2), then N=(X+Y−2); and
- b. If (X+Y)>(W+2), then N=(X+Y−W−2).
In the illustrated example W=4 and a spare partition S is located at partition index P1 of drive D1 because (X+Y)=(5+1)=(4+2). A member of protection group 2 is located at partition index 3 of drive D1 because (X+Y)=(1+3)=(2+2)=(N+2). The resulting distribution of spare partitions S is along a diagonal of the matrix with adjacent spare partitions located on incrementally decreasing drive numbers and incrementally increasing partitions. Apart from the single partition protection group 4, the protection groups are symmetrically distributed. In contrast, it is typical in previous designs for protection group members to be located on single partitions and all spare capacity to be on a spare drive.
-
- a. If (X+Y)<(W+2), then N=(X+Y−2); and
- b. If (X+Y)>(W+2), then N=(X+Y−W−2). The resulting drive subset is configured for scaling and use of spare capacity.
Specific examples have been presented to provide context and convey inventive concepts. The specific examples are not to be considered as limiting. A wide variety of modifications may be made without departing from the scope of the inventive concepts described herein. Moreover, the features, aspects, and implementations described herein may be combined in any technically possible way. Accordingly, modifications and combinations are within the scope of the following claims.
Claims
1. A method of creating and distributing spare capacity on a scalable drive subset on which protection groups are maintained, comprising:
- creating W=(D+P) partitions that are equal in size and number on (W+1) drives, wherein the partitions are sequentially ordered, and the drives are sequentially ordered;
- creating a first vertical protection group that has D data members and P parity members in one partition of W of the drives;
- creating spares at drive X partition Y that satisfy X+Y=W+2; and
- symmetrically distributing members of additional protection groups with W members on remaining partitions;
- whereby the spares are distributed such that all protection group members on a failed one of the drives can be relocated to ones of the spares such that no more than one member of any of the protection groups is located on a single one of the drives.
2. The method of claim 1 comprising symmetrically distributing members of additional protection groups such that the protection group member at drive X partition index Y belongs to protection group N, where if (X+Y)<(W+2), then N=(X+Y−2).
3. The method of claim 1 comprising symmetrically distributing members of additional protection groups such that the protection group member at drive X partition index Y belongs to protection group N, where if (X+Y)>(W+2), then N=(X+Y−W−2).
4. The method of claim 1 comprising creating the first protection group in a lowest numbered partition of the W lowest numbered drives.
5. The method of claim 1 comprising scaling the drive subset by adding new drives in single drive increments.
6. The method of claim 5 comprising, responsive to addition of a new drive, rotating the first W protection group members in the lowest numbered unrotated partition, excluding members of the vertical protection group, onto the new drive.
7. The method of claim 6 comprising utilizing partitions vacated by rotated protection group members and a rotated spare to create a new protection group.
8. An apparatus, comprising:
- a plurality of non-volatile drives;
- a plurality of interconnected compute nodes that manage access to the drives; and
- a drive manager comprising program code on a non-transitory, computer-readable medium, the drive manager configured to: create W=(D+P) partitions that are equal in size and number on (W+1) drives, wherein the partitions are sequentially ordered, and the drives are sequentially ordered; create a first vertical protection group that has D data members and P parity members in one partition of W of the drives; create spares at drive X partition Y that satisfy X+Y=W+2; and symmetrically distribute members of additional protection groups with W members on remaining partitions; whereby the spares are distributed such that all protection group members on a failed one of the drives can be relocated to ones of the spares such that no more than one member of any of the protection groups is located on a single one of the drives.
9. The apparatus of claim 8 wherein the drive manager distributes members of additional protection groups such that the protection group member at drive X partition index Y belongs to protection group N, where if (X+Y)<(W+2), then N=(X+Y−2).
10. The apparatus of claim 8 wherein the drive manager symmetrically distributes members of additional protection groups such that the protection group member at drive X partition index Y belongs to protection group N, where if (X+Y)>(W+2), then N=(X+Y−W−2).
11. The apparatus of claim 8 wherein the drive manager creates the first protection group in a lowest numbered partition of the W lowest numbered drives.
12. The apparatus of claim 8 wherein the drive subset is scaled by adding new drives in single drive increments and wherein the drive manager is configured to rotate the first W protection group members in the lowest numbered unrotated partition, excluding members of the vertical protection group, onto a new drive.
13. The apparatus of claim 12 wherein the drive manager is configured to utilize partitions vacated by rotated protection group members and a rotated spare to create a new protection group.
14. A non-transitory computer-readable storage medium storing instructions that when executed by a computer cause the computer to perform a method for using a computer system to create and distribute spare capacity on a scalable drive subset on which protection groups are maintained, the method comprising:
- creating W=(D+P) partitions that are equal in size and number on (W+1) drives, wherein the partitions are sequentially ordered, and the drives are sequentially ordered;
- creating a first vertical protection group that has D data members and P parity members in one partition of W of the drives;
- creating spares at drive X partition Y that satisfy X+Y=W+2; and
- symmetrically distributing members of additional protection groups with W members on remaining partitions;
- whereby the spares are distributed such that all protection group members on a failed one of the drives can be relocated to ones of the spares such that no more than one member of any of the protection groups is located on a single one of the drives.
15. The non-transitory computer-readable storage medium of claim 14 wherein the method further comprises symmetrically distributing members of additional protection groups such that the protection group member at drive X partition index Y belongs to protection group N, where if (X+Y)<(W+2), then N=(X+Y−2).
16. The non-transitory computer-readable storage medium of claim 14 wherein the method further comprises symmetrically distributing members of additional protection groups such that the protection group member at drive X partition index Y belongs to protection group N, where if (X+Y)>(W+2), then N=(X+Y−W−2).
17. The non-transitory computer-readable storage medium of claim 14 wherein the method further comprises creating the first protection group in a lowest numbered partition of the W lowest numbered drives.
18. The non-transitory computer-readable storage medium of claim 14 wherein the method further comprises scaling the drive subset by adding new drives in single drive increments.
19. The non-transitory computer-readable storage medium of claim 18 wherein the method further comprises, responsive to addition of a new drive, rotating the first W protection group members in the lowest numbered unrotated partition, excluding members of the vertical protection group, onto the new drive.
20. The non-transitory computer-readable storage medium of claim 19 wherein the method further comprises utilizing partitions vacated by rotated protection group members and a rotated spare to create a new protection group.
Type: Application
Filed: Oct 2, 2020
Publication Date: Apr 7, 2022
Applicant: EMC IP HOLDING COMPANY LLC (Hopkinton, MA)
Inventors: Kuolin Hua (Natick, MA), Kunxiu Gao (Boxborough, MA)
Application Number: 17/061,922