SYSTEMS, METHODS, AND DEVICES FOR FAULT RESILIENT STORAGE
A method of operating a storage device may include determining a fault condition of the storage device, selecting a fault resilient mode based on the fault condition of the storage device, and operating the storage device in the selected fault resilient mode. The selected fault resilient mode may include one of a power cycle mode, a reformat mode, a reduced capacity read-only mode, a reduced capacity mode, a reduced performance mode, a read-only mode, a partial read-only mode, a temporary read-only mode, a temporary partial read-only mode, or a vulnerable mode. The storage device may be configured to perform a namespace capacity management command received from the host. The namespace capacity management command may include a resize subcommand and/or a zero-size namespace subcommand. The storage device may report the selected fault resilient mode to a host.
This application is a continuation of U.S. patent application Ser. No. 17/232,144, filed Apr. 15, 2021, which claims priority to, and the benefit of, U.S. Provisional Patent Application Ser. No. 63/023,243 filed May 11, 2020 which is incorporated by reference; U.S. Provisional Patent Application Ser. No. 63/128,001 filed Dec. 18, 2020 which is incorporated by reference; U.S. Provisional Patent Application Ser. No. 63/051,158 filed Jul. 13, 2020 which is incorporated by reference; U.S. Provisional Patent Application Ser. No. 63/052,854 filed Jul. 16, 2020 which is incorporated by reference; and U.S. Provisional Patent Application Ser. No. 63/057,744 titled filed Jul. 28, 2020 which is incorporated by reference.
TECHNICAL FIELDThis disclosure relates generally to storage, and more specifically to systems, methods, and devices for fault resilient storage.
BACKGROUNDA storage device may encounter a fault condition that may affect the ability of the storage device to operate in a storage system.
The above information disclosed in this Background section is only for enhancement of understanding of the background of the invention and therefore it may contain information that does not constitute prior art.
SUMMARYA method of operating a storage device may include determining a fault condition of the storage device, selecting a fault resilient mode based on the fault condition of the storage device, and operating the storage device in the selected fault resilient mode. The selected fault resilient mode may include a power cycle mode. The selected fault resilient mode may include a reformat mode. The selected fault resilient mode may include a reduced capacity read-only mode. The selected fault resilient mode may include a reduced capacity mode. The selected fault resilient mode may include a reduced performance mode. The selected fault resilient mode may include a read-only mode. The selected fault resilient mode may include a partial read-only mode. The selected fault resilient mode may include a temporary read-only mode. The selected fault resilient mode may include a temporary partial read-only mode. The selected fault resilient mode may include a vulnerable mode. The selected fault resilient mode may include a normal mode. The storage device may be configured to perform a command received from a host. The command may include a namespace capacity management command. The namespace capacity management command may include a resize subcommand. The namespace capacity management command may include a zero-size namespace command.
A storage device may include a storage medium, and a storage controller, wherein the storage controller is configured to determine a fault condition of the storage device, select a fault resilient mode based on the fault condition of the storage device, and operate the storage device in the selected fault resilient mode. The selected resilient mode may include one of a power cycle mode, a reformat mode, a reduced capacity read-only mode, a reduced capacity mode, a reduced performance mode, a read-only mode, a partial read-only mode, a temporary read-only mode, a temporary partial read-only mode, or a vulnerable mode. The storage device may be to perform a namespace capacity management command received from a host.
A system may include a host, and at least one storage device coupled to the host, wherein the storage device is configured to determine a fault condition of the storage device, select a fault resilient mode based on the fault condition of the storage device, operate in the selected fault resilient mode, and report the selected fault resilient mode to the host.
A method of operating a storage array may include determining a first fault resilient operating mode of a first fault resilient storage device of the storage array, determining a second fault resilient operating mode of a second fault resilient storage device of the storage array, allocating one or more rescue spaces of one or more additional fault resilient storage devices of the storage array, mapping user data from the first fault resilient storage device to the one or more rescue spaces, and mapping user data from the second fault resilient storage device to the one or more rescue spaces. The method may further include reassigning at least one device identifier (ID) of the one or more additional fault resilient storage devices to a device ID of the first fault resilient storage device. The at least one device ID of the one or more additional fault resilient storage devices may be reassigned based on a current unaffected device ID and a current faulty device ID. The method may further include redirecting one or more inputs and/or outputs (IQs) from the first fault resilient storage device to the one or more additional fault resilient storage devices. The user data may include a strip of data. The strip of data may be redirected to a target storage device of the one or more additional fault resilient storage devices based on a stripe ID of the user data. Mapping user data from the first fault resilient storage device to the one or more rescue spaces may include maintaining a first mapping table. Mapping user data from the second fault resilient storage device to the one or more rescue spaces may include maintaining a second mapping table. The one or more rescue spaces may have a rescue space percentage ratio of a storage device capacity. The rescue space percentage ratio may be greater than or equal to a number of faded storage devices accommodated by the storage array, divided by the total number of storage devices in the storage array. The one or more rescue spaces may be allocated statically. The one or more rescue spaces may be allocated dynamically.
A system may include a storage array including a first fault resilient storage device, a second fault resilient storage device, one or more additional fault resilient storage devices, and a volume manger configured to: determine a first fault resilient operating mode of the first fault resilient storage device, determine a second fault resilient operating mode of the second fault resilient storage device, allocate one or more rescue spaces of one or more additional fault resilient storage devices of the storage array, map user data from the first fault resilient storage device to the one or more rescue spaces, and map user data from the second fault resilient storage device to the one or more rescue spaces. The volume manger may be further configured to reassign at least one device identifier (ID) of the one or more additional fault resilient storage devices to a device ID of the first fault resilient storage device. The volume manger may be further configured to redirect one or more inputs and/or outputs (IOs) from the first fault resilient storage device to the one or more additional fault resilient storage devices. The user data may include a strip of data, and the volume manger may be further configured to redirect the strip of data to a target storage device of the one or more additional fault resilient storage devices based on a stripe ID of the user data. The one or more rescue spaces have a rescue space percentage ratio of a storage device capacity, and the rescue space percentage ratio may be based on a number of failed storage devices accommodated by the storage array, divided by a total number of storage devices in the storage array.
An apparatus may include a volume manager for a storage array, the volume manager may include logic configured to: determine a first fault resilient operating mode of a first fault resilient storage device of the storage array, determine a second fault resilient operating mode of a second fault resilient storage device of the storage array, allocate one or more rescue spaces of one or more additional fault resilient storage devices of the storage array, map user data from the first fault resilient storage device to the one or more rescue spaces, and map user data from the second fault resilient storage device to the one or more rescue spaces. The user data may include a strip of data, and the strip of data may be redirected to a target storage device of the one or more additional fault resilient storage devices based on a stripe identifier (ID) of the user data. The one or more rescue spaces have a rescue space percentage ratio of a storage device capacity, and the rescue space percentage ratio may be based on a number of failed storage devices accommodated by the storage array, divided by a total number of storage devices in the storage array.
A method of operating a storage array may include allocating a first rescue space of a first fault resilient storage device of the storage array, allocating a second rescue space of a second fault resilient storage device of the storage array, determining a fault resilient operating mode of a third fault resilient storage device of the storage array, and mapping user data from the third fault resilient storage device to the first rescue space and the second rescue space based on determining the fault resilient operating mode. A first block of the user data may be mapped to the first rescue space, and a second block of the user data may be mapped to the second rescue space. The user data may include a strip of data. A first portion of the strip of data may be mapped to the first rescue space, and the first portion of the strip of data may include a number of data blocks based on a size of the strip of data and a size of the data blocks. The number of data blocks may be further based on a total number of storage devices in the storage array. The method may further include reassigning at least one device identifier (ID) of the first fault resilient storage device to a device ID of the third fault resilient storage device. The method may further include redirecting one or more inputs and/or outputs (IOs) from the third fault resilient storage device to the first rescue space and the second rescue space. The first rescue space may have a capacity based on a capacity of the first fault resilient storage device and a total number of storage devices in the storage array. The first rescue space may have a capacity of strips based on a size of the first rescue space and a block size.
A system may include a storage array including a first fault resilient storage device, a second fault resilient storage device, a third fault resilient storage device, and a volume manger configured to allocate a first rescue space of the first fault resilient storage device, allocate a second rescue space of the second fault resilient storage device, determine a fault resilient operating mode of the third fault resilient storage device, and map user data from the third fault resilient storage device to the first rescue space and the second rescue space based on determining the fault resilient operating mode. The volume manager may be further configured to map a first block of the user data to the first rescue space, and map a second block of the user data to the second rescue space. The user data may include a strip of data, and the volume manager may be further configured to map a first portion of the strip of data to the first rescue space. The first portion of the strip of data may include a number of data blocks based on a size of the strip of data and a size of the data blocks. The number of data blocks may be further based on a total number of storage devices in the storage array.
A method of operating a storage array may include determining a first parameter of a first fault resilient storage device of the storage array, determining a second parameter of a second fault resilient storage device of the storage array, and determining a quality-of-service (QoS) of the storage array based on the first parameter and the second parameter. The method may further include adjusting the first parameter based on the QoS. The first parameter may be adjusted automatically based on monitoring the first parameter. The first parameter may be adjusted automatically based on monitoring the second parameter. The first parameter may be adjusted by configuring a component of the storage array. The first parameter may be adjusted by controlling the operation of a component of the storage array. The first parameter may include one of a number of storage devices in the storage array, a number of data blocks in a strip of user data for the first fault resilient storage device, a write method for redirecting data from the first fault resilient storage device to the second fault resilient storage device, a number of faulty storage devices supported by the storage array, or a storage capacity of the first fault resilient storage device.
A system may include a storage array including a first fault resilient storage device, a second fault resilient storage device, and a volume manger configured to determine a first parameter of a first fault resilient storage device, determine a second parameter of a second fault resilient storage device, and determine a quality-of-service (QoS) of the storage array based on the first parameter and the second parameter. The volume manger may be further configured to adjust the first parameter based on the QoS. The volume manger may be further configured to adjust the first parameter automatically based on monitoring the first parameter. The volume manger may be further configured to adjust the first parameter automatically based on monitoring the second parameter.
The figures are not necessarily drawn to scale and elements of similar structures or functions may generally be represented by like reference numerals or portions thereof for illustrative purposes throughout the figures. The figures are only intended to facilitate the description of the various embodiments described herein. The figures do not describe every aspect of the teachings disclosed herein and do not limit the scope of the claims. To prevent the drawing from becoming obscured, not all of the components, connections, and the like may be shown, and not all of the components may have reference numbers. However, patterns of component configurations may be readily apparent from the drawings. The accompanying drawings, together with the specification, illustrate example embodiments in accordance with the disclosure, and, together with the description, serve to explain the principles of the present disclosure.
Some of the principles in accordance with example embodiments of the disclosure relate to storage devices that may continue to operate in one or more fault resilient modes in case of a fault of the storage device. For example, a storage device may continue to operate in a limited manner that may enable a storage system to recover quickly and/or efficiently from the fault of the storage drive.
In some embodiments, a storage device may implement any number of the following fault resilient (FR) modes:
Some embodiments may implement a power cycle mode which may involve self-healing based on power cycling the storage device.
Some embodiments may implement a reformat mode which may involve self-healing based on formatting all or a portion of the storage device.
Some embodiments may implement a reduced capacity read-only mode in which a first portion of the storage space of the storage device may operate normally, and a second portion may operate as read-only storage space.
Some embodiments may implement a reduced capacity mode in which a first portion of the storage space of the storage device may operate normally, and a second portion may not be available for input and/or output (I/O) operations.
Some embodiments may implement a reduced performance mode in which one or more aspects of the performance of the storage device may be reduced.
Some embodiments may implement a read-only mode in which data may be read from, but not written to, the storage device.
Some embodiments may implement a partial read-only mode in which a first portion of the storage space of the storage device may operate as read-only storage space, and a second portion may not be available for normal input and/or output (I/O) operations.
Some embodiments may implement a temporary read-only mode in which data may be read from, but not written to, the storage space of the storage device, which may be temporarily valid, and may become invalid.
Some embodiments may implement a temporary partial read-only mode in which data may be read from, but not written to, a first portion of the storage space of the storage device, which may be temporarily valid, and may become invalid. A second portion may not be available for input and/or output (I/O) operations.
Some embodiments may implement a vulnerable mode in which the storage device may not be available for input and/or output (I/O) operations.
Some embodiments may implement a normal mode in which the storage device may operate normally.
In some embodiments, a storage device may implement one or more commands which may be used by a host to determine and/or manage one or more features of the storage device. For example, in some embodiments, a storage device may implement a namespace capacity management command which may include a resize and/or zero-size subcommand.
The principles disclosed herein have independent utility and may be embodied individually, and not every embodiment may utilize every principle. However, the principles may also be embodied in various combinations, some of which may amplify the benefits of the individual principles in a synergistic manner.
Storage SystemsThe one or more storage devices 110 may be implemented with any type of storage apparatus and associated storage media including solid state drives (SSDs), hard disk drives (HDDs), optical drives, drives based on any type of persistent memory such as cross-gridded nonvolatile memory with bulk resistance change, and/or the like, and/or any combination thereof. Data in each storage device may be arranged as blocks, key-value structures, and/or the like, and/or any combination thereof. Each storage device 110 may have any form factor such as 3.5 inch, 2.5 inch, 1.8 inch, M.2, MO-297, MO-300, Enterprise and Data Center SSD Form Factor (EDSFF) and/or the like, using any connector configuration such as Serial ATA (SATA), Small Computer System Interface (SCSI), Serial Attached SCSI (SAS), U.2, and/or the like, and using any storage interface and/or protocol such as Peripheral Component Interconnect (PCI), PCI express (PCIe), Nonvolatile Memory Express (NVMe), NVMe-over-Fabrics (NVMe-oF), Ethernet, InfiniBand, Fibre Channel, and/or the like. Some embodiments may be implemented entirely or partially with, and/or used in connection with, a server chassis, server rack, dataroom, datacenter, edge datacenter, mobile edge datacenter, and/or any combinations thereof, and/or the like.
Any or all of the host 105, volume manager 115, storage controller 120, and/or any other components disclosed herein may be implemented with hardware, software, or any combination thereof, including combinational logic, sequential logic, one or more timers, counters, registers, state machines, complex programmable logic devices (CPLDs), field programmable gate arrays (FPGAs), application specific integrated circuits (ASICs), complex instruction set computer (CISC) processors and/or reduced instruction set computer (RISC) processors, and/or the like executing instructions stored in volatile memories such as dynamic random access memory (DRAM) and/or static random access memory (SRAM), nonvolatile memory such as flash memory and/or the like, as well as graphics processing units (GPUs), neural processing units (NPUs), and/or the like.
Although the inventive principles are not limited to any particular implementation details, for purposes of illustration, in some embodiments, each storage device 110 may be implemented as an SSD in which the storage media may be implemented, for example, with not AND (NAND) flash memory, and each storage controller 120 may implement any functionality associated with operating the SSD including a flash translation layer (FTL), a storage interface, and any functionality associated with implementing the fault resilient features disclosed herein. The smallest erasable unit in the storage device 110 may be referred to as a “block” and the smallest writeable unit in the storage device 110 may be referred to as a “page”.
The storage media 125 may have a retention period (which may depend on the usage history of the storage media 125, and, as such, may vary within the storage media 125); data that has been stored longer than the retention period (i.e., data having an age exceeding the retention period) may become unreliable and may be said to have expired. Data may be stored in the storage media 125 using an error correcting code, which may be, e.g., a block code. When data is read from the storage media 125, a quantity of raw data, referred to as a code block, may be read from the storage media 125, and an attempt to decode it may be made. If the attempt fails, additional attempts (e.g., read retrials) may be made. With use, a portion, e.g., a block, of the storage media 125 may degrade to a point that the retention period becomes unacceptably short, and the block may be classified as a “bad block”. To avoid allowing this circumstance to render the entire storage media 125 inoperable, reserve space, referred to as “bad block management reserve space” may be present (e.g., included in each flash memory die or in each flash memory plane), and the controller 120, or another controller internal to the flash memory die or to the flash memory plane may begin to use a block in the reserve and cease to use the bad block.
The operations and/or components described with respect to the embodiment illustrated in
Case 1 may include a fault condition in which the storage device 110 may no longer be capable of performing read or write operations, and that may not be resolved by cycling power and/or reformatting the storage media. A state in which the storage device 110 behaves in this manner may have various sub-states, with, e.g., each sub-state corresponding to a different failure mechanism. Such a state, or fault condition (in which the storage device 110 is no longer capable of performing read or write operations, and that may not be resolved by cycling power or reformatting the storage media) may be caused, for example, by a portion of the controller's firmware becoming corrupted (in which case it may be possible for the controller to restart into a safe mode, in which the corrupted instructions may not be executed) or by a failure of a processing circuit in the storage device 110 (e.g., the failure of a processing circuit that manages interactions with the storage media but is not responsible for communications with the host 105). When a fault condition of this type occurs, the storage device 110 may respond to a read or write command from the host 105 with an error message.
Case 2 may include a fault condition (i) in which the storage device 110 may no longer be capable of performing read or write operations and (ii) from which recovery may be possible by cycling the power of the storage device 110, by reformatting the storage media (e.g., nonvolatile memory (NVM)), by re-loading firmware, and/or the like. Such a fault condition may be caused, for example, by a program execution error of the controller 120 of the storage device 110 (e.g., a pointer that is out of range as a result of a bit flip in random-access memory (RAM) of the controller 120, or an instruction that is incorrect, as a result of a bit flip). If the program execution error has not caused the controller 120 to write incorrect data to the storage media 125 (e.g., if the program execution error occurred since the most recent write to storage media by the controller), then power cycling the storage device may be sufficient to restore the storage device 110 to normal operation. If the program execution error has caused the controller 120 to write erroneous data to the storage media 125, then reformatting the storage media 125, and/or re-loading firmware may be sufficient to restore the storage device 110 to normal operation.
Case 3 may include a fault condition that may be mitigated by operating the storage device 110 in a read-only mode, and for which reformatting the storage media 125 may not restore full functionality. Examples of such faults may include (i) a temperature sensor failure, and (ii) a portion of the storage media 125 having transitioned to a read-only mode. In the case of the temperature sensor failure, the failure may be detected by determining that a temperature sensor reading is out of range (e.g., has exceeded a threshold temperature), and in such a case the risk of overheating of the storage device 110 may be reduced by avoiding write operations, which may dissipate more power than read operations. The transitioning to a read-only mode of a portion of the storage media 125 may occur, for example, for flash memory storage media 125, if a flash memory plane or die exhausts a bad block management reserve space used for run time bad block management. For example, the storage device 110 may, while attempting to performing a read operation, make an unsuccessful attempt to decode a data item, determine that the block storing the data is a bad block and upon moving the data from the bad block to the bad block management reserve space, determine that the remaining bad block management reserve space is less than a threshold size and therefore insufficient to ensure the reliability of the plane or die. The storage device 110 may then determine that bad block management is no longer being performed, and transition to a read-only mode. In some embodiments, data item may refer to any quantity of data being processed in one operation, e.g., the data resulting from decoding a code block may be a data item.
Case 4 may include fault condition that may be mitigated by operating the storage device 110 in a write-through mode. For example, if a power supply backup capacitor in the storage device 110 fails, the device may, in response to a write command received from the host, complete the write to the storage media 125 before sending a command completion to the host 105, so that if power fails before the write to the storage media 125 has been completed, the host is not incorrectly informed that the write was completed successfully, Operating in the write-through mode may result in a reduction of performance (e.g., in terms of throughput and/or latency).
Case 5 may include a fault condition that may be mitigated by operating the storage device 110 in a manner that reduces power dissipation. For example, in the case of a temperature sensor failure, the storage device 110 may operate in a read-only mode as mentioned above, or it may reduce the rate at which operations (e.g., write operations, which may dissipate more power than read operations) may be performed, to reduce power dissipation in the storage device 110. For example, the storage device 110 may perform a first write to the storage media, then wait, during an interval corresponding to the reduced performance (the waiting resulting in a decrease in the rate at which write operations are performed), and then perform another (e.g., a second) write to the storage media.
Case 6 may include a fault condition that may be mitigated by operating the storage device 110 in a read-only mode, and for which reformatting the storage media 125 may restore full functionality.
Fault ResiliencyBased on one or more fault conditions such as those exemplified by the cases listed in
In the partially resilient mode, the storage device 110 may operate with lower performance, reduced capacity, or reduced capability, when a fault condition exists. For example, as mentioned above, if a power supply backup capacitor fails, writes may be completed (e.g., command completions may be sent to the host 105) only after data is written to the storage media 125 (i.e., synchronous writes may be performed), slowing the operation of the storage device 110, and reducing its performance. The user data may be preserved in this circumstance. As another example, storage device 110 may operate with reduced capacity if the reserve space for bad block management run time bad block (RTBB) is exhausted. In this circumstance, the affected dies in the storage device 110 may be excluded from the disk space and the overall disk capacity may be reduced. The user data on the lost space may be lost. For example, if a set in IO determinism or a zone in a zoned namespace is no longer capable of accepting new data writes, the set or the zone may be excluded from disk space but the remaining disk space may remain available for read and write operations. The user data on the zone or set may be lost.
The storage device 110 may operate with reduced capability, for example, if a storage device 110 does not allow write operations, and switches to a read-only mode. In some embodiments, the storage device 110 may be capable of operating in two types of read-only mode: a sustainable read-only mode (which may be referred to as a “first read-only mode”), and an unsustainable read-only mode (which may be referred to as a “second read-only mode”). In the sustainable read-only mode, the storage device 110 may continue to serve read requests beyond the retention period of the storage media 125. The unsustainable read-only mode may be employed, for example, when it may not be feasible to operate in the sustainable read-only mode, e.g., when there is insufficient unused storage space to set up a rescue space. When transitioning to the unsustainable read-only mode, the storage device 110 may send to the host 105 a notification that the storage device 110 is operating in the second (unsustainable) read-only mode, and that data items stored in the storage device 110 may be allowed to expire (e.g., at the end of their respective retention periods). In the unsustainable read-only mode, the storage device 110 may continue to serve read requests during the retention period of the storage media 125, and, if the storage device 110 encounters data integrity issues (as detected, for example, by one or more unsuccessful attempts to decode data during read operations), the storage device 110 may report the invalid data region.
A storage device 110 operating in the vulnerable mode may be incapable of performing normal read and/or write operations, and may perform a graceful exit, for example, by continuing to receive commands from the host and returning errors.
Thus, in some embodiments, a storage device having one or more fault resilient features in accordance with example embodiments of the disclosure may extend and/or organize the features so that a host may utilize them systematically, and the device may continue to operate in some capacity despite a fault condition. In some embodiments, for example, if the storage device is used for a RAID (Redundant Array of Independent (or Inexpensive) Drives) or RAIN (Redundant Array of Independent Nodes), and a node fails, the system may recover the data by copying the data from the accessible space of the storage device without calculating the stripe parity.
Logical Block Address Space TypesIn some embodiments, various logical block address (LBA) space types may be implemented by a storage device having fault resiliency features in accordance with example embodiments of the disclosure. These LBA space types may be used, for example, by a storage device such as that illustrated in
Performing (P) space may include LBA space containing valid data, which may be capable of being read and written in a normal manner without sacrificing performance. Data in performing space may be valid.
Underperforming (UP) space may include LBA space containing valid data, which may be capable of being read and written in a normal manner, but with degraded performance (e.g., degraded write performance).
Read-only (RO) space may include LBA space containing valid data, which may be read-only. For example, a storage device may refuse to write data received from a host and/or may respond with error messages to write commands from the host directed to this type of LBA space. The data in read-only space may remain valid for a period of time exceeding the retention period.
Volatile read-only (VRO) space may include read-only space, and the storage device may respond with error messages to write commands from a host directed to this type of LBA space. Data in this type of LBA space may be temporarily valid, and may become invalid when it expires, i.e., when the age of the data in its storage media reaches the retention period of the storage media.
Inaccessible (IA) space may include LBA space containing invalid data, which may not be accessible from the host.
Fault Resilient ModesIn some embodiments, LBA space types may be used, for example, to implement some embodiments of fault resilient modes.
In some embodiments, the modes illustrated in
In some embodiments, a storage device may implement any number of the following fault resilient modes. For example, a device manufacturer may implement different combinations of these and other fault resilient modes in different products.
A power cycle mode (Mode 1) may involve self-healing based on power cycling the storage device. For example, a storage device may experience a fault condition based on one or more flipped bits in memory such as SRAM or DRAM. A flipped bit may be caused, for example, by aging, heating, and/or radiation due to an antenna or high elevations above sea level which may interfere with memory cells. A storage device with a fault resilient power cycle mode may have self-healing capabilities such that power cycling the storage device (e.g., removing then reapplying power) may reset the current state and restore the failed SSD to a normal state. In this case, one or more inflight commands in a submission queue may be lost. Whether the user data of the storage device remains valid may depend on implementation details such as the partitioning of the device, the extent to which different circuits of the storage controller are reset, and/or the like. In some embodiments, in a power cycle mode, the entire storage space of the storage device (100 percent) may operate normally (e.g., as performing (P) space).
A reformat mode (Mode 2) may involve self-healing based on formatting all or a portion of the storage device. In some embodiments, formatting the storage device may reset its current state and restore the failed storage device to its normal state. However, depending on the implementation details (e.g., quick format, full format, partitioning details, and/or the like) all data on the disk may be lost. In some embodiments, in a reformat mode, the entire storage space of the storage device (100 percent) may operate normally (e.g., as performing (P) space).
In a reduced capacity read-only mode (Mode 3), a first portion of the storage space (e.g., X percent) of the storage device may operate normally (e.g., as performing (P) space), and a second portion (e.g., (100-X) percent) may operate as read-only (RO) storage space. Thus, the size of the performance (P) space in the storage device may be reduced, and the storage device may behave like a normal drive with respect to that space, but the read-only (RO) type of space may not be writable. In some embodiments, the storage device may provide a list of LBA ranges for the performance (P) and/or read-only (RO) spaces to a host, for example, in response to a get feature command. If the storage device supports the IO determinism, the LBA range may represent a set. If the storage device supports Zoned Namespaces (ZNS), the LBA range may represent a zone. In some embodiments, the storage device may also provide information about address ranges for sets and/or ZNS in response to a get feature command.
In a reduced capacity mode (Mode 4), a first portion of the storage space (e.g., X percent) of the storage device may operate normally (e.g., as performing (P) space), and a second portion (e.g., (100-X) percent) may be inaccessible (IA). Thus, the size of the performance (P) space in the storage device may be reduced, and the storage device may behave like a normal drive with respect to that space, but inaccessible (IA) space may not be available for normal IOs. For example, if an RTBB is exhausted, the problematic die may be excluded from the disk space, and thus, the overall disk capacity may be reduced. The storage device may provide a list of LBA ranges for the performance (P) and/or inaccessible (IA) type of space. If the storage device supports the IO determinism, the LBA range may represent a set. If the storage device supports ZNS, the LBA range may represent a zone. In some embodiments, the storage device may provide information about the LBA ranges, sets, zones, and/or the like, in response to a get feature command.
In a reduced performance mode (Mode 5) one or more aspects of the performance of the storage device may be reduced. For example, the storage device may perform normal operations, but at reduced throughput and/or latency. In some embodiments, a storage device may include one or more back-up capacitors that, in the event of a loss of the main power supply, may provide power to the storage device for a long enough period of time to enable the storage device to complete a write operation. If one or more of these back-up capacitors fail, the storage device may not notify a host that a write operation is complete until after the data is written to the media. (This may be referred to as a synchronous write operation.) This may reduce the input and/or output operations per second (IOPS) and/or increase latency, thereby reducing the performance of the storage device. Thus, in some embodiments, reduced performance mode may operate with 100 percent underperforming (UP) space. Depending on the implementation details, some or all of the user data may remain valid. In some embodiments, the storage device may provide speculative performance information to a host which may enable the host to make decisions on sending write data to the storage device in a manner that may mitigate the system-level impact of the fault condition.
In a read-only mode (Mode 6), the storage device may only allow read operations and may block external write operations. Depending on the implementation details, data in read-only space may remain valid, for example, after the retention period. Read-only mode may operate with 100 percent read-only (RO) space.
In a partial read-only mode (Mode 7), a first portion of the storage space (e.g., X percent) of the storage device may operate as read-only (RO) space, and a second portion (e.g., (100-X) percent) may be inaccessible (IA) space. Thus, the storage device may only allow read operations and external write operations may be prohibited in the first portion of the storage space. Depending on the implementation details, data in the read-only space may still valid, for example, after the retention period. The storage device may provide a list of LBA ranges for the read-only (RO) and/or inaccessible (IA) types of space. If the storage device supports the 10 determinism, the LBA range may represent a set. If the storage device supports ZNS, the LBA range may represent a zone. In some embodiments, the storage device may provide information about the LBA ranges, sets, zones, and/or the like, in response to a get feature command.
In a temporary read-only mode (Mode 8), data may be read from the storage space of the storage device, which may operate with 100 percent VRO space, but external writes may be prohibited. Data in this space may be temporarily valid but may become invalid after the retention period.
In a temporary partial read-only mode (Mode 9), data may be read from a first portion (e.g., X percent) of the storage space of the storage device, which may operate as VRO space, while external writes may be prohibited. A second portion (e.g., (100-X) percent) may be inaccessible (IA) space. Data in the first portion may be temporarily valid but may become invalid after the retention period. If the storage device supports the IO determinism, the LBA range may represent a set. If the storage device supports ZNS, the LBA range may represent a zone. In some embodiments, the storage device may provide information about the LBA ranges, sets, zones, and/or the like, in response to a get feature command.
In a vulnerable mode (Mode 10), the storage device may not be available for I/O operations. However, it may continue to receive commands from the host and return errors.
In a normal mode (Mode 11), the storage device may operate normally.
CommandsIn some embodiments, a storage device in accordance with example embodiments of the disclosure may implement one or more commands which may be used, for example, by a host to query the storage device and/or manage one or more features of the storage device.
A get feature command, which may include a subcommand as shown in the table illustrated in
A resiliency type subcommand (FR_INFO_RESILIENCY_TYPE) may return a type of fault resiliency in case of a failure. For example, the storage device may indicate which of the fault resilient modes illustrated in
A retention period subcommand (FR_INFO_RETENTION_PERIOD) may return an average retention period of the data without reprogramming the storage media. In some embodiments, this may be the upper-bound of retention time for data in the storage media from the time of the failure. This subcommand may be used, for example, with temporary read-only mode (Mode 8) and/or temporary partial read-only mode (Mode 9).
An earliest expiry subcommand (FR_INFO_EARLIEST_EXPIRY) may return a maximum time remaining for data integrity. In some embodiments, this may be the lower-bound of retention time for data in the storage media from the time of the failure. The unit of time may be determined, for example, based on a patrol period. This subcommand may be used, for example, with temporary read-only mode (Mode 8) and/or temporary partial read-only mode (Mode 9).
An IOPS subcommand (FR_INFO_IOPS) may return a percentage of the maximum available IOPS the storage device may be able to handle based on the fault condition. This subcommand may be used, for example, with reduced performance mode (Mode 5).
A bandwidth subcommand (FR_INFO_BW) may return a percentage of the maximum available bandwidth the storage device may be able to handle based on the fault condition. This subcommand may be used, for example, with reduced performance mode (Mode 5).
A space subcommand (FR_INFO_SPACE) may return an amount of storage space that may be available in the storage device based on the fault condition. This subcommand may be used, for example, with reduced capacity read-only mode (Mode 3) and/or reduced capacity mode (Mode 4).
A namespace capacity management command, which may include a subcommand as shown in the table illustrated in
A resize command (FR_NAMESPACE_RESIZE) may cause the storage device to resize a namespace based on one or more parameters that may be included with the command. In some embodiments, this subcommand may apply to storage device that may support two or more namespaces. In some embodiments, the namespaces may support NVMe resizing.
A zero-sized namespace command (FR_NAMESPACE_ZERO_SIZE) may cause the storage device to reduce the size of a rescue space to zero.
Application Programming InterfaceIn some embodiments, as mentioned above, a storage device in accordance with example embodiments of the disclosure may implement an API to enable a host to query the storage device and/or manage one or more features of the storage device.
A feature command (FAULT_RESILIENT_FEATURE) may return the fault resilient classes and features in each class that the storage device may support.
A status command (FAULT_RESILIENT_STATUS) may return the status of the storage device after a fault resilient recovery is performed.
A volatile blocks command (FAULT_RESILIENT_VOLATILE_BLOCKS (H)) may return a list of LBA ranges that reach to the retention period in the next H hours. In some embodiments, this may be used to determine the blocks that need to be relocated for unsustainable read-only.
An invalid data blocks command (FAULT_RESILIENT_INVALID_DATA_BLOCKS) may return a list of LBA ranges that may become invalid after switching to a fault resilient mode.
Additional EmbodimentsThe operations and/or components described with respect to the embodiment illustrated in
Any number of embodiments and/or variations on the embodiments disclosed herein may also be constructed. A storage controller such as a field programmable gate array (FPGA) or embedded processor may perform internal block checks and send asynchronous updates to the host 105 on the status of the storage device 110. Events may occur and be transmitted to the host 105 (e.g., temperature, or other parameters internal to the device). The host 105 may poll the storage devices 110 on a predetermined schedule, for example, if there is no device driver feature for providing notification. A storage controller may monitor the historical performance of the storage device 110 and use machine learning to provide predictive analytics (e.g., a likelihood of the storage device being in a given fault resilient state). Commands (e.g., NVMe commands) may be implemented and/or expanded, for example, to report the state of the storage device 110).
In some embodiments, the host may: (i) send different data types (e.g., file types such as image, video, text, or high-priority or low-priority data), based on the status of the storage device 110 (for instance, high priority data or real-time data may not be written to a device that is considered in the partially vulnerable mode); (ii) reduce the transmission rate if the storage device 110 is in a partially vulnerable state and in a lower performance state; (iii) send a reduced total amount of data if the storage device 110 is in a partially vulnerable and lower capacity state; (iv) read data at the greatest rate possible, and/or store the data elsewhere, if the storage device 110 is in a partially vulnerable unsustainable read-only mode, so as to avoid exceeding the retention period (in such a circumstance, the host may calculate the needed data rate based on the amount of data to be copied and on the retention period); (v) ignore data “read” from a vulnerable storage device 110 since it may be erroneous, and delete the data as it is received by the host 105; (vi) temporarily reroute read/write input and output to a cache in a fully resilient storage device 110 that is being power cycled and/or formatted, based on messages that control the timing of such events between the host and the storage devices 110. A storage controller on a partially vulnerable storage device that has had a capacity decrease may filter incoming data writes and only write a portion of that data to the storage device 110. In some cases, the filtering may include compression. Such a storage controller may receive various types of data (e.g., file types such as image, video, text, or high-priority or low-priority data) from a host 105 and filter based on the status of the storage device 110. For instance, the storage controller may determine that high priority data may not be written to a storage device 110 that is in the partially vulnerable mode. The storage controller may send a rejection message to the host 105 and give a reason for the rejection. Alternatively, the storage controller may filter out a certain type of data (e.g., image data) for writing to a partially resilient lower-capacity state storage device 110. For example, if a storage device 110 loses performance (e.g., operates at a reduced write rate), latency-sensitive reads and writes may be rejected.
Fault Resilient System with Fault Resilient Storage DevicesIn some embodiments, a RAID-0 system including an array of storage devices 110 and a volume manager 115 may be constructed to accommodate a transition of any of the fault resilient storage devices 110 of the RAID-0 system to a read-only mode. In normal operation, the volume manager 115 may be responsible for striping data across the array of storage devices 110, e.g., writing one strip of each stripe to a respective storage device 110 of the array of storage devices 110. In such a system, when any of the array of storage devices 110 transitions to a read-only mode (indicated as 110A), the RAID-0 system may transition to a second operating mode (which may also be referred to as an emergency mode), and the volume manager 115 for the array of storage devices 110 may (i) allocate a rescue space on each of the remaining, unaffected storage devices 110E (e.g., those that remain in a read-write state) for metadata and rescued user data from faulty storage device 110A, and/or (ii) create and/or maintain a mapping table (which may also be referred to as an emergency mapping table). Rescue space may be pre-allocated statically prior to system operation, dynamically during operation, or in any combination thereof.
The rescue space (which may be indicated as R) on each storage device 110A may be capable of storing n strips, where n=R/(strip size), R=C/M, C may be the capacity of each of the storage devices of the array of storage devices 110, and M may be the total number of storage devices. In some embodiments, the volume manager 115 may be implemented as an independent component, or may be partially or fully integrated into the host, a RAID controller of the RAID-0 system (which may, for example, be housed in a separate enclosure from the host), or in any other configuration. In some embodiments, the volume manager 115 may be implemented, for example, with an FPGA. The RAID-0 system may be self-contained and may virtualize the array of storage devices 110 so that from the perspective of the host the RAID-0 system may appear as a single storage device. In some embodiments, the volume manager may be implemented as a processing circuit (discussed in further detail below) configured (e.g., by suitable software or firmware) to perform the operations described herein as being performed by the volume manager.
When the RAID-0 system is operating in an emergency mode, and a write command is received from the host 105 requesting that data be written to a stripe of the array of storage devices 110, the volume manager 115 may check the emergency mapping table to determine whether the stripe is registered e.g., whether an entry has already been made for the stripe. If no entry has been made yet (e.g., the stripe is not registered, which may also be referred to as open-mapped), the volume manager 115 may create an entry in the emergency mapping table to indicate where a strip, that ordinarily would have been written to the faulty storage device 110A (the storage device that has transitioned to read-only mode), is to be written. If the emergency mapping table already contains an entry for the stripe, then the entry may be used to determine where to write the strip that ordinarily would have been written to the faulty storage device 110A. In either case, the volume manager 115 may then write each strip, as illustrated in
When a read command is received from the host 105 requesting that data of a stripe be read from the array of storage devices 110, the volume manager 115 may check the emergency mapping table to determine whether an entry has been made for the stripe. If no entry has been made, then, as illustrated in
The remapping of strips that ordinarily would have been written to the faulty storage device 110A may be accomplished, for example, as follows. Each storage device 110 of the array of storage devices 110 may have a drive identification number (or “drive ID”), which may be a number between zero and M−1, where M may be the number of storage devices 110 in the array of storage devices 110. The volume manager 115 may reassign the drive identification numbers, e.g., assign to each unaffected storage device 110B of the array of storage devices 110 an alternate drive identification number to be used for performing read or write operations for registered stripes (read operations for unregistered stripes may continue to use the original drive identification numbers). The following formula (Formula A) may be used to generate the alternate drive identification numbers:
The effect of Formula A may be (i) to assign, to each storage device having an identification number less than the original drive identification number of the faulty storage device, the respective original drive identification number, and/or (ii) to assign, to each storage device having an identification number greater than the original drive identification number of the faulty storage device, the respective original drive identification number minus one.
Using the alternate drive numbers, a target drive, to which a strip that ordinarily would have been written to the faulty storage device 110A may be written, may be identified (e.g., on a per stripe basis) using the formula Target Drive ID=sid % (M−1) where Target Drive ID may be the alternate drive identification number of the target drive, sid may be the stripe identifier of the strip that ordinarily may have been written to the faulty storage device 110, and “%” may be the modulo (mod) operator.
The target drive ID (e.g., for a read or write operation) may be implicitly determined by the equation Target Drive ID=Stripe ID % (M−1). For example, if M=4 and Stripe 1 is written, Stripe ID=1, and thus, Target Drive ID=1% 3=1. That is, the target drive may be the storage device 1108 with alternate (New) drive identification number 1 (i.e., previous Drive 2). Within the storage device, the rescue space may be split into strips (which may be referred to as rescue strips, or R-Strips) the size of which may be the same as the strip size. In some embodiments, the emergency mapping table may contain an entry for each strip having the format (Stripe ID, R-Strip ID) in which the first element may be the Stripe ID, and the second element may be the R-strip ID on the target drive. For example, an entry of (1,0) in the emergency mapping table may indicate that Strip (1,1) is mapped to R-Strip (1,0) as shown in
In some embodiments, a RAID-0 system may be constructed to accommodate the failure of multiple (e.g., N) fault resilient storage devices 110. An example embodiment of such a system may in some ways be constructed and operate in a manner similar to the embodiment described above with respect to
In a system that may accommodate N fault resilient storage device failures, M′ may represent the number of unaffected (e.g., live) storage devices such that M′<=M. In some embodiments, the drive IDs of the unaffected storage devices 110B may be reassigned according to the following formula (Formula B):
Using the alternate drive numbers, a target storage device for a write operation may be implicitly identified (e.g., on a per stripe basis) using the formula Target Drive ID=sid % (M′−1) where Target Drive ID may be the alternate (new) drive identification number of the target storage device, and sid may be the stripe identifier of the strip that ordinarily may have been written to the faulty storage device 110A, and which may now be written to the target storage device having the Target Drive ID.
Also, using the formula Target Drive ID=sid % (M′−1), if stripe 1 is written, Stripe ID=1, and thus, Target Drive ID=1% 2=1. That is, the target drive may be the storage device 110E with alternate (New) drive identification number 1 (i.e., previous Drive 2).
In some embodiments, when a first faulty storage device 110A transitions to a read-only mode, the RAID-0 system may transition to an emergency mode in which the volume manager 115 may (i) allocate a rescue space on each of the remaining, unaffected storage devices 110B (if adequate rescue space has not been allocated already, or if insufficient space has been allocated) for metadata and rescued user data from a faulty storage device 110A, and/or (ii) create and/or maintain a first mapping table for the first faulty storage device 110A. The RAID-0 system may then operate in a manner similar to the single device failure embodiment described above.
In some embodiments, if a second faulty storage device 110A transitions to a read-only mode, the RAID-0 system may once again allocate a rescue space on each of the remaining, unaffected storage devices 1108 (if adequate rescue space has not been allocated already, or if insufficient space has been allocated) for metadata and rescued user data from a faulty storage device 110A. In some embodiments, the RAID-0 system may then create and/or maintain a second mapping table for the second faulty storage device 110A. Each of the mapping tables may be designated as the Lth mapping table, where L=1 . . . M′, and the Lth mapping table corresponds to the Lth faulty storage device. In other embodiments, a RAID-0 system may create and/or modify a single mapping table to map data stripes and/or strips of all of the faulty storage devices 110A to the unaffected storage devices 1108, In some embodiments, one or more mapping tables may be stored in a reserved rescue space, for example, before a Disk Data Format (DDF) structure for a RAID configuration.
The RAID-0 system may then reassign drive IDs of the unaffected storage devices 110B, for example, based on Formula B, and proceed to operate with the two faulty storage devices 110A operating in read-only mode.
When a read command is received from the host, the volume manager 115 may check the one or more emergency mapping tables to determine whether an entry has been made for the stripe to be read. If no entry has been made, then the volume manager 115 may read the stripe as it would have, in ordinary operation, reading a strip from each of the storage devices 110, including the two faulty storage devices 110A. If the one or more emergency mapping tables contain an entry for the stripe, then the entry may be used to determine where to read the strip that ordinarily would have been read from one or both of the faulty storage devices 110A.
When a write command is received from the host, the volume manager 115 may check the one or more emergency mapping tables to determine whether an entry has been made for the stripe. If no entry has been made yet (e.g., the stripe is not registered) the volume manager 115 may create an entry in the one or more emergency mapping tables to indicate where the strips that ordinarily would have been written to the faulty storage devices 110A (the storage devices that have transitioned to read-only mode), are to be written. If the one or more emergency mapping tables already contain an entry for the stripe, then the entry may be used to determine where to write the strips that ordinarily would have been written to the faulty storage devices 110A. In either case, the volume manager 115 may then write the strips to the array of storage devices 110, writing the strips that ordinarily would have been written to the faulty (e.g., read-only) storage devices 110A to rescue space in the other storage devices 110B.
Rescue Space Management with Data Block WriteWithin each storage device 110, some or all of the rescue space may be split into rescue blocks (which may be referred to as R-blocks). The size of R-blocks may be set, for example, to the same size as a data block size used generally by the storage device.
In some embodiments, the volume manager 115 may maintain an emergency mapping table in which each entry may simply be a stripe ID to indicate that the stripe has been mapped to the rescue space in the unaffected storage devices 110B. For example, in the embodiment illustrated in
In some embodiments, the portion of the strip from the faulty storage device that may be stored in the rescue space of each storage device (which may be referred to as a chunk) may be equal to the strip size block size (M−1). To accommodate a strip size and block size that may not be evenly divided into the number of unaffected storage devices 110B, the chunk stored in the rescue space of a target storage device 110B that satisfies the formula Target Drive ID<(strip size/block size) mod (M−1) may include an extra block. Thus, for the example illustrated in
In the embodiment illustrated in
In the embodiment illustrated in
In some embodiments, the size of rescue space R in each storage device 110 may be set, for example, to R=C/M, where C may be the capacity of each of the storage devices of the array of storage devices 110, and M may be the total number of storage devices. The rescue space R in each storage device 110 may be capable of storing n blocks, where n=R/(block size).
When a read command is received from the host, the volume manager 115 may check the emergency mapping table to determine whether an entry has been made for the stripe of the strip to be read. If no entry has been made, then the volume manager 115 may read the stripe as it would have in ordinary operation, reading a strip from each of the storage devices 110, including the faulty storage device 110A. If the emergency mapping table contains an entry for the stripe, the chunks of the strip corresponding to the faulty storage device 110A (in this example, Drive 1) may be read from the rescue space of the unaffected storage devices 110E (in this example, the storage devices with new Drive IDs 0, 1, and 2) and reassembled into Strip (1,1).
When a write command is received from the host, the volume manager 115 may check the emergency mapping table to determine whether an entry has been made for the stripe of the strip to be written. If no entry has been made yet (e.g., the stripe is not registered) the volume manager 115 may create an entry in the emergency mapping table to indicate that chunks of the strip that ordinarily would have been written to the faulty storage device 110A (the storage device that has transitioned to read-only mode), are to be written to the unaffected storage devices 110B. If the emergency mapping table already contains an entry for the stripe, then the entry may be used to determine that chunks of the strip that ordinarily would have been written to the faulty storage device 110A (the storage device that has transitioned to read-only mode), are to be written to the unaffected storage devices 110B. In either case, the volume manager 115 may then write the chunks of the strip originally intended for Drive 1, to the rescue spaces of the unaffected storage devices 110B as illustrated in
In some embodiments, a fault resilient (FR) storage system such as an FR-RAID-0 system may implement one or more quality-of-service (QoS) management features in accordance with example embodiments of the disclosure. For example, a user and/or volume manager may adjust the size of strips in a RAID striping configuration, and/or the writing technique used to write data to in a rescue space on one or more storage devices in the RAID configuration, to provide a specific QoS level.
The QoS manager 802 may include QoS logic 808 that may receive, utilize, control, configure, direct, notify, and/or the like, any number of parameters relating to QoS such as the number of storage devices 811A in the storage array 804, the number of data blocks in a strip 811B, one or more write methods 811C used in a rescue space, the number of faulty storage devices 811D that may be accommodated by the storage array 804, the capacity or capacities 811E of storage devices used in the storage array 804, and/or the like.
For example, in some embodiments, a QoS metric such as performance may be influenced by the parameters 811A-811E in any number of the following manners. Increasing the number of storage devices 811A in the storage array 804 may increase performance, for example, in terms of storage capacity, latency, throughput, and/or the like. The number of data blocks 811B in a strip may be tuned based on the type of anticipated storage transactions. For example, using larger data blocks may provide greater throughput with larger, less frequent transactions, whereas smaller data blocks may provide a greater number of input and/or output operations per second (IOPS) with small, more frequent transactions. The write method 811C may also be tuned, for example, because writing data blocks to rescue spaces on multiple storage devices may take less time than writing a strip to the rescue space of a single storage device. Increasing the number of faulty storage devices 811D that may be accommodated by the storage array 804 may reduce performance, for example, because accommodating more faulty devices may involve more allocating a greater percentage of storage device capacity
The QoS manager 802 may operate automatically, manually, or in any combination thereof. The QoS manager 802 may operate automatically, for example, in response to monitoring one or more parameters 812 from the storage array 804. The QoS manager 802 may operate manually, for example, in response to one or more parameters 814 received through a user interface 816. Additionally, the QoS manager 802 may provide one or more outputs 818 through the user interface 816 that may instruct a user to take one or more specific actions, for example, to add and/or remove one or more storage devices 810.
In some embodiments, given system requirements from a user, the QoS manager 802 may determine one or more parameters based on storage performance information. For example, a user may specify that the storage array 804 may operate as an RF-RAID-0 configuration that may accommodate one storage device failure with 500K IOPS for 32K blocks and a total storage capacity of BTB. Based on these inputs, the QoS manager 802 may determine the following parameters to arrive at a number of storage devices that may be used to provide the specified performance:
Storage device capacity: 1 TB;
4K write IOPS per storage device: 400K;
32K write IOPS per storage device; 200K; and
RAID strip size: 32K.
Solving for capacity: (1−1/M)*2(M−1)>=8, M{circumflex over ( )}2−6M+1>0, (M−3){circumflex over ( )}2>8, and thus, M=6.
Solving for performance: 200 K*(M−1)/2>=500 K, and thus, M=6.
Therefore, six storage devices may be used to provide the specified performance.
In some embodiments, the QoS manager 802 and/or QoS logic 808 may be implemented with hardware, software, or any combination thereof, including combinational logic, sequential logic, one or more timers, counters, registers, state machines, CPLDs, FPGAs, ASICs, CISC processors and/or RISC processors, and/or the like executing instructions stored in volatile memories such as DRAM and/or SRAM, nonvolatile memory such as flash memory and/or the like, as well as GPUs, NPUs, and/or the like. The QoS manager 802 and/or QoS logic 808 may be implemented as one or more separate components, integrated with one or more other components such as the volume manager 815, a host, and/or any combination thereof.
The operations and/or components described with respect to the embodiments illustrated in
The embodiments described above have been described in the context of various implementation details, but the principles of this disclosure are not limited to these or any other specific details. For example, some storage arrays have been described in the context of systems in which the capacity and/or size of storage devices and/or rescue spaces may be the same for each storage devices, but different capacity and/or size of storage devices and/or rescue spaces may be used. As another example, some embodiments have been described in the context of RAID system such as RAID-0, but the principles may also be applied to any other type of storage arrays.
As another example, some functionality has been described as being implemented by certain components, but in other embodiments, the functionality may be distributed between different systems and components in different locations and having various user interfaces. Certain embodiments have been described as having specific processes, operations, etc., but these terms also encompass embodiments in which a specific process, step, etc. may be implemented with multiple processes, operations, etc., or in which multiple processes, operations, etc. may be integrated into a single process, step, etc. A reference to a component or element may refer to only a portion of the component or element. For example, a reference to an integrated circuit may refer to all or only a portion of the integrated circuit, and a reference to a block may refer to the entire block or one or more subblocks. The use of terms such as “first” and “second” in this disclosure and the claims may only be for purposes of distinguishing the things they modify and may not indicate any spatial or temporal order unless apparent otherwise from context. In some embodiments, based on” may refer to “based at least in part on.” In some embodiments, “disabled” may refer to “disabled at least in part,” A reference to a first element may not imply the existence of a second element. Various organizational aids such as section headings and the like may be provided as a convenience, but the subject matter arranged according to these aids and the principles of this disclosure are not defined or limited by these organizational aids.
The various details and embodiments described above may be combined to produce additional embodiments according to the inventive principles of this patent disclosure, Since the inventive principles of this patent disclosure may be modified in arrangement and detail without departing from the inventive concepts, such changes and modifications are considered to fall within the scope of the following claims.
Claims
1. A method of operating a storage array, the method comprising:
- determining a first fault resilient operating mode of a first fault resilient storage device of the storage array;
- determining a second fault resilient operating mode of a second fault resilient storage device of the storage array;
- allocating one or more rescue spaces of one or more additional fault resilient storage devices of the storage array;
- mapping user data from the first fault resilient storage device to the one or more rescue spaces; and
- mapping user data from the second fault resilient storage device to the one or more rescue spaces.
2. The method of claim 1, further comprising reassigning at least one device identifier (ID) of the one or more additional fault resilient storage devices to a device ID of the first fault resilient storage device.
3. The method of claim 2, wherein the at least one device ID of the one or more additional fault resilient storage devices is reassigned based on a current unaffected device ID and a current faulty device ID.
4. The method of claim 1, further comprising redirecting one or more inputs and/or outputs (ID) from the first fault resilient storage device to the one or more additional fault resilient storage devices.
5. The method of claim 1, wherein the user data comprises a strip of data.
6. The method of claim 5, wherein the strip of data is redirected to a target storage device of the one or more additional fault resilient storage devices based on a stripe ID of the user data.
7. The method of claim 1, wherein mapping user data from the first fault resilient storage device to the one or more rescue spaces comprises maintaining a first mapping table.
8. The method of claim 7, wherein mapping user data from the second fault resilient storage device to the one or more rescue spaces comprises maintaining a second mapping table.
9. The method of claim 1, wherein the one or more rescue spaces have a rescue space percentage ratio of a storage device capacity.
10. The method of claim 9, wherein the rescue space percentage ratio is greater than or equal to a number of failed storage devices accommodated by the storage array, divided by the total number of storage devices in the storage array.
11. The method of claim 1, wherein the one or more rescue spaces are allocated statically.
12. The method of claim 1, wherein the one or more rescue spaces are allocated dynamically.
13. A system comprising a storage array comprising:
- a first fault resilient storage device;
- a second fault resilient storage device;
- one or more additional fault resilient storage devices; and
- a volume manger configured to: determine a first fault resilient operating mode of the first fault resilient storage device; determine a second fault resilient operating mode of the second fault resilient storage device; allocate one or more rescue spaces of one or more additional fault resilient storage devices of the storage array; map user data from the first fault resilient storage device to the one or more rescue spaces; and map user data from the second fault resilient storage device to the one or more rescue spaces.
14. The system of claim 13, wherein the volume manger is further configured to reassign at least one device identifier (ID) of the one or more additional fault resilient storage devices to a device ID of the first fault resilient storage device.
15. The system of claim 13, wherein the volume manger is further configured to redirect one or more inputs and/or outputs (IOs) from the first fault resilient storage device to the one or more additional fault resilient storage devices.
16. The system of claim 13, wherein:
- the user data comprises a strip of data; and
- the volume manger is further configured to redirect the strip of data to a target storage device of the one or more additional fault resilient storage devices based on a stripe ID of the user data.
17. The system of claim 13, wherein:
- the one or more rescue spaces have a rescue space percentage ratio of a storage device capacity; and
- the rescue space percentage ratio is based on a number of failed storage devices accommodated by the storage array, divided by a total number of storage devices in the storage array.
18. An apparatus comprising a volume manager for a storage array, the volume manager comprising logic configured to:
- determine a first fault resilient operating mode of a first fault resilient storage device of the storage array;
- determine a second fault resilient operating mode of a second fault resilient storage device of the storage array;
- allocate one or more rescue spaces of one or more additional fault resilient storage devices of the storage array;
- map user data from the first fault resilient storage device to the one or more rescue spaces; and
- map user data from the second fault resilient storage device to the one or more rescue spaces.
19. The apparatus of claim 18, wherein:
- the user data comprises a strip of data; and
- the strip of data is redirected to a target storage device of the one or more additional fault resilient storage devices based on a stripe identifier (ID) of the user data.
20. The apparatus of claim 18, wherein:
- the one or more rescue spaces have a rescue space percentage ratio of a storage device capacity; and
- the rescue space percentage ratio is based on a number of failed storage devices accommodated by the storage array, divided by a total number of storage devices in the storage array.
Type: Application
Filed: May 10, 2022
Publication Date: Aug 25, 2022
Inventors: Yang Seok KI (Palo Alto, CA), Sungwook RYU (Palo Alto, CA), Alain TRAN (Hwasung-si), Changho CHOI (San Jose, CA)
Application Number: 17/741,440