ALLOCATION OF OVERPROVISIONED BLOCKS FOR MINIMIZING WRITE AMPLIFICATION IN SOLID STATE DRIVES
Systems and methods for allocation of overprovisioned blocks for minimizing write amplification in solid state drives are disclosed. An example system comprises: a plurality of memory devices and a controller operatively coupled to the memory devices, the controller configured to: determine a first value of a data stream attribute associated with a first data stream and a second value of the data stream attribute associated with a second data stream; determine, based on the first value and the second value, a first overprovisioning factor associated with the first data stream and a second overprovisioning factor associated with the second data stream; and allocate, based on the first overprovisioning factor and the second overprovisioning factor, a first plurality of overprovisioned blocks to the first data stream and a second plurality of overprovisioned blocks to the second data stream.
The present disclosure generally relates to storage systems, and more specifically, relates to allocation of overprovisioned blocks for minimizing write amplification in solid state drives.
BACKGROUNDA storage device, such as a solid-state drive (SSD), may include one or more volatile and non-volatile memory devices. The SSD may further include a controller that may manage allocation of data on the memory devices and provide an interface between the memory devices and the host computer system.
The present disclosure will be understood more fully from the detailed description given below and from the accompanying drawings of various implementations of the disclosure.
Aspects of the present disclosure are directed to methods of allocation of overprovisioned blocks for minimizing write amplification in solid state drives.
An example storage system may be represented by a solid state drive (SSD) including multiple memory devices having various storage media types, including negative-and (NAND) flash memory (utilizing single-level cell (SLC), triple-level cell (TLC), and/or quad-level cell (QLC) blocks). A NAND flash memory device may store multiple memory blocks, such that each block includes a fixed number memory pages (e.g., 64 pages of 4 kilobytes). “Overprovisioning” herein shall refer to allocating a number of physical memory blocks which exceeds the logical capacity presented as available memory to the host.
A page is an elementary unit for read or write operations. However, a page should be erased before it can be written to, and a memory erase operation may only be performed on a whole block, even if a single page of data needs to be erased. This difference in the granularity of write and erase operations leads to the phenomenon referred to as “write amplification,” which manifests itself by the amount of physical data to be written to the storage media being a multiple of the logical amount of data manipulated by the host.
For improving the memory device performance, the SSD controller may implement the relocate-on-write strategy, according to which a new data item replacing or modifying an existing data is written to a new physical location, thus invalidating the old physical location holding the data being overwritten. However, while improving the memory device performance, the relocate-on-write strategy necessitates reclaiming the invalidated memory locations, which is also referred to as “garbage collection.” The garbage collection may result in additional read and write operations (as valid memory pages from a block being reclaimed would need to be moved to another block), thus contributing to the write amplification. Therefore, the garbage collection strategy implemented by the device controller may be directed to minimizing the write amplification. In certain implementations, the garbage collection strategy may be further directed at optimizing the wear leveling, by yielding a uniform distribution of programming and erasing cycles across the storage media.
In certain implementations, the garbage collection process may select a block having the minimum, among all blocks, number of valid pages. Valid memory pages of the selected “victim” block may be relocated to another block, and the whole victim block may be erased. However, selecting the block having the minimum number of valid pages may not always yield the optimal efficiency (e.g., measured by the write amplification), since the data invalidation rate may differ significantly among blocks allocated to different data streams.
“Data stream” herein shall refer to a group of data items sharing one or more data attributes, including attributes that reflect expected or actual media usage patterns. Examples of such attributes include the data retention time or the data invalidation rate (also referred to as the “stream temperature,” such that a “cold” stream mostly includes data items having a relatively large retention time (and, consecutively, a relatively low data invalidation rate), while a “hot” stream mostly includes data items having a relatively small retention time (and, consecutively, a relatively high data invalidation rate)).
In an illustrative example, the data invalidation rate of data items of a cold data stream may be significantly lower than the data invalidation rate of data items of a hot data stream. In a hypothetical example, if each of the two data streams has a block with the lowest count of valid pages among all blocks, the garbage collection process may base the decision of the two blocks to choose on the respective data stream temperatures. As the data invalidation rate is higher for the hot data stream, the likelihood of a block of the hot data stream becoming free (due to eventual invalidation of the remaining few valid pages) within a certain period of time is higher than the likelihood of a block of the cold data stream becoming free within the same period of time. Accordingly, an efficient garbage collection strategy choosing among two blocks having the same or similar number of valid pages would choose the victim block from the cold data stream.
The garbage collection process may be initiated responsive to determining that the number of available physical blocks (including the overprovisioned blocks) allocated to a particular data stream falls below a pre-defined threshold. Therefore, increasing the number of overprovisioned blocks effectively increases the number of invalid pages which may exist before the garbage collection needs to be performed. Accordingly, a uniform distribution of valid page counts (also referred to as valid translation unit (TU) counts, or VTCs) across all media blocks may be achieved by allocating overprovisioned blocks on a per-stream basis, thus yielding consistent write amplification by all currently active data streams. In an illustrative example, the storage device controller may determine overprovisioning factors associated with each of two or more data streams that would minimize the differences between the expected write amplification factors associated with the data streams, and may allocate overprovisioned blocks to the data streams in accordance with calculated overprovisioning factors. “Overprovisioning factor” herein shall refer to the ratio of the number of physical memory blocks to the number of logical memory blocks presented as available memory to the host.
Thus, aspects of the present disclosure represent significant improvements over various common implementations of storage systems, by allocating overprovisioned blocks to data stream based on data attributes (such as the expected retention time or data invalidation rate), in order to optimize performance, endurance, and/or other operational aspects of the storage media. Various aspects of the above referenced methods and systems are described in details herein below by way of examples, rather than by way of limitation.
As shown in
The controller 111 may communicate with the memory devices 112A-112N to perform operations such as reading data, writing data, or erasing data at the memory devices 112A-112N and other such operations. Furthermore, the controller 111 may include hardware such as one or more integrated circuits and/or discrete components, a processing device, a buffer memory, software such as firmware or other instructions, or a combination thereof. In general, the controller 111 may receive commands or operations from the host system 120 and may convert the commands or operations into instructions or appropriate commands to achieve the desired access to the memory devices 112A-112N. The controller 111 may further implement wear leveling, garbage collection, error detection and error-correcting code (ECC), encryption, caching, and address translations between logical block addresses (LBAs) and physical block addresses (PBAs). The controller 111 may further include host interface circuitry to communicate with the host system 120 via the physical host interface. The host interface circuitry may convert the commands received from the host system into command instructions to access the memory devices 112A-112N as well as convert responses associated with the memory devices 112A-112N into information for the host system 120. In certain implementations, the controller 111 may be responsible for moving the data that is stored on the volatile memory devices to non-volatile memory devices (e.g., responsive to detecting a power failure or other pre-defined event), in order to provide persistent storage of all data written to the storage system 110. Responsive to detecting a symmetric pre-defined event (e.g., the storage system power-up), the controller 110 may move the data back to non-volatile memory devices.
In order to implement the systems and methods of the present disclosure, the controller 111 may include a block allocation functional component 115 that may be employed to allocate physical blocks (including overprovisioned blocks) and maintain mappings of logical block addresses (LBAs) to physical block addresses (PBAs) referencing memory blocks residing on memory devices 112A-112N. It should be noted that the component designation is of a purely functional nature, i.e., the functions of the block allocation component 115 may be implemented by one or more hardware components and/or firmware modules of the controller 111, as described in more detail herein below. Furthermore, the storage system 110 may include additional circuitry or components that are omitted from
Conceptually, the entire memory space of the memory devices 112A-112N may be represented by a set of blocks, such that each of block comprises a fixed number (e.g., 64) of memory pages (e.g., 4 kilobyte pages). User data pages, addressable by LBAs may be written to free memory pages, addressable by PBAs. The controller 111 may maintain a memory data structure comprising a plurality of LBA-PBA mappings. The storage driver 230 may expose, to the applications running on the host system 120, the block storage model, which may implement “read” and “write” command for storing and retrieving blocks of data identified by LB As.
As noted herein above, the controller 111 may implement the relocate-on-write strategy, according to which a new data item replacing or modifying an existing data is written to a new physical location, thus invalidating the old physical location holding the data being overwritten. Accordingly, when a user data page addressed by an LBA is updated, a free physical memory page on one of memory devices 112A-112N is allocated, the updated user data is stored to the newly allocated physical memory page, and the corresponding LBA-PBA mapping is updated accordingly: the LBA is mapped to the PBA of the newly allocated memory page, and the old PBA (referencing the physical memory page storing the user data before the update) is marked as invalidated physical memory page. In certain implementations, the controller 111 may allocate sequential PBAs in stripes that span several memory devices 112, thus implementing data redundancy and error correction methods similar to those implemented by Redundant Array of Independent Disks (RAID).
As schematically illustrated by
W=((1+ρ)/ρ)/2,
where ρ is the overprovisioning factor.
However, real life workloads may differ significantly by their data invalidation rates. Therefore, selecting the victim block having the minimum number of valid TUs may not always yield the optimal efficiency (e.g., measured by the write amplification). The present disclosure provides a method of allocating overprovisioned blocks on a per-stream basis, thus yielding consistent write amplification by all currently active data streams.
Referring again to
As noted herein above, selecting for garbage collection the victim block having the minimum number of valid TUs may not always yield the optimal efficiency (e.g., measured by the write amplification), since the data invalidation rate may differ significantly among blocks allocated to different data streams. The invalidation rate may be estimated as follows:
I=(1−((1+OP)/TUph)){circumflex over ( )}TUblock*Nbl,
where I is the invalidation rate,
OP is the overprovisioning factor,
TUph is the number of physical translation units,
TUblock is the number of translation units per block, and
Nbl is the input workload (in blocks).
The write amplification may be defined as follows:
WA=1(1−I),
where WA is the write amplification, and
I is the invalidation rate.
Therefore, the differences in VTC distributions in the simulated scenarios depicted by graphs 420-430 would necessarily manifest themselves in noticeably different write amplification factors. Conversely, equal or similar data invalidation rates would produce equal or similar write amplification factors. Accordingly, a uniform distribution of VTC across all media blocks may be achieved by allocating overprovisioned blocks on a per-stream basis, thus yielding consistent write amplification by all currently active data streams. In an illustrative example, the controller may determine overprovisioning factors associated with each of two or more data streams that would minimize the differences between the write amplification factors associated with the data streams:
OP=argmin Σ|WAi(OPi)−WAi(OPj)|,i=1, . . . ,N,j=i+1, . . . ,N,
where OP=(OP1, OP2, . . . , OPN) is the vector of overprovisioning factors for the data streams,
N is the number of the data streams, and
WAi is the expected write amplification factor of the data blocks storing the data items of the i-th data stream.
The controller may then allocate overprovisioned blocks to the data streams in accordance with calculated overprovisioning factors. Allocation of the overprovisioned data blocks may be reflected in a memory data structure comprising a plurality of records, such that each record would map a data stream identifier to the identifiers of overprovisioned blocks allocated to the identified data stream.
As noted herein above, increasing the number of overprovisioned blocks effectively increases the number of invalid pages which may exist before the garbage collection needs to be performed. Taking into account that the data invalidation rate of data items of a cold data stream may be significantly lower than the data invalidation rate of data items of a hot data stream, the overprovisioning factor associated with a given data stream may be chosen based on the data stream temperature, such that the overprovisioning factor of a “cold” data stream would exceed the overprovisioning factor of a hot data stream, in order to yield a uniform distribution of VTC across all data streams.
As shown in
At block 620, the processing logic may calculate, based on the data stream attribute values, overprovisioning factors to be associated with each data stream. In certain implementations, the processing logic may determine overprovisioning factors that would minimize the differences between the expected write amplification factors associated with the data streams. In an illustrative example, the overprovisioning factor of a “cold” data stream may be chosen to exceed the overprovisioning factor of a hot data stream, in order to yield a uniform distribution of VTC across all data streams, as described in more detail herein above.
At block 630, the processing logic may allocate overprovisioned blocks to each data stream based on the calculated overprovisioning factors. Allocation of the overprovisioned data blocks may be reflected in a memory data structure comprising a plurality of records, such that each record would map a data stream identifier to identifiers of physical blocks, including the overprovisioned blocks, allocated to the identified data stream, as described in more detail herein above.
At block 640, the processing logic may receive write command specifying a data item associated with one of the data streams.
At block 650, the processing logic may identify, based on the data structure reflecting the block allocation to the data streams, a physical block for storing the received data item.
At block 660, the processing logic may transmit, to the memory device holding the identified physical block, an instruction specifying the data item to be stored, as described in more detail herein above.
Therefore, as shown by the foregoing description, aspects of the present disclosure represent significant improvements over various common implementations of storage systems, by allocating overprovisioned blocks to data stream based on data attributes (such as expected retention time or data invalidation rate), in order to optimize performance, endurance, and/or other operational aspects of the storage media.
The example computer system 700 includes a processing device 702, a main memory 704 (e.g., read-only memory (ROM), flash memory, dynamic random access memory (DRAM) such as synchronous DRAM (SDRAM) or Rambus DRAM (RDRAM), etc.), a static memory 706 (e.g., flash memory, static random access memory (SRAM), etc.), and a data storage system 718, which communicate with each other via a bus 730. Processing device 702 represents one or more general-purpose processing devices such as a microprocessor, a central processing unit, or the like. More particularly, the processing device may be a complex instruction set computing (CISC) microprocessor, reduced instruction set computing (RISC) microprocessor, very long instruction word (VLIW) microprocessor, or a processor implementing other instruction sets, or processors implementing a combination of instruction sets. Processing device 702 may also be one or more special-purpose processing devices such as an application specific integrated circuit (ASIC), a field programmable gate array (FPGA), a digital signal processor (DSP), network processor, or the like. The processing device 702 is configured to execute instructions 726 for performing the operations and steps discussed herein. The computer system 700 may further include a network interface device 708 to communicate over the network 720.
The data storage system 718 may include a machine-readable storage medium 724 (also known as a computer-readable medium) on which is stored one or more sets of instructions or software 726 embodying any one or more of the methodologies or functions described herein. The instructions 726 may also reside, completely or at least partially, within the main memory 704 and/or within the processing device 702 during execution thereof by the computer system 700, the main memory 704 and the processing device 702 also constituting machine-readable storage media. The machine-readable storage medium 724, data storage system 718, and/or main memory 704 may correspond to the storage system 110 of
In one implementation, the instructions 726 include instructions to implement functionality corresponding to a parity data handler (e.g., parity data handler 113 of
Some portions of the preceding detailed descriptions have been presented in terms of algorithms and symbolic representations of operations on data bits within a computer memory. These algorithmic descriptions and representations are the ways used by those skilled in the data processing arts to most effectively convey the substance of their work to others skilled in the art. An algorithm is here, and generally, conceived to be a self-consistent sequence of operations leading to a desired result. The operations are those requiring physical manipulations of physical quantities. Usually, though not necessarily, these quantities take the form of electrical or magnetic signals capable of being stored, combined, compared, and otherwise manipulated. It has proven convenient at times, principally for reasons of common usage, to refer to these signals as bits, values, elements, symbols, characters, terms, numbers, or the like.
It should be borne in mind, however, that all of these and similar terms are to be associated with the appropriate physical quantities and are merely convenient labels applied to these quantities. The present disclosure may refer to the action and processes of a computer system, or similar electronic computing device, that manipulates and transforms data represented as physical (electronic) quantities within the computer system's registers and memories into other data similarly represented as physical quantities within the computer system memories or registers or other such information storage systems.
The present disclosure also relates to an apparatus for performing the operations herein. This apparatus may be specially constructed for the intended purposes, or it may include a general purpose computer selectively activated or reconfigured by a computer program stored in the computer. Such a computer program may be stored in a computer readable storage medium, such as, but not limited to, any type of disk including floppy disks, optical disks, CD-ROMs, and magnetic-optical disks, read-only memories (ROMs), random access memories (RAMs), EPROMs, EEPROMs, magnetic or optical cards, or any type of media suitable for storing electronic instructions, each coupled to a computer system bus.
The algorithms and displays presented herein are not inherently related to any particular computer or other apparatus. Various general purpose systems may be used with programs in accordance with the teachings herein, or it may prove convenient to construct a more specialized apparatus to perform the method. The structure for a variety of these systems will appear as set forth in the description below. In addition, the present disclosure is not described with reference to any particular programming language. It will be appreciated that a variety of programming languages may be used to implement the teachings of the disclosure as described herein.
The present disclosure may be provided as a computer program product, or software, that may include a machine-readable medium having stored thereon instructions, which may be used to program a computer system (or other electronic devices) to perform a process according to the present disclosure. A machine-readable medium includes any mechanism for storing information in a form readable by a machine (e.g., a computer). For example, a machine-readable (e.g., computer-readable) medium includes a machine (e.g., a computer) readable storage medium such as a read only memory (“ROM”), random access memory (“RAM”), magnetic disk storage media, optical storage media, flash memory devices, etc.
In the foregoing specification, implementations of the disclosure have been described with reference to specific example implementations thereof. It will be evident that various modifications may be made thereto without departing from the broader spirit and scope of implementations of the disclosure as set forth in the following claims. The specification and drawings are, accordingly, to be regarded in an illustrative sense rather than a restrictive sense.
Claims
1. A storage system, comprising:
- a plurality of memory devices;
- a controller operatively coupled to the memory devices, the controller to: determine a first value of a data stream attribute associated with a first data stream, wherein the first value of the data stream attribute reflects an expected data invalidation rate of the first data stream; determine a second value of the data stream attribute associated with a second data stream; determine, based on the first value and the second value, a first overprovisioning factor associated with the first data stream and a second overprovisioning factor associated with the second data stream; and allocate, based on the first overprovisioning factor and the second overprovisioning factor, a first plurality of overprovisioned blocks to the first data stream and a second plurality of overprovisioned blocks to the second data stream.
2. The storage system of claim 1, wherein at least one memory device of the plurality of memory devices is provided by a negative-and (NAND) flash memory device.
3. The storage system of claim 1, wherein the first overprovisioning factor and the second overprovisioning factor are calculated to minimize a difference between a first write amplification factor associated with the first data stream and a second write amplification factor associated with the second data stream.
4. The storage system of claim 1, wherein the first overprovisioning factor is represented by a ratio of a number of physical memory blocks to a number of logical memory blocks presented as available memory to a host in communication with the storage system.
5. (canceled)
6. The storage system of claim 1, wherein the controller is further to:
- maintain a memory data structure associating a plurality of physical blocks with the first data stream, wherein the plurality of physical blocks includes a plurality of overprovisioned blocks allocated based on the first overprovisioning factor.
7. The storage system of claim 6, wherein the controller is further to:
- receive a data item associated with the first data stream;
- identify, using the memory data structure, a physical block for storing a data stream; and
- transmit, to a memory device holding the identified physical block, an instruction specifying the data item.
8. The system of claim 1, wherein an expected data invalidation rate of the first data stream exceeds an expected data invalidation rate of the second data stream, and wherein the first overprovisioning factor is less than the second overprovisioning factor.
9. The storage system of claim 1, wherein the controller is further to:
- identify, among a plurality of physical blocks of the first data stream, a first physical block having a minimum number of valid pages;
- copy valid pages of the first physical blocks to a second physical block; and
- erase the first physical block.
10. A method, comprising:
- determining, by a storage system controller, a first value of a data stream attribute associated with a first data stream and a second value of the data stream attribute associated with a second data stream;
- determining, based on the first value and the second value, a first overprovisioning factor associated with the first data stream and a second overprovisioning factor associated with the second data stream, wherein the first overprovisioning factor and the second overprovisioning factor are calculated to minimize a difference between a first write amplification factor associated with the first data stream and a second write amplification factor associated with the second data stream; and
- allocating, based on the first overprovisioning factor and the second overprovisioning factor, a first plurality of overprovisioned blocks to the first data stream and a second plurality of overprovisioned blocks to the second data stream.
11. (canceled)
12. The method of claim 10, wherein the first overprovisioning factor is represented by a ratio of a number of physical memory blocks to a number of logical memory blocks presented as available memory to a host in communication with the storage system controller.
13. The method of claim 10, wherein the first value of the data stream attribute reflects an expected data invalidation rate of the first data stream.
14. The method of claim 10, further comprising:
- maintaining a memory data structure associating a plurality of physical blocks with the first data stream, wherein the plurality of physical blocks includes a plurality of overprovisioned blocks allocated based on the first overprovisioning factor.
15. The method of claim 14, further comprising:
- receiving a data item associated with the first data stream;
- identifying, using the memory data structure, a physical block for storing a data stream; and
- transmitting, to a memory device holding the identified physical block, an instruction specifying the data item.
16. The method of claim 10, wherein an expected data invalidation rate of the first data stream exceeds an expected data invalidation rate of the second data stream, and wherein the first overprovisioning factor is less than the second overprovisioning factor.
17. A computer-readable non-transitory storage medium comprising executable instructions that, when executed by a processor, cause the processor to:
- determine a first value of a data stream attribute associated with a first data stream and a second value of the data stream attribute associated with a second data stream;
- determine, based on the first value and the second value, a first overprovisioning factor associated with the first data stream and a second overprovisioning factor associated with the second data stream, wherein an expected data invalidation rate of the first data stream exceeds an expected data invalidation rate of the second data stream, and wherein the first overprovisioning factor is less than the second overprovisioning factor; and
- allocate, based on the first overprovisioning factor and the second overprovisioning factor, a first plurality of overprovisioned blocks to the first data stream and a second plurality of overprovisioned blocks to the second data stream.
18. The computer-readable non-transitory storage medium of claim 17, wherein the first overprovisioning factor and the second overprovisioning factor are calculated to minimize a difference between a first write amplification factor associated with the first data stream and a second write amplification factor associated with the second data stream.
19. The computer-readable non-transitory storage medium of claim 17, wherein the first overprovisioning factor is represented by a ratio of a number of physical memory blocks to a number of logical memory blocks presented as available memory to a host.
20. (canceled)
Type: Application
Filed: May 21, 2018
Publication Date: Nov 21, 2019
Inventors: Shirish D. Bahirat (Longmont, CO), William Akin (Morgan Hill, CA), Aditi P. Kulkarni (Boulder, CO)
Application Number: 15/985,167