DATA MIGRATION FOR COMPOSITE NON-VOLATILE STORAGE DEVICE
In one embodiment, a method for managing a composite storage device made up of fast non-volatile storage, such as a solid state device, and slower non-volatile storage, such as a traditional magnetic hard drive, can include maintaining a first data structure, which stores instances of recent access to each unit in a set of units in the fast non-volatile storage device, such as the SSD device and also maintaining a second data structure that indicates whether or not units in the slower storage device, such as the HDD, have been accessed at least a predetermined number of times. In one embodiment, the second data structure can be a probabilistic hash table, which has a low required memory overhead but is not guaranteed to always provide a correct answer with respect to whether a unit or block in the slower storage device has been referenced recently.
The present application claims the benefit of provisional application Ser. No. 61/599,927, filed on Feb. 16, 2012, and this provisional application is hereby incorporated by reference. The present application is also related to co-pending application Ser. No. 61/599,930, which was also filed on Feb. 16, 2012, and which is hereby incorporated by reference.
BACKGROUND OF THE INVENTIONThe present invention relates to methods for managing storage of data in a composite non-volatile memory that is a composite of a slow memory device and a fast memory device. In a composite disk system, a large, slow, and inexpensive magnetic hard drive can be combined with a small, fast but expensive storage device, such as a solid state drive to forma logical volume. This can provide the advantage of fast access through the solid state drive (SSD) while providing the large capacity of the magnetic hard disk drive (HDD). Prior techniques for managing such a composite disk have used algorithms such as a least recently used (LRU) algorithm or a CLOCK algorithm or the ClockPro algorithm described by Song Jiang. These prior techniques can improve the allocation of the data between the fast and the slow portions of the composite disk, but they tend to not be space efficient, in that they require large amounts of main memory, such as large amounts of DRAM, in order to implement the data structures used in these techniques for allocating data between the two parts of the composite disk. Hence there is a need for an improved, space efficient technique, which does not require as much memory to store the data structures used in allocating or migrating data between the two or more components of the composite disk.
SUMMARY OF THE DESCRIPTIONIn one embodiment, a method for managing access to a fast non-volatile storage device, such as a solid state device, and a slower non-volatile storage device, such as a magnetic hard drive, can include maintaining a first data structure which indicates a recency of access to each unit in a set of units in the fast non-volatile storage device, such as the SSD device and also maintaining a second data structure that indicates whether or not units or blocks in the slower storage device, such as the HDD device, have been referenced recently (such as the units or blocks that have been referenced only once recently). In one embodiment, the second data structure can be a probabilistic hash table, which is space efficient, and reduces the required memory overhead. The probabilistic hash table is correct most of the time with respect to whether a unit or block in the slower storage device has been referenced recently, but is not guaranteed to always provide a correct answer.
Other features of the present invention will be apparent from the accompanying drawings and from the detailed description, which follows.
The above summary does not include an exhaustive list of all aspects of the present invention. It is contemplated that the invention includes all systems and methods that can be practiced from all suitable combinations of the various aspects summarized above, and also those disclosed in the Detailed Description below.
The present invention is illustrated by way of example, and not limitation, in the figures of the accompanying drawings in which like references indicate similar elements.
Approaches to improving the management of a composite, non-volatile data storage device are described. Various embodiments and aspects of the invention will be described with reference to details discussed below, and the accompanying drawings will illustrate the various embodiments. The following description and drawings are illustrative of the invention and are not to be construed as limiting the invention. Numerous specific details are described to provide a thorough understanding of various embodiments of the present invention. However, in certain instances, well-known or conventional details are not described in order to provide a concise discussion of embodiments of the present invention.
Reference in the specification to “one embodiment” or “an embodiment” means that a particular feature, structure, or characteristic described in conjunction with the embodiment can be included in at least one embodiment of the invention. The appearances of the phrase “in one embodiment” in various places in the specification do not necessarily all refer to the same embodiment. The processes depicted in the figures that follow are performed by processing logic that comprises hardware (e.g. circuitry, dedicated logic, etc.), software (as instructions on a non-transitory machine-readable storage medium), or a combination of both. Although the processes are described below in terms of some sequential operations, it should be appreciated that some of the operations described may be performed in a different order. Moreover, some operations may be performed in parallel rather than sequentially.
For example, location 302 corresponds to unit zero on the SSD and the next unit to the right corresponds to unit one on the SSD, and location 303 corresponds to another unit on the SSD. Each location stores a value indicating the state of the corresponding storage unit within the SSD. In one embodiment two-bit value can be used, such that a value of zero can indicate that the one or more blocks or other components in a particular unit on the SSD is free while the value of one in a location can indicate that a particular unit on the SSD has not been referenced recently and a value of two can indicate that that unit in the SSD has been referenced recently. A value of three can indicate that a unit is pinned to the SSD, and cannot be demoted to the HDD. Alternatively, in one embodiment, a three-bit value can be used which can track the specific number of accesses to a unit. In this embodiment, a zero value can also indicate that the unit is free; a value of one can indicate that the unit has not been referenced recently, and the maximum value of seven can indicate that the unit is pinned. Other values can indicate the number of times a unit has been recently referenced, such as a value of six, which would indicate five recent references.
In one embodiment, the first data structure 301 can be managed as follows. When the algorithm needs to find a candidate to demote from the SSD to the HDD, it will use the clock pointer 304. In one embodiment, the clock pointer 304 will sweep from one unit to the next unit in a clockwise direction, until it finds a unit with value of one, which means the unit has not been referenced recently. In one embodiment, the clock pointer 304 can sweep in a counter-clockwise direction. If the value in the unit is the maximum value, then the unit is pinned to the SSD and cannot be demoted to the HDD. If the value is larger than one, but is not the maximum value, the value is decremented by one, down to a minimum value of one, before the clock pointer moves to the next unit. When a particular unit in the SSD is accessed, a counter in the location corresponding to that unit on the SSD will be incremented. Using this method, frequently accessed units on the SSD will attain increasingly higher counts in the unit of the data structure corresponding to that unit on the SSD, up to a preset count limit. However, as the clock pointer 304 sweeps from unit to unit each time a candidate for demotion is required, a count in each sequential unit (e.g. 302, 303) will decrement each time the clock pointer 304 passes that unit, down to a minimum value of one, which indicates that the unit has not been recently accessed. Further details in connection with the use of the clock algorithm relative to the second data structure, which will be next described, are provided in conjunction with
If operation 507 determines that the unit is not already in the second data structure then it proceeds to operation 509 in which the unit number or a representation of the unit number is added to the second data structure which can be the ghost table 401. Further information concerning operation 509 is provided in connection with
In operation 515, data in the unit of the HDD that is being accessed is migrated from the HDD to the SDD using techniques, which are known in the art. Further, the unit number for that unit of data that has been migrated or is to be migrated is removed from the second data structure, such as the ghost table 401. If in operation 511 the system determines that the SSD is full, then operation 513 precedes operation 515. It will be appreciated that the file system will still maintain conventional data structures indicating the locations of various data in response to the migration of the data in operation 515. In operation 513, the system creates space on the SSD using, in one embodiment, the clock algorithm. In this case, the clock algorithm uses the clock pointer 304 to move sequentially through the circular queue, starting with the current position of the clock pointer to a position which indicates a unit in the SSD that has not been recently referenced; in one embodiment, this is indicated by the value of one stored in a location in the circular queue. As the clock pointer 304 is moved through the circular queue in a circular fashion, the value in each location is decremented by one. As the clock pointer 304 moves through the queue decrementing the values in each location, eventually one of the units will receive a value indicating it is an available unit. Once the clock algorithm determines a next available unit location in the SSD, then the data in that unit of the SSD can be flushed to the HDD and the accessed data on the HDD can be migrated from the HDD to that location or unit in the SSD in operation 515 which can follow operation 513. The removal of a unit number from the second data structure is further described in conjunction with
The method shown in
An alternative embodiment of the present invention can employ a Bloom filter rather than the probabilistic hash table, which can be implemented as a ghost table. An example of a Bloom filter is shown in
In the foregoing specification, the invention has been described with reference to specific embodiments thereof. It will, however, be evident that various modifications and changes can be made thereto without departing from the broader spirit and scope of the invention. The specification and drawings are, accordingly, to be regarded in an illustrative rather than a restrictive sense.
Claims
1. A method for managing access to a multi-device composite data storage system, the method comprising:
- managing a first data structure indicating a recency of access to each unit in a set of units on a first data storage device; and
- managing a second data structure that probabilistically indicates whether a unit on a second data storage device has received at least one recent references, wherein the second data structure is a probabilistic hash table, or a counting Bloom filter, or another space efficient probabilistic data structure.
2. The method of claim 1 wherein managing the first data structure comprises:
- receiving a request to access a block of the composite data storage system;
- accessing the block from the first data storage device; and
- updating the first data structure to indicate that the block was recently accessed from the first data storage device.
3. The method of claim 1 wherein managing the second data structure comprises:
- receiving a request to access a block of the data storage system;
- adding, to the second data structure, data representing a unit identifier on the second data storage device containing the block of the data storage system; and
- migrating the unit to the first data storage device.
4. The method of claim 3 wherein adding data representing a unit identifier on the second data storage device to the second data structure comprises:
- calculating a hash of the unit identifier of the data storage system;
- calculating a signature for the unit; and
- storing the signature for the unit into an index on the second data structure, wherein the index is defined by the hash of the unit.
5. The method of claim 3 wherein grating the unit on the second data storage device from the second data storage device to the first data storage device comprises:
- searching the second data structure for the signature of a unit on the second data storage device, wherein the unit contains the block of the data storage system. moving the unit from the second storage device to the first storage device; and
- removing, from the second data structure, the signature of the unit.
6. The method of claim 5 wherein moving the unit from the second storage device to the first storage device comprises moving multiple data blocks as a single unit.
7. A system for managing access to a composite data storage device, the system comprising:
- a first data storage device, to store data in a set of units;
- a first data structure, to indicate a recency of access to each unit in the set of units on the first data storage device;
- a second data storage device, coupled to the first data storage device, to store data in a set of units; and
- a second data structure, to probabilistically indicates whether a unit in the set of units on the second data storage device has received at least one recent access, wherein the second data structure is a probabilistic hash table.
8. The system of claim 7 wherein the first data storage device is a solid-state drive.
9. The system of claim 7 wherein the second data storage device is a magnetic hard disk drive.
10. The system of claim 7 wherein the second data structure contains an element corresponding to each of the units on the first storage device.
11. The system of claim 7 wherein the second data structure contains a number of elements corresponding to a proportion of the units on the first storage device.
12. The system of claim 7 wherein a signature for a unit on the second storage device is stored in the second data structure.
13. The system of claim 7 wherein the first data structure is a circular queue maintained by use of a clock algorithm.
14. The system of claim 13 wherein the first data structure contains an element for each unit on the first data storage device.
15. The system of claim 14 wherein an element of the first data structure indicates that a unit on the first data storage device is free.
16. The system of claim 15 wherein the first data structure stores a value to indicate a count of recent accesses to a particular unit on the first data storage device.
17. A non-transitory machine-readable storage medium having instructions stored therein, which when executed by a machine, cause a machine to perform operations for managing access to a multi-device composite data storage system, the operations comprising:
- initializing a first data structure, the first data structure to indicate if a unit in a set of units on a first data storage device is accessed, wherein the first data structure is managed via a clock algorithm;
- initializing a second data structure, the second data structure to probabilistically indicate that a unit on a second data storage device has received at least one recent access, wherein the second data structure is a probabilistic hash table;
- receiving a request to access a logical block of the composite data storage system;
- accessing the logical block from a unit on the first storage device if the logical block is contained on the first data storage device, and updating the first data structure to indicate that a block of the composite data storage system as recently accessed from a unit on the first data storage device;
- searching the second data structure for the logical block if the logical block is not found in a unit on the first data storage device;
- adding the logical block to the second data structure if the logical block is not found in the second data structure;
- migrating a unit from the second data storage device to the first data storage device if the logical block is found in the second data structure; and
- removing the logical block from the second data structure.
18. The machine-readable storage medium of claim 17 further comprising:
- halving the size of the second data structure after a number of signatures are not found within a period of time; and
- doubling the size of the second data structure after a number of signatures are found within a period of time.
19. The machine-readable storage medium of claim 18, further comprising:
- calculating a set of hash values for an address of a requested unit on the second data storage device;
- calculating a signature for the address of the requested unit on the second data storage device; and
- storing the signature of the address of the requested unit in the second data structure by using a hash value from the calculated set of hash values as an index.
20. The machine-readable storage medium of claim 19, wherein calculating a set of hash values uses a plurality of hash functions.
21. The machine-readable storage medium of claim 9, wherein storing the signature of the address of the requested unit in the second data structure comprises:
- searching, for each hash function, an index of the second data structure addressed by the hash value calculated by that hash function.;
- storing the signature of the address of the requested unit in an empty location indexed by hash the value; and
- storing the signature of the address of the requested unit in a random location in the second data structure if no hash value in the set of hash values indexes an empty location.
23. A non-transitory machine-readable storage medium having instructions, which when executed, cause a data processing system to perform a method as in claim 1.
24. A non-transitory machine-readable storage medium having instructions, which when executed, cause a data processing system to perform a method as in claim 4.
Type: Application
Filed: Sep 6, 2012
Publication Date: Aug 22, 2013
Inventors: Wenguang Wang (Santa Clara, CA), Peter Macko (Liptovsky Mikulas)
Application Number: 13/605,916
International Classification: G06F 12/08 (20060101); G06F 12/14 (20060101);