Pluggable storage system for distributed file systems

- EMC Corporation

A method, article of manufacture, and apparatus for managing data. In some embodiments, this includes an initial instruction for a file stored in a first storage system, determining that the initial instruction is not supported by the first storage system, identifying a combination of instructions to the first storage system after determining that the initial instruction is not supported by the first storage system, wherein the combination of instructions is based on the initial instruction, performing the identified combination of instructions on the file stored in the first storage system, and storing results of the performed identified combination of instructions.

Skip to: Description  ·  Claims  ·  References Cited  · Patent History  ·  Patent History
Description
CROSS REFERENCE TO RELATED APPLICATIONS

This application claims priority to U.S. Provisional Patent Application No. 61/769,043 for INTEGRATION OF MASSIVELY PARALLEL PROCESSING WITH A DATA INTENSIVE SOFTWARE FRAMEWORK filed on Feb. 25, 2013, which is incorporated herein by reference for all purposes.

FIELD OF THE INVENTION

This invention relates generally to databases, and more particularly to systems and methods for managing datasets in databases.

BACKGROUND OF THE INVENTION

With the large amounts of data generated in recent years, data mining and machine learning are playing an increasingly important role in today's computing environment. For example, businesses may utilize either data mining or machine learning to predict the behavior of users. This predicted behavior may then be used by businesses to determine which plan to proceed with, or how to grow the business.

The data used in data mining and analytics is typically not stored in a uniform data storage system. Many data storage systems utilize different file systems, and those different file systems are typically not compatible with each other. Further, the data may reside in geographically diverse locations.

One conventional method to performing data analytics across different databases includes copying data from one database to a central database, and performing the data analytics on the central database. However, this results in an inefficient use of storage space, and creates issues with data consistency between the two databases.

There is a need, therefore, for an improved method, article of manufacture, and apparatus for managing data.

BRIEF DESCRIPTION OF THE DRAWINGS

The present invention will be readily understood by the following detailed description in conjunction with the accompanying drawings, wherein like reference numerals designate like structural elements, and in which:

FIG. 1 illustrates a database system in accordance with some embodiments.

FIG. 2 is a flowchart of a method to manage data in accordance with some embodiments.

DETAILED DESCRIPTION

A detailed description of one or more embodiments of the invention is provided below along with accompanying figures that illustrate the principles of the invention. While the invention is described in conjunction with such embodiment(s), it should be understood that the invention is not limited to any one embodiment. On the contrary, the scope of the invention is limited only by the claims and the invention encompasses numerous alternatives, modifications, and equivalents. For the purpose of example, numerous specific details are set forth in the following description in order to provide a thorough understanding of the present invention. These details are provided for the purpose of example, and the present invention may be practiced according to the claims without some or all of these specific details. For the purpose of clarity, technical material that is known in the technical fields related to the invention has not been described in detail so that the present invention is not unnecessarily obscured.

It should be appreciated that the present invention can be implemented in numerous ways, including as a process, an apparatus, a system, a device, a method, or a computer readable medium such as a computer readable storage medium or a computer network wherein computer program instructions are sent over optical or electronic communication links. Applications may take the form of software executing on a general purpose computer or be hardwired or hard coded in hardware. In this specification, these implementations, or any other form that the invention may take, may be referred to as techniques. In general, the order of the steps of disclosed processes may be altered within the scope of the invention.

An embodiment of the invention will be described with reference to a data storage system in the form of a storage system configured to store files, but it should be understood that the principles of the invention are not limited to this configuration. Rather, they are applicable to any system capable of storing and handling various types of objects, in analog, digital, or other form. Although terms such as document, file, object, etc. may be used by way of example, the principles of the invention are not limited to any particular form of representing and storing data or other information; rather, they are equally applicable to any object capable of representing information.

FIG. 1 illustrates a database system in accordance with some embodiments. Primary Master 102 accepts queries, plans queries, and dispatches queries to Segments 106 for execution. Primary Master 102 also collects results from Segments 106. Segments 106 are each a compute unit of the database system. Rack 108 can store multiple segments as hosts (hosts not shown in FIG. 1). Standby Master 100 is a warm backup of the primary host. Primary Master 102 and Standby Master 100 communicate to Rack 108 via the network connection. The tables of the database system are stored in Storage Nodes 112. Access, including read and write, are done through Storage Abstraction Layer 110. Primary Master 102, in some embodiments, may include a meta store (not shown in FIG. 1).

The meta store includes information about the different files systems in Storage Nodes 112, such as API information of the file system's interface, and different attributes and metadata of the file system. The meta store also includes information on the binary location of the Storage Abstraction Layer 110. As new file systems are added to the database system, the new file systems are registered (e.g. provided API information, other attributes, etc.) with the meta store. Once the new file systems are added, instances of that file system may be created to store data objects, such as databases and tables.

The storage nodes may be different file systems. For example, one storage node may be Hadoop File System (HDFS), while another storage node may be NFS. Having multiple file systems presents some challenges. One challenge is that file systems do not support all the same commands. The Storage Abstraction Layer helps address this challenge.

In some embodiments, the Storage Abstraction Layer selects a file system instance. A file system instance means a physical storage system for a specific file system. As discussed above, there may be several different file systems, and several different instances. The instances may be of the same file system, or they may be of different file systems. Different file systems may have different semantics or different performance characteristics. For example, some file systems allow you to update data, while other file systems only let you append data. The Storage Abstraction Layer chooses a file system based on the file system's attributes.

For example, in some embodiments, if a user wanted to modify or update a file that is stored on an underlying storage system which does not support file modification, the Storage Abstraction Layer may recognize the update command and move the file from the underlying storage system to another which does support file modification. The move may be temporary (e.g. move the file back after the user is finished with the file), or the move may be permanent.

In some embodiments, the Storage Abstraction Layer may choose to store a data object in a file system that does not allow updating. This may be preferable in cases where the data object is only read and never modified, and the file system is efficient for retrieving data. Thus, the Storage Abstraction Layer may take into account the usage statistics of the data object to determine what file system to use to store the data.

In some embodiments, the Storage Abstraction Layer may perform semantic adaptation. This may be preferable when the underlying file system may not be able to communicate directly with segments. This may occur when the interface the Storage Abstraction Layer exposes to the segment execution engine does not match with the semantics of the underlying file system. Other examples include instances where the functionality required by the segments is not supported by the underlying file system.

For example, a user may wish to truncate a file. However, the file may be stored on a segment where the underlying storage does not allow truncating files. The user is not aware of this because the user is not aware of where the files are physically stored. Typically, without a Storage Abstraction Layer, the underlying file system would not be able to understand the truncate command.

One example of semantic adaption includes adapting the truncate command. Suppose that a segment requires a piece of data to be truncated. However, the underlying file system does not support the truncate functionality. The Storage Abstraction Layer may be able to put various commands together to mimic a truncate command. Since the Storage Abstraction Layer has access to the metadata of the file system stored in the meta store, it knows what commands are allowed in the file system, as well as how to access the file system via APIs. Suppose that the file to be truncated is File A, and File A consists of 20 bytes. The segment wants the last 10 bytes to be deleted. With this requirement, the Storage Abstraction Layer may employ semantic adaptation to complete the truncation even when the underlying file system does not support a truncate command. In some embodiments, the Storage Abstraction Layer may first copy the first 10 bytes of File A to a temporary file, called File B. Then, the original File A is deleted, leaving only the temporary File B. After the original File A is deleted, the temporary File B is renamed to File A. File A is now only half of the original File A. In other words, File A has been truncated, even though the underlying file system did not support truncation. The Storage Abstraction Layer, by understanding how to access the underlying file system via the meta store, sent a series of commands to mimic a truncate. This series of commands may be stored in the meta store so that future truncate requests may make use of it.

Another example of semantic adaption includes a file update command. As mentioned above, some file systems do not allow for updating a file. Suppose a segment required that a file be updated. However, the file is stored in a file system that does not allow files to be updated. In some embodiments, the Storage Abstraction Layer may record the modifications in separate file as a new version. For example, if File A was to be modified, the separate file may be called File A_ver2. The segment (or user) will see that changes are being made to File A, but in fact, File A remains unchanged and the changes are being stored in File A_ver2. After the segment is finished modifying or updating the file, there may be two files stored—one is File A, and the other is File A_ver2. When subsequent users want to access File A, the Storage Abstraction Layer may cause the two files to be merged. With File A merged with File A_ver2 and called File A, the new File A will include all the changes made by the previous user. In other words, File A has been modified, even though the underlying file system did not support updating.

With the Storage Abstraction Layer, many different file systems may be supported. New and different storage systems with different file systems may be “plugged” into the database, without affecting the ability for the database to run its queries or jobs, as long as the meta store is updated with information about the new file system, such as its APIs.

FIG. 2 illustrates a method to manage data in accordance with some embodiments. In step 200, an initial instruction for a file stored in a first storage system is received. In step 202, it is determined that the initial instruction is not supported by the first storage system. In step 204, a combination of instructions to the first storage system is identified after determining that the initial instruction is not supported by the first storage system, wherein the combination of instructions is based on the initial instruction. In step 206, the identified combination of instructions is performed on the file stored in the first storage system. In step 208, results of the performed identified combination of instructions are stored.

For the sake of clarity, the processes and methods herein have been illustrated with a specific flow, but it should be understood that other sequences may be possible and that some may be performed in parallel, without departing from the spirit of the invention. Further, though the techniques herein teach creating one SwR sample in parallel, those with ordinary skill in the art will readily appreciate that the techniques are easily extendable to generate many SwR samples. Additionally, steps may be subdivided or combined. As disclosed herein, software written in accordance with the present invention may be stored in some form of computer-readable medium, such as memory or CD-ROM, or transmitted over a network, and executed by a processor.

All references cited herein are intended to be incorporated by reference. Although the present invention has been described above in terms of specific embodiments, it is anticipated that alterations and modifications to this invention will no doubt become apparent to those skilled in the art and may be practiced within the scope and equivalents of the appended claims. More than one computer may be used, such as by using multiple computers in a parallel or load-sharing arrangement or distributing tasks across multiple computers such that, as a whole, they perform the functions of the components identified herein; i.e. they take the place of a single computer. Various functions described above may be performed by a single process or groups of processes, on a single computer or distributed over several computers. Processes may invoke other processes to handle certain tasks. A single storage device may be used, or several may be used to take the place of a single storage device. The disclosed embodiments are illustrative and not restrictive, and the invention is not to be limited to the details given herein. There are many alternative ways of implementing the invention. It is therefore intended that the disclosure and following claims be interpreted as covering all such alterations and modifications as fall within the true spirit and scope of the invention.

Claims

1. A method for managing data, comprising:

storing a file in a first storage system having a first file system, wherein the first storage system is selected by a storage abstraction layer based at least in part on whether file system functionality associated with usage statistics of the file is supported by the first file system of the first storage system;
receiving an initial instruction associated with performing a first action in relation to a file stored in a first storage system;
determining that the initial instruction associated with performing the first action is not supported by the first storage system based at least in part on metadata associated with the first storage system, wherein the metadata associated with the first storage system is stored in meta store that includes information respectively associated with one or more storage systems;
in response to determining that the initial instruction associated with performing the first action is not supported by the first storage system, identifying a combination of instructions to the first storage system, wherein the identifying of the combination of instructions includes storing an update in a second file and merging the second file with the file after the update is complete, wherein the combination of instructions is based on the initial instruction and performs the first action by performing a series of actions that have a collective result that is equivalent to a result of the first action, and wherein the combination of instructions are determined based at least in part on the metadata associated with the first storage system, and wherein the metadata associated with the first storage system indicates a mechanism for accessing a file system corresponding to the first storage system;
performing the identified combination of instructions on the file stored in the first storage system; and
storing results of the performed identified combination of instructions.

2. The method as recited in claim 1, wherein the initial instruction includes a truncate instruction.

3. The method as recited in claim 2, wherein the combination of instructions includes copy a first portion of the file, delete the file, and rename the first portion of the file.

4. The method as recited in claim 1, wherein the initial instruction includes an update instruction.

5. A system for managing data, comprising a storage device and a processor configured to:

store a file in a first storage system having a first file system, wherein the first storage system is selected by a storage abstraction layer based at least in part on whether file system functionality associated with usage statistics of the file is supported by the first file system of the first storage system
receive an initial instruction associated with performing a first action in relation to a file stored in a first storage system;
determine that the initial instruction associated with performing the first action is not supported by the first storage system based at least in part on metadata associated with the first storage system, wherein the metadata associated with the first storage system is stored in meta store that includes information respectively associated with one or more storage systems;
in response to determining that the initial instruction associated with performing the first action is not supported by the first storage system, identify a combination of instructions to the first storage system, wherein to identify the combination of instructions includes to store an update in a second file and merging the second file with the file after the update is complete, wherein the combination of instructions is based on the initial instruction and performs the first action by performing a series of actions that have a collective result that is equivalent to a result of the first action, and wherein the combination of instructions are determined based at least in part on the metadata associated with the first storage system, and wherein the metadata associated with the first storage system indicates a mechanism for accessing a file system corresponding to the first storage system;
perform the identified combination of instructions on the file stored in the first storage system; and
store results of the performed identified combination of instructions.

6. The system as recited in claim 5, wherein the initial instruction includes a truncate instruction.

7. The system as recited in claim 6, wherein the combination of instructions includes copy a first portion of the file, delete the file, and rename the first portion of the file.

8. The system as recited in claim 5, wherein the initial instruction includes an update instruction.

9. A computer program product for processing data, comprising a non-transitory computer readable medium having program instructions embodied therein for:

storing a file in a first storage system having a first file system, wherein the first storage system is selected by a storage abstraction layer based at least in part on whether file system functionality associated with usage statistics of the file is supported by the first file system of the first storage system;
receiving an initial instruction associated with performing a first action in relation to a file stored in a first storage system;
determining that the initial instruction associated with performing the first action is not supported by the first storage system based at least in part on metadata associated with the first storage system, wherein the metadata associated with the first storage system is stored in meta store that includes information respectively associated with one or more storage systems;
in response to determining that the initial instruction associated with performing the first action is not supported by the first storage system, identifying a combination of instructions to the first storage system, wherein the identifying of the combination of instructions includes storing an update in a second file and merging the second file with the file after the update is complete, wherein the combination of instructions is based on the initial instruction and performs the first action by performing a series of actions that have a collective result that is equivalent to a result of the first action, and wherein the combination of instructions are determined based at least in part on the metadata associated with the first storage system, and wherein the metadata associated with the first storage system indicates a mechanism for accessing a file system corresponding to the first storage system;
performing the identified combination of instructions on the file stored in the first storage system; and
storing results of the performed identified combination of instructions.

10. The computer program product as recited in claim 9, wherein the initial instruction includes a truncate instruction.

11. The computer program product as recited in claim 10, wherein the combination of instructions includes copy a first portion of the file, delete the file, and rename the first portion of the file.

12. The computer program product as recited in claim 9, wherein the initial instruction includes an update instruction.

13. The method of claim 1, wherein the determining that the initial instruction associated with the first action is not supported by the first storage system and the identifying of the combination of instructions are performed, or caused to be performed, by the storage abstraction layer.

14. The method of claim 13, wherein the storage abstraction layer selects a file system of the one or more storage systems in which the file is to be stored based at least in part on a usage statistic of the file and the metadata associated with the corresponding file storage system.

15. The method of claim 13, wherein in response to a second storage system being added, the storage abstraction layer updates the meta store to include information corresponding to the second storage system.

16. The method of claim 1, wherein the information respectively associated with the one or more storage systems comprises one or more of an Application Program Interface (API) information of a corresponding interface of the one or more storage systems, an attribute of the one or more storage systems, and metadata associated with the one or more storage systems.

17. The method of claim 1, wherein the combination of instructions comprises a plurality of actions that are collectively mapped to the initial instruction.

Referenced Cited
U.S. Patent Documents
5655116 August 5, 1997 Kirk et al.
5706514 January 6, 1998 Bonola
6718372 April 6, 2004 Bober
6745385 June 1, 2004 Lupu
6996582 February 7, 2006 Daniels
7035931 April 25, 2006 Zayas et al.
7069421 June 27, 2006 Yates, Jr.
7313512 December 25, 2007 Traut
7613947 November 3, 2009 Coatney et al.
7689535 March 30, 2010 Bernard
7702625 April 20, 2010 Peterson et al.
7716261 May 11, 2010 Black
7720841 May 18, 2010 Gu et al.
7739316 June 15, 2010 Thompson
7827201 November 2, 2010 Gordon
7949693 May 24, 2011 Mason
7958303 June 7, 2011 Shuster
7978544 July 12, 2011 Bernard
8028290 September 27, 2011 Rymarczyk
8051113 November 1, 2011 Shekar
8131739 March 6, 2012 Wu
8180813 May 15, 2012 Goodson et al.
8185488 May 22, 2012 Chakravarty et al.
8195769 June 5, 2012 Miloushev et al.
8200723 June 12, 2012 Sears
8219681 July 10, 2012 Glade
8301822 October 30, 2012 Pinto et al.
8312037 November 13, 2012 Batchavachalu et al.
8352429 January 8, 2013 Mamidi et al.
8417681 April 9, 2013 Miloushev et al.
8452821 May 28, 2013 Shankar et al.
8533183 September 10, 2013 Hokanson
8577911 November 5, 2013 Stepinski et al.
8682853 March 25, 2014 Zane et al.
8682922 March 25, 2014 Boneti
8971916 March 3, 2015 Joyce et al.
20020049782 April 25, 2002 Herzenberg et al.
20020133810 September 19, 2002 Giles
20020146035 October 10, 2002 Tyndall
20030172094 September 11, 2003 Lauria et al.
20030229637 December 11, 2003 Baxter et al.
20040054748 March 18, 2004 Ackaouy et al.
20040088282 May 6, 2004 Xu et al.
20040143571 July 22, 2004 Bjornson et al.
20050165777 July 28, 2005 Hurst-Hiller et al.
20050216788 September 29, 2005 Mani-Meitav et al.
20060005188 January 5, 2006 Vega
20060010433 January 12, 2006 Neil
20060136653 June 22, 2006 Traut
20060146057 July 6, 2006 Blythe
20060149793 July 6, 2006 Kushwah et al.
20060173751 August 3, 2006 Schwarze et al.
20060248528 November 2, 2006 Oney
20080059746 March 6, 2008 Fisher
20080281802 November 13, 2008 Peterson et al.
20080313183 December 18, 2008 Cunningham et al.
20090007105 January 1, 2009 Fries
20090089344 April 2, 2009 Brown et al.
20090132609 May 21, 2009 Barsness et al.
20090265400 October 22, 2009 Pudipeddi et al.
20090328225 December 31, 2009 Chambers
20100036840 February 11, 2010 Pitts
20100042655 February 18, 2010 Tse et al.
20100145917 June 10, 2010 Bone et al.
20100241673 September 23, 2010 Wu
20100287170 November 11, 2010 Liu et al.
20110113052 May 12, 2011 Hörnkvist et al.
20110137966 June 9, 2011 Srinivasan
20110153662 June 23, 2011 Stanfill et al.
20110153697 June 23, 2011 Nickolov et al.
20110313973 December 22, 2011 Srivas et al.
20120023145 January 26, 2012 Brannon et al.
20120036107 February 9, 2012 Miloushev et al.
20120066274 March 15, 2012 Stephenson
20120095952 April 19, 2012 Archambeau et al.
20120095992 April 19, 2012 Cutting et al.
20120185913 July 19, 2012 Martinez et al.
20120310916 December 6, 2012 Abadi et al.
20120311572 December 6, 2012 Falls
20120317388 December 13, 2012 Driever et al.
20130166543 June 27, 2013 MacDonald et al.
20130185735 July 18, 2013 Farrell
20130198716 August 1, 2013 Huang
20130246347 September 19, 2013 Sorenson
20130275653 October 17, 2013 Ranade et al.
20140149392 May 29, 2014 Wang et al.
20140188845 July 3, 2014 Ah-Soon et al.
20140337323 November 13, 2014 Soep et al.
20150120711 April 30, 2015 Liensberger et al.
Patent History
Patent number: 9411832
Type: Grant
Filed: Mar 15, 2013
Date of Patent: Aug 9, 2016
Assignee: EMC Corporation (Hopkinton, MA)
Inventors: Lei Chang (Beijing), Tao Ma (Beijing), Zhanwei Wang (Beijing), Lirong Jian (Beijing), Lili Ma (Beijing), Gavin Sherry (San Mateo, CA)
Primary Examiner: Hexing Liu
Application Number: 13/843,067
Classifications
Current U.S. Class: Operation (712/30)
International Classification: G06F 7/00 (20060101); G06F 17/30 (20060101);