Method and Apparatus for Storing Data in a Peer to Peer Network

A fixed prefix peer to peer network has a number of physical nodes. The nodes are logically divided into a number of storage slots. Blocks of data are erasure coded into original and redundant data fragments and the resultant fragments of data are stored in slots on separate physical nodes such that no physical node has more than one original and/or redundant fragment. The storage locations of all of the fragments are organized into a logical virtual node (e.g., a supernode). Thus, the supernode and the original block of data can be recovered even if some of the physical nodes are lost.

Skip to: Description  ·  Claims  · Patent History  ·  Patent History
Description

This application claims the benefit of U.S. Provisional Application No. 60/890,661 filed Feb. 20, 2007 which is incorporated herein by reference.

BACKGROUND OF THE INVENTION

The present invention relates generally to peer to peer networking and more particularly to storing data in peer to peer networks.

Peer to peer networks for storing data may be overlay networks that allow data to be distributively stored in the network (e.g., at nodes). In peer to peer networks, there are links between any two peers (e.g., nodes) that communicate with each other. That is, nodes in the peer to peer network may be considered as being connected by virtual or logical links, each of which corresponds to a path in the underlying network (e.g., a path of physical links). Such a structured peer to peer network employs a globally consistent protocol to ensure that any node can efficiently route a search to some peer that has desired data (e.g., a file, piece of data, packet, etc.). A common type of structured peer to peer network uses a distributed hash table (DHT) in which a variant of consistent hashing is used to assign ownership of each file or piece of data to a particular peer in a way analogous to a traditional hash table's assignment of each key to a particular array slot.

However, traditional DHTs do not readily support data redundancy and may compromise the integrity of data stored in systems using DHTs. To overcome these obstacles in existing peer to peer networks, files or pieces of data are N-way replicated, but this results in high storage overhead and often requires multiple hashing functions to locate copies of the data. Further, it is difficult to add support for monitoring data resiliency and automatic rebuilding of missing data.

Accordingly, improved systems and methods of organizing and storing data in peer to peer networks are required.

BRIEF SUMMARY OF THE INVENTION

The present invention generally provides a method of storing data in a fixed prefix peer to peer network having a plurality of physical nodes. A plurality of data fragments are generated by erasure coding a block of data and each of the data fragments are then stored in different physical nodes. In one embodiment, the erasure coding divides the block of data into a number of original fragments and a number of redundant fragments are created where the number of redundant fragments is equal to a predetermined network cardinality minus the number of original data fragments. The physical nodes in the peer to peer network are logically divided into storage slots and the data fragments are stored in different slots on different physical nodes. The storage locations of the fragments (e.g., the slots) are logically organized into a virtual node.

To generate the data fragments by erasure coding, a network cardinality is determined, the block of data is divided into a number of original fragments, and a number of redundant fragments are created wherein the number of redundant fragments is equal to the network cardinality minus the number of original data fragments.

The storage locations of the plurality of data fragments are mapped in a data structure in which the storage locations are the physical nodes in which the plurality of data fragments are stored. In some embodiments, the data structure is a distributed hash table.

These and other advantages of the invention will be apparent to those of ordinary skill in the art by reference to the following detailed description and the accompanying drawings.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a diagram of an exemplary peer to peer network according to an embodiment of the invention;

FIG. 2 is a diagram of an exemplary peer to peer network according to an embodiment of the invention;

FIG. 3 is a diagram of an exemplary peer to peer network according to an embodiment of the invention;

FIG. 4 is an exemplary supernode composition and component description table 400 according to an embodiment of the present invention;

FIG. 5 is a depiction of data to be stored in a peer to peer network;

FIG. 6 is a flowchart of a method of storing data in a fixed prefix peer to peer network according to an embodiment of the present invention; and

FIG. 7 is a schematic drawing of a controller according to an embodiment of the invention.

DETAILED DESCRIPTION

The present invention extends the concept of Distributed Hash Tables (DHTs) to create a more robust peer to peer network. The improved methods of storing data described herein allow for a simple DHT organization with built-in support for multiple classes of data redundancy which have a smaller storage overhead than previous DHTs. Embodiments of the invention also support automatic monitoring of data resilience and automatic reconstruction of lost and/or damaged data.

The present invention provides greater robustness and resiliency to the DHT-based peer to peer network known as a Fixed Prefix Network (FPN) disclosed in U.S. patent application Ser. No. 10/813,504, filed Mar. 30, 2004 and incorporated herein by reference. Unlike traditional peer to peer networks, FPNs and networks according to the present invention, known as FPNs with Supernodes (FPN/SN), are constructed such that the contributed resources (e.g., nodes) are dedicated to the peer to peer system and the systems are accordingly significantly more stable and scalable.

FIGS. 1-3 depict various illustrative embodiments of peer to peer networks utilizing FPN/SNs. FIGS. 1-3 are exemplary diagrams to illustrate the various structures and relationships described below and are not meant to limit the invention to the specific network layouts shown.

FIG. 1 is a diagram of an exemplary peer to peer network 100 for use with an embodiment of the present invention. The peer to peer network 100 has a plurality of physical nodes 102, 104, 106, and 108 that communicate with each other through an underlying transport network 110 as is known. There is no restriction on the location, grouping, or number of the physical nodes 102-108 with regards to the present invention. Though depicted in FIG. 1 as four physical nodes 102-108, it is understood that any number of nodes in any arrangement may be utilized. Similarly, the physical nodes 102-108 may vary in actual storage space, processing power, and/or other resources.

Physical nodes 102-108 each have associated memories and/or storage areas (not shown) as is known. The memories and/or storage areas of physical nodes 102-108 are each logically divided into a plurality of slots approximately proportional to the amount of storage available to each physical node. That is, the memory and/or storage area of physical node 102 is logically divided into approximately equivalent-sized slots 112a, 112b, 112c, and 112d, the memory and/or storage area of physical node 104 is logically divided into approximately equivalent-sized slots 114a, 114b, 114c, and 114d, the memory and/or storage area of physical node 106 is logically divided into approximately equivalent-sized slots 116a, 116b, 116c, and 116d, and the memory and/or storage area of physical node 108 is logically divided into approximately equivalent-sized (e.g., in terms of storage capacity) slots 118a, 118b, 118c, and 118d. A physical node may be logically divided in that its memory and/or storage allocation may be allocated as different storage areas (e.g., slots). Physical nodes 102-108 may be divided into any appropriate number of slots, the slots being representative of an amount of storage space in the node. In other words, data may be stored in the nodes 102-108 in a sectorized or otherwise compartmentalized manner. Of course, any appropriate division of the storage and/or memory of physical nodes 102-108 may be used and slots 112a-d, 114a-d, 116a-d, and 118a-d may be of unequal size. Further, slot size may not be static and may grow or shrink and slots may be split and/or may be merged with other slots.

Each physical node 102-108 is responsible for the storage and retrieval of one or more objects (e.g., files, data, pieces of data, data fragments, etc.) in the slots 112a-d, 114a-d, 116a-d, and 118a-d, respectively. Each object may be associated with a preferably fixed-size hash key of a hash function. In operation, one or more clients 120 may communicate with one or more of physical nodes 102-108 and issue a request for a particular object using a hash key.

Slots 112a-d, 114a-d, 116a-d, and 118a-d may also each be associated with a component of a virtual (e.g., logical) node (discussed in further detail below with respect to FIGS. 2 and 3). Herein, components are not physical entities, but representations of a portion of a virtual node. That is, components may be logical representations of and/or directions to or addresses for a set or subset of data that is hosted in a particular location in a node (e.g., hosted in a slot). Storage locations of data fragments (e.g., data fragments discussed below with respect to FIG. 5) are logically organized into a virtual node.

FIG. 2 is a diagram of a portion of an exemplary peer to peer network 200 for use with an embodiment of the present invention. The peer to peer network 200 is similar to peer to peer network 100 and has a plurality of physical nodes 202, 204, 206, 208, 210, and 212 similar to physical nodes 102-108. Physical nodes 202-212 are each logically divided into a plurality of slots approximately proportional to the amount of storage available to each physical node. That is, physical node 202 is divided logically into slots 214a, 214b, 214c, and 214d, physical node 204 is divided logically into slots 216a, 216b, 216c, and 216d, physical node 206 is divided logically into slots 218a, 218b, 218c, and 218d, physical node 208 is divided logically into slots 220a, 220b, 220c, and 220d, physical node 210 is divided logically into slots 222a, 222b, 222c, and 222d, and physical node 212 is divided logically into slots 224a, 224b, 224c, and 224d. For simplicity of discussion and depiction in FIG. 2, since each slot 214a-d, 216a-d, 218a-d, 220a-d, 222a-d, and 224a-d hosts a component, the component corresponding to its host slot is referred to herein with the same reference numeral. For example, the component hosted in slot 214c of physical node 202 is referred to as component 214c.

A grouping of multiple components is referred to as a virtual node (e.g., a “supernode”). In the example of FIG. 2, supernode 226 comprises components 214b, 216c, 218b, 220d, 222a, and 224a. A virtual node (e.g., supernode) is thus a logical grouping of a plurality of storage locations on multiple physical nodes. The supernode may have any number of components—where the number of components is the supernode cardinality (e.g., the number of components in a supernode)—associated with any number of physical nodes in a network and a supernode need not have components from every physical node. However, each component of a supernode must be hosted in slots on different physical nodes. That is, no two components in a supernode should be hosted at the same physical node. The total number of components in a supernode may be given by a predetermined constant—supernode cardinality. In some embodiments, the supernode cardinality may be in the range of 4-6 32. The supernode cardinality may be a predetermined (e.g., desired, designed, etc.) number of data fragments.

In some embodiments, a larger supernode cardinality is chosen to increase flexibility in choosing data classes. In alternative embodiments, a smaller supernode cardinality is chosen to provide greater access to storage locations (e.g., disks) in read/write operations. Here, data classes define a level of redundancy where lower data classes (e.g., data class low) have less redundancy and higher data classes (e.g., data class high) have more redundancy. There may be a number of data classes equal to the predetermined supernode cardinality. The lowest data class is defined as having no redundant fragment and the highest class is defined as having (supernode cardinality—1) redundant fragments.

In an exemplary embodiment, data class low may refer to a single redundant fragment and data class high may refer to four redundant fragments. Of course, any appropriate number of data fragments may be set for data class low and/or data class high. In this exemplary embodiment, data blocks that are classified by user as data class low will be divided into a number of fragments equal to a supernode cardinality, where there are (supernode cardinality—1) original fragments and one redundant fragment. Accordingly, one fragment may be lost and the data block may be recreated. Using data class high (e.g., four redundant fragments) a block of data will be divided into fragments such that four of them will be redundant. Thus, four fragments may be lost and the original block of data may be recreated. Fragmentation, especially redundant fragments, is discussed in further detail below with respect to FIG. 5.

Components of the supernode may be considered peers and may similarly associated (e.g., in a hash table, etc.), addressed, and/or contacted as peer nodes in a traditional peer to peer network.

FIG. 3 depicts a high level abstraction of an exemplary peer to peer network 300 according to an embodiment of the invention. Peer to peer network 300 is similar to peer to peer networks 100 and 200 and has multiple physical nodes 302, 304, 306, and 308. Each of the physical nodes 302-308 is divided into multiple slots as described above. In the particular example of FIG. 3, each of the physical nodes 302-308 has eight slots. As in FIG. 2, each slot 310, 312, 314, 316, 318, 320, 322, or 324 hosts a component 310, 312, 314, 316, 318, 320, 322, or 324. Components 310-324 are each associated with a corresponding supernode and are distributed among the physical nodes 302-308. In this way, eight supernodes are formed, each with one component 310-324 on each of the four physical nodes 302-308. For example, a first supernode is formed with four components—component 310 hosted on physical node 302 (e.g., in a slot 310), component 310 hosted in physical node 304 (e.g., in a slot 310), component 310 hosted in physical node 306 (e.g., in a slot 310), and component 310 hosted in physical node 308 (e.g., in a slot 310). The first supernode, comprising components 310, is shown as dashed boxes. A second supernode comprises the four components 312 hosted in physical nodes 302-308 and is shown as a trapezoid. Of course, these are merely graphical representations to highlight the different components comprising different supernodes and are not meant to be literal representations of what a slot, component, node, or supernode might look like. The remaining six supernodes are formed similarly.

To facilitate data storage using the supernodes as described and shown in FIGS. 1-3, the fixed prefix network model of DHTs (e.g., FPN) may be extended to use supernodes. Any advantageous hashing function that maps data (e.g., objects, files, etc.) to a fixed-size hash key may be utilized in the context of the present invention. The hash keys may be understood to be fixed-size bit strings (e.g., 5 bits, 6 bits, etc.) in the space containing all possible combinations of such strings. A subspace of the hashkey space is associated with a group of bits of the larger bit string as is known. For example, a group of hash keys beginning with 110 in a 5 bit string would include all hash keys except those beginning with 000, 001, 010, 011,100, and 101. That is, the prefix is 110. Such a subspace of the hashkey space may be a supernode and a further specification may be a component of the supernode. The prefix may be fixed for the life of a supernode and/or component. In such embodiments, the peer to peer network is referred to as a fixed-prefix peer to peer network. Other methods of hashing may be used as appropriate.

FIG. 4 is an exemplary supernode composition and component description table 400 according to an embodiment of the present invention. The supernode composition and component description table 400 may be used in conjunction with the peer to peer network 200, for example. Each supernode (e.g., supernode 226) is described by a supernode composition (e.g., with supernode composition and component description table 400) comprising the supernode prefix 402, an array 404 of the component descriptions, and a supernode version 406. Since each component has a description as described below, the array 402 size is equal to the supernode cardinality. The supernode version 406 is a sequence number corresponding to the current incarnation of the supernode. Each supernode is identified by a fixed prefix 402 as described above and in U.S. patent application Ser. No. 10/813,504. For example, in hashing and/or storing data in peer to peer network 200 according to supernode composition and component description table 400 in which hash keys are fixed size bit strings, the supernode 226 has a fixed prefix of 01101. Therefore, any data that has a hash key beginning with 01101 will be associated with supernode 226.

In operation, each component (e.g., 214b, 216c, 218b, 220d, 222a, 224a, etc.) in the component array 404 is described by a component description comprising a fixed prefix 408, a component index 410, and a component version 412. All components of the supernode (e.g., in array 404) are also assigned the same fixed prefix for their lifetime. The component index 410 of each component corresponds to a location in the supernode array. A component's index is fixed for the component's lifetime and is an identification number pointing to the particular component. A component index is a number between 0 and (supernode cardinality—1). A component's version is a version number sequentially increased whenever the component changes hosts (e.g., nodes). In some embodiments, described in detail in concurrently filed U.S. patent application Ser. No. ______, entitled “Methods for Operating a Fixed Prefix Peer to Peer Network”, Attorney Docket No. 06083A, incorporated by reference herein, a component may be split or moved from one physical node to another and its version is increased in such instances.

Supernode composition and component description table 400 is an example of an organization of the information related to physical nodes, supernodes, and their respective components. Of course, one skilled in the art would recognize other methods of organizing and providing such information, such as storing the information locally on physical nodes in a database, storing the information at a remote location in a communal database, etc.

Updated indications of the supernode composition are maintained (e.g., in supernode composition and component description table 400, etc.) to facilitate communication amongst peers. Further, physical nodes associated with the components maintain compositions of neighboring physical and/or virtual nodes. To maintain such compositions, physical nodes associated with components ping peers and neighbors as is known. In this way, a physical node associated with a component may internally ping physical nodes associated with peers in the component's supernode to determine virtual node health and/or current composition. Further, a physical node associated with a component may externally ping physical nodes associated with neighbors (e.g., components with the same index, but belonging to a different supernode) to propagate and/or collect composition information. Of course, other systems and methods of organizing and/or keeping track of supernodes and their components, including version/incarnation information may be used as appropriate.

FIG. 5 is a generalized drawing of data that may be stored in peer to peer networks 100, 200, and/or 300. A block 502 of data may be divided into multiple pieces 504 of data according to any conventional manner. In at least one embodiment, the block of data 502 may be fragmented into multiple original pieces (e.g., fragments) 506 and a number of redundant fragments 508 may also be generated. Such fragmentation and/or fragment generation may be accomplished by erasure coding, replication, and/or other fragmentation means.

FIG. 6 depicts a flowchart of a method 600 of organizing data in a fixed prefix peer to peer network according to an embodiment of the present invention with particular reference to FIGS. 2 and 5 above. Though discussed with reference to the peer to peer network 200 of FIG. 2, the method steps described herein also may be used in peer to peer networks 100 and 300, as appropriate. The method begins at step 602.

In step 604, a network cardinality is determined. Network cardinality may be a predetermined constant for an entire system and may be determined in any appropriate fashion.

In step 606, a plurality of data fragments 506-508 are generated. In at least one embodiment, the data fragments 506-508 are generated from a block of data 502 by utilizing an erasure code. Using the erasure code transforms a block 502 of n (here, four) original pieces of data 504 into more than n fragments of data 506-508 (here, four original fragments and two redundant fragments) such that the original block 502 of n pieces (e.g., fragments) of data 504 can be recovered from a subset of those fragments (e.g., fragments 506-508). The fraction of the fragments 506-508 required to recover the original n pieces of data 504 is called the rate r. In some embodiments, optimal erasure codes may be used. An optimal erasure code produces n/r fragments of data where any n fragments may be used to recover the original n pieces of data. In alternative embodiments, near optimal erasure codes may be used to conserve system resources. In the same or alternative embodiments, the block of data 502 is divided into n pieces 506. Based on the original n pieces 506, m redundant fragments 508 are created where (m=supernode cardinality−n) and the fragment size is equal to the size of the original block of data 502 divided by n. It may be understood that the erasure coding and creation of redundant fragments 508 allows recreation of the original block of data 502 with half plus one redundant fragments 508 and/or original fragments 506. In the example shown in FIG. 5, only four total fragments from the group of fragments 506-508 are needed to reconstruct original block of data 502. Of course, any other erasure coding scheme may be used.

In step 608, the data fragments 506-508 are stored in different physical nodes 202-212. Each of the data fragments 506, representing the original pieces of the data block 502, and the redundant fragments 508 are stored in separate physical nodes 202-212 using any appropriate methods of storing data in a peer to peer network. In at least one embodiment, data fragments 506-508 are stored in separate slots 214a-d, 216a-d, 218a-d, 220a-d, 222a-d, 224a-d of the physical nodes 202-212. For example, one fragment from fragments 508 and 508 may be stored in each of slots 214b, 216c, 218b, 220d, 222a, and 224a.

A hash may be computed based on the original block of data 502. A virtual node (e.g., virtual node 226) is then found that has the same fixed prefix as the prefix of the computed hash. Since, virtual node 226 comprises components 214b, 216c, 218b, 220d, 222a, and 224a, the data fragments 506-508 are then stored in the slots 214b, 216c, 218b, 220d, 222a, and 224a corresponding to components 214b, 216c, 218b, 220d, 222a, and 224a.

In step 610, the storage locations of the data fragments 506-508 are recorded (e.g., mapped, etc.) in a data structure. The data structure may be a hash table, a DHT, a DHT according to the FPN referenced above, the data structures described in co-pending and concurrently filed U.S. patent application Ser. No. ______, entitled “Methods for Operating a Fixed Prefix Peer to Peer Network”, Attorney Docket No. 06083A, incorporated by reference herein, or any other appropriate data structure. The data structure may facilitate organization, routing, look-ups, and other functions of peer to peer networks 100, 200, and 300. Fragments 506-508 may be numbered (e.g., from 0 to a supernode cardinality minus one) and fragments of the same number may be stored (e.g., grouped, arranged, etc.) in a logical entity (e.g., a virtual node component).

In step 612, the data structure facilitates organization of information about the data fragments 506-508 into virtual nodes (e.g., supernode 226, supernodes 310-324, etc.). That is, the storage locations (e.g., the slots in the physical nodes) storing each of the original fragments 506 and each of the redundant fragments 408 are organized into and/or recorded as a grouping (e.g., a virtual node/supernode as described above). Accordingly, the fragments 506-508 may be organized into and hosted in supernode 226 as described above so that location, index, and version information about the fragments of data 506-508 may be organized as components of supernode 226.

The method ends at step 614.

FIG. 7 is a schematic drawing of a controller 700 according to an embodiment of the invention. Controller 700 contains a processor 702 that controls the overall operation of the controller 700 by executing computer program instructions that define such operation. The computer program instructions may be stored in a storage device 704 (e.g., magnetic disk, database, etc.) and loaded into memory 706 when execution of the computer program instructions is desired. Thus, applications for performing the herein-described method steps, such as erasure coding, storing data, and DHT organization, in method 600 are defined by the computer program instructions stored in the memory 706 and/or storage 704 and controlled by the processor 702 executing the computer program instructions. The controller 700 may also include one or more network interfaces 608 for communicating with other devices via a network (e.g., a peer to peer network, etc.). The controller 700 also includes input/output devices 710 (e.g., display, keyboard, mouse, speakers, buttons, etc.) that enable user interaction with the controller 700. Controller 700 and/or processor 702 may include one or more central processing units, read only memory (ROM) devices and/or random access memory (RAM) devices. One skilled in the art will recognize that an implementation of an actual controller could contain other components as well, and that the controller of FIG. 7 is a high level representation of some of the components of such a controller for illustrative purposes.

According to some embodiments of the present invention, instructions of a program (e.g., controller software) may be read into memory 706, such as from a ROM device to a RAM device or from a LAN adapter to a RAM device. Execution of sequences of the instructions in the program may cause the controller 700 to perform one or more of the method steps described herein, such as those described above with respect to method 600 and/or erasure coding as described above with respect to FIG. 5. In alternative embodiments, hard-wired circuitry or integrated circuits may be used in place of, or in combination with, software instructions for implementation of the processes of the present invention. Thus, embodiments of the present invention are not limited to any specific combination of hardware, firmware, and/or software. The memory 706 may store the software for the controller 700, which may be adapted to execute the software program and thereby operate in accordance with the present invention and particularly in accordance with the methods described in detail below. However, it would be understood by one of ordinary skill in the art that the invention as described herein could be implemented in many different ways using a wide range of programming techniques as well as general purpose hardware sub-systems or dedicated controllers.

Such programs may be stored in a compressed, uncompiled and/or encrypted format. The programs furthermore may include program elements that may be generally useful, such as an operating system, a database management system, and device drivers for allowing the controller to interface with computer peripheral devices, and other equipment/components. Appropriate general purpose program elements are known to those skilled in the art, and need not be described in detail herein.

The inventive methods of organizing a peer to peer network described herein improve network resiliency. Since each supernode includes the fragments derived from an original block of data (e.g., by erasure coding) and each of the fragments is thus stored on a separate physical node, the network is less susceptible to failure due to network changes. That is, changes to the peer physical nodes such as failures and node departures are less likely to affect the peer to peer network because of the distributed nature of the data.

Accordingly, the inventive methods may be employed on a peer to peer network. A controller (e.g., controller 700) may perform hashing functions store and/or look up one or more pieces of data in the peer to peer network. The controller may further be configured to recover the stored data should one or more of the physical nodes be lost (e.g., through failure, inability to communicate, etc.) Of course, the physical nodes in the peer to peer network may be configured to perform one or more of the functions of the controller instead.

The foregoing Detailed Description is to be understood as being in every respect illustrative and exemplary, but not restrictive, and the scope of the invention disclosed herein is not to be determined from the Detailed Description, but rather from the claims as interpreted according to the full breadth permitted by the patent laws. It is to be understood that the embodiments shown and described herein are only illustrative of the principles of the present invention and that various modifications may be implemented by those skilled in the art without departing from the scope and spirit of the invention. Those skilled in the art could implement various other feature combinations without departing from the scope and spirit of the invention.

Claims

1. A method of storing data in a fixed prefix peer to peer network having a plurality of physical nodes comprising:

generating a plurality of data fragments by erasure coding a block of data;
storing each of the plurality of data fragments in different physical nodes.

2. The method of claim 1 wherein storing each of the plurality of data fragments in different physical nodes comprises:

logically dividing each of the physical nodes into a plurality of slots; and
storing each of the plurality of data fragments in different slots on different physical nodes.

3. The method of claim 3 further comprising:

associating the different slots on different physical nodes as a virtual node.

4. The method of claim 1 wherein generating a plurality of data fragments by erasure coding a block of data comprises:

determining a network cardinality;
dividing the block of data into a number of original fragments; and
creating a plurality of redundant fragments wherein the number of redundant fragments is equal to the network cardinality minus the number of original data fragments.

5. The method of claim 1 further comprising mapping storage locations of the plurality of data fragments in a data structure wherein the storage locations are the physical nodes in which the plurality of data fragments are stored.

6. The method of claim 5 wherein the data structure is a distributed hash table.

7. A peer to peer network for storing fragments of data comprising:

a plurality of physical nodes each logically divided into a plurality of slots;
a plurality of controllers associated with each of the physical nodes and configured to associate the plurality of physical nodes as one or more logical nodes comprising a grouping of slots wherein each of the one or more logical nodes includes slots from more than one of the physical nodes.

8. The peer to peer network of claim 7 wherein the controllers are further configured to store a plurality of data fragments in the grouping of slots.

9. The peer to peer network of claim 8 wherein the controllers are further configured to map storage locations of the plurality of data fragments in a data structure wherein the storage locations are the physical nodes in which the plurality of data fragments are stored.

10. A machine readable medium having program instructions stored thereon, the instructions capable of execution by a processor and defining the steps of:

generating a plurality of data fragments by erasure coding a block of data;
storing each of the plurality of data fragments in different physical nodes.

11. The machine readable medium of claim 10 wherein the instructions further define the steps of:

logically dividing each of the physical nodes into a plurality of slots; and
storing each of the plurality of data fragments in different slots on different physical nodes.

12. The machine readable medium of claim 10 wherein the instructions further define the step of:

associating the different slots on different physical nodes as a virtual node.

13. The machine readable medium of claim 10 wherein the instructions further define the step of:

mapping storage locations of the plurality of data fragments in a data structure wherein the storage locations are the physical nodes in which the plurality of data fragments are stored.

14. The machine readable medium of claim 13 wherein the instructions further define the step of:

mapping the storage locations of the plurality of data fragments in a distributed hash table.

15. An apparatus for storing data in a fixed prefix peer to peer network having a plurality of physical nodes comprising:

means for generating a plurality of data fragments by erasure coding a block of data;
means for storing each of the plurality of data fragments in different physical nodes.

16. The apparatus of claim 15 wherein the means for storing each of the plurality of data fragments in different physical nodes comprises:

means for logically dividing each of the physical nodes into a plurality of slots; and
means for storing each of the plurality of data fragments in different slots on different physical nodes.

17. The apparatus of claim 16 further comprising:

means for associating the different slots on different physical nodes as a virtual node.

18. The apparatus of claim 15 further comprising:

means for mapping storage locations of the plurality of data fragments in a data structure wherein the storage locations are the physical nodes in which the plurality of data fragments are stored.
Patent History
Publication number: 20080201335
Type: Application
Filed: Jan 31, 2008
Publication Date: Aug 21, 2008
Applicant: NEC LABORATORIES AMERICA, INC. (Princeton, NJ)
Inventors: Cezary Dubnicki (Monmouth Junction, NJ), Leszek Gryz (Princeton, NJ), Krzysztof Lichota (Warszawa), Cristian Ungureanu (Princeton, NJ)
Application Number: 12/023,133
Classifications
Current U.S. Class: 707/10; File Systems; File Servers (epo) (707/E17.01)
International Classification: G06F 17/30 (20060101);