Abstract: A self-healing computer storage system utilizes a proxy storage management process to service memory access requests directed to stored objects whose designated storage management process has failed. The proxy accesses the relevant parts of the stored objects fault tolerance information to service memory access requests, updating the stored object's fault tolerance information to reflect any changes. When the previously failed storage management process is restarted, it determines if the fault tolerance information for any of the objects (or parts thereof) it manages have been modified (i.e., by a proxy). If such indication is found, the restarting storage management process reconstructs its stored object data (and metadata) from the stored objects' fault tolerance information.
Abstract: A method to manage storage of an object in a computer system having a first and a second storage management process (wherein the stored object has a data portion, a metadata portion and a fault tolerance data portion) includes receiving a memory access request from a client process, routing the memory access request to the first storage management process, determining the first storage management process has failed, routing the memory access request to the second storage management process (having access to the fault tolerance data portion), receiving a result from the second storage management process, and returning at least a portion of the result to the client process. The second storage management process may reconstruct at least a portion of the metadata portion, modify the fault tolerance data portion in accordance with the memory access request, and store the modified fault tolerance information.
Abstract: The invention discloses apparatus and process in which data files are distributed across a large scale data processing system to enable protection from the loss of data due to the failure of one or more fault domains. Specifically, the invention provides significant advances in data base management by distributing data across N fault domains using one or more of a multitude of deterministic functions to protect failure.
Abstract: The invention discloses apparatus and process in which data files are distributed across a large scale data processing system to enable balance of work loads and storage loads at a plurality of nodes. Specifically, the invention provides significant advances in data base management by distributing meta-data in a plurality of file storage nodes to isolatively and distributively store file data in a distributed computing environment. This provides efficient allocation of storage space and work loads among nodes.
Abstract: Method and apparatus for improved detection of multiple-bit errors which occur within a single memory circuits. In one embodiment, a computer system is described which includes a main computer and a memory system. The memory system includes a plurality of memory circuits, at least one of the plurality of memory circuits having a data interface more than two bits wide. Also included is a multiple-bit-error-detect (MBED) circuit, wherein bits from the plurality of memory circuits are coupled to the MBED circuit in an order which causes the MBED circuit to preferentially detect multiple-bit errors which occur on the data interface of any single one of the plurality of memory circuits.