Abstract: Log-Based Rollback Recovery for system failures. The system includes a storage medium, and a component configured to transition through a series of states. The component is further configured to record in the storage medium the state of the component every time the component communicates with another component in the system, the system being configured to recover the most recent state recorded in the storage medium following a failure of the component.
Abstract: Approaches to efficiently creating a checkpoint of the process are described. In one approach, a method of performing a checkpoint operation on a process involves detecting a change in the contents of a memory page associated with the process. This change occurred after a preceding checkpoint operation. The method also involves modifying a data structure, at a location corresponding to the contents of that memory page.
Type:
Grant
Filed:
March 13, 2007
Date of Patent:
February 22, 2011
Assignee:
Librato, Inc.
Inventors:
Srinidhi Varadarajan, Joseph Ruscio, Michael Heffner
Abstract: Concurrent checkpointing for rollback recovery for system failures is disclosed. The system includes a stable database, and a processor configured to receive and process a checkpoint request while a first thread performs a process and a second thread stores contents of memory regions listed in a first list to the stable storage. Processing the checkpoint request includes write protecting all memory regions listed in a previously initialized and populated second list, initializing an empty third list, creating a coalesced list by combining the contents of the first and second lists, and assigning the coalesced list to the second thread while the first thread proceeds with the process.
Abstract: A method of identifying the source of a memory corruption error during operation of a checkpoint library includes receiving an error detection request and, in response to the request, write protecting all memory regions allocated to a checkpoint library. The method further includes detecting when a memory region is accessed for modification during operation of the checkpoint library and, in response to the detection, identifying the source of a memory corruption error affecting the memory region.