Abstract: The present invention is a system and method for recovering from memory failures in computer systems. The method of the present invention includes the steps of: identifying a predetermined instruction sequence; monitoring for memory access errors in response to the request; logging a memory access error in an error logging register; polling the register for any logged memory access error during execution of the instruction sequence; and raising exceptions, if the memory access error is logged. Within the system of the present invention, memory access errors are stored in an error logging register, machine check abort handles are masked, and memory controllers are under full control of the software so that memory access errors can be intercepted and responded to without necessitating a system reboot or application restart. The present invention is particularly applicable to O/S code which can not otherwise recover from memory errors except by rebooting.
Type:
Grant
Filed:
April 30, 2001
Date of Patent:
February 1, 2005
Assignee:
Hewlett-Packard Development Company
Inventors:
Dejan S. Miloiicic, Thomas Wylegala, Fong Pong, Stephen Hoyle, Lance W. Russell, Lu Xu, Alberto J. Munoz