System and method for error detection in a redundant memory system
A system and method is disclosed for detecting errors in memory. A memory subsystem that includes a set of parallel memory channels is disclosed. Data is saved such that a duplicate copy of data is saved to the opposite memory channel according to a horizontal mirroring scheme or a vertical mirroring scheme. A cyclic redundancy code is generated on the basis of the data bits and address bits. The generated cyclic redundancy code and a copy of the cyclic redundancy code are saved to the memory channels according to a horizontal mirroring scheme or a vertical mirroring scheme.
Latest Patents:
The present disclosure relates generally to computer systems and information handling systems, and, more particularly, to a system and method for detecting errors in mirrored memory
BACKGROUNDAs the value and use of information continues to increase, individuals and businesses seek additional ways to process and store information. One option available to these users is an information handling system. An information handling system generally processes, compiles, stores, and/or communicates information or data for business, personal, or other purposes thereby allowing users to take advantage of the value of the information. Because technology and information handling needs and requirements vary between different users or applications, information handling systems may vary with respect to the type of information handled; the methods for handling the information; the methods for processing, storing or communicating the information; the amount of information processed, stored, or communicated; and the speed and efficiency with which the information is processed, stored, or communicated. The variations in information handling systems allow for information handling systems to be general or configured for a specific user or specific use such as financial transaction processing, airline reservations, enterprise data storage, or global communications. In addition, information handling systems may include or comprise a variety of hardware and software components that may be configured to process, store, and communicate information and may include one or more computer systems, data storage systems, and networking systems.
Memory systems, including mirrored memory systems, often use Hamming error correction codes for the purpose of identifying errors in saved data. Although Hamming error correction codes may be effective at identifying single bit errors, Hamming error correction codes are less effective at identifying multiple bit errors. The inability of these memory systems to handle multi-bit errors may cause an error correction routine to be performed that is itself flawed but nonetheless recognized as being correct and yielding valid data. In addition, some multi-bit errors may not be recognized. As a result, the incorrect data in the code word will not be corrected and will be recognized as valid. In addition, if there is a fault in the memory system that causes can address failure resulting in one or more addresses lines being in error, the accessed data at the memory location will return a valid error correction code, but will nevertheless be wrong data.
SUMMARYIn accordance with the present disclosure, a system and method is disclosed for detecting errors in memory. A memory subsystem that includes a set of parallel memory channels is disclosed. Data is saved such that a duplicate copy of data is saved to the opposite memory channel according to a horizontal mirroring scheme or a vertical mirroring scheme. A cyclic redundancy code is generated on the basis of the data bits and address bits. The generated cyclic redundancy code and a copy of the cyclic redundancy code are saved to the memory channels according to a horizontal mirroring scheme or a vertical mirroring scheme.
The system and method disclosed herein is technically advantageous because it provides a technique for improved error detection with the additional benefit of mirrored memory. The system and method herein is advantageous because of the use of a cyclic redundancy code as a method for identifying errors in the saved data bits, with the result being improved error detection. The system and method disclosed herein is also advantageous because the cyclic redundancy code is generated on the basis of the data bits and the address bits associated with the data bits. As such, if an error occurs in the bits of the address bits, the error will be detected.
The system and method disclosed herein is also advantageous because of the use of a mirrored memory for storing the data within the memory subsystem. If an error in a version of stored data is detected, the requested data can be retrieved from the copy of the data that is saved in another location in memory. The saved copy of the data can be accessed in place of the version of the data that includes the error. The system and method disclosed herein is additionally advantageous in that the cyclic redundancy code is mirrored between the parallel memory channels, thereby allowing the integrity of the duplicate copy of the data to be evaluated in the event that an error is detected in the first version of the data. The system and method disclosed herein is also advantageous because an error can be detected through the use of a cyclic redundancy code, thereby eliminating the need to perform a comparison of the data bits during each read cycle. Because a comparison step need not be performed, independent operations can occur simultaneously on each memory channel, thereby preserving the available memory bandwidth of the memory subsystem. Other technical advantages will be apparent to those of ordinary skill in the art in view of the following specification, claims, and drawings.
BRIEF DESCRIPTION OF THE DRAWINGSA more complete understanding of the present embodiments and advantages thereof may be acquired by referring to the following description taken in conjunction with the accompanying drawings, in which like reference numbers indicate like features, and wherein:
For purposes of this disclosure, an information handling system may include any instrumentality or aggregate of instrumentalities operable to compute, classify, process, transmit, receive, retrieve, originate, switch, store, display, manifest, detect, record, reproduce, handle, or utilize any form of information, intelligence, or data for business, scientific, control, or other purposes. For example, an information handling system may be a personal computer, a network storage device, or any other suitable device and may vary in size, shape, performance, functionality, and price. The information handling system may include random access memory (RAM), one or more processing resources such as a central processing unit (CPU) or hardware or software control logic, ROM, and/or other types of nonvolatile memory. Additional components of the information handling system may include one or more disk drives, one or more network ports for communication with external devices as well as various input and output (I/O) devices, such as a keyboard, a mouse, and a video display. The information handling system may also include one or more buses operable to transmit communications between the various hardware components.
Shown in
Shown in
A cyclic redundancy code is a code associated with and derived from the data bits and the address location of the code word. On the basis of the bits comprising the data and the address of the code word, the cyclic redundancy code is generated in logic module 12 in memory controller 15. The thirty-two CRC bits associated with a given code word are created on the basis of an algorithm in a finite state machine in the logic module 12. Using the CRC bits for a code word, the an error in the data bits of a code word can be accomplished by generating a cyclic redundancy code for a code word and comparing the generated cyclic redundancy code with the cyclic redundancy code stored in the memory lines associated with the code word.
The content of Memory Channel A of
Shown in
The mirrored copy of the code word is likewise striped across the two memory channels. In contrast with a horizontal mirroring scheme of
Shown in
Shown in
Shown in
Although the present invention has been described herein, in some instances, with respect to a computer system, it should be recognized that the system and method disclosed herein may be applied and used in any information handling system that includes single or multiple memory channels. Although the present disclosure has been described in detail, it should be understood that various changes, substitutions, and alterations can be made hereto without departing from the spirit and the scope of the invention as defined by the appended claims.
Claims
1. A method for identifying errors in the memory of a computer system, comprising:
- generating a set of cyclic redundancy code bits from a set of data bits and associated address bits;
- saving the data bits and the cyclic redundancy code bits to a first memory location;
- saving a duplicate of the data bits and the cyclic redundancy code bits to a second memory location;
- retrieving the data bits and the cyclic redundancy code bits from the first memory location;
- generating a second set of cyclic redundancy code bits on the basis of the retrieved data bits and associated address bits; and
- comparing the retrieved cyclic redundancy code bits with the second set of the cyclic redundancy code bits.
2. The method for identifying errors in the memory of a computer system of claim 1, further comprising the step of retrieving the duplicate of the data bits and the cyclic redundancy code bits if the retrieved cyclic redundancy code bits are not identical to the second set of the cyclic redundancy code bits.
3. The method for identifying errors in the memory of a computer system of claim 1, wherein the step of generating a set of cyclic redundancy code bits from a set of data bits and associated address bits comprises the step of generating a set of cyclic redundancy code bits in a logic element of a memory controller.
4. The method for identifying errors in the memory of a computer system of claim 1, wherein the step of saving the data bits and the cyclic redundancy code bits to a first memory location comprises the step of saving the data bits and cyclic redundancy code bits to a first memory location associated with a first memory channel; and
- wherein the step of saving a duplicate of the data bits and the cyclic redundancy code bits to a second memory location comprises the step of saving the duplicate of the data bits and cyclic redundancy code bits to a second memory location associated with a second memory channel.
5. The method for identifying errors in the memory of a computer system of claim 4, wherein the first memory location and the second memory location are dual in-line memory modules.
6. The method for identifying errors in the memory of a computer system of claim 5, wherein the cyclic redundancy code bits are saved across multiple memory rows in the first memory location and wherein the duplicate of the cyclic redundancy code bits are saved across multiple memory rows in the second memory location.
7. The method for identifying errors in the memory of a computer system of claim 2, wherein the step of retrieving the duplicate of the data bits and the cyclic redundancy code bits is followed by the steps of:
- generating a third set of cyclic redundancy code bits on the basis of the retrieved duplicate data bits and associated address bits; and
- comparing the retrieved cyclic redundancy code bits with the third set of the cyclic redundancy code bits.
8. A method for identifying errors in the memory of a computer system, comprising:
- generating a set of cyclic redundancy code bits from a set of data bits and respective address bits;
- saving a first portion of the data bits and the cyclic redundancy bits to a first memory location;
- saving a duplicate of the first portion of the data bits and the cyclic redundancy bits to a second memory location;
- saving a second portion of the data bits and the cyclic redundancy bits to a second memory location;
- saving a duplicate of the second portion of the data bits and the cyclic redundancy bits to a first memory location
- retrieving the first portion of the data bits and the cyclic redundancy code bits from the first memory location and the second portion of the data bits and the cyclic redundancy code bits from the second memory location;
- generating a second set of cyclic redundancy code bits on the basis of the retrieved data bits; and
- comparing the retrieved cyclic redundancy code bits with the second set of the cyclic redundancy code bits.
9. The method for identifying errors in the memory of a computer system of claim 8, further comprising the step of retrieving the duplicate of the first portion of the data bits and the cyclic redundancy code bits and the duplicate of the second portion of the data bits and the cyclic redundancy code bits if the retrieved cyclic redundancy code bits are not identical to the second set of the cyclic redundancy code bits.
10. The method for identifying errors in the memory of a computer system of claim 9, wherein the step of generating a set of cyclic redundancy code bits from a set of data bits comprises the step of generating a set of cyclic redundancy code bits in a logic element of a memory controller.
11. The method for identifying errors in the memory of a computer system of claim 10, wherein the step of generating a second set of cyclic redundancy code bits on the basis of the retrieved data bits comprises the step of generating a second set of cyclic redundancy code bits in the logic element of the memory controller.
12. The method for identifying errors in the memory of a computer system of claim 8, wherein the data bits are divided into four sets;
- wherein the first and third sets comprise the first portion of the data bits saved to a first memory location;
- wherein the second and fourth sets comprise the second portion of the data bits saved to a second memory location.
13. The method for identifying errors in the memory of a computer system of claim 8,
- wherein the duplicate data bits are divided into four sets;
- wherein the first and third sets comprise the first portion of the data bits saved to a second memory location;
- wherein the second and fourth sets comprise the second portion of the data bits saved to a first memory location.
14. The method for identifying errors in the memory of a computer system of claim 8,
- wherein the first memory location is accessible through a first memory channel;
- wherein the second memory location is accessible through a second memory channel; and
- wherein the first memory channel is logically parallel to the second memory channel.
15. The method for identifying errors in the memory of a computer system of claim 14, wherein the first memory location and the second memory location are dual in-line memory modules.
16. The method for identifying errors in the memory of a computer system of claim 9, wherein the step of retrieving the duplicate of the data bits and the cyclic redundancy code bits is followed by the steps of:
- generating a third set of cyclic redundancy code bits on the basis of the retrieved duplicate data bits; and
- comparing the retrieved cyclic redundancy code bits with the third set of the cyclic redundancy code bits.
17. A memory subsystem, comprising:
- a memory controller;
- a first memory channel coupled to the memory controller, the first memory channel comprising a plurality of memory lines for storing a code word comprising a set of data bits and a cyclic redundancy code generated on the basis of the set of data bits and corresponding address bits; and
- a second memory channel couple to the memory controller, the second memory channel comprising a plurality of memory lines for storing a duplicate of the data bits and cyclic redundancy code of the first memory channel.
18. The memory subsystem of claim 17, wherein the memory controller includes a logic element for generating a cyclic redundancy code on the basis of a set of data bits.
19. A memory subsystem, comprising:
- a memory controller;
- a first memory channel coupled to the memory controller, the first memory channel comprising a plurality of memory lines for storing a first portion of a code word, a first portion of a cyclic redundancy code generated on the basis of the code word, a duplicate of the second portion of the code word, and a duplicate of the second portion of a cyclic redundancy code generated on the basis of the code word; and
- a second memory channel coupled to the memory controller, the second memory channel comprising a plurality of memory lines for storing a duplicate of the first portion of a code word, a duplicate of the first portion of a cyclic redundancy code generated on the basis of the code word, a second portion of the code word, and a second portion of a cyclic redundancy code generated on the basis of the code word.
20. The memory subsystem of claim 19, wherein the memory controller includes a logic element for generating a cyclic redundancy code on the basis of a set of data bits.
Type: Application
Filed: Oct 7, 2004
Publication Date: Apr 13, 2006
Applicant:
Inventor: John Pescatore (Georgetown, TX)
Application Number: 10/960,465
International Classification: G11C 8/02 (20060101);