System and Method for Simulating an Aspect of a Memory Circuit

Info

Publication number: 20120109621
Type: Application
Filed: Jan 5, 2012
Publication Date: May 3, 2012
Applicant: GOOGLE INC. (Mountain View, CA)
Inventors: Suresh Natarajan Rajan (San Jose, CA), Keith R. Schakel (San Jose, CA), Michael John Sebastian Smith (Palo Alto, CA), David T. Wang (San Jose, CA), Frederick Daniel Weber (San Jose, CA)
Application Number: 13/343,852

Abstract

A memory subsystem is provided including an interface circuit adapted for coupling with a plurality of memory circuits and a system. The interface circuit is operable to interface the memory circuits and the system for emulating at least one memory circuit with at least one aspect that is different from at least one aspect of at least one of the plurality of memory circuits. Such aspect includes a signal, a capacity, a timing, and/or a logical interface.

Description

Description

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is a continuation application of, and claims priority to pending U.S. patent application Ser. No. 11/929,500, filed on Oct. 30, 2007 and entitled “System and Method for Simulating an Aspect of a Memory Circuit,” which is a continuation of U.S. patent application Ser. No. 11/762,013 (now U.S. Pat. No. 8,090,897), filed on Jun. 12, 2007 and entitled “System and Method for Simulating an Aspect of a Memory Circuit,” which is a continuation-in-part of pending U.S. patent application Ser. No. 11/461,420, filed on Jul. 31, 2006 and entitled “System and Method for Simulating a Different Number of Memory Circuits.” The contents of the prior applications are incorporated herein by reference in their entirety.

BACKGROUND OF THE INVENTION

1. Technical Field of the Invention

This invention relates generally to digital memory such as used in computers, and more specifically to organization and design of memory modules such as DIMMs.

2. Background Art

Digital memories are utilized in a wide variety of electronic systems, such as personal computers, workstations, servers, consumer electronics, printers, televisions, and so forth. Digital memories are manufactured as monolithic integrated circuits (“ICs” or “chips”). Digital memories come in several types, such as dynamic random access memory (DRAM), static random access memory (SRAM), flash memory, electrically erasable programmable read only memory (EEPROM), and so forth.

In some systems, the memory chips are coupled directly into the system such as by being soldered directly to the system's main motherboard. In other systems, groups of memory chips are first coupled into memory modules, such as dual in-line memory modules (DIMMs), which are in turn coupled into a system by means of slots, sockets, or other connectors. Some types of memory modules include not only the memory chips themselves, but also some additional logic which interfaces the memory chips to the system. This logic may perform a variety of low level functions, such as buffering or latching signals between the chips and the system, but it may also perform higher level functions, such as telling the system what are the characteristics of the memory chips. These characteristics may include, for example, memory capacity, speed, latency, interface protocol, and so forth.

Memory capacity requirements of such systems are increasing rapidly. However, other industry trends such as higher memory bus speeds, small form factor machines, etc. are reducing the number of memory module slots, sockets, connectors, etc. that are available in such systems. There is, therefore, pressure for manufacturers to use large capacity memory modules in such systems.

However, there is also an exponential relationship between a memory chip's capacity and its price. As a result, large capacity memory modules may be cost prohibitive in some systems.

What is needed, then, is an effective way to make use of low cost memory chips in manufacturing high capacity memory modules.

SUMMARY

A memory subsystem is provided including an interface circuit adapted for coupling with a plurality of memory circuits and a system. The interface circuit is operable to interface the memory circuits and the system for emulating at least one memory circuit with at least one aspect that is different from at least one aspect of at least one of the plurality of memory circuits. Such aspect includes a signal, a capacity, a timing, and/or a logical interface.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 shows a system coupled to multiple memory circuits and an interface circuit according to one embodiment of this invention.

FIG. 2 shows a buffered stack of DRAM circuits each having a dedicated data path from the buffer chip and sharing a single address, control, and clock bus.

FIG. 3 shows a buffered stack of DRAM circuits having two address, control, and clock busses and two data busses.

FIG. 4 shows a buffered stack of DRAM circuits having one address, control, and clock bus and two data busses.

FIG. 5 shows a buffered stack of DRAM circuits having one address, control, and clock bus and one data bus.

FIG. 6 shows a buffered stack of DRAM circuits in which the buffer chip is located in the middle of the stack of DRAM chips.

FIG. 7 is a flow chart showing one method of storing information.

FIG. 8 shows a high capacity DIMM using buffered stacks of DRAM chips according to one embodiment of this invention.

FIG. 9 is a timing diagram showing one embodiment of how the buffer chip makes a buffered stack of DRAM circuits appear to the system or memory controller to use longer column address strobe (CAS) latency DRAM chips than is actually used by the physical DRAM chips.

FIG. 10 shows a timing diagram showing the write data timing expected by DRAM in a buffered stack, in accordance with another embodiment of this invention.

FIG. 11 is a timing diagram showing how write control signals are delayed by a buffer chip in accordance with another embodiment of this invention.

FIG. 12 is a timing diagram showing early write data from a memory controller or an advanced memory buffer (AMB) according to yet another embodiment of this invention.

FIG. 13 is a timing diagram showing address bus conflicts caused by delayed write operations.

FIG. 14 is a timing diagram showing variable delay of an activate operation through a buffer chip.

FIG. 15 is a timing diagram showing variable delay of a precharge operation through a buffer chip.

FIG. 16 shows a buffered stack of DRAM circuits and the buffer chip which presents them to the system as if they were a single, larger DRAM circuit, in accordance with one embodiment of this invention.

FIG. 17 is a flow chart showing a method of refreshing a plurality of memory circuits, in accordance with one embodiment of this invention.

FIG. 18 shows a block diagram of another embodiment of the invention.

DETAILED DESCRIPTION

The invention will be understood more fully from the detailed description given below and from the accompanying drawings of embodiments of the invention which, however, should not be taken to limit the invention to the specific embodiments described, but are for explanation and understanding only.

FIG. 1 illustrates a system 100 including a system device 106 coupled to an interface circuit 102, which is in turn coupled to a plurality of physical memory circuits 104A-N. The physical memory circuits may be any type of memory circuits. In some embodiments, each physical memory circuit is a separate memory chip. For example, each may be a DDR2 DRAM. In some embodiments, the memory circuits may be symmetrical, meaning each has the same capacity, type, speed, etc., while in other embodiments they may be asymmetrical. For ease of illustration only, three such memory circuits are shown, but actual embodiments may use any plural number of memory circuits. As will be discussed below, the memory chips may optionally be coupled to a memory module (not shown), such as a DIMM.

The system device may be any type of system capable of requesting and/or initiating a process that results in an access of the memory circuits. The system may include a memory controller (not shown) through which it accesses the memory circuits.

The interface circuit may include any circuit or logic capable of directly or indirectly communicating with the memory circuits, such as a buffer chip, advanced memory buffer (AMB) chip, etc. The interface circuit interfaces a plurality of signals 108 between the system device and the memory circuits. Such signals may include, for example, data signals, address signals, control signals, clock signals, and so forth. In some embodiments, all of the signals communicated between the system device and the memory circuits are communicated via the interface circuit. In other embodiments, some other signals 110 are communicated directly between the system device (or some component thereof, such as a memory controller, an AMB, or a register) and the memory circuits, without passing through the interface circuit. In some such embodiments, the majority of signals are communicated via the interface circuit, such that L>M.

As will be explained in greater detail below, the interface circuit presents to the system device an interface to emulated memory devices which differ in some aspect from the physical memory circuits which are actually present. For example, the interface circuit may tell the system device that the number of emulated memory circuits is different than the actual number of physical memory circuits. The terms “emulating”, “emulated”, “emulation”, and the like will be used in this disclosure to signify emulation, simulation, disguising, transforming, converting, and the like, which results in at least one characteristic of the memory circuits appearing to the system device to be different than the actual, physical characteristic. In some embodiments, the emulated characteristic may be electrical in nature, physical in nature, logical in nature (e.g. a logical interface, etc.), pertaining to a protocol, etc. An example of an emulated electrical characteristic might be a signal, or a voltage level. An example of an emulated physical characteristic might be a number of pins or wires, a number of signals, or a memory capacity. An example of an emulated protocol characteristic might be a timing, or a specific protocol such as DDR3.

In the case of an emulated signal, such signal may be a control signal such as an address signal, a data signal, or a control signal associated with an activate operation, precharge operation, write operation, mode register read operation, refresh operation, etc. The interface circuit may emulate the number of signals, type of signals, duration of signal assertion, and so forth. It may combine multiple signals to emulate another signal.

The interface circuit may present to the system device an emulated interface to e.g. DDR3 memory, while the physical memory chips are, in fact, DDR2 memory. The interface circuit may emulate an interface to one version of a protocol such as DDR2 with 5-5-5 latency timing, while the physical memory chips are built to another version of the protocol such as DDR2 with 3-3-3 latency timing. The interface circuit may emulate an interface to a memory having a first capacity that is different than the actual combined capacity of the physical memory chips.

An emulated timing may relate to latency of e.g. a column address strobe (CAS) latency, a row address to column address latency (tRCD), a row precharge latency (tRP), an activate to precharge latency (tRAS), and so forth. CAS latency is related to the timing of accessing a column of data. tRCD is the latency required between the row address strobe (RAS) and CAS. tRP is the latency required to terminate an open row and open access to the next row. tRAS is the latency required to access a certain row of data between an activate operation and a precharge operation.

The interface circuit may be operable to receive a signal from the system device and communicate the signal to one or more of the memory circuits after a delay (which may be hidden from the system device). Such delay may be fixed, or in some embodiments it may be variable. If variable, the delay may depend on e.g. a function of the current signal or a previous signal, a combination of signals, or the like. The delay may include a cumulative delay associated with any one or more of the signals. The delay may result in a time shift of the signal forward or backward in time with respect to other signals. Different delays may be applied to different signals. The interface circuit may similarly be operable to receive a signal from a memory circuit and communicate the signal to the system device after a delay.

The interface circuit may take the form of, or incorporate, or be incorporated into, a register, an AMB, a buffer, or the like, and may comply with Joint Electron Device Engineering Council (JEDEC) standards, and may have forwarding, storing, and/or buffering capabilities.

In some embodiments, the interface circuit may perform operations without the system device's knowledge. One particularly useful such operation is a power-saving operation. The interface circuit may identify one or more of the memory circuits which are not currently being accessed by the system device, and perform the power saving operation on those. In one such embodiment, the identification may involve determining whether any page (or other portion) of memory is being accessed. The power saving operation may be a power down operation, such as a precharge power down operation.

The interface circuit may include one or more devices which together perform the emulation and related operations. The interface circuit may be coupled or packaged with the memory devices, or with the system device or a component thereof, or separately. In one embodiment, the memory circuits and the interface circuit are coupled to a DIMM.

FIG. 2 illustrates one embodiment of a system 200 including a system device (e.g. host system 204, etc.) which communicates address, control, clock, and data signals with a memory subsystem 201 via an interface.

The memory subsystem includes a buffer chip 202 which presents the host system with emulated interface to emulated memory, and a plurality of physical memory circuits which, in the example shown, are DRAM chips 206A-D. In one embodiment, the DRAM chips are stacked, and the buffer chip is placed electrically between them and the host system. Although the embodiments described here show the stack consisting of multiple DRAM circuits, a stack may refer to any collection of memory circuits (e.g. DRAM circuits, flash memory circuits, or combinations of memory circuit technologies, etc.).

The buffer chip buffers communicates signals between the host system and the DRAM chips, and presents to the host system an emulated interface to present the memory as though it were a smaller number of larger capacity DRAM chips, although in actuality there is a larger number of smaller capacity DRAM chips in the memory subsystem. For example, there may be eight 512 Mb physical DRAM chips, but the buffer chip buffers and emulates them to appear as a single 4 Gb DRAM chip, or as two 2 Gb DRAM chips. Although the drawing shows four DRAM chips, this is for ease of illustration only; the invention is, of course, not limited to using four DRAM chips.

In the example shown, the buffer chip is coupled to send address, control, and clock signals 208 to the DRAM chips via a single, shared address, control, and clock bus, but each DRAM chip has its own, dedicated data path for sending and receiving data signals 210 to/from the buffer chip.

Throughout this disclosure, the reference number 1 will be used to denote the interface between the host system and the buffer chip, the reference number 2 will be used to denote the address, control, and clock interface between the buffer chip and the physical memory circuits, and the reference number 3 will be used to denote the data interface between the buffer chip and the physical memory circuits, regardless of the specifics of how any of those interfaces is implemented in the various embodiments and configurations described below. In the configuration shown in FIG. 2, there is a single address, control, and clock interface channel 2 and four data interface channels 3; this implementation may thus be said to have a “1A4D” configuration (wherein “1A” means one address, control, and clock channel in interface 2, and “4D” means four data channels in interface 3).

In the example shown, the DRAM chips are physically arranged on a single side of the buffer chip. The buffer chip may, optionally, be a part of the stack of DRAM chips, and may optionally be the bottommost chip in the stack. Or, it may be separate from the stack.

FIG. 3 illustrates another embodiment of a system 301 in which the buffer chip 303 is interfaced to a host system 304 and is coupled to the DRAM chips 307A-307D somewhat differently than in the system of FIG. 2. There are a plurality of shared address, control, and clock busses 309A and 309B, and a plurality of shared data busses 305A and 305B. Each shared bus has two or more DRAM chips coupled to it. As shown, the sharing need not necessarily be the same in the data busses as it is in the address, control, and clock busses. This embodiment has a “2A2D” configuration.

FIG. 4 illustrates another embodiment of a system 411 in which the buffer chip 413 is interfaced to a host system 404 and is coupled to the DRAM chips 417A-417D somewhat differently than in the system of FIG. 2 or 3. There is a shared address, control, and clock bus 419, and a plurality of shared data busses 415A and 415B. Each shared bus has two or more DRAM chips coupled to it. This implementation has a “1A2D” configuration.

FIG. 5 illustrates another embodiment of a system 521 in which the buffer chip 523 is interfaced to a host system 504 and is coupled to the DRAM chips 527A-527D somewhat differently than in the system of FIGS. 2 through 4. There is a shared address, control, and clock bus 529, and a shared data bus 525. This implementation has a “1A1D” configuration.

FIG. 6 illustrates another embodiment of a system 631 in which the buffer chip 633 is interfaced to a host system 604 and is coupled to the DRAM chips 637A-637D somewhat differently than in the system of FIGS. 2 through 5. There is a plurality of shared address, control, and clock busses 639A and 639B, and a plurality of dedicated data paths 635. Each shared bus has two or more DRAM chips coupled to it. Further, in the example shown, the DRAM chips are physically arranged on both sides of the buffer chip. There may be, for example, sixteen DRAM chips, with the eight DRAM chips on each side of the buffer chip arranged in two stacks of four chips each. This implementation has a “2A4D” configuration.

FIGS. 2 through 6 are not intended to be an exhaustive listing of all possible permutations of data paths, busses, and buffer chip configurations, and are only illustrative of some ways in which the host system device can be in electrical contact only with the load of the buffer chip and thereby be isolated from whatever physical memory circuits, data paths, busses, etc. exist on the (logical) other side of the buffer chip.

FIG. 7 illustrates one embodiment of a method 700 for storing at least a portion of information received in association with a first operation, for use in performing a second operation. Such a method may be practiced in a variety of systems, such as, but not limited to, those of FIGS. 1-6. For example, the method may be performed by the interface circuit of FIG. 1 or the buffer chip of FIG. 2.

Initially, first information is received (702) in association with a first operation to be performed on at least one of the memory circuits (DRAM chips). Depending on the particular implementation, the first information may be received prior to, simultaneously with, or subsequent to the instigation of the first operation. The first operation may be, for example, a row operation, in which case the first information may include e.g. address values received by the buffer chip via the address bus from the host system. At least a portion of the first information is then stored (704).

The buffer chip also receives (706) second information associated with a second operation. For convenience, this receipt is shown as being after the storing of the first information, but it could also happen prior to or simultaneously with the storing. The second operation may be, for example, a column operation.

Then, the buffer chip performs (708) the second operation, utilizing the stored portion of the first information, and the second information.

If the buffer chip is emulating a memory device which has a larger capacity than each of the physical DRAM chips in the stack, the buffer chip may receive from the host system's memory controller more address bits than are required to address any given one of the DRAM chips. In this instance, the extra address bits may be decoded by the buffer chip to individually select the DRAM chips, utilizing separate chip select signals (not shown) to each of the DRAM chips in the stack.

For example, a stack of four ×4 1Gb DRAM chips behind the buffer chip may appear to the host system as a single ×4 4 Gb DRAM circuit, in which case the memory controller may provide sixteen row address bits and three bank address bits during a row operation (e.g. an activate operation), and provide eleven column address bits and three bank address bits during a column operation (e.g. a read or write operation). However, the individual DRAM chips in the stack may require only fourteen row address bits and three bank address bits for a row operation, and eleven column address bits and three bank address bits during a column operation. As a result, during a row operation (the first operation in the method 702), the buffer chip may receive two address bits more than are needed by any of the DRAM chips. The buffer chip stores (704) these two extra bits during the row operation (in addition to using them to select the correct one of the DRAM chips), then uses them later, during the column operation, to select the correct one of the DRAM chips.

The mapping between a system address (from the host system to the buffer chip) and a device address (from the buffer chip to a DRAM chip) may be performed in various manners. In one embodiment, lower order system row address and bank address bits may be mapped directly to the device row address and bank address bits, with the most significant system row address bits (and, optionally, the most significant bank address bits) being stored for use in the subsequent column operation. In one such embodiment, what is stored is the decoded version of those bits; in other words, the extra bits may be stored either prior to or after decoding. The stored bits may be stored, for example, in an internal lookup table (not shown) in the buffer chip, for one or more clock cycles.

As another example, the buffer chip may have four 512 Mb DRAM chips with which it emulates a single 2 Gb DRAM chip. The system will present fifteen row address bits, from which the buffer chip may use the fourteen low order bits (or, optionally, some other set of fourteen bits) to directly address the DRAM chips. The system will present three bank address bits, from which the buffer chip may use the two low order bits (or, optionally, some other set of two bits) to directly address the DRAM chips. During a row operation, the most significant bank address bit (or other unused bit) and the most significant row address bit (or other unused bit) are used to generate the four DRAM chip select signals, and are stored for later reuse. And during a subsequent column operation, the stored bits are again used to generate the four DRAM chip select signals. Optionally, the unused bank address is not stored during the row operation, as it will be re-presented during the subsequent column operation.

As yet another example, addresses may be mapped between four 1 Gb DRAM circuits to emulate a single 4 Gb DRAM circuit. Sixteen row address bits and three bank address bits come from the host system, of which the low order fourteen address bits and all three bank address bits are mapped directly to the DRAM circuits. During a row operation, the two most significant row address bits are decoded to generate four chip select signals, and are stored using the bank address bits as the index. During the subsequent column operation, the stored row address bits are again used to generate the four chip select signals.

A particular mapping technique may be chosen, to ensure that there are no unnecessary combinational logic circuits in the critical timing path between the address input pins and address output pins of the buffer chip. Corresponding combinational logic circuits may instead be used to generate the individual chip select signals. This may allow the capacitive loading on the address outputs of the buffer chip to be much higher than the loading on the individual chip select signal outputs of the buffer chip.

In another embodiment, the address mapping may be performed by the buffer chip using some of the bank address signals from the host system to generate the chip select signals. The buffer chip may store the higher order row address bits during a row operation, using the bank address as the index, and then use the stored address bits as part of the DRAM circuit bank address during a column operation.

For example, four 512 Mb DRAM chips may be used in emulating a single 2 Gb DRAM. Fifteen row address bits come from the host system, of which the low order fourteen are mapped directly to the DRAM chips. Three bank address bits come from the host system, of which the least significant bit is used as a DRAM circuit bank address bit for the DRAM chips. The most significant row address bit may be used as an additional DRAM circuit bank address bit. During a row operation, the two most significant bank address bits are decoded to generate the four chip select signals. The most significant row address bit may be stored during the row operation, and reused during the column operation with the least significant bank address bit, to form the DRAM circuit bank address.

The column address from the host system memory controller may be mapped directly as the column address to the DRAM chips in the stack, since each of the DRAM chips may have the same page size, regardless any differences in the capacities of the (asymmetrical) DRAM chips.

Optionally, address bit A[10] may be used by the memory controller to enable or disable auto-precharge during a column operation, in which case the buffer chip may forward that bit to the DRAM circuits without any modification during a column operation.

In various embodiments, it may be desirable to determine whether the simulated DRAM circuit behaves according to a desired DRAM standard or other design specification. Behavior of many DRAM circuits is specified by the JEDEC standards, and it may be desirable to exactly emulate a particular JEDEC standard DRAM. The JEDEC standard defines control signals that a DRAM circuit must accept and the behavior of the DRAM circuit as a result of such control signals. For example, the JEDEC specification for DDR2 DRAM is known as JESD79-2B. If it is desired to determine whether a standard is met, the following algorithm may be used. Using a set of software verification tools, it checks for formal verification of logic, that protocol behavior of the simulated DRAM circuit is the same as the desired standard or other design specification. Examples of suitable verification tools include: Magellan, supplied by Synopsys, Inc. of 700 E. Middlefield Rd., Mt. View, Calif. 94043; Incisive, supplied by Cadence Design Systems, Inc., of 2655 Sealy Ave., San Jose, Calif. 95134; tools supplied by Jasper Design Automation, Inc. of 100 View St. #100, Mt. View, Calif. 94041; Verix, supplied by Real Intent, Inc., of 505 N. Mathilda Ave. #210, Sunnyvale, Calif. 94085; 0-In, supplied by Mentor Graphics Corp. of 8005 SW Boeckman Rd., Wilsonville, Oreg. 97070; and others. These software verification tools use written assertions that correspond to the rules established by the particular DRAM protocol and specification. These written assertions are further included in the code that forms the logic description for the buffer chip. By writing assertions that correspond to the desired behavior of the emulated DRAM circuit, a proof may be constructed that determines whether the desired design requirements are met.

For instance, an assertion may be written that no two DRAM control signals are allowed to be issued to an address, control, and clock bus at the same time. Although one may know which of the various buffer chip/DRAM stack configurations and address mappings (such as those described above) are suitable, the verification process allows a designer to prove that the emulated DRAM circuit exactly meets the required standard etc. If, for example, an address mapping that uses a common bus for data and a common bus for address, results in a control and clock bus that does not meet a required specification, alternative designs for buffer chips with other bus arrangements or alternative designs for the sideband signal interconnect between two or more buffer chips may be used and tested for compliance. Such sideband signals convey the power management signals, for example.

FIG. 8 illustrates a high capacity DIMM 800 using a plurality of buffered stacks of DRAM circuits 802 and a register device 804, according to one embodiment of this invention. The register performs the addressing and control of the buffered stacks. In some embodiments, the DIMM may be an FB-DIMM, in which case the register is an AMB. In one embodiment the emulation is performed at the DIMM level.

FIG. 9 is a timing diagram illustrating a timing design 900 of a buffer chip which makes a buffered stack of DRAM chips mimic a larger DRAM circuit having longer CAS latency, in accordance with another embodiment of this invention. Any delay through a buffer chip may be made transparent to the host system's memory controller, by using such a method. Such a delay may be a result of the buffer chip being located electrically between the memory bus of the host system and the stacked DRAM circuits, since some or all of the signals that connect the memory bus to the DRAM circuits pass through the buffer chip. A finite amount of time may be needed for these signals to traverse through the buffer chip. With the exception of register chips and AMBs, industry standard memory protocols may not comprehend the buffer chip that sits between the memory bus and the DRAM chips. Industry standards narrowly define the properties of a register chip and an AMB, but not the properties of the buffer chip of this embodiment. Thus, any signal delay caused by the buffer chip may cause a violation of the industry standard protocols.

In one embodiment, the buffer chip may cause a one-half clock cycle delay between the buffer chip receiving address and control signals from the host system memory controller (or, optionally, from a register chip or an AMB), and the address and control signals being valid at the inputs of the stacked DRAM circuits. Data signals may also have a one-half clock cycle delay in either direction to/from the host system. Other amounts of delay are, of course, possible, and the half-clock cycle example is for illustration only.

The cumulative delay through the buffer chip is the sum of a delay of the address and control signals and a delay of the data signals. FIG. 9 illustrates an example where the buffer chip is using DRAM chips having a native CAS latency of i clocks, and the buffer chip delay is j clocks, thus the buffer chip emulates a DRAM having a CAS latency of i+j clocks. In the example shown, the DRAM chips have a native CAS latency 906 of four clocks (from t1 to t5), and the total latency through the buffer chip is two clocks (one clock delay 902 from t0 to t1 for address and control signals, plus one clock delay 904 from t5 to t6 for data signals), and the buffer chip emulates a DRAM having a six clock CAS latency 908.

In FIG. 9 (and other timing diagrams), the reference numbers 1, 2, and/or 3 at the left margin indicate which of the interfaces correspond to the signals or values illustrated on the associated waveforms. For example, in FIG. 9: the “Clock” signal shown as a square wave on the uppermost waveform is indicated as belonging to the interface 1 between the host system and the buffer chip; the “Control Input to Buffer” signal is also part of the interface 1; the “Control Input to DRAM” waveform is part of the interface 2 from the buffer chip to the physical memory circuits; the “Data Output from DRAM” waveform is part of the interface 3 from the physical memory circuits to the buffer chip; and the “Data Output from Buffer” shown in the lowermost waveform is part of the interface 1 from the buffer chip to the host system.

FIG. 10 is a timing diagram illustrating a timing design 1000 of write data timing expected by a DRAM circuit in a buffered stack. Emulation of a larger capacity DRAM circuit having higher CAS latency (as in FIG. 9) may, in some implementations, create a problem with the timing of write operations. For example, with respect to a buffered stack of DDR2 SDRAM chips with a read CAS latency of four clocks which are used in emulating a single larger DDR2 SDRAM with a read CAS latency of six clocks, the DDR2 SDRAM protocol may specify that the write CAS latency 1002 is one less than the read CAS latency. Therefore, since the buffered stack appears as a DDR2 SDRAM with a read CAS latency of six clocks, the memory controller may use a buffered stack write CAS latency of five clocks 1004 when scheduling a write operation to the memory.

In the specific example shown, the memory controller issues the write operation at t0. After a one clock cycle delay through the buffer chip, the write operation is issued to the DRAM chips at t1. Because the memory controller believes it is connected to memory having a read CAS latency of six clocks and thus a write CAS latency of five clocks, it issues the write data at time t0+5=t5. But because the physical DRAM chips have a read CAS latency of four clocks and thus a write CAS latency of three clocks, they expect to receive the write data at time t1+3=t4. Hence the problem, which the buffer chip may alleviate by delaying write operations.

The waveform “Write Data Expected by DRAM” is not shown as belonging to interface 1, interface 2, or interface 3, for the simple reason that there is no such signal present in any of those interfaces. That waveform represents only what is expected by the DRAM, not what is actually provided to the DRAM.

FIG. 11 is a timing illustrating a timing design 1100 showing how the buffer chip does this. The memory controller issues the write operation at t0. In FIG. 10, the write operation appeared at the DRAM circuits one clock later at t1, due to the inherent delay through the buffer chip. But in FIG. 11, in addition to the inherent one clock delay, the buffer chip has added an extra two clocks of delay to the write operation, which is not issued to the DRAM chips until t0+1+2=t3. Because the DRAM chips receive the write operation at t3 and have a write CAS latency of three clocks, they expect to receive the write data at t3+3=t6. Because the memory controller issued the write operation at t0, and it expects a write CAS latency of five clocks, it issues the write data at time t0+5=t5. After a one clock delay through the buffer chip, the write data arrives at the DRAM chips at t5+1=t6, and the timing problem is solved. It should be noted that extra delay of j clocks (beyond the inherent delay) which the buffer chip deliberately adds before issuing the write operation to the DRAM is the sum j clocks of the inherent delay of the address and control signals and the inherent delay of the data signals. In the example shown, both those inherent delays are one clock, so j=2.

FIG. 12 is a timing diagram illustrating operation of an FB-DIMM's AMB, which may be designed to send write data earlier to buffered stacks instead of delaying the write address and operation (as in FIG. 11). Specifically, it may use an early write CAS latency 1202 to compensate the timing of the buffer chip write operation. If the buffer chip has a cumulative (address and data) inherent delay of two clocks, the AMB may send the write data to the buffered stack two clocks early. This may not be possible in the case of registered DIMMs, in which the memory controller sends the write data directly to the buffered stacks (rather than via the AMB). In another embodiment, the memory controller itself could be designed to send write data early, to compensate for the j clocks of cumulative inherent delay caused by the buffer chip.

In the example shown, the memory controller issues the write operation at t0. After a one clock inherent delay through the buffer chip, the write operation arrives at the DRAM at t1. The DRAM expects the write data at t1+3−t4. The industry specification would suggest a nominal write data time of t0+5=t5, but the AMB (or memory controller), which already has the write data (which are provided with the write operation), is configured to perform an early write at t5−2=t3. After the inherent delay 1203 through the buffer chip, the write data arrive at the DRAM at t3+1=t4, exactly when the DRAM expects it—specifically, with a three-cycle DRAM Write CAS latency 1204 which is equal to the three-cycle Early Write CAS Latency 1202.

FIG. 13 is a timing diagram 1300 illustrating bus conflicts which can be caused by delayed write operations. The delaying of write addresses and write operations may be performed by a buffer chip, a register, an AMB, etc. in a manner that is completely transparent to the memory controller of the host system. And, because the memory controller is unaware of this delay, it may schedule subsequent operations such as activate or precharge operations, which may collide with the delayed writes on the address bus to the DRAM chips in the stack.

An example is shown, in which the memory controller issues a write operation 1302 at time t0. The buffer chip or AMB delays the write operation, such that it appears on the bus to the DRAM chips at time t3. Unfortunately, at time t2 the memory controller issued an activate operation (control signal) 1304 which, after a one-clock inherent delay through the buffer chip, appears on the bus to the DRAM chips at time t3, colliding with the delayed write.

FIGS. 14 and 15 are a timing diagram 1400 and a timing diagram 1500 illustrating methods of avoiding such collisions. If the cumulative latency through the buffer chip is two clock cycles, and the native read CAS latency of the DRAM chips is four clock cycles, then in order to hide the delay of the address and control signals and the data signals through the buffer chip, the buffer chip presents the host system with an interface to an emulated memory having a read CAS latency of six clock cycles. And if the tRCD and tRP of the DRAM chips are four clock cycles each, the buffer chip tells the host system that they are six clock cycles each in order to allow the buffer chip to delay the activate and precharge operations to avoid collisions in a manner that is transparent to the host system.

For example, a buffered stack that uses 4-4-4 DRAM chips (that is, CAS latency=4, tRCD=4, and tRP=4) may appear to the host system as one larger DRAM that uses 6-6-6 timing.

Since the buffered stack appears to the host system's memory controller as having a tRCD of six clock cycles, the memory controller may schedule a column operation to a bank six clock cycles (at time t6) after an activate (row) operation (at time t0) to the same bank. However, the DRAM chips in the stack actually have a tRCD of four clock cycles. This gives the buffer chip time to delay the activate operation by up to two clock cycles, avoiding any conflicts on the address bus between the buffer chip and the DRAM chips, while ensuring correct read and write timing on the channel between the memory controller and the buffered stack.

As shown, the buffer chip may issue the activate operation to the DRAM chips one, two, or three clock cycles after it receives the activate operation from the memory controller, register, or AMB. The actual delay selected may depend on the presence or absence of other DRAM operations that may conflict with the activate operation, and may optionally change from one activate operation to another. In other words, the delay may be dynamic. A one-clock delay (1402A, 1502A) may be accomplished simply by the inherent delay through the buffer chip. A two-clock delay (1402B, 1502B) may be accomplished by adding one clock of additional delay to the one-clock inherent delay, and a three-clock delay (1402C, 1502C) may be accomplished by adding two clocks of additional delay to the one-clock inherent delay. A read, write, or activate operation issued by the memory controller at time t6 will, after a one-clock inherent delay through the buffer chip, be issued to the DRAM chips at time t7. A preceding activate or precharge operation issued by the memory controller at time t0 will, depending upon the delay, be issued to the DRAM chips at time t1, t2, or t3, each of which is at least the tRCD or tRP of four clocks earlier than the t7 issuance of the read, write, or activate operation.

Since the buffered stack appears to the memory controller to have a tRP of six clock cycles, the memory controller may schedule a subsequent activate (row) operation to a bank a minimum of six clock cycles after issuing a precharge operation to that bank. However, since the DRAM circuits in the stack actually have a tRP of four clock cycles, the buffer chip may have the ability to delay issuing the precharge operation to the DRAM chips by up to two clock cycles, in order to avoid any conflicts on the address bus, or in order to satisfy the tRAS requirements of the DRAM chips.

In particular, if the activate operation to a bank was delayed to avoid an address bus conflict, then the precharge operation to the same bank may be delayed by the buffer chip to satisfy the tRAS requirements of the DRAM. The buffer chip may issue the precharge operation to the DRAM chips one, two, or three clock cycles after it is received. The delay selected may depend on the presence or absence of address bus conflicts or tRAS violations, and may change from one precharge operation to another.

FIG. 16 illustrates a buffered stack 1600 according to one embodiment of this invention. The buffered stack includes four 512 Mb DDR2 DRAM circuits (chips) 1602 which a buffer chip 1604 maps to a single 2 Gb DDR2 DRAM.

Although the multiple DRAM chips appear to the memory controller as though they were a single, larger DRAM, the combined power dissipation of the actual DRAM chips may be much higher than the power dissipation of a monolithic DRAM of the same capacity. In other words, the physical DRAM may consume significantly more power than would be consumed by the emulated DRAM.

As a result, a DIMM containing multiple buffered stacks may dissipate much more power than a standard DIMM of the same actual capacity using monolithic DRAM circuits. This increased power dissipation may limit the widespread adoption of DIMMs that use buffered stacks. Thus, it is desirable to have a power management technique which reduces the power dissipation of DIMMs that use buffered stacks.

In one such technique, the DRAM circuits may be opportunistically placed in low power states or modes. For example, the DRAM circuits may be placed in a precharge power down mode using the clock enable (CKE) pin of the DRAM circuits.

A single rank registered DIMM (R-DIMM) may contain a plurality of buffered stacks, each including four ×4 512 Mb DDR2 SDRAM chips and appear (to the memory controller via emulation by the buffer chip) as a single ×4 2 Gb DDR2 SDRAM. The JEDEC standard indicates that a 2 Gb DDR2 SDRAM may generally have eight banks, shown in FIG. 16 as Bank 0 to Bank 7. Therefore, the buffer chip may map each 512 Mb DRAM chip in the stack to two banks of the equivalent 2 Gb DRAM, as shown; the first DRAM chip 1602A is treated as containing banks 0 and 1, 1602B is treated as containing banks 2 and 4, and so forth.

The memory controller may open and close pages in the DRAM banks based on memory requests it receives from the rest of the host system. In some embodiments, no more than one page may be able to be open in a bank at any given time. In the embodiment shown in FIG. 16, each DRAM chip may therefore have up to two pages open at a time. When a DRAM chip has no open pages, the power management scheme may place it in the precharge power down mode.

The clock enable inputs of the DRAM chips may be controlled by the buffer chip, or by another chip (not shown) on the R-DIMM, or by an AMB (not shown) in the case of an FB-DIMM, or by the memory controller, to implement the power management technique. The power management technique may be particularly effective if it implements a closed page policy.

Another optional power management technique may include mapping a plurality of DRAM circuits to a single bank of the larger capacity emulated DRAM. For example, a buffered stack (not shown) of sixteen ×4 256 Mb DDR2 SDRAM chips may be used in emulating a single ×4 4 Gb DDR2 SDRAM. The 4 Gb DRAM is specified by JEDEC as having eight banks of 512 Mbs each, so two of the 256 Mb DRAM chips may be mapped by the buffer chip to emulate each bank (whereas in FIG. 16 one DRAM was used to emulate two banks)

However, since only one page can be open in a bank at any given time, only one of the two DRAM chips emulating that bank can be in the active state at any given time. If the memory controller opens a page in one of the two DRAM chips, the other may be placed in the precharge power down mode. Thus, if a number p of DRAM chips are used to emulate one bank, at least p−1 of them may be in a power down mode at any given time; in other words, at least p−1 of the p chips are always in power down mode, although the particular powered down chips will tend to change over time, as the memory controller opens and closes various pages of memory.

As a caveat on the term “always” in the preceding paragraph, the power saving operation may comprise operating in precharge power down mode except when refresh is required.

FIG. 17 is a flow chart 1700 illustrating one embodiment of a method of refreshing a plurality of memory circuits. A refresh control signal is received (1702) e.g. from a memory controller which intends to refresh an emulated memory circuit. In response to receipt of the refresh control signal, a plurality of refresh control signals are sent (1704) e.g. by a buffer chip to a plurality of physical memory circuits at different times. These refresh control signals may optionally include the received refresh control signal or an instantiation or copy thereof. They may also, or instead, include refresh control signals that are different in at least one aspect (format, content, etc.) from the received signal.

In some embodiments, at least one first refresh control signal may be sent to a first subset of the physical memory circuits at a first time, and at least one second refresh control signal may be sent to a second subset of the physical memory circuits at a second time. Each refresh signal may be sent to one physical memory circuit, or to a plurality of physical memory circuits, depending upon the particular implementation.

The refresh control signals may be sent to the physical memory circuits after a delay in accordance with a particular timing. For example, the timing in which they are sent to the physical memory circuits may be selected to minimize an electrical current drawn by the memory, or to minimize a power consumption of the memory. This may be accomplished by staggering a plurality of refresh control signals. Or, the timing may be selected to comply with e.g. a tRFC parameter associated with the memory circuits.

To this end, physical DRAM circuits may receive periodic refresh operations to maintain integrity of data stored therein. A memory controller may initiate refresh operations by issuing refresh control signals to the DRAM circuits with sufficient frequency to prevent any loss of data in the DRAM circuits. After a refresh control signal is issued, a minimum time tRFC may be required to elapse before another control signal may be issued to that DRAM circuit. The tRFC parameter value may increase as the size of the DRAM circuit increases.

When the buffer chip receives a refresh control signal from the memory controller, it may refresh the smaller DRAM circuits within the span of time specified by the tRFC of the emulated DRAM circuit. Since the tRFC of the larger, emulated DRAM is longer than the tRFC of the smaller, physical DRAM circuits, it may not be necessary to issue any or all of the refresh control signals to the physical DRAM circuits simultaneously. Refresh control signals may be issued separately to individual DRAM circuits or to groups of DRAM circuits, provided that the tRFC requirements of all physical DRAMs has been met by the time the emulated DRAM's tRFC has elapsed. In use, the refreshes may be spaced in time to minimize the peak current draw of the combination buffer chip and DRAM circuit set during a refresh operation.

FIG. 18 illustrates one embodiment of an interface circuit such as may be utilized in any of the above-described memory systems, for interfacing between a system and memory circuits. The interface circuit may be included in the buffer chip, for example.

The interface circuit includes a system address signal interface for sending/receiving address signals to/from the host system, a system control signal interface for sending/receiving control signals to/from the host system, a system clock signal interface for sending/receiving clock signals to/from the host system, and a system data signal interface for sending/receiving data signals to/from the host system. The interface circuit further includes a memory address signal interface for sending/receiving address signals to/from the physical memory, a memory control signal interface for sending/receiving control signals to/from the physical memory, a memory clock signal interface for sending/receiving clock signals to/from the physical memory, and a memory data signal interface for sending/receiving data signals to/from the physical memory.

The host system includes a set of memory attribute expectations, or built-in parameters of the physical memory with which it has been designed to work (or with which it has been told, e.g. by the buffer circuit, it is working). Accordingly, the host system includes a set of memory interaction attributes, or built-in parameters according to which the host system has been designed to operate in its interactions with the memory. These memory interaction attributes and expectations will typically, but not necessarily, be embodied in the host system's memory controller.

In addition to physical storage circuits or devices, the physical memory itself has a set of physical attributes.

These expectations and attributes may include, by way of example only, memory timing, memory capacity, memory latency, memory functionality, memory type, memory protocol, memory power consumption, memory current requirements, and so forth.

The interface circuit includes memory physical attribute storage for storing values or parameters of various physical attributes of the physical memory circuits. The interface circuit further includes system emulated attribute storage. These storage systems may be read/write capable stores, or they may simply be a set of hard-wired logic or values, or they may simply be inherent in the operation of the interface circuit.

The interface circuit includes emulation logic which operates according to the stored memory physical attributes and the stored system emulation attributes, to present to the system an interface to an emulated memory which differs in at least one attribute from the actual physical memory. The emulation logic may, in various embodiments, alter a timing, value, latency, etc. of any of the address, control, clock, and/or data signals it sends to or receives from the system and/or the physical memory. Some such signals may pass through unaltered, while others may be altered. The emulation logic may be embodied as, for example, hard wired logic, a state machine, software executing on a processor, and so forth.

CONCLUSION

When one component is said to be “adjacent” another component, it should not be interpreted to mean that there is absolutely nothing between the two components, only that they are in the order indicated.

The physical memory circuits employed in practicing this invention may be any type of memory whatsoever, such as: DRAM, DDR DRAM, DDR2 DRAM, DDR3 DRAM, SDRAM, QDR DRAM, DRDRAM, FPM DRAM, VDRAM, EDO DRAM, BEDO DRAM, MDRAM, SGRAM, MRAM, IRAM, NAND flash, NOR flash, PSRAM, wetware memory, etc.

The physical memory circuits may be coupled to any type of memory module, such as: DIMM, R-DIMM, SO-DIMM, FB-DIMM, unbuffered DIMM, etc.

The system device which accesses the memory may be any type of system device, such as: desktop computer, laptop computer, workstation, server, consumer electronic device, television, personal digital assistant (PDA), mobile phone, printer or other peripheral device, etc.

The various features illustrated in the figures may be combined in many ways, and should not be interpreted as though limited to the specific embodiments in which they were explained and shown.

Those skilled in the art, having the benefit of this disclosure, will appreciate that many other variations from the foregoing description and drawings may be made within the scope of the present invention. Indeed, the invention is not limited to the details described above. Rather, it is the following claims including any amendments thereto that define the scope of the invention.

Claims

1. An apparatus comprising:

an interface circuit operable to: receive, from a system, (1) write data to be stored on at least one physical memory device of a plurality of physical memory devices, and (2) a first refresh control signal; communicate the write data, after a first delay, to the at least one physical memory device; and in response to receiving the first refresh control signal, communicate within a first time span a distinct second refresh control signal to each of a first subset of the plurality of physical memory devices and a second subset of the plurality of physical memory devices, wherein the interface circuit includes: first storage to store emulated memory attributes associated with one or more emulated memory devices; and second storage to store physical memory attributes associated with at least one of the one or more physical memory devices, where the interface circuit is operable to emulate one or more memory devices based on the emulated memory attributes, where the emulated memory attributes include a first latency and the physical memory attributes include a second latency, the first latency being greater than the second latency, where the first latency includes at least one of a first row address strobe to column address strobe latency (tRCD), a first row precharge latency (tRP), a first activate to precharge latency (tRAS), or a first row cycle time (tRC), and the second latency includes at least one of a second tRCD, a second tRP, a second tRAS, or a second tRC, where the first delay is based on the first latency and the second latency, and where the first time span is specified by a refresh latency (tRFC) associated with the one or more emulated memory devices.

2. The apparatus of claim 1, wherein the first delay is based on a difference between the first latency and the second latency.

3. The apparatus of claim 1, wherein the first delay is greater than or equal to one clock cycle.

4. The apparatus of claim 1, wherein at least two of the emulated memory attributes are different from corresponding physical memory attributes.

5. The apparatus of claim 1 wherein the interface circuit is further operable to:

receive read data from at least one of the one or more physical memory devices; and

communicate the read data, after a second delay, to the system, where the second delay is based on the first latency and the second latency.

6. The apparatus of claim 1 wherein the first storage and second storage include read or write capable memory.

7. The apparatus of claim 1 wherein the first storage and second storage include hard-wired logic.

8. An apparatus comprising:

a plurality of physical memory devices;

an interface circuit operable to:

receive, from a system, (1) write data to be stored on one or more of the plurality of physical memory devices and (2) a first refresh control signal;

communicate the write data, after a first delay, to the one or more physical memory devices; and

in response to receiving the first refresh control signal, communicate within a first time span a distinct second refresh control signal to each of a first subset of the plurality of physical memory devices and a second subset of the plurality of physical memory devices, wherein the interface circuit includes: first storage to store emulated memory attributes associated with one or more emulated memory devices; and second storage to store physical memory attributes associated with one or more of the plurality of physical memory devices,

where the interface circuit is operable to emulate one or more memory devices based on the emulated memory attributes,

where the emulated memory attributes include a first latency and the physical memory attributes include a second latency, the first latency being greater than the second latency,

where the first latency includes at least one of a first row address strobe to column address strobe latency (tRCD), a first row precharge latency (tRP), a first activate to precharge latency (tRAS), or a first row cycle time (tRC), and the second latency includes at least one of a second tRCD, a second tRP, a second tRAS, or a second tRC,

where the first delay is based on the first latency and the second latency; and

where the first time span is specified by a refresh latency (tRFC) associated with the one or more emulated memory devices.

9. The apparatus of claim 8, wherein the first delay is based on a difference between the first latency and the second latency.

10. The apparatus of claim 8, wherein the first delay is greater than or equal to one clock cycle.

11. The apparatus of claim 8 wherein the interface circuit is further operable to:

receive read data from at least one of the plurality of physical memory devices; and

communicate the read data, after a second delay, to the system, where the second delay is based on the first latency and the second latency.

12. The apparatus of claim 8 wherein the first storage and second storage include read or write capable memory.

13. The apparatus of claim 8 wherein the first storage and second storage include hard-wired logic.

14. The apparatus of claim 1 wherein the distinct second refresh control signal includes a copy of the first refresh control signal.

15. The apparatus of claim 8 wherein the distinct second refresh control signal includes a copy of the first refresh control signal.

16. The apparatus of claim 1 wherein the interface circuit is further operable to communicate the distinct second refresh control signal to each of the first subset and the second subset at separate times to minimize power consumption of the plurality of physical memory devices.

17. The apparatus of claim 16 wherein the interface circuit is further operable to communicate the distinct second refresh control signal to each of the first subset and the second subset at separate times to minimize a current draw of the interface circuit and the plurality of physical memory devices.

18. The apparatus of claim 8 wherein the interface circuit is further operable to communicate the distinct second refresh control signal to each of the first subset and the second subset at separate times to minimize power consumption of the plurality of physical memory devices.

19. The apparatus of claim 18 wherein the interface circuit is further operable to communicate the distinct second refresh control signal to each of the first subset and the second subset at separate times to minimize a current draw of the interface circuit and the plurality of physical memory devices.