DYNAMIC RAM PHY INTERFACE WITH CONFIGURABLE POWER STATES
A physical memory interface (Phy) and method of operating is disclosed. The Phy interface includes command and status registers (CSRs) configured to receive a first power context and second power context. Selection circuitry is configured to switch between the first and second power contexts. A plurality of adjustable delay elements are provided, each having a delay time responsive to the selected power context. A first set of CSRs configured may store the first power context and a second set of CSRs configured may store the second power context. The Phy interface may also include a plurality of drivers each having a selectable drive strength responsive to the selected power context. The Phy interface may also include a plurality of receivers each having a selectable termination impedance responsive to the selected power context. Switching between power contexts may result in adjusting of the delay elements, drive strength and/or termination impedance of one or more drivers/receivers.
Latest ADVANCED MICRO DEVICES, INC. Patents:
This application claims the benefit of U.S. provisional application No. 61/382,089, filed Sep. 13, 2010, which is incorporated by reference as if fully set forth herein.
FIELD OF INVENTIONThis invention relates to memory subsystems including physical layers that directly interface with dynamic random access memory (DRAM) devices.
BACKGROUNDTypical memory systems use either an asynchronous or synchronous clocking scheme to transmit data between the memory controller and the memory device. Synchronous clocking means that the memory device waits for a clock signal before responding to control inputs and is therefore synchronized with the computer's system bus. Synchronous dynamic random access memory (SDRAM) is widely used since such devices typically support higher clock speeds than asynchronous memory devices.
Double data rate (DDR) SDRAM transfers data on both the rising and falling edges of the clock signal. Such memory devices use a lower clock frequency but require strict control of the timing of the electrical data and clock signals. The first version of such devices (DDR1) achieved nearly twice the bandwidth of a single data rate (SDR) SDRAM running at the same clock frequency. DDR2 and DDR3 SDRAM devices are subsequent improvements over DDR1 devices. Regardless of which type of DDR memory is used (DDR1/DDR2/DDR3), a physical interface (Phy) is coupled directly between the memory controller and the DDR SDRAM devices. The Phy interface generally includes circuitry for handling the timing requirements of the DDR SDRAM data strobes. Typical Phy interface implementations provide no mechanism to rapidly adjust memory performance level or demanded power.
SUMMARYA physical memory interface (Phy) is provided. The Phy interfaces between a memory controller and physical memory devices. The Phy interface includes command and status registers (CSRs) configured to receive a first power context and second power context. Selection circuitry is provided. The selection circuitry is configured to switch between the first and second power contexts. The Phy interface includes a plurality of adjustable delay elements, each having a delay time responsive the selected power context. Switching between power contexts results in an adjustment of one or more of the adjustable delay elements.
In another embodiment, the Phy interface includes a first set of CSRs configured to store the first power context and a second set of CSRs configured to store the second power context. The Phy interface may also include a plurality of drivers each having a selectable drive strength responsive to the selected power context. The Phy interface may also include a plurality of receivers each having a selectable termination impedance responsive to the selected power context. Switching between power contexts may result in adjusting of the drive strength and/or termination impedance of one or more drivers/receivers.
The first and second power context may be determined via a BIOS training procedure. Such procedures may have multiple phases. For example, the first power context may be determined via a first memory training phase and the second power context may be determined via a second memory training phase.
The Phy interface may include a configuration bus that is configured to allow read/write access the CSRs. The Phy interface may also be configured to support multiple channels of physical memory devices. The Phy interface may be located in various locations including on a die of a central processing unit (CPU).
A Phy interface 22 resides between the memory controller 18 and the physical memory devices. The Phy interface is typically located in the central processing unit but may be located elsewhere. For purposes of clarity, the Phy interface 22 is shown as a separate block in
During a read operation, a DDR SDRAM issues DQ and DQS at the same time, a manner commonly referred to as “edge aligned.” In order for the memory controller to correctly acquire the data being sent from the DDR SDRAM, the Phy interface 22 utilizes delay circuitry, such as a delay-locked loop (DLL), to delay the DQS signal so that it may be used to correctly latch the DQ signals during a valid data window or “data eye”. Similarly, the Phy interface 22 also utilizes delay circuitry to support the writing of data to the DDR DRAM. For reading data, the DQS 34 must be delayed. For writing data, the DQS and DQ 34, 32 must be delayed. The Phy aligns DQS 34 with the middle of the DQ 32 data eye rather than edge aligned. DQS 34 is delayed for write leveling and to meet the middle-of-data-eye requirements. Other delays may also be used (e.g., for read/write tri-state control of the data bus). The Phy interface 22 includes a plurality of command and status registers (CSRs) 42 that are utilized to control delay timing, drive strengths and a variety of other parameters as described in more detail below. It should be understood that such circuitry may be duplicated on a per channel basis as well.
The Phy interface may also adjust or select transmitter drive strength and receiver termination impedance. Rather than use fixed timing delays, transmitter drive strength and receiver termination impedance, these parameters may be adjusted each time the computer system is turned on. This is typically accomplished with the assistance of a training program. The training program is typically stored in a basic input/output system (BIOS) memory device 26, but it may also be implemented within the device hardware. The training program executes an algorithm during power-on self-test (POST), which determines appropriate timing delays, drive strengths and termination impedances associated with many of the memory interface signals. Theses parameters are saved within the Phy interface in a plurality of registers that define the overall timing of the various signal paths to and from the Phy. In the alternative, these parameters may be stored elsewhere (e.g., in the north bridge 14 or south bridge 16).
Typical memory devices are also provided with a clock enable (self refresh) input 40. The clock enable input 40 is used to place the memory device in self refresh mode. In this mode, the memory device uses an on-chip timer to generate internal refresh cycles as necessary. External clocks may also be stopped during this time. This input is typically used in connection with power down modes since it allows the memory controller to be disabled without loss of main memory data.
As shown in
In this example, drivers 52 and 56 are associated with DLLs 72 and 74 respectively. Receiver 58 is associated with DLL 76. As described above, the DLLs are adjusted to provide the appropriate timing delays for read and write operations. The Phy interface may also be configured to perform read and write operations with or without leveling. During memory write operations to DDR3 DIMMS with leveling, the Phy interface delays the launch of each DQS going to the DIMM such that at each DRAM chip DQS is seen to coalesce with the memory clock 58. During read operations with leveling, the Phy interface may also compensate delays introduced by fly-by topology.
Due to signal integrity issues of operation at higher data rates, the Phy interface may dynamically change the DLL settings on a burst-by-burst (or transaction) basis. The Phy interface may store an optimum tupple of delay settings for DQ and DQS for each DIMM in the system. Depending upon the DIMM being accessed, the Phy interface retrieves the appropriate DLL settings and applies them.
The Phy interface may tailor its demanded power based upon the desired level of performance. It should be understood that the determination of when to change power contexts may come from a variety of sources. For example, the operating system may determine that a context change is desired (e.g., after a set period of inactivity, by user command, time schedule or the like). In the alternative, hardware may be used to determine when a context change is desired. The context change is accomplished by switching between different sets of Phy interface parameters associated with different power states. For example, a high power state (e.g., higher memory speed) and a low power state (e.g., lower memory speed). Each power state has an associated set of Phy interface parameters or context (i.e., delay element settings, drive strengths and termination impedances for each signal line). As described in more detail hereafter, switching between power states may be accomplished in several ways. It should be understood that switching between multiple power states as disclosed herein can be applied to any memory type and is not limited to use with DDR memory as used in the examples below.
In this example, the memory controller 18 may access the CSRs 42 via a 32-bit, time interleaved, uni-directional configuration bus 80. Address and command (e.g., read, write, do nothing) are sent in the first pipestage, followed by data in the second pipestage as shown in
In this example, the CSR address space is 16 bits wide, allowing for a space of 65,536 unique 16-bit registers. Instead of allowing for such a large space, the address is mapped to allow for the following functions: chiplet identification; intra-chiplet broadcast; compensation broadcast; chiplet instance identification (the D3 DBYTE, D3CLK and D3CMP are chiplets that are placed more than once).
Only a portion of the CSRs contain values that are relevant to a given power state. In order to facility low latency switching between power states, a set of power context sensitive CSRs are provided for each power state. Returning to
A summary of the programmable fields for each CSR in each PhyPS is shown below in Table 1.
In this example, programming of the relevant fields of the Phy interface is accomplished by issuing commands or programming specific fields in CSRs via the configuration bus 80. For example, changing from one PhyPS to another may be accomplished with a single command issued to the DDR Phy interface indirect register space. Programming of PhyPS context sensitive CSRs may be accomplished by setting the appropriate PhyPS context and then performing normal indirect CSR writes or reads. Alternatively, direct CSR writes or reads may reach any CSR without regard to the PhyPS context.
The Phy interface may be controlled via a series of commands including: Master—0x08[12]—PhyPS, Master—0x08[8]—PstateToAccess and Master—0x18[8]—PhyPSMasterChannel. The nomenclature [12], [8] and the like refers to the bit position within the command. The Master—0x08[12] command corresponds to the current Phy interface P-state (0 or 1). This command controls power context (e.g., which set of CSRs) is currently active. The Master—0x08[8] command selects which P-State to read or write to during CSR accesses. BIOS may use this method to control which P-State to write to without having to do an actual P-State change. It should be understood that additional bits may be added to support more than two power contexts. The Master—0x18[8] command selects the master channel. In this embodiment, only the channel designated by this bit (master channel) is allowed to issue 0x0B[PhyPS Change] commands. Any 0x0B[PhyPS change] commands issued from other channels will be ignored. It should be understood that additional bits may be added to support more than two channels. In this embodiment, the following 0x0B commands are defined:
The full 0x0B data packet is shown below in Table 3
In this example, the power context is selected via a single bit (i.e., PhyPS[0] and PhyPS[1]—bit position 26). It should be understood that additional bits may be added to support more than two contexts. The PhyPSRequest bit (bit position 30) is used to indicate that the command includes a context change. The power context may generally be changed as follows. Upon receipt of a context change request, each active channel is placed in self refresh (SR) mode (0x0B[PhySR=1]). In this “safe mode”, the memory devices use an on-chip timer to generate internal refresh cycles as necessary. Depending on the SR mode selected, external clocks may be stopped during this time. The entry into SR mode may happen at different times but all channels should be in SR mode before the context change. The power context change is initiated (0x0B[PhyPS=X], [PhyPSRequest=1]). Each of the active channels are switched from SR mode to normal mode (0x0B[PhySR=0]).
Upon receipt of the context change command, the Phy interface will change the PhyPS context, the DDR PLL multiplier and divider will be updated, and the PLL will be relocked. When this is all complete, CfgDone will be set. When SR mode is subsequently exited the DLLs will relock. During the time the PhyPS change is occurring the memory controller maintains control over all inputs to the Phy interface (specifically CKE, MemReset, ReadPending, WritePending, all other tri-state controls).
It is generally expected that PhyPS changes will leave the PLL powered up, because the intention is to change PhyPS as quickly as possible. Thus, the change of PhyPS will trigger the PLL relock (and wait for relock) immediately, with the DLL relock happening after the SR is exited. However, it is possible to be in a SR mode which has powered down the PLL (either VCO or regulator), and to change PhyPS while in this SR mode. In this case, the Phy interface will still wait for a PLL relock time immediately after the PhyPS change, but on the subsequent SR exit, the Phy interface will wait for both PLL relock (because the PLL will be in the process of powering back up) and DLL relock.
The 0x0B command to enter and exit Phy interface Self Refresh (PhySR) may be issued in either channel 0 or 1; making it possible to have one DRAM channel in SR mode while the other is not. In order to fully power down the PClk global grid in the Phy, both channel 0 and 1 need to be in PhySR. The 0x0B to change PhySR is shown in Table 4 below:
Setting 0x0B[31=PhySRRequest] along with 0x0B[23=PhySR] causes the Phy interface to either enter or exit self refresh. 0x0B[25:24] control the behavior of the PLL while in SR. Entry into SR mode is very quick, taking ˜300 ns from receipt of the 0x0B command to the time CfgDone is asserted. To allow the memory controller to move forward as quickly as possible, the memory controller may monitor CfgDone for a transition from 1 to 0. This indicates the Phy interface has closed off input from the memory controller, driving all CKE low, driving MemReset appropriately and placing all remaining DDR bus pins into tristate. At this point it is safe for the memory controller to go insane. It is not safe to drop the Vddr rail until CfgDone has asserted, indicating all necessary CSR transactions are complete.
Exit from PhySR is much more latent, requiring that the Phy interface clock grid (PClk) be turned back on and the DLLs relocked. The PLL may be left on or turned off during SR. In this embodiment, 0x0B[30=PhyPSRequest] must be 0 when executing a PhySR change. Setting either 0x0B[31=PhySRRequest] or 0x0B[30=PhyPSRequest] disables 0x0B[22:0] meaning it is not possible to set 0x0B[3=DdrRateRequest] to update the DDR rate field. The DDR rate is set first to load the CSRs containing the Phy interface PLL multiplier and divider. Sending 0x0B[PhyPS change] will load the DDR PLL multiplier and dividers. In other embodiments, it may be possible to execute multiple high level 0x0B commands at once. If 0x0B[31=PhySRRequest]=1 and 0x0B[23=PhySR] results in no change to the PhySR state, the DLL lock times are still obeyed before asserting CfgDone.
In this embodiment, the 0x0B command for the DRAM data rate on the DDR bus is set by BIOS in channel 0 only. Sending 0x0B commands to set the DRAM data rate in channel 1 has no effect. It should be understood that other embodiments may support independent DDR data rates on each channel. The 0x0B DDR rate command is shown in Table 5 below:
Changing the actual DDR rate and PLL frequency through this 0x0B[DdrRate] command is included for legacy BIOSes. BIOSes that understand Phy interface P-States should instead program the DDR Rates for both Phy interface PStates through the direct CSRs Master 0x00[DdrRate] and Master—0x40[DdrRate]. These rate changes (through the direct CSRs) will only take affect after a subsequent 0x0B[PhyPS change].
In this embodiment, 0x0B[31=PhySRRequest] and 0x0B[30=PhyPSRequest] must both be 0 when executing an update to the DDR rate. It should be understood that embodiments may support multiple high level 0x0B commands at once.
The PhyPS state after cold reset, warm reset, or Advanced Configuration and Power Interface (ACPI) power state S3 (commonly referred to as Standby, Sleep, or Suspend to RAM) are shown in Table 6 below. All states are persistent through warm reset. Therefore, in the warm reset entries below, the values remain as they were prior to the warm reset.
When a typical computer power supply is first energized, it takes some time for the various voltages to stabilize. Before the voltages stabilize, if the computer were allowed to try to boot up, unpredictable results could occur. To prevent the computer from starting up prematurely, the power supply outputs a PwrOK signal when the power supply is ready for use. Until this signal is sent, the motherboard will refuse to start up the computer.
When cold booting (˜PwrOk asserted as well as Reset asserted) the PhyPS will automatically be set to 0 and the Phy interface will not be in PhySR (all DLLs will be powered up). When warm booting (PwrOk continuously asserted, only Reset asserted) the PhyPS and PhySR state will be determined by the state just before warm reset. Resuming from a warm reset requires BIOS to set the appropriate PhyPS and PhySR states (a time optimized solution is to perform a direct CSR read of these states and then perform a write if necessary). The BIOS should ensure that the Phy interface is not in SR mode after a warm reset.
Since DDR3 requires a two pass training procedure to have an unambiguous convergence of Write Levelization (WL) and RxEn, it is possible to train for the lower-frequency PState as part of the process of training for the higher-powered PState. The procedure also assumes that BIOS has already determined that this is a cold reset (and thus requires training), as opposed to a warm reset or S3 exit.
First pass of memory training uses the initial DDR rate required for unambiguous training as shown in Table 7 below. It should be understood that the specific sequence of steps set out in all of the tables below may be varied without departing from the scone of this disclosure.
At this point the DDR PLL and DLLs are properly configured for use. The PhyPS context is PhyPS0. Both channels, if memory is present, are ready to begin training. Training follows the known protocols such as the AMD Generic Encapsulated Software Architecture (AGESA) bootstrap protocol as shown generally in Table 8:
At this point, the training and programming of the initial DDR rate for unambiguous training is complete. This DDR rate and other parameters established during this initial training protocol may be used as a first power context (e.g., lower speed—PhyPS[1]). It should be understood that PhyPS[1] may be set based on another DDR rate (i.e., repeat steps 4 through 15 for this frequency). The next phase trains for the higher (PhyPS[0]) DDR rate. The pass(es) above resolved aliasing in the Write Levelization and RxEn hardware training algorithm. The Write Levelization and RxEn training values are scaled by the MemClk data rate ratio PhyPS0/PhyPS1 and used as the seed for the second phase of training in PhyPS[0]. The procedure is initiated as shown in Table 9.
At this point the DDR PLL and DLLs are properly configured for use. The PhyPS context is still PhyPS[0], and the Phy interface is running at the PhyPS[0] DDR Rate. Both channels, if memory is present, are ready to begin the next phase of training as shown generally in Table 8:
This completes second phase training and programming of power context—PhyPS[0]. At this stage, the rates for both PhyPS spaces have been trained, and trained values are already written to the Phy interface PhyPS [0] CSRs as part of the training. The Phy interface is currently in the PhyPS [0] context. The PhyPS[1] trained values have been trained but not yet written to the PhyPS[1] CSRs. The PhyPS[0] CSRs are undated as shown generally in Table 11:
In the event only one PhyPS is required, BIOS may choose which PhyPS context should be used. In order to preserve historical meaning, BIOS may configure the Phy interface for PhyPS[0].
Resumption from S3 does not involve any DRAM training, only restoring the trained values from nonvolatile state (generally in the South Bridge). Resumption from S3 will typically guarantee (because of ˜PwrOk) that the PhyPS context is PhyPS0, Master—0x08[PStateToAccess] is 0, that both channels are out of PhySR, and that the master channel is channel 0 (even if memory is not present on channel 0). The procedure for resuming from S3 are generally shown in Table 12:
A warm reset resume is almost identical to a resume from S3. Resume from S3 has ˜PwrOk set the PhyPS context to PhyPS0, designate channel 0 as the master channel to communicate PhyPS changes and take the Phy interface out of PhySR in both channels. In contrast, warm reset leaves the PhyPS and PhySR as well as master channel in an unknown state. It should be further noted that an architectural hole exists with a warm reset resume. If a warm reset resume is issued before the system may cold boot, complete memory training and store all trained values in non-volatile memory, the resume will fail. In order to avoid this issue, BIOS should use a flag (which is reset on cold reset but persistent through warm reset) to indicate whether the training values have been calculated and stored successfully—if during a warm reset, BIOS sees this flag set, it may resume by restoring the trained state. The trained state also includes the Master—0x18[PhyPSMasterChannel] as well as any unpopulated channel is left in SR mode. If this flag is not set then BIOS must (re)train the Phy. For purposes of the following disclosure, it is assumed that this flag has been set and that training values have been stored. Therefore warm reset resume does not involve any DRAM training, only restoring the trained values from non-volatile state (generally in the South Bridge). The procedure for performing a warm reset resume are generally shown in Table 13:
Table 14 shows a list of all PhyPS CSRs that are duplicated for each power context:
It should be understood that many variations are possible based on the disclosure herein. For example, multiple power contexts could be stored in other memory locations (e.g., in the north bridge 14 or south bridge 16). In this scenario, a standard Phy interface could switch power contexts without the need for a dedicated set of context sensitive CSRs. Such a scenario could eliminate the need for multiple sets of CSRs but would increase the latency for the context change.
Although features and elements are described above in particular combinations, each feature or element may be used alone without the other features and elements or in various combinations with or without other features and elements. The methods or flow charts provided herein may be implemented in a computer program, software, or firmware incorporated in a computer-readable storage medium for execution by a general purpose computer or a processor. Examples of computer-readable storage mediums include a read only memory (ROM), a random access memory (RAM), a register, cache memory, semiconductor memory devices, magnetic media such as internal hard disks and removable disks, magneto-optical media, and optical media such as CD-ROM disks, and digital versatile disks (DVDs).
Suitable processors include, by way of example, a general purpose processor, a special purpose processor, a conventional processor, a digital signal processor (DSP), a plurality of microprocessors, one or more microprocessors in association with a DSP core, a controller, a microcontroller, Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) circuits, any other type of integrated circuit (IC), and/or a state machine. Such processors may be manufactured by configuring a manufacturing process using the results of processed hardware description language (HDL) instructions (such instructions capable of being stored on a computer readable media). The results of such processing may be maskworks that are then used in a semiconductor manufacturing process to manufacture a processor which implements aspects of the present invention.
Claims
1. A method of controlling a physical memory interface for a memory device, the method comprising:
- storing a first and second power context;
- providing a plurality of adjustable delay elements configured to provide timing delays for reading data from and writing data to the memory device, each adjustable delay element having a delay time responsive to a selected one of the first and second power contexts.
- receiving a power context change request; and
- selecting one of the first and second power contexts based on the power context change request.
2. The method of claim 1, further comprising generating a self refresh output configured to select a self refresh mode associated with the memory device prior to selecting one of the first and second power contexts.
3. The method of claim 1, further comprising:
- providing a first set of registers configured to store the first power context;
- providing a second set of registers configured to store the second power context; and
- selecting on of the first and second set of registers in response to the power context change request.
4. The method of claim 1, further comprising:
- adjusting a selectable drive strength for at least one driver of the physical interface in response to the power context change request.
5. The method of claim 1, further comprising:
- adjusting a selectable termination impedance for at least one receiver of the physical interface in response to the power context change request.
6. The method of claim 1, further comprising generating the first power context via a first memory training phase and generating the second power context via a second memory training phase.
7. The method of claim 1, further comprising restoring at least one of the first and second power context upon resuming from an Advanced Configuration and Power Interface (ACPI) S3 power state.
8. The method of claim 1, further comprising retrieving the first and second power contexts from a memory location on a south bridge.
9. The method of claim 1, further comprising retrieving the first and second power contexts from a memory location on a north bridge.
10. A physical memory interface for a memory device, the physical memory interface comprising:
- a plurality of registers configured to receive a first power context and second power context;
- selection circuitry configured to select one of the first and second power contexts; and
- a plurality of adjustable delay elements configured to provide timing delays for reading data from and writing data to the memory device, each adjustable delay element having a delay time responsive to the selected one of the first and second power contexts.
11. The physical memory interface of claim 10, wherein the selection circuitry retrieves the selected one of the first and second power contexts from a memory location;
12. The physical memory interface of claim 10, further comprising:
- a first set of registers configured to store the first power context; and
- a second set of registers configured to store the second power context, wherein the selection circuitry is configured to select between the first and second set of registers.
13. The physical memory interface of claim 10, further comprising a plurality of drivers each having a selectable drive strength responsive to the selected one of the first and second power contexts.
14. The physical memory interface of claim 10, further comprising a plurality of receivers each having a selectable termination impedance responsive to the selected one of the first and second power contexts.
15. The physical memory interface of claim 10, wherein the first power context is determined via a first memory training phase and the second power context is determined via a second memory training phase.
16. The physical memory interface of claim 10, further comprising interfaces for multiple channels of physical memory devices.
17. The physical memory interface of claim 10, wherein the physical memory interface is located on a die of a central processing unit (CPU).
18. The physical memory interface of claim 10, further comprising a memory interface configured retrieve at least one of the first and second power contexts upon resuming from an Advanced Configuration and Power Interface (ACPI) S3 power state.
19. The physical memory interface of claim 10, further comprising a memory interface configured to load the first and second power contexts from a memory location.
20. The physical memory interface of claim 10, further comprising a memory device coupled to the physical memory interface, the memory device being configured for reading and writing data using the timing delays associated with the selected one of the first and second power contexts.
Type: Application
Filed: Oct 22, 2010
Publication Date: Mar 15, 2012
Patent Grant number: 8356155
Applicant: ADVANCED MICRO DEVICES, INC. (Sunnyvale, CA)
Inventors: Shawn Searles (Austin, TX), Nicholas T. Humphries (Austin, TX), Brian W. Amick (Bedford, MA), Richard W. Reeves (Westborough, MA), Hanwoo Cho (Acton, MA), Ronald L. Pettyjohn (Concord, MA)
Application Number: 12/910,412
International Classification: G06F 12/00 (20060101); G11C 5/14 (20060101); G11C 7/22 (20060101);