CUTTING CAM PEAK POWER BY CLOCK REGIONING
A CAM device architecture where CAM cells are divided into at least two arrays and each array is operated in a different clock domain so that at no time are the arrays simultaneously drawing maximum power. By dividing the CAM array into a plurality of arrays and staggering the search operation so that every array does not simultaneously draw maximum power, the peak power consumption of the CAM device is reduced.
This application is a continuation of applicatoin Ser. No. 10/655,215, filed Sep. 5, 2003, the subject matter of which is incorpoated by reference herein.
FIELD OF INVENTIONThe present invention relates generally to semiconductor memory devices and, more particularly to peak power reduction in content addressable memory (CAM) devices.
BACKGROUND OF THE INVENTIONAn essential semiconductor device is semiconductor memory, such as a random access memory (RAM) device. A RAM allows a memory circuit to execute both read and write operations on its memory cells. Typical examples of RAM devices include dynamic random access memory (DRAM) and static random access memory (SRAM).
Another form of memory is the content addressable memory (CAM) device. A CAM is a memory device that accelerates any application requiring fast searches of a database, list, or pattern, such as in database machines, image or voice recognition, or computer and communication networks. CAMs provide benefits over other memory search algorithms by simultaneously comparing the desired information (i.e., data in the comparand register) against the entire list of pre-stored entries. As a result of their unique searching algorithm, CAM devices are frequently employed in network equipment, particularly routers and switches, computer systems and other devices that require rapid content searching.
In order to perform a memory search in the above-identified manner, CAMs are organized differently than other memory devices (e.g., DRAM). For example, data is stored in a RAM in a particular location, called an address. During a memory access, the user supplies an address and writes into or reads the data at the specified address.
In a CAM, however, data is stored in locations in a somewhat random fashion. The locations can be selected by an address bus, or the data can be written into the first empty memory location. Every memory location includes one or more status bits which maintain state information regarding the memory location. For example, each memory location may include a valid bit whose state indicate whether the memory location stores valid information, or whether the memory location does not contain valid information (and is therefore available for writing).
Once information is stored in a memory location, it is found by comparing every bit in a memory location with corresponding bits in a comparand register. When the content stored in the CAM memory location does not match the data in the comparand register, a local match detection circuit returns a no match indication. When the content stored in the CAM memory location matches the data in the comparand register, the local match detection circuit returns a match indication. If one or more local match detect circuits return a match indication, the CAM device returns a “match” indication. Otherwise, the CAM device returns a “no-match” indication. In addition, the CAM may return the identification of the address location in which desired data is stored or identification of one of such addresses if more than one address contained matching data. Thus, with a CAM, the user supplies the data and gets back an address if there is a match found in memory.
The first DRAM cell 110a includes transistor Q1 and capacitor CA, which combine to form a storage node A that receives a data value from a first bit line BL1 at node U during write operations, and applies the stored data value to the gate terminal of transistor Q2 of comparator circuit 120. Transistor Q2 is connected in series with transistor Q3 between a match line M and a ground potential. Transistor Q3 is controlled by a data signal transmitted on data line D1#. The second DRAM cell 110b includes transistor Q3 and capacitor CB, which combine to form a storage node B that receives a data value from a second bit line BL2 at node V, and applies the stored data value to the gate terminal of transistor Q4 of comparator circuit 120. Transistor Q4 is connected in series with transistor Q5 between the match line M and the ground potential. It should be noted that in some embodiments transistors Q2 and Q4 are coupled to a discharge line instead of being directly coupled to ground. Transistor Q5 is controlled by a data signal transmitted on data line D1, between the match line and the ground potential.
Now referring back to
The above described match operation illustrates what happens in a single CAM cell 100. In the device 200, however, the match operation is performed simultaneously on all CAM cells 100. This permits search operations to be performed much faster on a CAM device than a conventional memory device, such as a DRAM. However, CAM devices 200 consume significantly more power and produce significantly more switching noise than a conventional memory device, especially during a first portion of the search operation because the CAM cells 100 are accessed and searched simultaneously. This results in the CAM device 200 having a peak power consumption which may be significantly higher than the average power consumption during a portion of each match operation. The high peak power consumption requires the CAM device 200 to be used with a robust power supply, and also increases heat production. Both of these effects are undesirable and should be minimized. Accordingly, there is a need for a CAM device architecture that has a lesser degree of peak power consumption.
SUMMARY OF THE INVENTIONThe invention provides a CAM device architecture where the CAM cells are divided into at least two arrays. Each array is operated in a different clock domain so that each array is prevented from drawing maximum power at a same time. By dividing the CAM array into a plurality of arrays and staggering the search operation so that every array does not simultaneously draw maximum power, the peak power consumption of the CAM device is reduced.
BRIEF DESCRIPTION OF THE DRAWINGSThe foregoing and other advantages and features of the invention will become more apparent from the detailed description of exemplary embodiments of the invention given below with reference to the accompanying drawings, in which:
Now referring to the drawings, where like reference numerals designate like elements, there is shown in
In
Referring to both
In clock cycle 1, the search command and the search data arrives at the control circuit 250′. No activity is associated with clock cycle 1′. In clock cycle 2, the control circuit 250′ decodes the search command. No activity is associated with clock cycle 2′. In clock cycle 3, the search data is loaded from the control circuit 250′ to the left side comparand register 220a. In clock cycle 3′, the search data is loaded fro the control circuit 250′ to the right side comparand register 220b. In clock cycle 4, the left side array 210a executes a search.
In clock cycle 4′, the right side array 210b executes a search. Thus, in the present embodiment, there is only a narrow overlap where both the right and left side arrays 220a, 220b are simultaneously in search mode. More specifically, in the present embodiment at no time are both arrays simultaneously drawing maximum power by being in the first portion of the search operation. Thus, peak power consumption in the device 300 by is reduced by avoiding a state where every CAM cell 100 is simultaneously drawing maximum power.
In clock cycle 5, the left side array 220a outputs its search hits (i.e., matches), if any, to priority encoder 240a. In clock cycle 5′, the right side array 220b outputs its search hits, if any, to priority encoder 240b.
In clock cycle 6, the priority encoder 240a outputs its result to priority encoder 240c. In clock cycle 6′, the priority encoder 240b outputs its result to priority encoder 240c. No task is associated with clock cycle 7. In clock cycle 7′, the priority encoder 240c evaluates the input it received from priority encoders 240a, 240b. No task is associated with clock cycle 8. In clock cycle 8′, the priority encoder 240c outputs its result to the control circuit 250′. In clock cycle 9, the control circuit 250′ outputs the search result (off-chip). No activity is associated with clock cycle 9′.
The first embodiment of the invention therefore operates the device 300 over two clock domains. In one exemplary embodiment, the two clock domains are separated by a half cycle clock cycle, and each clock signal is respectively used to control a similar sequence of operations with respect to the two CAM arrays 210a, 210b. In this manner, the search operation, which in a conventional CAM device would have every CAM cell draw maximum power at the same time is converted into an overlapping operation where only half the CAM cells in the device at any given time is drawing maximum power. As a result, peak power consumption is reduced.
Now referring to
The second exemplary embodiment behaves nearly identically to the first exemplary embodiment during an initial period of each search. More specifically, the two exemplary embodiment operate nearly identically during clock cycles 1-5 and 1′-5′, since during these clock cycles the same operations are performed (i.e., receipt of search command, command decode, command load, execute search, and output matches). The only difference is that four quadrants are searched in the second embodiment while two arrays are searched in the first embodiment. It should be noted that each pair of quadrants (e.g., 210a1, 210a2) in the second embodiment which correspond to an array (e.g., 210a) of the first embodiment is operated in the same clock domain as the array of the first embodiment. That is, quadrants 210a1 and 210a2 are operated on a first clock domain while quadrants 210b1 and 210b2 are operated on a second clock domain. Thus, the second embodiment achieves a power reduction over that of a conventional four quadrant CAM device by ensure that no more than two quadrants operate at peak power simultaneously.
The second embodiment differs more from the first embodiment subsequent to clock cycles 5 and 5′, due to the changes in the number of, and operation of, the priority encoders. As a result, the timing diagram of
In clock cycle 6, priority encoders 240a1 and 240a2 each output their results to priority encoder 240a3. In clock cycle 6′, priority encoders 240b1 and 240b2 output their results to priority encoder 240b3. In clock cycle 7, priority encoder 240a3 outputs its result to priority encoder 240c. In clock cycle 7′, priority encoder 240b3 outputs its result to priority encoder 240c. No task is associated with clock cycle 8. In clock cycle 8′, priority encoder 240c outputs it result to control circuit 250″. In clock cycle 9, the control circuit 250″ outputs the final result of the search process (off-chip). No task is associated with clock cycle 9′.
The memory controller 502 is also coupled to one or more memory buses 507. Each memory bus 507 accepts memory components 508 which include at least one memory device 300 (or 300′) of the present invention. The memory components 508 may be a memory card or a memory module. Examples of memory modules include single inline memory modules (SIMMs) and dual inline memory modules (DIMMs). The memory components 508 may include one or more additional devices 509. For example, in a SIMM or DIMM, the additional device 509 might be a configuration memory, such as a serial presence detect (SPD) memory. The memory controller 502 may also be coupled to a cache memory 505. The cache memory 505 may be the only cache memory in the processing system. Alternatively, other devices, for example, processors 501 may also include cache memories, which may form a cache hierarchy with cache memory 505. If the processing system 500 include peripherals or controllers which are bus masters or which support direct memory access (DMA), the memory controller 502 may implement a cache coherency protocol. If the memory controller 502 is coupled to a plurality of memory buses 507, each memory bus 507 may be operated in parallel, or different address ranges may be mapped to different memory buses 507.
The primary bus bridge 503 is coupled to at least one peripheral bus 510. Various devices, such as peripherals or additional bus bridges may be coupled to the peripheral bus 510. These devices may include a storage controller 511, a miscellaneous I/O device 514, a secondary bus bridge 515 communicating with a secondary bus 516, a multimedia processor 518, and a legacy device interface 520. The primary bus bridge 503 may also coupled to one or more special purpose high speed ports 522. In a personal computer, for example, the special purpose port might be the Accelerated Graphics Port (AGP), used to couple a high performance video card to the processing system 500.
The storage controller 511 couples one or more storage devices 513, via a storage bus 512, to the peripheral bus 510. For example, the storage controller 511 may be a SCSI controller and storage devices 513 may be SCSI discs. The I/O device 514 may be any sort of peripheral. For example, the I/O device 514 may be an local area network interface, such as an Ethernet card. The secondary bus bridge 515 may be used to interface additional devices via another bus 516 to the processing system. For example, the secondary bus bridge 515 may be an universal serial port (USB) controller used to couple USB devices 517 via to the processing system 500. The multimedia processor 518 may be a sound card, a video capture card, or any other type of media interface, which may also be coupled to additional devices such as speakers 519. The legacy device interface 520 is used to couple at least one legacy device 521, for example, older styled keyboards and mice, to the processing system 500.
The processing system 500 illustrated in
While the invention has been described in detail in connection with the exemplary embodiment, it should be understood that the invention is not limited to the above disclosed embodiment. Rather, the invention can be modified to incorporate any number of variations, alternations, substitutions, or equivalent arrangements not heretofore described, but which are commensurate with the spirit and scope of the invention. For example, while the embodiment illustrated by
Claims
1. A content addressable memory (CAM) device, comprising:
- a plurality of CAM arrays, each having a plurality of CAM cells;
- a plurality of first priority encoders, each one of said first priority encoders being coupled to a respective one of said plurality of CAM arrays;
- at least one subsequent priority encoder coupled to said plurality of first priority encoders, said subsequent priority encoder receiving outputs from said plurality of first priority encoders and selecting one of said outputs; and
- a control circuit coupled to said plurality of CAM arrays, said plurality of first priority encoders, and said at least one subsequent priority encoder,
- wherein said control circuit operates said plurality of CAM arrays so that at least a first one of said plurality of CAM arrays is operated in accordance with a first clock domain and at least a second one of said plurality of CAM arrays is operated in accordance with a second different clock domain.
2. The device of claim 1, wherein said plurality of CAM arrays comprises first and second CAM arrays.
3. The device of claim 1, wherein said plurality of CAM array comprises first, second, third, and fourth CAM arrays.
4. The device of claim 3, wherein said first and second CAM arrays are operated in accordance with the first clock domain and said third and fourth CAM arrays are operated in accordance with the second clock domain.
5. The device of claim 4, wherein said at least one subsequent priority encoder comprises:
- a first subsequent priority encoder, coupled to said first and second CAM arrays;
- a second subsequent priority encoder, coupled to said third and fourth CAM arrays; and
- a third subsequent priority encoder coupled to outputs of said first and second subsequent priority encoder.
6. The device of claim 1, wherein said first clock domain and said second clock domain correspond to respective first and second clock signals which are supplied to said device.
7. The device of claim 6, wherein said second clock domain is offset from said first clock domain by any fractional clock cycle.
8. The device of claim 7, wherein said second clock domain is delayed by one half clock cycle from said first clock domain.
9. The device of claim 1, wherein said first clock domain corresponds to a first clock signal and said second clock domain is corresponds to a second clock signal, said first and second clock signals being generated by the control circuit from a master clock signal supplied to said device.
10. The device of claim 9, wherein said second clock domain is offset from said first clock domain by any fractional clock cycle.
11. The device of claim 10, wherein said second clock domain is delayed by one half clock cycle from said first clock domain.
12. The device of claim 1, further comprising at least one comparand register, wherein each of said at least one comparand register is coupled to the control circuit and wherein said at least one comparand register supplies a match pattern to said plurality of CAM arrays.
13. A processor based system, comprising:
- a processor; and
- a memory subsystem, coupled to said processor, said memory subsystem further comprising at least one content addressable memory (CAM) device, wherein at least one of said at least one CAM device further comprises, a plurality of CAM arrays, each having a plurality of CAM cells; a plurality of first priority encoders, each one of said first priority encoders being coupled to a respective one of said plurality of CAM arrays; at least one subsequent priority encoder coupled to said plurality of first priority encoders, said subsequent priority encoder receiving outputs from said plurality of first priority encoders and selecting one of said outputs; and a control circuit coupled to said plurality of CAM arrays, said plurality of first priority encoders, and said at least one subsequent priority encoder, wherein said control circuit operates said plurality of CAM arrays so that at least a first one of said plurality of CAM arrays is operated in accordance with a first clock domain and at least a second one of said plurality of CAM arrays is operated in accordance with a second different clock domain.
14. The system of claim 13, wherein said plurality of CAM arrays comprises first and second CAM arrays.
15. The system of claim 13, wherein said plurality of CAM array comprises first, second, third, and fourth CAM arrays.
16. The system of claim 15, wherein said first and second CAM arrays are operated in accordance with the first clock domain and said third and fourth CAM arrays are operated in accordance with the second clock domain.
17. The system of claim 16, wherein said at least one subsequent priority encoder comprises:
- a first subsequent priority encoder, coupled to said first and second CAM arrays;
- a second subsequent priority encoder, coupled to said third and fourth CAM arrays; and
- a third subsequent priority encoder coupled to outputs of said first and second subsequent priority encoder.
18. The system of claim 13, wherein said first clock domain and said second clock domain correspond to respective first and second clock signals which are supplied to said device.
19. The system of claim 18, wherein said second clock domain is offset from said first clock domain by any fractional clock cycle
20. The system of claim 19, wherein said second clock domain is delayed by one half clock cycle from said first clock domain.
21. The system of claim 13, wherein said first clock domain corresponds to a first clock signal and said second clock domain is corresponds to a second clock signal, said first and second clock signals being generated by the control circuit from a master clock signal supplied to said device.
22. The system of claim 21, wherein said second clock domain is offset from said first clock domain by any fractional clock cycle.
23. The system of claim 22, wherein said second clock domain is delayed by one half clock cycle from said first clock domain.
24. The system of claim 13, further comprising at least one comparand register, wherein each of said at least one comparand register is coupled to the control circuitry and wherein said at least one comparand register supplies a match pattern to said plurality of CAM arrays.
25. A method for operating a content addressable memory (CAM) device, comprising:
- controlling a search operation of at least a first CAM array in accordance with a first clock signal; and
- controlling a search operation of at least a second CAM array in accordance with a second clock signal,
- wherein said first and second clock signals are different.
26. The method of claim 25, further comprising the steps of:
- receiving said first clock signal as a master clock signal supplied to said device; and
- generating said second clock signal from said first clock signal.
27. The method of claim 26, wherein said second clock domain is offset from said first clock domain by any fractional clock cycle.
28. The method of claim 27, wherein said second clock signal is said first clock signal delayed by one half cycle.
29. The method of claim 26, further comprising the steps of:
- receiving said first clock signal from an external source; and
- receiving said second clock signal from the external source.
30. The method of claim 24, wherein said second clock domain is offset from said first clock domain by any fractional clock cycle.
31. The method of claim 30, wherein said second clock signal is equal to said first clock signal delayed by one half cycle.
32. The method of claim 25, further comprising the steps of:
- selecting a first intermediate match from said at least a first CAM array;
- selecting a second intermediate match from said at least a second CAM array; and
- selecting one of said first intermediate match and said second intermediate match asn an output.
33. A router, comprising:
- a processor;
- a first network interface, coupled to said processor;
- a second network interface, coupled to said processor; and
- a memory subsystem, coupled to said processor, said memory subsystem further comprising at least one content addressable memory (CAM) device, wherein at least one of said at least one CAM device further comprises: a plurality of CAM arrays, each having a plurality of CAM cells; a plurality of first priority encoders, each one of said first priority encoders being coupled to a respective one of said plurality of CAM arrays; at least one subsequent priority encoder coupled to said plurality of first priority encoders, said subsequent priority encoder receiving outputs from said plurality of first priority encoders and selecting one of said outputs; and a control circuit coupled to said plurality of CAM arrays, said plurality of first priority encoders, and said at least one subsequent priority encoder, wherein said control circuit operates said plurality of CAM arrays so that at least a first one of said plurality of CAM arrays is operated in accordance with a first clock domain and at least a second one of said plurality of CAM arrays is operated in accordance with a second different clock domain; and
- wherein said processor searches a routing table stored in said memory subsystem to route packets between said first and second network interfaces.
Type: Application
Filed: Jan 28, 2004
Publication Date: Mar 10, 2005
Inventor: William Radke (San Francisco, CA)
Application Number: 10/765,396