Circuitry and method for accessing an associative cache with parallel determination of data and data availability
A circuit for accessing an associative cache is provided. The circuit includes data selection circuitry and an outcome parallel processing circuit both in communication with the associative cache. The outcome parallel processing circuit is configured to determine whether an accessing of data from the associative cache is one of a cache hit, a cache miss, or a cache mispredict. The circuit further includes a memory in communication with the data selection circuitry and the outcome parallel processing circuit. The memory is configured to store a bank select table, whereby the bank select table is configured to include entries that define a selection of one of a plurality of banks of the associative cache from which to output data. Methods for accessing the associative cache are also described.
Latest Sun Microsystems, Inc. Patents:
A cache is a collection of data duplicating original values stored elsewhere, where the original data takes a longer access time relative to accessing the cache.
Another type of cache is a multiple-bank associative cache. The difference between direct-mapped cache and multiple-bank associative cache is that instead of mapping to a single bank, a virtual address associated with the associative cache maps to several banks. For example,
A multiple-bank associative cache performs better (e.g., better cache hit ratios) than a direct-mapped cache. On the other hand, a multiple-bank associative cache takes longer to access data than a direct mapped cache because the associative cache has the added burden of comparing the addresses to determine whether there is a match. As a result, there is a need to provide methods and circuitries for accessing data from a cache that has the fast timing characteristic of a direct-mapped cache while retaining the performance advantages of an associative cache.
SUMMARYBroadly speaking, the present invention fills these needs by providing circuitries and methods for accessing an associative cache. It should be appreciated that the present invention can be implemented in numerous ways, including as a method, a system, or a device. Several inventive embodiments of the present invention are described below.
In accordance with a first aspect of the present invention, a method for accessing an associative cache is provided. In this method, a request is received for data in the associative cache whereby the request includes an address of the data. The data is accessed out at the address of the associative cache. At the same time, an entry is read from a bank select table based on the address of the data. The entry defines a selection of one of a plurality of banks of the associative cache to output the data. A determination is made whether the accessing out the data is one of a cache hit, a cache miss, or a cache mispredict. It should be appreciated that the method operations of accessing out the data, reading an entry from a bank select table, and making the determination are processed in parallel.
In accordance with a second aspect of the present invention, a circuit for accessing an associative cache is provided. The circuit includes data selection circuitry and an outcome parallel processing circuit both in communication with the associative cache. The outcome parallel processing circuit is configured to determine whether an accessing of data from the associative cache is one of a cache hit, a cache miss, or a cache mispredict. The circuit further includes a memory in communication with the data selection circuitry and the outcome parallel processing circuit. The memory is configured to store a bank select table, whereby the bank select table is configured to include entries that define a selection of one of a plurality of banks of the associative cache from which to output data.
In accordance with a third aspect of the present invention, a method for accessing an associative cache is provided. In this method, a least recently used replacement table is provided. The least recently used replacement table is configured to include entries that define one of a plurality of banks of the associative cache to be replaced on a cache miss. On a cache access, a slice of the least recently used replacement table is replaced with entries that define a selection of one of the plurality of banks that is less recently used.
Other aspects and advantages of the invention will become apparent from the following detailed description, taken in conjunction with the accompanying drawings, illustrating by way of example the principles of the invention.
The present invention will be readily understood by the following detailed description in conjunction with the accompanying drawings, and like reference numerals designate like structural elements.
An invention is described for methods and circuitries for accessing an associative cache. It will be obvious, however, to one skilled in the art, that the present invention may be practiced without some or all of these specific details. In other instances, well known process operations have not been described in detail in order not to unnecessarily obscure the present invention.
The embodiments described herein provide methods and circuitries for accessing a multiple-bank associative cache. In one embodiment, at the same time a data is being accessed in the associative cache, an entry from a bank select table is read. As will be explained in more detail below, the read entry defines a selection of a bank of the associative cache to output the data. An outcome parallel processing circuit in communication with the associative cache determines in parallel with the accessing of data whether the selection read from the bank select table results in a cache hit, a cache miss, or a cache mispredict.
Outcome parallel processing circuit 220 is configured to determine whether an accessing of data from associative cache 202 is a cache hit, a cache miss, or a cache mispredict. Outcome parallel processing circuit 220 includes comparators in communication with associative cache 202 and also includes selection circuitries (e.g., multiplexors). As shown in
Data selection circuitry 210 outputs the data being accessed from either Bank 0 or Bank 1 of associative cache 202. Specifically, data selection circuitry 210 selects one of the two banks for output based on a selection input from bank select table 208. First and second selection circuitries 214, 212, as well as data selection circuitry 210, has selection inputs that are read from bank select table 208. Here, bank select table 208 includes entries that define a selection of a particular bank, such as Bank 0 or Bank 1, to output data. For example, since associative cache 202 as shown in
First and second selection circuitries 214, 212 of outcome parallel processing circuit 220 are also driven by the selection input from bank select table 208. First and second selection circuitries 214, 212 are configured to determine whether the data access is a cache hit, a cache miss, or a cache mispredict simultaneously (i.e., in parallel) with the accessing out of the data from data selection circuitry 210. As will be explained in more detail below, first and second comparators 206, 207 compare the address of the accessed data with addresses of entries in each of the banks to determine whether the data is stored in Bank 0, Bank 1, or not stored in associative cache 202. First and second selection circuitries 214, 212 then take the comparison results from first and second comparators 206, 207 and selection input from bank select table 208 to determine whether the outputted data from data selection circuitry 210 is a cache hit, a cache miss, or a cache mispredict.
As will be explained in more detail below, if the accessing out of the data is a cache mispredict, the corresponding entry of the bank select table used to select a bank is replaced with another entry that defines a selection of the bank that actually contains the data. If the accessing out of the data is a cache miss, the corresponding entry of the bank select table is replaced with a randomly generated entry, in accordance with one embodiment of the present invention. The cache itself is then filled into this bank. In other words, the corresponding entry is replaced with a selection of a bank that is randomly generated, and this selection value selects which bank of cache is filled. In another embodiment, if the accessing out of the data is a cache miss, the corresponding entry of the bank select table is replaced with another entry read from a least recently used (LRU) replacement table.
A. Cache Hit
When associative cache 202 is accessed, data from entries 222, 224 of the associative cache are outputted to data selection circuitry 210. Data selection circuitry 210 selects either data from entry 222 of associative cache 202 or data from entry 224 of the associative cache for output depending on a selection input read from bank select table 208. As shown in
At the same time data selection circuitry 210 outputs the data, outcome parallel processing circuit 220 makes a determination on whether the data access is a cache hit, a cache miss, or a cache mispredict. In particular, first comparator 206, which is in communication with Bank 0, compares the virtual address of the accessed data with the addresses from Bank 0 to determine whether the data is stored in Bank 0. Similarly, second comparator 207, which is in communication with Bank 1, compares the virtual address of the accessed data with the addresses from Bank 1 to determine whether the data is stored in Bank 1. The outputs of first and second comparators 206, 207 (e.g., outputs a 1 value if inputs are equal and a 0 value otherwise) are inputted into first and second selection circuitries 214, 212. As discussed above, second selection circuitry 212 is configured to determine whether the data access is a cache hit. Specifically, second selection circuitry 212 selects either the comparator result from first comparator 206 or the comparator result from second comparator 207 for output depending on the selection input read from bank select table 208. In this example, with selection input of 0 read from entry 216 of bank select table 208, second selection circuitry 212 selects comparison result from first comparator 206 for output, which identifies the data access from Bank 0 as a cache hit (as circled in
B. Cache Mispredict
At the same time data selection circuitry 210 outputs the data, outcome parallel processing circuit 220 makes a determination on whether the data access is a cache hit, a cache miss, or a cache mispredict. As discussed above, first and second comparators 206, 207 compare the virtual address of the accessed data with the addresses of their corresponding banks to determine whether the data is stored in Bank 0 or Bank 1. Second selection circuitry 212 is configured to determine whether the data access is a cache hit. In this example, with selection input of 0 read from entry 216 of bank select table 208, second selection circuitry 212 selects the comparison result from first comparator 206 for output, which identifies the data access from Bank 0 as not a cache hit. If the data access is not a cache hit, then the data access is either a cache miss or a cache mispredict (as circled below second selection circuitry 212 in
First selection circuitry 214 is configured to resolve whether the data access is a cache miss or a cache mispredict. Since first and second comparators 206, 207 are in parallel communication with first and second selection circuitries 214, 212, both of the first and second selection circuitries simultaneously receive the comparison results from the first and second comparators. Similar to second selection circuitry 212 and data selection circuitry 210, first selection circuitry 214 selects a comparison result from either first comparator 206 or second comparator 207 for output depending on a selection input read from bank select table 208. However, in this two-bank embodiment, an inverse of the selection input read from bank select table 208 is inputted into first selection circuitry 214. In one embodiment, inverter 502 may be used to invert the value read from bank select table 208. The effect of inverting the selection input is to configure first selection circuitry 214 to select the other comparison result for output. For instance, inverter 502 inverts the selection input of 0 read from entry 216 of bank select table 208 to a value of 1. Accordingly, instead of selecting the comparison result from first comparator 206 for output, first selection circuitry 214 selects the comparison result from second comparator 207 that is associated with Bank 1 for output. In this example, output from first selection circuitry 214 identifies that the requested data is in Bank 1 and not Bank 0. Accordingly, the data access of
Upon a determination that the data access is a cache mispredict, entry 216 of bank select table 208 is replaced with another entry that defines a selection of the bank that contains the data. Thus, entry 216 of bank select table 208 with a value of 0 is replaced with a value of 1. The replacement of entry 216 improves future cache hit rates by redirecting all future data access at the same address to Bank 1, where the data is actually stored, instead of Bank 0.
C. Cache Miss
In this example, virtual address points to two entries 222, 224 of associative cache 202. However, the desired data is not stored in associative cache 202. As described above, when associative cache 202 is accessed, data from entries 222, 224 of the associative cache are outputted to data selection circuitry 210. As shown in
At the same time data selection circuitry 210 outputs the data, outcome parallel processing circuit 220 makes a determination on whether the data access is a cache hit, a cache miss, or a cache mispredict. As discussed above, first and second comparators 206, 207 compare the virtual address of the accessed data with the addresses in the banks to determine whether the data is stored in Bank 0 or Bank 1. In this example, with selection input of 0 read from entry 216 of bank select table 208, second selection circuitry 212 selects comparison result from first comparator 206 for output, which identifies the data access from Bank 0 as not a cache hit. Since the data access is not a cache hit, then the data access is either a cache miss or a cache mispredict (as circled below second selection circuitry 212 in
First selection circuitry 214 is configured to resolve whether the data access is a cache miss or a cache mispredict. First selection circuitry 214 selects a comparison result from either first comparator 206 or second comparator 207 for output depending on an inverse of a selection input read from bank select table 208. In this example, inverter 502 inverts the selection input of 0 read from entry 216 of bank select table 208 to a value of 1. Accordingly, first selection circuitry 214 selects the comparison result from second comparator 207 that is associated with Bank 1 for output. Since the requested data is not stored in entry 224 of Bank 1, the output from first selection circuitry 214 identifies that the requested data is not in Bank 1 (i.e., not a cache mispredict). Thus, the data access of
In one embodiment, when a cache miss occurs, entry 216 of bank select table 208 is replaced with a randomly generated entry. In other words, entry 216 of bank select table 208 is replaced with a randomly generated bank selection. Data is then fetched from the main memory and inserted into associative cache 202 at the bank specified by the randomly generated entry (e.g., Bank 0 or Bank 1), ready for a next access.
In another embodiment, as shown in
As shown in operation 704, after the slice of the LRU replacement table is identified, the slice of the LRU replacement table is replaced with entries that define a selection of a bank of the associative cache that is less recently used. For example, with a two-bank associative cache, if an entry in Bank 0 is less recently used, then a slice of the LRU replacement table that maps to the entry is replaced with Bank 0 identifiers (e.g., a value of 0). On the other hand, if an entry in Bank 1 is less recently used, then a slice of the LRU replacement table that maps to the entry in Bank 1 is replaced with Bank 1 identifiers (e.g., a value of 1).
Every time a cache access occurs, a slice of LRU replacement table 602 is replaced with entries that define a selection of a bank of associative cache 202 that is less recently used. In the example of
In another example,
It should be appreciated that the above described method operations and circuitries can be expanded and applied to associative cache with more than two banks. The outcome parallel processing circuit can have any suitable selection circuitries in any suitable combinations configured to identify a cache hit, a cache miss, and/or a cache mispredict for any bank of the associative cache. The bank select table and LRU replacement table may be scaled based on the number of banks. For example, if associative cache has three banks with sixteen entries each, then an embodiment of the invention would have sixteen bank select tables and LRU replacement tables with each of the tables being 16×16 in size. Each entry of the bank select table and LRU replacement table can store either a 0 value, a 1 value, or a 2 value, which identifies Bank 0, Bank 1, or Bank 2, respectively.
In summary, the above-described invention provides methods and circuitries for accessing a multiple-bank associative cache. By making a determination of whether the data access is a cache hit, a cache miss, or a cache mispredict in parallel with data access, embodiments of the invention eliminate the additional time requirement of a conventional associative cache that sequentially compares addresses and then accesses data. Accordingly, the timing of the above-described embodiments is as good as a direct-mapped cache with one bank. Additionally, the performance of the above-described embodiments is better than a conventional two-bank associative cache. The additional implementation of the LRU replacement table can further improve the performance of the above-described invention by 5-10%.
With the above embodiments in mind, it should be understood that the invention may employ various computer-implemented operations involving data stored in computer systems. These operations are those requiring physical manipulation of physical quantities. Usually, though not necessarily, these quantities take the form of electrical or magnetic signals capable of being stored, transferred, combined, compared, and otherwise manipulated. Further, the manipulations performed are often referred to in terms, such as producing, identifying, determining, or comparing.
Any of the operations described herein that form part of the invention are useful machine operations. The invention also relates to a device or an apparatus for performing these operations. The apparatus may be specially constructed for the required purposes, or it may be a general purpose computer selectively activated or configured by a computer program stored in the computer. In particular, various general purpose machines may be used with computer programs written in accordance with the teachings herein, or it may be more convenient to construct a more specialized apparatus to perform the required operations.
The above described invention may be practiced with other computer system configurations including hand-held devices, microprocessor systems, microprocessor-based or programmable consumer electronics, minicomputers, mainframe computers and the like. Although the foregoing invention has been described in some detail for purposes of clarity of understanding, it will be apparent that certain changes and modifications may be practiced within the scope of the appended claims. Accordingly, the present embodiments are to be considered as illustrative and not restrictive, and the invention is not to be limited to the details given herein, but may be modified within the scope and equivalents of the appended claims. In the claims, elements and/or steps do not imply any particular order of operation, unless explicitly stated in the claims.
Claims
1. A method for accessing an associative cache, comprising method operations of:
- a. receiving a request for data in the associative cache, the request including an address of the data;
- b. accessing out the data at the address of the associative cache;
- c. reading an entry from a bank select table based on the address of the data, the entry defining a selection of one of a plurality of banks of the associative cache to output the data; and
- d. determining whether the accessing out the data is one of a cache hit, a cache miss, or a cache mispredict,
- wherein the method operations b, c, and d are processed in parallel.
2. The method of claim 1, further comprising:
- leaving the entry of the bank select table unchanged if the accessing out the data is the cache hit;
- replacing the entry of the bank select table if the accessing out the data is the cache miss;
- replacing the entry of the bank select table with another entry that defines a selection of one of the plurality of banks that contains the data if the accessing out the data is the cache mispredict.
3. The method of claim 2, wherein the replacing the entry of the bank select table for the cache miss comprises:
- replacing the entry of the bank select table with a randomly generated entry.
4. The method of claim 2, wherein the replacing the entry of the bank select table for the cache miss comprises:
- reading a corresponding entry from a least recently used replacement tables, and
- replacing the entry of the bank select table with the corresponding entry from the least recently used replacement table.
5. The method of claim 1, wherein the method operation of reading the entry from the bank select table based on the address of the data includes,
- calculating hashes from the address of the data; and
- reading the entry from the bank select table using the hashes as an index into the bank select table.
6. The method of claim 1, wherein the method operation of determining whether the accessing out the data includes,
- comparing the address of the data with each of the plurality of banks of the associative cache, the comparison defining comparison results; and
- determining whether the accessing out the data is one of the cache hit, the cache mispredict, or the cache miss based on the comparison results and the entry from the bank select table.
7. A circuit for accessing an associative cache, comprising:
- data selection circuitry in communication with the associative cache;
- an outcome parallel processing circuit in communication with the associative cache, the outcome parallel processing circuit being configured to determine in parallel whether an accessing of data from the associative cache is one of a cache hit, a cache miss, or a cache mispredict, wherein the data selection circuitry and the outcome parallel processing circuit operate in parallel; and
- a first memory in communication with the data selection circuitry and the outcome parallel processing circuit, the first memory being configured to store a bank select table, the bank select table being configured to include entries that define a selection of one of a plurality of banks of the associative cache from which to output data.
8. The circuit of claim 7, further comprising:
- a second memory in communication with the first memory, the second memory being configured to store a least recently used replacement table, the least recently used replacement table being configured to include entries that define one of the plurality of banks of the associative cache to be replaced when the cache miss occurs.
9. The circuit of claim 7, wherein the outcome parallel processing circuit includes,
- comparators in communication with the associative cache; and
- selection circuitries in communication with the comparators, the selection circuitries having selection inputs being read from the bank select table.
10. The circuit of claim 9, wherein at least one of the selection inputs is inversed.
11. The circuit of claim 9, wherein at least one of the selection circuitries is configured to identify the cache hit.
12. The circuit of claim 9, wherein at least one of the selection circuitries is configured to identify one of the cache miss or the cache mispredict.
13. The circuit of claim 7, wherein the data selection circuitry is a multiplexor.
14. A method for accessing an associative cache, comprising method operations of:
- a. receiving a request for data in the associative cache, the request including an address of the data;
- b. accessing out the data at the address of the associative cache;
- c. reading an entry from a bank select table based on the address of the data, the entry defining a first selection of one of a plurality of banks of the associative cache to output the data;
- d. determining whether the accessing out the data is one of a cache hit, a cache miss, or a cache mispredict;
- e. providing a least recently used replacement table, the least recently used replacement table being configured to include entries that define one of the plurality of banks of the associative cache to be replaced on a cache miss; and
- f. replacing a slice of the least recently used replacement table on a cache access with entries that define a second selection of one of the plurality of banks that is less recently used, wherein the slice is defined as one of an entire column or an entire row of the least recently used replacement table if the associative cache is defined by two banks,
- wherein the method operations b, c, and d are processed in parallel.
15. The method of claim 14, further comprising:
- calculating hashes based on the address of data being accessed, the hashes defining an index into the least recently used replacement table.
16. The method of claim 14, further comprising:
- if the cache access is the cache miss,
- reading one of the entries from the least recently used replacement table, and
- replacing a corresponding entry of the bank select table with the one of the entries from the least recently used replacement table.
17. The method of claim 16, wherein the method operation of replacing the corresponding entry of the bank select table increases a cache hit rate.
18. The method of claim 14, wherein the method operation of replacing the slice of the least recently used replacement table is configured to fill values in the slice with a constant.
19. The method of claim 14, wherein the method operation of replacing the slice of the least recently used replacement table includes,
- providing the associative cache with a first bank and a second bank; and
- replacing an entire column of the least recently used replacement table with first bank identifiers if the second bank is most recently used, the least recently used replacement table having a horizontal index based on addresses of the second bank.
20. The method of claim 14, wherein the method operation of replacing the slice of the least recently used replacement table includes,
- providing the associative cache with a first bank and a second bank; and
- replacing an entire row of the least recently used replacement table with second bank identifiers if the first bank is most recently used, the least recently used replacement table having a vertical index based on addresses of the first bank.
6356990 | March 12, 2002 | Aoki et al. |
6678792 | January 13, 2004 | van de Waerdt |
6868471 | March 15, 2005 | Kota |
- Handy, Jim., “The Cache Memory Book,” 1998, Academic Press Limited, 2nd Edition, ISBN: 0-12-322980-4, pp. 16 & 62.
- Hennessy, John L., “Computer Organization and Design”, 1998, Morgna Kaufmann Publishers, Inc., Second Edition, pp. 231 & B-46, ISBN: 1-55860-428-6.
Type: Grant
Filed: Jun 16, 2005
Date of Patent: Dec 2, 2008
Assignee: Sun Microsystems, Inc. (Santa Clara, CA)
Inventors: Paul Caprioli (Mountain View, CA), Sherman H. Yip (San Francisco, CA), Shailender Chaudhry (San Francisco, CA)
Primary Examiner: Kevin Ellis
Assistant Examiner: Shawn Eland
Attorney: Martine Penilla & Gencarella, LLP
Application Number: 11/155,147