CACHE SYSTEMS AND CIRCUITS FOR SYNCING CACHES OR CACHE SETS
A cache system, having a first cache, a second cache, and a logic circuit coupled to control the first cache and the second cache according to an execution type of a processor. When an execution type of a processor is a first type indicating non-speculative execution of instructions and the first cache is configured to service commands from a command bus for accessing a memory system, the logic circuit is configured to copy a portion of content cached in the first cache to the second cache. The cache system can include a configurable data bit. The logic circuit can be coupled to control the caches according to the bit. Alternatively, the caches can include cache sets. The caches can also include registers associated with the cache sets respectively. The logic circuit can be coupled to control the cache sets according to the registers.
At least some embodiments disclosed herein relate generally to cache architecture and more specifically, but not limited to, cache architecture for main and speculative executions by computer processors.
BACKGROUNDA cache is a memory component that stores data closer to a processor than the main memory so that data stored in the cache can be accessed by the processor. Data can be stored in the cache as the result of an earlier computation or an earlier access to the data in the main memory. A cache hit occurs when the data requested by the processor using a memory address can be found in the cache, while a cache miss occurs when it cannot.
In general, a cache is memory which holds data recently used by a processor. A block of memory placed in a cache is restricted to a cache line accordingly to a placement policy. There are three generally known placement policies: direct mapped, fully associative, and set associative. In a direct mapped cache structure, the cache is organized into multiple sets with a single cache line per set. Based on the address of a memory block, a block of memory can only occupy a single cache line. With direct mapped caches, a cache can be designed as a (n*1) column matrix. In a fully associative cache structure, the cache is organized into a single cache set with multiple cache lines. A block of memory can occupy any of the cache lines in the single cache set. The cache with fully associative structure can be designed as a (1*m) row matrix.
A set associative cache is an intermediately designed cache with a structure that is a middle ground between a direct mapped cache and a fully associative cache. A set associative cache can be designed as a (n*m) matrix, where neither the n nor the m is 1. The cache is divided into n cache sets and each set contains m cache lines. A memory block can be mapped to a cache set and then placed into any cache line of the set. Set associative caches can include the range of caches from direct mapped to fully associative when considering a continuum of levels of set associativity. For example, a direct mapped cache can also be described as a one-way set associative cache and a fully associative cache with m blocks can be described as a m-way set associative cache. Directed mapped caches, two-way set associative caches, and four-way set associative caches are commonplace in cache systems.
Speculative execution is a computing technique where a processor executes one or more instructions based on the speculation that such instructions need to be executed under some conditions, before the determination result is available as to whether such instructions should be executed or not.
A memory address in a computing system identifies a memory location in the computing system. Memory addresses are fixed-length sequences of digits conventionally displayed and manipulated as unsigned integers. The length of the sequences of digits or bits can be considered the width of the memory addresses. Memory addresses can be used in certain structures of central processing units (CPUs), such as instruction pointers (or program counters) and memory address registers. The size or width of such structures of a CPU typically determines the length of memory addresses used in such a CPU.
The embodiments are illustrated by way of example and not limitation in the figures of the accompanying drawings in which like references indicate similar elements.
The present disclosure includes techniques to use multiple caches or cache sets of a cache interchangeably with different types of executions by a connected processor. The types of executions can include speculative and non-speculative execution threads. Non-speculative execution can be referred to as main execution or normal execution.
For enhanced security, when a processor performs conditional speculative execution of instructions, the processor can be configured to use a shadow cache during the speculative execution of the instructions, where the shadow cache is separate from the main cache that is used during the main execution or normal execution of instructions. Some techniques of using a shadow cache to improve security can be found in U.S. patent application Ser. No. 16/028,930, filed Jul. 6, 2018 and entitled “Shadow Cache for Securing Conditional Speculative Instruction Execution,” the entire disclosure of which is here by incorporated herein by reference. The present disclosure includes techniques to allow a cache to be configured dynamically as a shadow cache or a main cache; a unified set of cache resources can be dynamically allocated for the shadow cache or for the main cache; and the allocation can be changed during the execution of instructions.
In some embodiments, a system can include a memory system (e.g., including main memory), a processor, and a cache system coupled between the processor and memory system. The cache system can have a set of caches. And, a cache of the set of caches can be designed in multiple ways. For instance, a cache in the set of caches can include cache sets through cache set associativity (which can include physical or logical cache set associativity).
In some embodiments, caches of the system can be changeable between being configured for use in a first type of execution of instructions by the processor and being configured for use in a second type of execution of instructions by the processor. The first type can be a non-speculative execution of instructions by the processor. The second type can be a speculative execution of instructions by the processor.
In some embodiments, cache sets of a cache can be changeable between being configured for use in a first type of execution of instructions by the processor and being configured for use in a second type of execution of instructions by the processor. The first type can be a non-speculative execution of instructions by the processor. And, the second type can be a speculative execution of instructions by the processor.
In some embodiments, speculative execution is where the processor executes one or more instructions based on a speculation that such instructions need to be executed under some conditions, before the determination result is available as to whether such instructions should be executed or not. Non-speculative execution (or main execution, or normal execution) is where instructions are executed in an order according to the program sequence of the instructions.
In some embodiments, the set of caches of the system can include at least a first cache and a second cache. In such examples, the system can include a command bus, configured to receive a read command or a write command from the processor. The system can also include an address bus, configured to receive a memory address from the processor for accessing memory for a read command or a write command. And, a data bus can be included that is configured to: communicate data to the processor for the processor to read; and receive data from the processor to be written in memory. The memory access requests from the processor can be defined by the command bus, the address bus, and the data bus.
In some embodiments, a common command and address bus can replace the command and address buses described herein. Also, in such embodiments, a common connection to the common command and address bus can replace the respective connections to command and address buses described herein.
The system can also include an execution-type signal line that is configured to receive an execution type from the processor. The execution type can be either an indication of a normal or non-speculative execution or an indication of a speculative execution.
The system can also include a configurable data bit that is configured to be set to a first state (e.g., “0”) or a second state (e.g., “1) to change the uses of the first cache and the second cache with respect to non-speculative execution and speculative execution.
The system can also include a logic circuit that is configured to select the first cache for a memory access request from the processor, when the configurable data bit is set to the first state and the execution-type signal line receives an indication of non-speculative execution. The logic circuit can also be configured to select the second cache for a memory access request from the processor, when the configurable data bit is set to the first state and the execution-type signal line receives an indication of speculative execution. The logic circuit can also be configured to select the second cache for a memory access request from the processor, when the configurable data bit is set to the second state and the execution-type signal line receives an indication of a non-speculative execution. The logic circuit can also be configured to select the first cache for a memory access request from the processor, when the configurable data bit is set to the second state and the execution-type signal line receives an indication of a speculative execution.
The system can also include a speculation-status signal line that is configured to receive speculation status from the processor. The speculation status can be either a confirmation or a rejection of a condition with nested instructions that are executed initially by a speculative execution and subsequently by a non-speculative execution when the speculation status is the confirmation of the condition.
The logic circuit can also be configured to select the second cache as identified by the first state of the configurable data bit and restrict the first cache from use or change as identified by the first state of the configurable data bit, when the signal received by the execution-type signal line changes from an indication of a non-speculative execution to an indication of a speculative execution.
Also, the logic circuit can be configured to change the configurable data bit from the first state to the second state and select the second cache for a memory access request when the execution-type signal line receives an indication of a non-speculative execution. This can occur when the signal received by the execution-type signal line changes from the indication of the speculative execution to the indication of the non-speculative execution and when the speculation status received by the speculation-status signal line is the confirmation of the condition.
The logic circuit can also be configured to maintain the first state of the configurable data bit and select the first cache for a memory access request when the execution-type signal line receives an indication of a non-speculative execution. This can occur when the signal received by the execution-type signal line changes from the indication of the speculative execution to the indication of the non-speculative execution and when the speculation status received by the speculation-status signal line is the rejection of the condition. Also, the logic circuit can be configured to invalidate and discard the contents of the second cache, when the signal received by the execution-type signal line changes from the indication of the speculative execution to the indication of the non-speculative execution and when the speculation status received by the speculation-status signal line is the rejection of the condition.
The system can also include a second command bus, configured to communicate a read command or a write command to a main memory connected to the cache system. The read command or the write command can be received from the processor by the cache system. The system can also include a second address bus, configured to communicate a memory address to the main memory. The memory address can be received from the processor by the cache system. The system can also include a second data bus, configured to communicate data to the main memory to be written in memory, and receive data from the main memory to be communicated to the processor to be read by the processor. Memory access requests to the main memory from the cache system can be defined by the second command bus, the second address bus, and the second data bus.
As mentioned, a cache of the set of caches can be designed in multiple ways, and one of those ways includes a cache of a set divided into cache sets through cache set associativity (which can include physical or logical cache set associativity). A benefit of cache design through set associativity is that a single cache with set associativity can have multiple cache sets within the single cache, and thus, different parts of the single cache can be allocated for use by the processor without allocating the entire cache. Therefore, the single cache can be used more efficiently. This is especially the case when the processor executes multiple types of threads or has multiple execution types. For instance, the cache sets within a single cache can be used interchangeably with different execution types instead of the use of interchangeable caches. Common examples of cache division include having two, four, or eight cache sets within a cache.
Also, set associativity cache design is advantageous over other common cache designs when the processor executes main and speculative threads. Since a speculative execution may use less additional cache capacity than the normal or non-speculative execution, the selection mechanism can be implemented at a cache set level and thus reserve less space than an entire cache (i.e., a fraction of a cache) for speculative execution. Cache with set associativity can have multiple cache sets within a set (e.g., division of two, four, or eight cache sets within a cache). For instance, as shown in
As shown in
On the cache set level of a cache, a first cache set (e.g., see cache set 702 depicted in
For example, in a first time instance, a first cache set is used for normal or non-speculative execution and a second cache set is used for speculative execution. In a second time instance, the second cache set is used for normal or non-speculative execution and the first cache set is used for speculative execution. A way of delegating/switching the cache sets for non-speculative and speculative executions can use set associativity via a cache set index within or external to a memory address tag or via a cache set indicator within a memory address tag that is different from a cache set index (e.g., see
As shown in at least
As shown in
In some embodiments, a number of cache sets can be initially allocated for use in the first type of execution (e.g., non-speculative execution). During the second type of execution (e.g., speculative execution), one of the cache sets initially used for the first type of execution or not (such as a reserved cache set) can be used in the second type of execution. Essentially, a cache set allocated for the second type of execution can be initially a free cache set waiting to be used, or selected from the number of cache sets used for the first type of execution (e.g., a cache set that is less likely to be further used in further first type executions).
In general, in some embodiments, the cache system includes a plurality of cache sets. The plurality of cache sets can include a first cache set, a second cache set, and a plurality of registers associated with the plurality of cache sets respectively. The plurality of registers can include a first register associated with the first cache set and a second register associated with the second cache set. The cache system can also include a connection to a command bus coupled between the cache system and a processor, a connection to an address bus coupled between the cache system and the processor, and a connection to a data bus coupled between the cache system and the processor. The cache system can also include a logic circuit coupled to the processor to control the plurality of cache sets according to the plurality of registers.
In such embodiments, the cache system can be configured to be coupled between the processor and a memory system. And, when the connection to the address bus receives a memory address from the processor, the logic circuit can be configured to generate a set index from at least the memory address (e.g., see set index generation 730, 732, 830, 832, 930, and 932 shown in
The cache system can also include a connection to an execution-type signal line from the processor identifying an execution type (e.g., see connection 604d depicted in
Also, when the first and second registers are in a first state, the logic circuit can be configured to: implement commands received from the command bus for accessing the memory system via the first cache set, when the execution type is a first type; and implement commands received from the command bus for accessing the memory system via the second cache set, when the execution type is a second type. Also, when the first and second registers are in a second state, the logic circuit can be configured to: implement commands received from the command bus for accessing the memory system via another cache set of the plurality of cache sets besides the first cache set, when the execution type is the first type; and implement commands received from the command bus for accessing the memory system via another other cache set of the plurality of cache sets besides the second cache set, when the execution type is the second type. In such an example, each one of the plurality of registers can be configured to store a set index, and when the execution type changes from the second type to the first type, the logic circuit can be configured to change the content stored in the first register and the content stored in the second register.
In some embodiments, the first type is configured to indicate non-speculative execution of instructions by the processor; and the second type is configured to indicate speculative execution of instructions by the processor. In such embodiments, the cache system can further include a connection to a speculation-status signal line from the processor identifying a status of a speculative execution of instructions by the processor (e.g., see connection 1002 shown in
Additionally, the cache systems described herein (e.g., cache systems 200, 400, 600, and 1000) can each include or be connected to background syncing circuitry (e.g., see background syncing circuitry 1102 shown in
For example, the content of a cache or cache set that is initially delegated for a speculative execution (e.g., an extra cache or a spare cache set delegated for a speculative execution) can be synced with a corresponding cache or cache set used by a normal or non-speculative execution (to have the cache content of the normal execution), such that if the speculation is confirmed, the cache or cache set that is initially delegated for the speculative execution can immediately join the cache sets of a main or non-speculative execution. Also, the original cache set corresponding to the cache or cache set that is initially delegated for the speculative execution can be removed from the group of cache sets used for the main or non-speculative execution. In such embodiments, a circuit, such as a circuit including the background synching circuitry, can be configured to synchronize caches or cache sets in the background to reduce the impact of cache set syncing on cache usage by the processor. Also, the synchronization of the cache or cache sets can continue either until the speculation is abandoned, or until the speculation is confirmed and the syncing is complete. The synchronization may optionally include syncing (e.g., writing back) to the memory.
In some embodiments, a cache system can include a first cache and a second cache as well as a connection to a command bus coupled between the cache system and a processor, a connection to an address bus coupled between the cache system and the processor, a connection to a data bus coupled between the cache system and the processor, and a connection to an execution-type signal line from the processor identifying an execution type (e.g., see cache systems 200 and 400). Such a cache system can also include a logic circuit coupled to control the first cache and the second cache according to the execution type, and the cache system can be configured to be coupled between the processor and a memory system. Also, when the execution type is a first type indicating non-speculative execution of instructions by the processor and the first cache is configured to service commands from the command bus for accessing the memory system, the logic circuit can be configured to copy a portion of content cached in the first cache to the second cache (e.g., see operation 1202). Further, the logic circuit can be configured to copy the portion of content cached in the first cache to the second cache independent of a current command received in the command bus.
Additionally, when the execution type is the first type indicating non-speculative execution of instructions by the processor and the first cache is configured to service commands from the command bus for accessing the memory system, the logic circuit can be configured to service subsequent commands from the command bus using the second cache in response to the execution type being changed from the first type to a second type indicating speculative execution of instructions by the processor (e.g., see operation 1208). In such an example, the logic circuit can be configured to complete synchronization of the portion of the content from the first cache to the second cache before servicing the subsequent commands after the execution type is changed from the first type to the second type (e.g., see
In such embodiments, the cache system can also include a configurable data bit, wherein the logic circuit is further coupled to control the first cache and the second cache according to the configurable data bit. Also, in such embodiments, the cache system can further include a plurality of cache sets. For instance, the first cache and the second cache together can include the plurality of cache sets, and a plurality of cache sets can include a first cache set and a second cache set. The cache system can also include a plurality of registers associated with the plurality of cache sets respectively. The plurality of registers can include a first register associated with the first cache set and a second register associated with the second cache set. And, in such embodiments, the logic circuit can be further coupled to control the plurality of cache sets according to the plurality of registers.
In some embodiments, a cache system can include a plurality of cache sets that includes a first cache set and a second cache set. The cache system can also include a plurality of registers associated with the plurality of cache sets respectively, which includes a first register associated with the first cache set and a second register associated with the second cache set. In such embodiments, the cache system can include a plurality of caches that include a first cache and a second cache, and the first cache and the second cache together can include at least part of the plurality of cache sets. Such a cache system can also include a connection to a command bus coupled between the cache system and a processor, a connection to an address bus coupled between the cache system and the processor, a connection to a data bus coupled between the cache system and the processor, and a connection to an execution-type signal line from the processor identifying an execution type, as well as a logic circuit coupled to control the plurality of cache sets according to the execution type.
In such embodiments, the cache system can be configured to be coupled between the processor and a memory system. And, when the execution type is a first type indicating non-speculative execution of instructions by the processor and the first cache set is configured to service commands from the command bus for accessing the memory system, the logic circuit is configured to copy a portion of content cached in the first cache set to the second cache set. The logic circuit can also be configured to copy the portion of content cached in the first cache set to the second cache set independent of a current command received in the command bus.
Also, when the execution type is the first type indicating non-speculative execution of instructions by the processor and the first cache set is configured to service commands from the command bus for accessing the memory system, the logic circuit can be configured to service subsequent commands from the command bus using the second cache set in response to the execution type being changed from the first type to a second type indicating speculative execution of instructions by the processor. The logic circuit can also be configured to complete synchronization of the portion of the content from the first cache set to the second cache set before servicing the subsequent commands after the execution type is changed from the first type to the second type. The logic circuit can also be configured to continue synchronization of the portion of the content from the first cache set to the second cache set while servicing the subsequent commands. And, the logic circuit can be further coupled to control the plurality of cache sets according to the plurality of registers.
In addition to using a shadow cache for securing speculative executions, and synchronizing content between a main cache and the shadow cache to save the content cached in the main cache in preparation of acceptance of the content in the shadow cache, a spare cache set can be used to accelerate the speculative executions. Also, a spare cache set can be used to accelerate the speculative executions without use of a shadow cache. Use of a spare cache set is useful with shadow cache implementations because data held in cache sets used as a shadow cache can be validated and therefore used for normal execution and some cache sets used as the main cache may not be ready to be used as the shadow cache. Thus, one or more cache sets can be used as spare cache sets to avoid delays from waiting for cache set availability. To put it another way, once a speculation is confirmed, the content of the cache sets used as a shadow cache is confirmed to be valid and up-to-date; and thus, the former cache sets used as the shadow cache for speculative execution are used for normal execution. However, some of the cache sets initially used as the normal cache may not be ready to be used for a subsequent speculative execution. Therefore, one or more cache sets can be used as spares to avoid delays from waiting for cache set availability and accelerate the speculative executions.
In some embodiments, if the syncing from a cache set in the normal cache to a corresponding cache set in the shadow cache has not yet been completed, the cache set in the normal cache cannot be freed immediately for use in the next speculative execution. In such a situation, if there is no spare cache set, the next speculative execution has to wait until the syncing is complete so that the corresponding cache set in the normal cache can be freed. This is just one example, of when a spare cache set is beneficial and can be added to an embodiment. And, there are many other situations when cache sets in the normal cache cannot be freed immediately so a spare cache set can be useful.
Also, in some embodiments, the speculative execution may reference a memory region that has no overlapping with the memory region cached in the cache sets used in the normal cache. As a result of accepting the result of the speculative execution, the cache sets in the shadow cache and the normal cache may all be in the normal cache. This can cause delays as well, because it takes time for the cache system to free a cache set to support the next speculative execution. To free one, the cache system can identify a cache set, such as a least used cache set, and synchronize the cache set with the memory system. If the cache has data that is more up to date than the memory system, the data can be written into the memory system.
Additionally, a system using a spare cache set can also use background synchronizing circuitry such as the background synchronizing circuitry 1102 depicted in
In addition to using a shadow cache, synchronizing content between a main cache and the shadow cache, and using a spare cache set, extended tags can be used to improve use of interchangeable caches and caches sets for different types of executions by a processor (such as speculative and non-speculative executions). There are many different ways to address cache sets and cache blocks within a cache system using extended tagging. Two example ways are shown in
In general, cache sets and cache blocks can be selected via a memory address. In some examples, selection is via set associativity. Both examples in
In some embodiments, including embodiments shown in
Also, as shown in
Both of the embodiments shown in
In addition to using extended tags as well as other techniques disclosed herein to improve use of interchangeable caches and caches sets for different types of executions by a processor, a circuit included in or connected to the cache system can be used to map physical outputs from cache sets of a cache hardware system to a logical main cache and a logical shadow cache for normal and speculative executions by the processor respectively. The mapping can be according to at least one control register (e.g., a physical-to-logical-set-mapping (PLSM) register).
Also, disclosed herein are computing devices having cache systems having interchangeable cache sets utilizing a mapping circuit (such as mapping circuit 1830 shown in
In a conventional cache, each cache set is statically associated with a particular value of “Index S”/“Block Index L”. In the cache systems disclosed herein, any cache set can be used for any purpose for any index value S/L and for a main cache or a shadow cache. Cache sets can be used and defined by data in cache set registers associated with the cache sets. A selection logic can then be used to select the appropriate result based on the index value of S/L and how the cache sets are used.
For example, four cache sets, a cache set 0 to set 3, can be initially used for a main cache for S/L=00, 01, 10 and 11 respectively. A fourth cache set can be used as the speculative cache for S/L=00, assuming that speculative execution does not change the cache sets defined by 01, 10 and 11. If the result of the speculative execution is required, the mapping data can be changed to indicate that the main cache for S/L=00, 01, 10 and 11 are respectively for the fourth cache set, cache set 1, cache set 2, and cache set 3. Cache set 0 can then be freed or invalidated for subsequent use in a speculative execution. If the next speculative execution needs to change the cache set S/L to 01, cache set 0 can be used as the shadow cache (e.g., copied from cache set 1 and used to look up content for addresses with S/L equaling ‘01’.
Also, the cache system and processor does not merely switch back and forth between a predetermined main thread and a predetermined speculative thread. Consider the speculative execution of the following pseudo-program.
Instructions A;
If condition=true,
-
- then Instructions B;
End conditional loop;
Instructions C; and
Instructions D.
For the pseudo-program, the processor can run two threads.
Thread A:
Instructions A;
Instructions C; and
Instructions D.
Thread B:
Instructions A;
Instructions B;
Instructions C; and
Instructions D.
The execution of Instructions B is speculative because it depends on the test result of “condition=true” instead of “condition=false”. The execution of Instructions B is required only when condition=true. By the time the result of the test “condition=true” becomes available, the execution of Thread A reached Instructions D and the execution of Thread A may reach Instructions C. If the test result requires the execution of Instructions B, cache content for thread B is correct and cache content for thread A is incorrect. Then, all changes made in the cache according to Thread B should be retained and the processor can continue the execution of Instructions C using the cache that has the results of executing Instructions B; and Thread A is terminated. Since the changes made according to Thread B is in the shadow cache, the content of the shadow cache should be accepted as the main cache. If the test result requires no execution of Instructions B, the results of the Thread B is discarded (e.g., the content of the shadow cache is discarded or invalidated).
The cache sets used for the shadow and the normal cache can be swapped or changed according to a mapping circuit and a control register (e.g., a physical-to-logical-set-mapping (PLSM) register). In some embodiments, a cache system can include a plurality of cache sets, having a first cache set configured to provide a first physical output upon a cache hit and a second cache set configured to provide a second physical output upon a cache hit. The cache system can also include a connection to a command bus coupled between the cache system and a processor and a connection to an address bus coupled between the cache system and the processor. The cache system can also include the control register, and the mapping circuit coupled to the control register to map respective physical outputs of the plurality of cache sets to a first logical cache and a second logical cache according to a state of the control register. The cache system can be configured to be coupled between the processor and a memory system.
When the connection to the address bus receives a memory address from the processor and when the control register is in a first state, the mapping circuit can be configured to: map the first physical output to the first logical cache for a first type of execution by the processor to implement commands received from the command bus for accessing the memory system via the first cache set during the first type of execution; and map the second physical output to the second logical cache for a second type of execution by the processor to implement commands received from the command bus for accessing the memory system via the second cache set during the second type of execution. And, when the connection to the address bus receives a memory address from the processor and when the control register is in a second state, the mapping circuit is configured to: map the first physical output to the second logical cache to implement commands received from the command bus for accessing the memory system via the first cache set during the second type of execution; and map the second physical output to the first logical cache to implement commands received from the command bus for accessing the memory system via the second cache set for the first type of execution.
In some embodiments, the first logical cache is a normal cache for non-speculative execution by the processor, and the second logical cache is a shadow cache for speculative execution by the processor.
Also, in some embodiments, the cache system can further include a plurality of registers associated with the plurality of cache sets respectively, including a first register associated with the first cache set and a second register associated with the second cache set. The cache system can also include a logic circuit coupled to the processor to control the plurality of cache sets according to the plurality of registers. When the connection to the address bus receives a memory address from the processor, the logic circuit can be configured to generate a set index from at least the memory address, as well as determine whether the generated set index matches with a content stored in the first register or with a content stored in the second register. And, the logic circuit can be configured to implement a command received in the connection to the command bus via the first cache set in response to the generated set index matching with the content stored in the first register and via the second cache set in response to the generated set index matching with the content stored in the second register.
In some embodiments, the mapping circuit can be a part of or connected to the logic circuit and the state of the control register can control a state of a cache set of the plurality of cache sets. In some embodiments, the state of the control register can control the state of a cache set of the plurality of cache sets by changing a valid bit for each block of the cache set.
Also, in some examples, the cache system can further include a connection to a speculation-status signal line from the processor identifying a status of a speculative execution of instructions by the processor. The connection to the speculation-status signal line can be configured to receive the status of a speculative execution, and the status of a speculative execution can indicate that a result of a speculative execution is to be accepted or rejected. When the execution type changes from the speculative execution to a non-speculative execution, the logic circuit can be configured to change, via the control register, the state of the first and second cache sets, if the status of speculative execution indicates that a result of speculative execution is to be accepted (e.g., when the speculative execution is to become the main thread of execution). And, when the execution type changes from the speculative execution to a non-speculative execution, the logic circuit can be configured to maintain, via the control register, the state of the first and second cache sets without changes, if the status of speculative execution indicates that a result of speculative execution is to be rejected.
In some embodiments, the mapping circuit is part of or connected to the logic circuit and the state of the control register can control a state of a cache register of the plurality of cache registers via the mapping circuit. In such examples, the cache system can further include a connection to a speculation-status signal line from the processor identifying a status of a speculative execution of instructions by the processor. The connection to the speculation-status signal line can be configured to receive the status of a speculative execution, and the status of a speculative execution indicates that a result of a speculative execution is to be accepted or rejected. When the execution type changes from the speculative execution to a non-speculative execution, the logic circuit can be configured to change, via the control register, the state of the first and second registers, if the status of speculative execution indicates that a result of speculative execution is to be accepted. And, when the execution type changes from the speculative execution to a non-speculative execution, the logic circuit can be configured to maintain, via the control register, the state of the first and second registers without changes, if the status of speculative execution indicates that a result of speculative execution is to be rejected.
Additionally, the present disclosure includes techniques to secure speculative instruction execution using multiple interchangeable caches that are each interchangeable as a shadow cache or a main cache. The speculative instruction execution can occur in a processor of a computing device. The processor can execute two different types of threads of instructions. One of the threads can be executed speculatively (such as with a condition that has not yet been evaluated). The data of the speculative thread can be in a logical cache acting as a shadow cache. The data of a main thread can be in a logical cache acting as a main cache. Subsequently, when the result of evaluating the condition becomes available, the processor can keep the results of executing the speculative thread when the condition requires the execution of the thread, or remove the results. The hardware circuit for the cache acting as a shadow cache can be repurposed as the hardware circuit for the main cache by changing the content of the register. Thus, there is no need to synchronize the main cache with the shadow cache if the execution of the speculative thread is required.
The techniques disclosed herein also relate to the use of a unified cache structure that can be used to implement, with improved performance, a main cache and a shadow cache. In the unified cache structure, results of cache sets can be dynamically remapped using a set of registers to switch being in the main cache and being in the shadow cache. When a speculative execution is successful, the cache set used with the shadow cache has the correct data and can be remapped as the corresponding cache set for the main cache. This eliminates a need to copy the data from the shadow cache to the main cache as used by other techniques using shadow and main caches.
In general, a cache can be configured as multiple sets of blocks. Each block set can have multiple blocks and each block can hold a number bytes. A memory address can be partitioned into three segments for accessing the cache: tag, block index (which can be for addressing a set within the multiple sets), and cache block (which can be for addressing a byte in a block of bytes). For each block in a set, the cache stores not only the data from the memory, but can also store a tag of the address from which the data is loaded and a field indicating whether the content in the block is valid. Data can be retrieved from the cache using the block index (e.g., set ID) and the cache block (e.g., byte ID). The tag in the retrieved data is compared with the tag portion of the address. A matched tag means the data is cached for the address. Otherwise, it means that the data can be cached for another address that is mapped to the same location in the cache.
With the techniques using multiple interchangeable caches, the physical cache sets of the interchangeable caches are not hardwired as main cache or shadow cache. A physical cache set can be used either as a main cache set or a shadow cache set. And, a set of registers can be used to specify whether the physical cache set is currently being used as a main cache set or a shadow cache set. In general, a mapping can be constructed to translate the outputs of the physical cache sets as logical outputs of the corresponding cache sets represented by the block index (e.g., set ID) and the main status or shadow status. The remapping allows any available physical cache to be used as a shadow cache.
In some embodiments, the unified cache architecture can remap a shadow cache (e.g., speculative cache) to a main cache, and can remap a main cache to a speculative cache. It is to be understood that designs can include any number of caches or cache sets that can interchange between being main or speculative caches or cache sets.
It is to be understood that there are no physical distinctions in the hardwiring of the main and speculative caches or cache sets. And, in some embodiments, there are no physical distinctions in the hardwiring of the logic units described herein. It is to be understood that interchangeable caches or cache sets do not have different caching capacity and structure. Otherwise, such caches or cache sets would not be interchangeable. Also, the physical cache sets can dynamically be configured to be main or speculative, such as with no a priori determination.
Also, it is to be understood that interchangeability occurs at the cache level and not at the cache block level. Interchangeability at cache block level may allow the main cache and the shadow cache to have different capacity; and thus, not be interchangeable.
Also, in some embodiments, when a speculation, by a processor, is successful and a cache is being used as a main cache as well as another cache is being used as a speculative or shadow cache, the valid bits associated with cache index blocks of the main cache are all set to indicate invalid (e.g., indicating invalid by a “0” bit value). In such embodiments, the initial states of all the valid bits of the speculative cache are indicative of invalid but then changed to indicate valid since the speculation was successful. In other words, the previous state of the main cache is voided, and the previous state of the speculative cache is set from invalid to valid and accessible by a main thread.
In some embodiments, a PLSM register for the main cache can be changed from indicating the main cache to indicating the speculative cache. The change in the indication, by the PLSM register, of the main cache to the speculative cache can occur by the PLSM register receiving a valid bit of the main cache which indicates invalid after a successful speculation. For example, after a successful speculation and where a first cache is initially a main cache and a second cache is initially a speculative cache, an invalid indication of bit “0” can replace a least significant bit in a 3-bit PLSM register for the first cache, which can change “011” to “010” (or “3” to “2”). And, for a 3-bit PLSM register for the second cache, a valid indication of bit “1” can replace a least significant bit in the PLSM register, which can change “010” to “011” (or “2” to “3”). Thus, as shown by the example, a PLSM register, which is initially for a first cache (e.g., main cache) and initially selecting the first cache, is changed to selecting the second cache (e.g., speculative cache) after a successful speculation. And, as shown by the example, a PLSM register, which is initially for a second cache (e.g., speculative cache) and initially selecting the second cache, is changed to selecting the first cache (e.g., main cache) after a successful speculation. With such a design, a main thread of the processor can first access a cache initially designated as a main cache and then access a cache initially designated as a speculative cache after a successful speculation by the processor. And, a speculative thread of the processor can first access a cache initially designated as a speculative cache and then access a cache initially designated as a main cache after a successful speculation by the processor.
For example, data of all memory addresses having the same block index part 106a and block offset part 108a can be stored in the same physical location in a cache for a given execution type. When the data at the memory address 102a is stored in the cache, tag part 104a is also stored for the block containing the memory address to identify which of the addresses having the same block index part 106a and block offset part 108a is currently being cached at that location in the cache.
The data at a memory address can be cached in different locations in a unified cache structure for different types of executions. For example, the data can be cached in a main cache during non-speculative execution; and subsequent cached in a shadow cache during speculative execution. Execution type 110a can be combined with the tag part 104a to select from caches that can be dynamically configured for use in main and speculative executions without restriction. There can be many different ways to implement the use of the combination of execution type 110a and tag part 104a to make the selection. For example, logic circuit 206 depicted in
In a relatively simple implementation, the execution type 110a can be combined with the tag part 104a to form an extended tag in determining whether a cache location contains the data for the memory address 102a and for the current type of execution of instructions. For example, a cache system can use the tag part 104a to select a cache location without distinction of execution types; and when the tag part 104a is combined with the execution type 110a to form an extended tag, the extended tag can be used in a similar way to select a cache location in executions that have different types (e.g., speculative execution and non-speculative execution), such that the techniques of shadow cache can be implemented to enhance security. Also, since the information about the execution type associated with cached data is shared among many cache locations (e.g., in a cache set, or in a cache having multiple cache sets), it is not necessary to store the execution type for individual locations; and a selection mechanism (e.g., a switch, a filter, or a multiplexor such as a data multiplexor) can be used to implement the selection according to the execution type). Alternatively, the physical caches or physical cache sets used for different types of executions can be remapped to logical caches pre-associated with the different types of executions respectively. Thus, the use of the logical caches can be selected according to the execution type 110a.
For example, a plurality of cache sets can be configured in a cache, where each cache set can be addressed using cache set index 112b. A data set associated with the same cache set index can be cached in a same cache set. The tag part 104b of a data block cached in the cache set can be stored in the cache in association with the data block. When the address 102b is used to retrieve data from the cache set identified using the cache set index 112b, the tag part of the data block stored in the cache set can be retrieved and compared with the tag part 104b to determine whether there is a match between the tag 104b of the address 102b of the access request and the tag 104b stored in the cache set identified by the cache set index 112b and stored for the cache block identified by the block index 106b. If there is a match (such as a cache hit), the cache block stored in the cache set is for the memory address 112b; otherwise, the cache block stored in the cache set is for another the memory address that has the same cache set index 112b and the same block index 106b as the memory address 102b, which results in a cache miss. In response to a cache miss, the cache system accesses the main memory to retrieve the data block according to the address 102b. To implement shadow cache techniques, the cache set index 112b can be combined with the execution type 110a to form an extended cache set index. Thus, cache sets used for different types of executions for different cache set indices can be addressed using the extended cache set index that identifies both the cache set index and the execution type.
In
Also, as shown in
Also, as shown in
Also, as shown in
The cache system 200 is shown including a connection 204a to a command bus 205a coupled between the cache system and the processor 201. The cache system 200 is shown including a connection 204b to an address bus 205b coupled between the cache system and the processor 201. Addresses 102a, 102b, 102c, 102d, and 102e depicted in
Not shown in
In some embodiments, the cache system 200 can include a first cache (e.g., see cache 202a) and a second cache (e.g., see cache 202b). In such embodiments, as shown in
When the configurable data bit is in a first state (e.g., see data 312 depicted in
When the configurable data bit is in a second state (e.g., see data 314 depicted in
In some embodiments, when the execution type changes from the second type to the first type, the logic circuit 206 is configured to toggle the configurable data bit.
Also, as shown in
When the configurable data bit is in a second state, the logic circuit 206 is configured to provide commands to the second command bus 209a for accessing the memory system 203 via the second cache, when the execution type is the first type. Also, when the configurable data bit is in a second state, the logic circuit 206 is configured to provide commands to the second command bus 209a for accessing the memory system 203 via the first cache, when the execution type is the second type.
In some embodiments, the connection 204a to the command bus 205a is configured to receive a read command or a write command from the processor 201 for accessing the memory system 203. Also, the connection 204b to the address bus 205b can be configured to receive a memory address from the processor 201 for accessing the memory system 203 for the read command or the write command. Also, the connection 204c to the data bus 205c can be configured to communicate data to the processor 201 for the processor to read the data for the read command. And, the connection 204c to the data bus 205c can also be configured to receive data from the processor 201 to be written in the memory system 203 for the write command. Also, the connection 204d to the execution-type signal line 205d can be configured to receive an identification of the execution type from the processor 201 (such as an identification of a non-speculative or speculative type of execution performed by the processor).
In some embodiments, the logic circuit 206 can be configured to select the first cache for a memory access request from the processor 201 (e.g., one of the commands received from the command bus for accessing the memory system), when the configurable data bit is in the first state and the connection 204d to the execution-type signal line 205d receives an indication of the first type (e.g., the non-speculative type). Also, the logic circuit 206 can be configured to select the second cache for a memory access request from the processor 201, when the configurable data bit is in the first state and the connection 204d to the execution-type signal line 205d receives an indication of the second type (e.g., the speculative type). Also, the logic circuit 206 can be configured to select the second cache for a memory access request from the processor 201, when the configurable data bit is in the second state and the connection 204d to the execution-type signal line 205d receives an indication of the first type. And, the logic circuit 206 can be configured to select the first cache for a memory access request from the processor 201, when the configurable data bit is in the second state and the connection 204d to the execution-type signal line 205d receives an indication of the second type.
The illustrated lines 320 connecting the register 306 to the caches 302 and 304 can be a part of the logic circuit 206.
In some embodiments, instead of using a configurable bit to control use of the caches of the cache system 200, another form of data may be used to control use of the caches of the cache system. For instance, the logic circuit 206 can be configured to control the first cache (e.g., see cache 202a) and the second cache (e.g., see cache 202b) based on different data being stored in the register 306 that is not the configurable bit. In such an example, when the register 306 stores first data or is in a first state, the logic circuit can be configured to: implement commands received from the command bus for accessing the memory system via the first cache, when the execution type is a first type; and implement commands received from the command bus for accessing the memory system via the second cache, when the execution type is a second type. And, when the register 306 stores second data or is in a second state, the logic circuit can be configured to: implement commands received from the command bus for accessing the memory system via the second cache, when the execution type is the first type; and implement commands received from the command bus for accessing the memory system via the first cache, when the execution type is the second type.
Similarly, the cache system 400 is shown including connection 204a to command bus 205a coupled between the cache system and the processor 401. The system 400 also includes connection 204b to an address bus 205b coupled between the cache system and the processor 401. Addresses 102a, 102b, 102c, 102d, and 102e depicted in
In some embodiments, the cache system 400 can include a first cache (e.g., see cache 202a) and a second cache (e.g., see cache 202b). In such embodiments, as shown in
In some embodiments, such as shown in
Also, when the execution type changes from the second type or the speculative type to the first type or non-speculative type, the logic circuit 406 of system 400 can be configured to toggle the configurable data bit, if the status of speculative execution indicates that a result of speculative execution is to be accepted. Further, when the execution type changes from the second type or the speculative type to the first type or non-speculative type, the logic circuit 406 of system 400 can be configured to maintain the configurable data bit without changes, if the status of speculative execution indicates that a result of speculative execution is to be rejected.
Also, similarly, in
In some embodiments, instead of using a configurable bit to control use of the caches of the cache system 400, another form of data may be used to control use of the caches of the cache system 400. For instance, the logic circuit 406 in the system 400 can be configured to control the first cache (e.g., see cache 202a) and the second cache (e.g., see cache 202b) based on different data being stored in the register 306 that is not the configurable bit. In such an example, when the register 306 stores first data or is in a first state, the logic circuit can be configured to: implement commands received from the command bus for accessing the memory system via the first cache, when the execution type is a non-speculative type; and implement commands received from the command bus for accessing the memory system via the second cache, when the execution type is a speculative type. And, when the register 306 stores second data or is in a second state, the logic circuit can be configured to: implement commands received from the command bus for accessing the memory system via the second cache, when the execution type is the non-speculative type; and implement commands received from the command bus for accessing the memory system via the first cache, when the execution type is the speculative type.
Some embodiments can include a cache system and the cache system can include a plurality of caches including a first cache and a second cache. The system can also include a connection to a command bus, configured to receive a read command or a write command from a processor connected to the cache system, for reading from or writing to a memory system. The system can also include a connection to an address bus, configured to receive a memory address from the processor for accessing the memory system for the read command or the write command. The system can also include a connection to a data bus, configured to: communicate data to the processor for the processor to read the data for the read command; and receive data from the processor to be written in the memory system for the write command. In such examples, the memory access requests from the processor and memory used by the processor can be defined by the command bus, the address bus, and the data bus). The system can also include an execution-type signal line, configured to receive an identification of execution type from the processor. The execution type is either a first execution type or a second execution type (e.g., a normal or non-speculative execution or a speculative execution).
The system can also include a configurable data bit configured to be set to a first state (e.g., “0”) or a second state (e.g., “1) to control selection of the first cache and the second cache for use by the processor).
The system can also include a logic circuit, configured to select the first cache for use by the processor, when the configurable data bit is in a first state and the execution-type signal line receives an indication of the first type of execution. The logic circuit can also be configured to select the second cache for use by the processor, when the configurable data bit is in the first state and the execution-type signal line receives an indication of the second type of execution. The logic circuit can also be configured to select the second cache for use by the processor, when the configurable data bit is in the second state and the execution-type signal line receives an indication of the first type of execution. The logic circuit can also be configured to select the first cache for use by the processor, when the configurable data bit is in the second state and the execution-type signal line receives an indication of the second type of execution.
In some embodiments, the first type of execution is a speculative execution of instructions by the processor, and the second type of execution is a non-speculative execution of instructions by the processor (e.g., a normal or main execution). In such examples, the system can further include a connection to a speculation-status signal line that is configured to receive speculation status from the processor. The speculation status can be either an acceptance or a rejection of a condition with nested instructions that are executed initially by a speculative execution of the processor and subsequently by a normal execution of the processor when the speculation status is the acceptance of the condition.
In some embodiments, the logic circuit is configured to switch the configurable data bit from the first state to the second state, when the speculation status received by the speculation-status signal line is the acceptance of the condition. The logic circuit can also be configured to maintain the state of the configurable data bit, when the speculation status received by the speculation-status signal line is the rejection of the condition.
In some embodiments, the logic circuit is configured to select the second cache for use as identified by the first state of the configurable data bit and restrict the first cache from use as identified by the first state of the configurable data bit, when the signal received by the execution-type signal line changes from an indication of a normal execution to an indication of a speculative execution. At this change, a speculation status can be ignored/bypassed by the logic circuit because the processor is in speculative execution does not know whether the instructions preformed under the speculative execution should be executed or not by the main execution.
The logic circuit can also be configured to maintain the first state of the configurable data bit and select the first cache for a memory access request when the execution-type signal line receives an indication of a normal execution, when the signal received by the execution-type signal line changes from the indication of the speculative execution to the indication of the normal execution and when the speculation status received by the speculation-status signal line is the rejection of the condition.
In some embodiments, the logic circuit is configured to invalidate and discard the contents of the second cache, when the signal received by the execution-type signal line changes from the indication of the speculative execution to the indication of the normal execution and when the speculation status received by the speculation-status signal line is the rejection of the condition.
In some embodiments, the system further includes a connection to a second command bus, configured to communicate a read command or a write command to the memory system (e.g., including main memory). The read command or the write command can be received from the processor by the cache system. The system can also include a connection to a second address bus, configured to communicate a memory address to the memory system. The memory address can be received from the processor by the cache system. The system can also include a connection to a second data bus, configured to: communicate data to the memory system to be written in the memory system; and receive data from the memory system to be communicated to the processor to be read by the processor. For instance, memory access requests to the memory system from the cache system can be defined by the second command bus, the second address bus, and the second data bus.
In some embodiments, when the configurable data bit is in a first state, the logic circuit is configured to: provide commands to the second command bus for accessing the memory system via the first cache, when the execution type is a first type; and provide commands to the second command bus for accessing the memory system via the second cache, when the execution type is a second type. And, when the configurable data bit is in a second state, the logic circuit can be configured to: provide commands to the second command bus for accessing the memory system via the second cache, when the execution type is the first type; and provide commands to the second command bus for accessing the memory system via the first cache, when the execution type is the second type.
Some embodiments can include a system including a processor, a memory system, and a cache system coupled between the processor and the memory system. The cache system of the system can include a plurality of caches including a first cache and a second cache. The cache system of the system can also include a connection to a command bus coupled between the cache system and the processor, a connection to an address bus coupled between the cache system and the processor, a connection to a data bus coupled between the cache system and the processor, and a connection to an execution-type signal line from the processor identifying an execution type.
The cache system of the system can also include a configurable data bit and a logic circuit coupled to the processor to control the first cache and the second cache based on the configurable data bit. When the configurable data bit is in a first state, the logic circuit can be configured to: implement commands received from the command bus for accessing the memory system via the first cache, when the execution type is a first type; and implement commands received from the command bus for accessing the memory system via the second cache, when the execution type is a second type. And, when the configurable data bit is in a second state, the logic circuit can be configured to: implement commands received from the command bus for accessing the memory system via the second cache, when the execution type is the first type; and implement commands received from the command bus for accessing the memory system via the first cache, when the execution type is the second type.
In such a system, the first type can be configured to indicate non-speculative execution of instructions by the processor, and the second type can be configured to indicate speculative execution of instructions by the processor. Also, the cache system of the system can further include a connection to a speculation-status signal line from the processor identifying a status of a speculative execution of instructions by the processor. The connection to the speculation-status signal line can be configured to receive the status of a speculative execution, and the status of a speculative execution can indicate that a result of a speculative execution is to be accepted or rejected. When the execution type changes from the second type (speculative type) to the first type (non-speculative type), the logic circuit can be configured to toggle the configurable data bit, if the status of speculative execution indicates that a result of speculative execution is to be accepted. And, when the execution type changes from the second type (speculative type) to the first type (non-speculative type), the logic circuit can also be configured to maintain the configurable data bit without changes, if the status of speculative execution indicates that a result of speculative execution is to be rejected.
The cache system 600 is shown including a connection 604a to a command bus 605a coupled between the cache system and the processor 601. The cache system 600 is shown including a connection 604b to an address bus 605b coupled between the cache system and the processor 601. Addresses 102a, 102b, 102c, 102d, and 102e depicted in
Also, as shown in
The cache system 600 also includes a plurality of cache sets (e.g., see cache sets 610a, 610b, and 610c). The caches sets can include a first cache set (e.g., see cache set 610a) and a second cache set (e.g., see cache set 610b).
Also, as shown in
As shown in
Cache set 1 (in a conventional sense) may or may not communicate with its register 1 depending on the embodiment. Broken lines are also shown in FIGS.
In some embodiments, the logic circuit 606 can be coupled to the processor 601 to control the plurality of cache sets (e.g., cache sets 610a, 610b, and 610c) according to the plurality of registers (e.g., registers 612a, 612b, and 612c). In such embodiments, the cache system 600 can be configured to be coupled between the processor 601 and a memory system 603. And, when the connection 604b to the address bus 605b receives a memory address from the processor 601, the logic circuit 606 can be configured to generate a set index from at least the memory address and determine whether the generated set index matches with content stored in the first register (e.g., register 612a) or with content stored in the second register (e.g., register 612b). The logic circuit 606 can also be configured to implement a command received in the connection 604a to the command bus 605a via the first cache set (e.g., cache set 610a) in response to the generated set index matching with the content stored in the first register (e.g., register 612a) and via the second cache set (e.g., cache set 610b) in response to the generated set index matching with the content stored in the second register (e.g., register 612b).
In some embodiments, the cache system 600 can include a first cache (e.g., see cache 602a) and a second cache (e.g., see cache 602b). In such embodiments, as shown in
In some embodiments, in response to a determination that a data set of the memory system 603 associated with the memory address is not currently cached in the cache system 600 (such as not cached in cache 602a of the system), the logic circuit 606 is configured to allocate the first cache set (e.g., cache set 610a) for caching the data set and store the generated set index in the first register (e.g., register 612a). In such embodiments and others, the cache system can include a connection to an execution-type signal line (e.g., connection 604d to execution-type signal line 605) from the processor (e.g., processor 601) identifying an execution type. And, in such embodiments and others, the generated set index is generated further based on a type identified by the execution-type signal line. Also, the generated set index can include a predetermined segment of bits in the memory address and a bit representing the type identified by the execution-type signal line 605d.
Also, when the first and second registers (e.g., registers 612a and 612b) are in a first state, the logic circuit 606 can be configured to implement commands received from the command bus 605a for accessing the memory system 601 via the first cache set (e.g., cache set 610a), when the execution type is a first type. Also, when the first and second registers (e.g., registers 612a and 612b) are in a first state, the logic circuit 606 can be configured to implement commands received from the command bus 605a for accessing the memory system 601 via the second cache set (e.g., cache set 610b), when the execution type is a second type.
Furthermore, when the first and second registers (e.g., registers 612a and 612b) are in a second state, the logic circuit 606 can be configured to implement commands received from the command bus 605a for accessing the memory system 601 via another cache set of the plurality of cache sets besides the first cache set (e.g., cache set 610b or 610c), when the execution type is the first type. Also, when the first and second registers (e.g., registers 612a and 612b) are in a second state, the logic circuit 606 can be configured to implement commands received from the command bus 605a for accessing the memory system 601 via another other cache set of the plurality of cache sets besides the second cache set (e.g., cache set 610a or 610c or another cache set not depicted in
In some embodiments, each one of the plurality of registers (e.g., see registers 612a, 612b, and 612c) can be configured to store a set index, and when the execution type changes from the second type to the first type (e.g., from the non-speculative type to the speculative type of execution), the logic circuit 606 can be configured to change the content stored in the first register (e.g., register 612a) and the content stored in the second register (e.g., register 612b). Examples of the change of the content stored in the first register (e.g., register 612a) and the content stored in the second register (e.g., register 612b) are illustrated in
Each of
Not shown in
As illustrated by
Specifically, as shown in
Specifically, as shown in
Specifically, as shown in
Specifically, as shown in
Specifically, as shown in
Specifically, as shown in
In some embodiments implemented through the cache system illustrated in
Also, in some embodiments implemented through the cache system illustrated in
Also, in such embodiments, when the first and second registers are in a first state, the logic circuit can be configured to: implement commands received from the command bus for accessing the memory system via the first cache set, when an execution type of a processor is a first type; and implement commands received from the command bus for accessing the memory system via the second cache set, when the execution type is a second type. Also, when the first and second registers are in a second state, the logic circuit can be configured to: implement commands received from the command bus for accessing the memory system via another cache set of the plurality of cache sets besides the first cache set, when the execution type is the first type; and implement commands received from the command bus for accessing the memory system via another other cache set of the plurality of cache sets besides the second cache set, when the execution type is the second type. In such an example, each one of the plurality of registers can be configured to store a set index, and when the execution type changes from the second type to the first type, the logic circuit can be configured to change the content stored in the first register and the content stored in the second register.
Similarly, the cache system 1000 is shown including connection 604a to command bus 605a coupled between the cache system and the processor 1001. The system 1000 also includes connection 604b to an address bus 605b coupled between the cache system and the processor 1001. Addresses 102a, 102b, 102c, 102d, and 102e depicted in
Similarly, the cache system 1000 is also shown including logic circuit 1006 which can be similar to logic circuit 606 but for its circuitry coupled to the connection 1002 to the speculation-status signal line 1004.
In some embodiments, the logic circuit 1006 can be coupled to the processor 1001 to control the plurality of cache sets (e.g., cache sets 610a, 610b, and 610c) according to the plurality of registers (e.g., registers 612a, 612b, and 612c). Each one of the plurality of registers (e.g., see registers 612a, 612b, and 612c) can be configured to store a set index.
In such embodiments, the cache system 1000 can be configured to be coupled between the processor 1001 and a memory system 603. And, when the connection 604b to the address bus 605b receives a memory address from the processor 1001, the logic circuit 1006 can be configured to generate a set index from at least the memory address and determine whether the generated set index matches with content stored in the first register (e.g., register 612a) or with content stored in the second register (e.g., register 612b). The logic circuit 1006 can also be configured to implement a command received in the connection 604a to the command bus 605a via the first cache set (e.g., cache set 610a) in response to the generated set index matching with the content stored in the first register (e.g., register 612a) and via the second cache set (e.g., cache set 610b) in response to the generated set index matching with the content stored in the second register (e.g., register 612b).
Also, the cache system 1000 is shown including connections 608a, 608b, and 608c, which are similar to the corresponding connections shown in
Further, when the first and second registers (e.g., registers 612a and 612b) are in a second state, the logic circuit 606 or 1006 can be configured to provide commands to the second command bus 609a for accessing the memory system 603 via a cache set other than the first cache set (e.g., cache set 610b or 610c or another cache set not depicted in
In some embodiments, such as shown in
In such embodiments, each one of the plurality of registers (e.g., registers 612a, 612b, and 612c) can be configured to store a set index, and when the execution type changes from the speculative execution type to the non-speculative type, the logic circuit 1006 can be configured to change the content stored in the first register (e.g., register 612a) and the content stored in the second register (e.g., register 612b), if the status of speculative type of execution indicates that a result of the speculative execution is to be accepted. And, when the execution type changes from the speculative type to the non-speculative type, the logic circuit 1006 can be configured to maintain the content stored in the first register and the content stored in the second register without changes, if the status of speculative type of execution indicates that a result of the speculative type of execution is to be rejected.
Some embodiments can include a cache system that includes a plurality of cache sets having at least a first cache set and a second cache set. The cache system can also include a plurality of registers associated with the plurality of cache sets respectively. The plurality of registers can include at least a first register associated with the first cache set, configured to store a set index, and a second register associated with the second cache set, configured to store a set index. The cache system can also include a connection to a command bus coupled between the cache system and a processor, a connection to an address bus coupled between the cache system and the processor, a connection to a data bus coupled between the cache system and the processor, and a connection to an execution-type signal line from the processor identifying an execution type.
The cache system can also include a logic circuit coupled to the processor to control the plurality of cache sets according to the plurality of registers. And, the cache system can be configured to be coupled between the processor and a memory system. When the first and second registers are in a first state, the logic circuit can be configured to: implement commands received from the command bus for accessing the memory system via the first cache set, when the execution type is a first type; and implement commands received from the command bus for accessing the memory system via the second cache set, when the execution type is a second type. Also, when the first and second registers are in a second state, the logic circuit can be configured to: implement commands received from the command bus for accessing the memory system via another cache set of the plurality of cache sets besides the first cache set, when the execution type is the first type; and implement commands received from the command bus for accessing the memory system via another other cache set of the plurality of cache sets besides the second cache set, when the execution type is the second type.
The connection to the address bus can be configured to receive a memory address from the processor, and the memory address can include a set index.
In some embodiments, when the first and second registers are in a first state, a first set index associated with the first cache set is stored in the first register, and a second set index associated with the second cache set is stored in the second register. When the first and second registers are in a second state, the first set index can be stored in another register of the plurality of registers besides the first register, and the second set index can be stored in another register of the plurality of registers besides the second register. In such examples, when the connection to the address bus receives a memory address from the processor, the logic circuit can be configured to: generate a set index from at least the memory address; and determine whether the generated set index matches with content stored in the first register or with content stored in the second register. And, the logic circuit can be further configured to implement a command received in the connection to the command bus via the first cache set in response to the generated set index matching with the content stored in the first register and via the second cache set in response to the generated set index matching with the content stored in the second register.
In response to a determination that a data set of the memory system associated with the memory address is not currently cached in the cache system, the logic circuit can be configured to allocate the first cache set for caching the data set and store the generated set index in the first register.
In some embodiments, the generated set index is generated further based on an execution type identified by the execution-type signal line. In such examples, the generated set index can include a predetermined segment of bits in the memory address and a bit representing the execution type identified by the execution-type signal line.
Some embodiments can include a system, including a processor, a memory system, and a cache system. The cache system can include a plurality of cache sets, including a first cache set and a second cache set, and a plurality of registers associated with the plurality of cache sets respectively, including a first register associated with the first cache set and a second register associated with the second cache set. The cache system can also include a connection to a command bus coupled between the cache system and the processor, a connection to an address bus coupled between the cache system and the processor, and a connection to a data bus coupled between the cache system and the processor.
The cache system can also include a logic circuit coupled to the processor to control the plurality of cache sets according to the plurality of registers. When the connection to the address bus receives a memory address from the processor, the logic circuit can be configured to: generate a set index from at least the memory address; and determine whether the generated set index matches with content stored in the first register or with content stored in the second register. And, the logic circuit can be configured to implement a command received in the connection to the command bus via the first cache set in response to the generated set index matching with the content stored in the first register and via the second cache set in response to the generated set index matching with the content stored in the second register.
The cache system can further include a connection to an execution-type signal line from the processor identifying an execution type. The generated set index can be generated further based on a type identified by the execution-type signal line. The generated set index can include a predetermined segment of bits in the memory address and a bit representing the type identified by the execution-type signal line.
In some embodiments, if a program references a variable during normal execution, the variable can be cached. In such examples, if during speculation the variable is referenced in a write-through cache, the value in main memory is valid and correct. If during speculation the variable is referenced in a write-back cache, then the aforesaid examples features described for
In the scenario illustrated in
In preparation of the cache B 1226 for use as a shadow cache in the speculative execution of a second set of instructions, the background syncing circuitry 1102 copies the cached content from cache 1124 to cache 1126 in syncing 1130. At least part of the copying operations can be performed in the background in a way independent from the processor accessing the memory via the cache system. For example, when the processor is accessing a first memory address in the non-speculative execution of the first set of instructions, the background syncing circuitry 1102 can copy the content cached in the cache 1124 for a second memory address into the cache 1126. In some instances, the copying operations can be performed in the background in parallel with the accessing the memory via the cache system. For example, when the processor is accessing a first memory address in the non-speculative execution of the first set of instructions to store a computation result, the background syncing circuitry can copy the computation result into the cache 1126 as cache content for the first memory address.
In one implementation, the background syncing circuitry 1102 is configured to complete the syncing operation before the cache 1126 is allowed to be used in the speculative execution of the second set of instructions. Thus, when the cache 1126 is enabled to be used for the speculative execution of the second set of instructions, the valid content in the cache 1124 can also be found in cache 1126. However, the syncing operation can delay the use of the cache 1126 as the shadow cache. Alternatively, the background syncing circuitry 1102 is configured to prioritize the syncing of dirty content from the cache 1124 to the cache 1126. Dirty content can be where the data in the cache has been modified and the data in main memory has not be modified.
Dirty content cached in the cache 1124 can be more up to date than the content stored in corresponding one or more addresses in the memory. For example, when the processor stores a computation result at an address, the cache 1124 can cache the computation result for the address without immediately writing the computation result into the memory at the address. When the computation result is written back to the memory at the address, the cached content is no longer considered dirty. The cache 1124 stores data to track the dirty content cached in cache 1124. The background syncing circuit 1102 can automatically copy the dirty content from cache 1124 to cache 1126 in preparation of cache 1126 to serve as a shadow cache.
Optionally, before the completion of the syncing operations, the background syncing circuitry 1102 can allow the cache 1126 to function as a shadow cache in conditional speculative execution of the second set of instructions. During the time period in which the cache 1126 is used in the speculative execution as a shadow cache, the background syncing circuit 1102 can continue the syncing operation 1130 of copying cached content from cache 1124 to cache 1126. The background syncing circuitry 1102 is configured to complete at least the syncing of the dirty content from the cache 1124 to cache 1126 before allowing the cache 1126 to be accepted as the main cache. For example, upon the indication that the execution of the second set of instructions is required, the background syncing circuitry 1102 determines whether the dirty content in the cache 1124 has been synced to the cache 1126; and if not, the use of the cache 1126 as main cache is postponed until the syncing is complete.
In some implementations, the background syncing circuitry 1102 can continue its syncing operation even after the cache 1126 is accepted as the main cache, but before the cache 1124 is used as a shadow cache in conditional speculative execution of a third set of instructions.
Before the completion of the syncing operation 1130, the cache system can configure the cache 1124 as a secondary cache between the cache 1126 and the memory during the speculative execution, such that when the content of a memory address is not found in cache 1126, the cache system checks cache 1124 to determine whether the content is in cache 1124; and if so, the content is copied from cache 1124 to cache 1126 (instead of being loaded from the memory directly). When the processor stores data at a memory address and the data is cached in cache 1126, the cache system checks invalidates the content that is cached in the cache 1124 as a secondary cache.
After the cache 1126 is reconfigured as the main cache following the acceptance of the result of the speculative execution of the second set of instructions, the background syncing circuitry 1102 can start to synchronize 1132 the cached content from the cache 1126 to the cache 1124, as illustrated in
Following the speculative execution of the second set of instructions, if the speculative status from the processor indicates that the results of the execution of the second set of instructions should be rejected, the cache 1124 remains to function as the main cache; and the content in the cache 1126 can be invalidated. The invalidation can include the cache 1126 has all its entries marked empty; thus, any subsequent speculations begin with an empty speculative cache.
The background syncing circuitry 1102 can again synchronize 1130 the cached content from the cache 1124 to the cache 1126 in preparation of the speculative execution of the third set of instructions.
In some embodiments, each of the cache 1124 and cache 1126 has a dedicated and fixed collection of cache sets; and a configurable bit is used to control use of the caches 1124 and 1126 as main cache and shadow cache respectively, as illustrated in
In other embodiments, cache 1124 and cache 1126 can share a pool of cache sets, some of the cache sets can be dynamically allocated to cache 1124 and cache 1126, as illustrated in
As shown in
At operation 1204, the cache system determines whether the current execution type is changed from non-speculative to speculative. For example, when the processor accesses the memory via the cache system 200, the processor further provides the indication of whether the current memory access is associated with conditional speculative execution. For example, the indication can be provided in a signal line 205d configured to specify execution type.
If the current execution type is not changed from non-speculative to speculative, the cache system services memory access requests from the processor using the first cache as the main cache at operation 1206. When the memory access changes the cached content in the first cache, the background syncing circuitry 1102 can copy the content cached in the first cache to the second cache in operation 1208. For example, the background syncing circuitry 1102 can be part of the logic circuit 206 in
In
Optionally, the background syncing circuitry 1102 is configured to continue copying content cached in the first cache to the second cache to finish syncing at least the dirty content from the first cache to the second cache in operation 1210 before allowing the cache system to service memory requests from the processor during the speculative execution using the second cache in operation 1212.
Optionally, the background syncing circuitry 1102 can continue the syncing operation while the cache system uses the second cache to service memory requests from the processor during the speculative execution in operation 1212.
In operation 1214, the cache system determines whether the current execution type is changed to non-speculative. If the current execution type remains as speculative, the operations 1210 and 1212 can be repeated.
In response to the determination that the current execution type is changed to non-speculative at operation 1214, the cache system determines whether the result of the speculative execution is to be accepted. The result of the speculative execution corresponds to the changes in the cached content in the second cache. For example, the processor 401 can provide an indication of whether the result of the speculative execution should be accepted via speculation-status signal line 404 illustrated in
If, in operation 1216, the cache system determines that the result of the speculative execution is to be rejected, the cache system can discard the cached content currently cached in the second cache in operation 1222 (e.g., discard via setting the invalid bits of cache blocks in the second cache). Subsequently, in operation 1244, the cache system can keep the first cache as main cache and the second cache as shadow cache; and in operation 1208, the background syncing circuitry 1102 can copy the cached content from the first cache to the second cache. When the execution remains non-speculative, operations 1204 to 1208 can be repeated.
If, in operation 1216, the cache system determines that the result of the speculative execution is to be accepted, the background syncing circuitry 1102 is configured to further copying content cached in the first cache to the second cache to complete syncing at least the dirty content from the first cache to the second cache in operation 1218 before allowing the cache system to re-configure first cache as shadow cache. In operation 1220, the cache system configures the first cache as shadow cache and the second cache as main cache, in a way somewhat similar to the operation 1202. In configuring the first cache as shadow cache, the cache system can invalidate its content and then synchronize the cached content in the second cache to the first cache, in a way somewhat similar to the operations 1222, 1224, 1208, and 1204.
For example, when dedicated caches with fixed hardware structures are used as the first cache and the second cache, a configurable bit can be changed to configure the first cache as shadow cache and the second cache as main cache in operation 1220. Alternatively, when cache sets can be allocated from a pool of cache sets using registers to from the first cache and the second cache, in a way as illustrated in
In this specification, the disclosure has been described with reference to specific exemplary embodiments thereof. However, it will be evident that various modifications can be made thereto without departing from the broader spirit and scope as set forth in the following claims. The specification and drawings are, accordingly, to be regarded in an illustrative sense rather than a restrictive sense.
For example, embodiments can include a cache system, including: a first cache; a second cache; a connection to a command bus coupled between the cache system and a processor; a connection to an address bus coupled between the cache system and the processor; a connection to a data bus coupled between the cache system and the processor; a connection to an execution-type signal line from the processor identifying an execution type; and a logic circuit coupled to control the first cache and the second cache according to the execution type. In such embodiments, the cache system is configured to be coupled between the processor and a memory system. Also, when the execution type is a first type indicating non-speculative execution of instructions by the processor and the first cache is configured to service commands from the command bus for accessing the memory system, the logic circuit is configured to copy a portion of content cached in the first cache to the second cache.
In such embodiments, the logic circuit can be configured to copy the portion of content cached in the first cache to the second cache independent of a current command received in the command bus.
Also, when the execution type is the first type indicating non-speculative execution of instructions by the processor and the first cache is configured to service commands from the command bus for accessing the memory system, the logic circuit can be configured to service subsequent commands from the command bus using the second cache in response to the execution type being changed from the first type to a second type indicating speculative execution of instructions by the processor. The logic circuit can also be configured to complete synchronization of the portion of the content from the first cache to the second cache before servicing the subsequent commands after the execution type is changed from the first type to the second type. The logic circuit can also be configured to continue synchronization of the portion of the content from the first cache to the second cache while servicing the subsequent commands.
In such embodiments, the cache system can further include: a configurable data bit, and the logic circuit is further coupled to control the first cache and the second cache according to the configurable data bit. When the configurable data bit is in a first state, the logic circuit can be configured to: implement commands received from the command bus for accessing the memory system via the first cache, when the execution type is the first type; and implement commands received from the command bus for accessing the memory system via the second cache, when the execution type is a second type. And, when the configurable data bit is in a second state, the logic circuit can be configured to: implement commands received from the command bus for accessing the memory system via the second cache, when the execution type is the first type; and implement commands received from the command bus for accessing the memory system via the first cache, when the execution type is the second type. When the execution type changes from the second type to the first type, the logic circuit can also be configured to toggle the configurable data bit.
In such embodiments, the cache system can further include: a connection to a speculation-status signal line from the processor identifying a status of a speculative execution of instructions by the processor. The connection to the speculation-status signal line is configured to receive the status of a speculative execution. The status of a speculative execution indicates that a result of a speculative execution is to be accepted or rejected. When the execution type changes from the second type to the first type, the logic circuit can be configured to: toggle the configurable data bit, if the status of speculative execution indicates that a result of speculative execution is to be accepted; and maintain the configurable data bit without changes, if the status of speculative execution indicates that a result of speculative execution is to be rejected.
Also, in such embodiments, the first cache and the second cache together include: a plurality of cache sets, including a first cache set and a second cache set; and a plurality of registers associated with the plurality of cache sets respectively, including a first register associated with the first cache set and a second register associated with the second cache set. In such examples, the logic circuit can be further coupled to control the plurality of cache sets according to the plurality of registers. Also, when the connection to the address bus receives a memory address from the processor, the logic circuit can be configured to: generate a set index from at least the memory address; and determine whether the generated set index matches with content stored in the first register or with content stored in the second register. The logic circuit can also be configured to implement a command received in the connection to the command bus via the first cache set in response to the generated set index matching with the content stored in the first register and via the second cache set in response to the generated set index matching with the content stored in the second register. Furthermore, in response to a determination that a data set of the memory system associated with the memory address is not currently cached in the cache system, the logic circuit can be configured to allocate the first cache set for caching the data set and store the generated set index in the first register.
Additionally, in such embodiments having cache sets, the cache system can also include a connection to an execution-type signal line from the processor identifying an execution type, and the generated set index is generated further based on a type identified by the execution-type signal line. The generated set index can include a predetermined segment of bits in the memory address and a bit representing the type identified by the execution-type signal line. Also, when the first and second registers are in a first state, the logic circuit can be configured to: implement commands received from the command bus for accessing the memory system via the first cache set, when the execution type is a first type; and implement commands received from the command bus for accessing the memory system via the second cache set, when the execution type is a second type. And, when the first and second registers are in a second state, the logic circuit is configured to: implement commands received from the command bus for accessing the memory system via another cache set of the plurality of cache sets besides the first cache set, when the execution type is the first type; and implement commands received from the command bus for accessing the memory system via another other cache set of the plurality of cache sets besides the second cache set, when the execution type is the second type.
In such embodiments having cache sets, each one of the plurality of registers can be configured to store a set index. And, when the execution type changes from the second type to the first type, the logic circuit can be configured to change the content stored in the first register and the content stored in the second register. Also, the first type can be configured to indicate non-speculative execution of instructions by the processor and the second type can be configured to indicate speculative execution of instructions by the processor. In such examples, the cache system can further include a connection to a speculation-status signal line from the processor identifying a status of a speculative execution of instructions by the processor. The connection to the speculation-status signal line is configured to receive the status of a speculative execution, and the status of a speculative execution indicates that a result of a speculative execution is to be accepted or rejected. When the execution type changes from the second type to the first type, the logic circuit can be configured to: change the content stored in the first register and the content stored in the second register, if the status of speculative execution indicates that a result of speculative execution is to be accepted; and maintain the content stored in the first register and the content stored in the second register without changes, if the status of speculative execution indicates that a result of speculative execution is to be rejected.
Also, for example, embodiments can include a cache system, including: in general, a plurality of cache sets and a plurality of registers associated with the plurality of cache sets respectively. The plurality of cache sets includes a first cache set and a second cache set, and the plurality of registers includes a first register associated with the first cache set and a second register associated with the second cache set. Similarly, in such embodiments, the cache system can include a connection to a command bus coupled between the cache system and a processor, a connection to an address bus coupled between the cache system and the processor, a connection to a data bus coupled between the cache system and the processor, a connection to an execution-type signal line from the processor identifying an execution type, and a logic circuit coupled to control the plurality of cache sets according to the execution type. The cache system can also be configured to be coupled between the processor and a memory system. And, when the execution type is a first type indicating non-speculative execution of instructions by the processor and the first cache set is configured to service commands from the command bus for accessing the memory system, the logic circuit can be configured to copy a portion of content cached in the first cache set to the second cache set.
In such embodiments with cache sets, the logic circuit can be configured to copy the portion of content cached in the first cache set to the second cache set independent of a current command received in the command bus. When the execution type is the first type indicating non-speculative execution of instructions by the processor and the first cache set is configured to service commands from the command bus for accessing the memory system, the logic circuit can be configured to service subsequent commands from the command bus using the second cache set in response to the execution type being changed from the first type to a second type indicating speculative execution of instructions by the processor. The logic circuit can also be configured to complete synchronization of the portion of the content from the first cache set to the second cache set before servicing the subsequent commands after the execution type is changed from the first type to the second type. The logic circuit can also be configured to continue synchronization of the portion of the content from the first cache set to the second cache set while servicing the subsequent commands.
Also, in such embodiments with cache sets, the logic circuit can be further coupled to control the plurality of cache sets according to the plurality of registers. When the connection to the address bus receives a memory address from the processor, the logic circuit can be configured to: generate a set index from at least the memory address; and determine whether the generated set index matches with content stored in the first register or with content stored in the second register. The logic circuit can also be configured to implement a command received in the connection to the command bus via the first cache set in response to the generated set index matching with the content stored in the first register and via the second cache set in response to the generated set index matching with the content stored in the second register. Also, in response to a determination that a data set of the memory system associated with the memory address is not currently cached in the cache system, the logic circuit can be configured to allocate the first cache set for caching the data set and store the generated set index in the first register.
Additionally, in such embodiments with cache sets, the cache system can further include a connection to an execution-type signal line from the processor identifying an execution type, and the generated set index can be generated further based on a type identified by the execution-type signal line. The generated set index can include a predetermined segment of bits in the memory address and a bit representing the type identified by the execution-type signal line. When the first and second registers are in a first state, the logic circuit can be configured to: implement commands received from the command bus for accessing the memory system via the first cache set, when the execution type is a first type; and implement commands received from the command bus for accessing the memory system via the second cache set, when the execution type is a second type. And, when the first and second registers are in a second state, the logic circuit can be configured to: implement commands received from the command bus for accessing the memory system via another cache set of the plurality of cache sets besides the first cache set, when the execution type is the first type; and implement commands received from the command bus for accessing the memory system via another other cache set of the plurality of cache sets besides the second cache set, when the execution type is the second type.
In such embodiments with cache sets, each one of the plurality of registers is configured to store a set index, and when the execution type changes from the second type to the first type, the logic circuit can be configured to change the content stored in the first register and the content stored in the second register. Also, the first type can be configured to indicate non-speculative execution of instructions by the processor and the second type is configured to indicate speculative execution of instructions by the processor.
In such embodiments with cache sets, the cache system can also include a connection to a speculation-status signal line from the processor identifying a status of a speculative execution of instructions by the processor. The connection to the speculation-status signal line is configured to receive the status of a speculative execution, and the status of a speculative execution indicates that a result of a speculative execution is to be accepted or rejected. When the execution type changes from the second type to the first type, the logic circuit can be configured to: change the content stored in the first register and the content stored in the second register, if the status of speculative execution indicates that a result of speculative execution is to be accepted; and maintain the content stored in the first register and the content stored in the second register without changes, if the status of speculative execution indicates that a result of speculative execution is to be rejected.
Also, in such embodiments with cache sets, the cache sets can be divided amongst a plurality of caches within the cache system. For instance, the cache sets can be divided up amongst first and second caches of the plurality of caches.
In addition to using a shadow cache for securing speculative executions, as well as synchronizing content between a main cache and the shadow cache to save the content cached in the main cache in preparation of acceptance of the content in the shadow cache, a spare cache set can be used to accelerate the speculative executions (e.g., see the spare cache set 1310d as depicted in
Once a speculation is confirmed, the content of the cache sets used as a shadow cache is confirmed to be valid and up-to-date; and thus, the former cache sets used as the shadow cache for speculative execution are used for normal execution. For example, see the cache set 1310c as depicted in
In some embodiments, where the cache system has background syncing circuitry (e.g., see background synching circuitry 1102), if the syncing from a cache set in the normal cache to a corresponding cache set in the shadow cache has not yet been completed (e.g., see syncing 1130 shown in
Also, for example, the speculative execution may reference a memory region in the memory system (e.g., see memory system 603 in
Additionally, a system using a spare cache set (e.g., see the spare cache set 1310d as depicted in
In
The cache system 1000 is also shown including logic circuit 1006 that can be configured to allocate a first subset of the cache sets (e.g., see cache 602a as shown in
The logic circuit 1006 can also be configured to reconfigure the second subset for caching in caching operations (e.g., see cache 602b as shown in
In some embodiments, a cache system can include one or more mapping tables that can map the cache sets mentioned herein. And, in such embodiments, a logic circuit, such as the logic circuits mentioned herein, can be configured to allocate and reconfigure subsets of cache sets, such as caches in a cache system, according to the one or more mapping tables. The map can be an alternative to the cache set registers described herein or used in addition to such registers.
In some embodiments, as shown in at least
Also, in some embodiments, as shown in
Also, in such embodiments, the logic circuit 1006 can be configured to generate a set index (e.g., see set indexes 1504a, 1504b, 1504c, and 1504d) based on a memory address received from address bus 605b, from processor 1001 and an identification of speculative execution or non-speculative execution received from execution-type signal line 605d from the processor identifying execution type. And, the logic circuit 1006 can be configured to determine whether the set index matches with content stored in the first cache set register, the second cache set register, or the third cache set register.
Also, in such embodiments, the logic circuit 1006 can be configured to store the first cache set index in the second cache set register or another cache set register associated with another cache set in the second subset of the plurality of cache sets, so that the second cache set or the other cache set in the second subset is used for non-speculative execution, when the execution type changes from the second type to the first type and the status of speculative execution indicates that a result of speculative execution is to be accepted. For example, see
Specifically,
As shown by
In such embodiments, when the connection 604b to the address bus 605b receives a memory address from the processor 1001, the logic circuit 1006 can be configured to generate a set index from at least the memory address 102b according to this cache set index 112b of the address (e.g., see set index generations 1506a, 1506b, 1506c, and 1506d, which generate set indexes 1504a, 1504b, 1504c, and 1504d respectively). Also, when the connection 604b to the address bus 605b receives a memory address from the processor 1001, the logic circuit 1006 can be configured to determine whether the generated set index matches with content stored in one of the registers (which can be stored set index 1504a, 1504b, 1504c, or 1504d). Also, the logic circuit 1006 can be configured to implement a command received in the connection 604a to the command bus 605a via a cache set in response to the generated set index matching with the content stored in the corresponding register. Also, in response to a determination that a data set of the memory system associated with the memory address is not currently cached in the cache system, the logic circuit 1001 can be configured to allocate the cache set for caching the data set and store the generated set index in the corresponding register. The generated set index can include a predetermined segment of bits in the memory address as shown in
Also, in such embodiments, the logic circuit 1006 can be configured to generate a set index (e.g., see set indexes 1504a, 1504b, 1504c, and 1504d) based on a memory address (e.g., memory address 102b) received from address bus 605b, from processor 1001 and an identification of speculative execution or non-speculative execution received from execution-type signal line 605d from the processor identifying execution type. And, the logic circuit 1006 can be configured to determine whether the set index matches with content stored in the cache set register 1312b, the cache set register 1312c, or the cache set register 1312d.
In some embodiments, a cache system can include a plurality of cache sets, a connection to an execution-type signal line from a processor identifying an execution type, a connection to a signal line from the processor identifying a status of speculative execution, and a logic circuit. The logic circuit can be configured to: allocate a first subset of the plurality of cache sets for caching in caching operations when the execution type is a first type indicating non-speculative execution of instructions by the processor, and allocate a second subset of the plurality of cache sets for caching in caching operations when the execution type changes from the first type to a second type indicating speculative execution of instructions by the processor. The logic circuit can also be configured to reserve at least one cache set (or a third subset of the plurality of cache sets) when the execution type is the second type. The logic circuit can also be configured to reconfigure the second subset for caching in caching operations when the execution type is the first type, when the execution type changes from the second type to the first type and the status of speculative execution indicates that a result of speculative execution is to be accepted. And, the logic circuit can also be configured to allocate the at least one cache set (or the third subset of the plurality of cache sets) for caching in caching operations when the execution type changes from the first type to the second type, when the execution type changes from the second type to the first type and the status of speculative execution indicates that a result of speculative execution is to be accepted.
In such embodiments, the logic circuit can be configured to reserve the at least one cache set (or the third subset of the plurality of cache sets) when the execution type is the second type and the at least one cache set (or the third subset of the plurality of cache sets) includes a least used cache set in the plurality of cache sets.
Also, in such embodiments, the cache system can include one or more mapping tables mapping the plurality of cache sets. In such an example, the logic circuit is configured to allocate and reconfigure subsets of the plurality of cache sets according to the one or more mapping tables.
Also, in such embodiments, the cache system can include a plurality of cache set registers associated with the plurality of cache sets, respectively. In such an example, the logic circuit is configured to allocate and reconfigure subsets of the plurality of cache sets according to the plurality of cache set registers. In such an example, the first subset of the plurality of cache sets can include a first cache set, the second subset of the plurality of cache sets can include a second cache set, and the at least one cache set (or the third subset of the plurality of cache sets) can include a third cache set. Also, the plurality of cache set registers can include a first cache set register associated with the first cache set, configured to store a first cache set index initially so that the first cache set is used for non-speculative execution. The plurality of cache set registers can also include a second cache set register associated with the second cache set, configured to store a second cache set index initially so that the second cache set is used for speculative execution. The plurality of cache set registers can also include a third cache set register associated with the third cache set, configured to store a third cache set index initially so that the third cache set is used as a spare cache set.
In such embodiments, the logic circuit can be configured to generate a set index based on a memory address received from an address bus from a processor and identification of speculative execution or non-speculative execution received from an execution-type signal line from the processor identifying execution type. And, the logic circuit can be configured to determine whether the set index matches with content stored in the first cache set register, the second cache set register, or the third cache set register. When the execution type changes from the second type to the first type and the status of speculative execution indicates that a result of speculative execution is to be accepted, the logic circuit can also be configured to store the first cache set index in the second cache set register or another cache set register associated with another cache set in the second subset of the plurality of cache sets, so that the second cache set or the other cache set in the second subset is used for non-speculative execution. When the execution type changes from the second type to the first type and the status of speculative execution indicates that a result of speculative execution is to be accepted, the logic circuit can also be configured to store the second cache set index in the third cache set register or another cache set register associated with another cache set in the at least one cache set (or the third subset of the plurality of cache sets), so that the third cache set or the other cache set in the at least one cache set (or the third subset of the plurality of cache sets) is used for speculative execution. When the execution type changes from the second type to the first type and the status of speculative execution indicates that a result of speculative execution is to be accepted, the logic circuit can also be configured to store the third cache set index in the first cache set register or another cache set register associated with another cache set in the first subset of the plurality of cache sets, so that the first cache set or the other cache set in the first subset is used as a spare cache set.
In some embodiments, a cache system can include a plurality of cache sets having a first subset of cache sets, a second subset of cache sets, and a third subset of cache sets. The cache system can also include a connection to an execution-type signal line from a processor identifying an execution type, a connection to a signal line from the processor identifying a status of speculative execution, and a logic circuit. The logic circuit can be configured to allocate the first subset of the plurality of cache sets for caching in caching operations when the execution type is a first type indicating non-speculative execution of instructions by the processor and allocate the second subset of the plurality of cache sets for caching in caching operations when the execution type changes from the first type to a second type indicating speculative execution of instructions by the processor. The logic circuit can also be configured to reserve the third subset of the plurality of cache sets when the execution type is the second type. The logic circuit can also be configured to reconfigure the second subset for caching in caching operations when the execution type is the first type, when the execution type changes from the second type to the first type and the status of speculative execution indicates that a result of speculative execution is to be accepted. The logic circuit can also be configured to allocate the third subset for caching in caching operations when the execution type changes from the first type to the second type, when the execution type changes from the second type to the first type and the status of speculative execution indicates that a result of speculative execution is to be accepted.
In some embodiments, a cache system can include a plurality of caches including a first cache, a second cache, and a third cache. The cache system can also include a connection to an execution-type signal line from a processor identifying an execution type, a connection to a signal line from the processor identifying a status of speculative execution, and a logic circuit. The logic circuit can be configured to allocate the first cache for caching in caching operations when the execution type is a first type indicating non-speculative execution of instructions by the processor and allocate the second cache for caching in caching operations when the execution type changes from the first type to a second type indicating speculative execution of instructions by the processor. The logic circuit can also be configured to reserve the third cache when the execution type is the second type. The logic circuit can also be configured to reconfigure the second cache for caching in caching operations when the execution type is the first type, when the execution type changes from the second type to the first type and the status of speculative execution indicates that a result of speculative execution is to be accepted. And, the logic circuit can also be configured to allocate the third cache for caching in caching operations when the execution type changes from the first type to the second type.
Both examples in
In
Also, as shown in
Also, as shown in
In
With the memory address partitioning, in the examples, the extended tag from the memory address and the execution type (e.g., see extended tags 1650 and 1750) are compared with an extended tag for a cache set (e.g., see extended tags 1640a, 1640b, 1740a, and 1740b) for controlling cache operations implemented via the cache set. The tag compare circuits (e.g., tag compare circuits 1660a, 1660b, 1760a, and 1760b) can output a hit or miss depending on if the extended tags inputted into the compare circuits match or not. The extended tags for the cache sets (e.g., see extended tags 1640a, 1640b, 1740a, and 1740b) can be derived from an execution type (e.g., see the execution types 1632a, 1632b), 1732a, and 1732b) held in a register (e.g., see registers 1612a, 1612b, 1712a, and 1712b) and a block tag (e.g., see tags 1622a, 1622b, 1626a, 1626b, 1722a, 1722b, 1726a, and 1726b) from a first cache set (e.g., see cache sets 1610a, 1610b, 1710a, and 1710b). And, as shown in
In
In other words, since tags 1622a, 1622b, etc., have the same cache set indicator, the indicator could be stored once in a register for the cache set (e.g., see cache set registers 1712a and 1712b). This is one of the benefits of the arrangement depicted in
When the execution type is combined with the cache set index to form an extended cache set index, the extended cache set index can be used to select one of the cache sets. Then, the tag from the selected cache set is compared to the tag in the address to determine hit or miss. The two-stage selection can be similar to a conventional two-stage selection using a cache set index or can be used to be combined with the extended tag to support more efficient interchanging of cache sets for different execution types (such as speculative and non-speculative execution types).
In some embodiments, a cache system (such as the cache system 600 or 1000) can include a plurality of cache sets (such as cache sets 610a to 610c, 1010a to 1010c, 1310a to 1310d, 1610a to 1610b, or 1710a to 1710b). The plurality of cache sets can include a first cache set and a second cache set (e.g., see cache sets 1610a to 1610b and sets 1710a to 1710b). The cache system can also include a plurality of registers associated with the plurality of cache sets respectively (such as registers 612a to 612c, 1012a to 1012c, 1312a to 1312d, 1612a to 1612b, or 1712a to 1712b). The plurality of registers can include a first register associated with the first cache set and a second register associated with the second cache set (e.g., see registers 1612a to 1612b and registers 1712a to 1712b).
The cache system can also include a connection (e.g., see connection 604a) to a command bus (e.g., see command bus 605a) coupled between the cache system and a processor (e.g., see processors 601 and 1001). The cache system can also include a connection (e.g., see connection 604b) to an address bus (e.g., see address bus 605b) coupled between the cache system and the processor.
The cache system can also include a logic circuit (e.g., see logic circuits 606 and 1006) coupled to the processor to control the plurality of cache sets according to the plurality of registers. When the connection to the address bus receives a memory address (e.g., see memory addresses 102a to 102e shown in
The logic circuit (e.g., see logic circuits 606 and 1006) can also be configured to implement a command received in the connection (e.g., see connection 604a) to the command bus (e.g., see command bus 605a) via the first cache set (e.g., see cache sets 1610a and 1710a) in response to the generated extended tag (e.g., see extended tags 1650 and 1750) matching with the first extended tag (e.g., see extended tags 1640a and 1740a) and via the second cache set (e.g., see cache sets 1610b and 1710b) in response to the generated extended tag matching with the second extended tag (e.g., see extended tags 1640b and 1740b).
The logic circuit (e.g., see logic circuits 606 and 1006) can also be configured to generate the first extended tag (e.g., see extended tags 1640a and 1740a) from a cache address (e.g., see the blocks labeled Tag′ in extended tags 1640a and 1740a, as well as the tags 1622a, 1622b, 1722a, 1722b, etc.) of the first cache set (e.g., see cache sets 1610a and 1710a) and content (e.g., see the blocks labeled ‘Execution Type’ in extended tags 1640a and 1740a and the block labeled ‘Cache Set Index’ in extended tag 1740a, as well as execution type 1632a and cache set index 1732a) stored in the first register (e.g., see registers 1612a and 1712a). The logic circuit can also be configured to generate the second extended tag (e.g., see extended tags 1640b and 1740b) from a cache address (e.g., see the blocks labeled ‘Tag’ in extended tags 1640b and 1740b, as well as the tags 1626a, 1626b, 1726a, 1726b, etc.) of the second cache set (e.g., see cache sets 1610b and 1710b) and content (e.g., see the blocks labeled ‘Execution Type’ in extended tags 1640b and 1740b and the block labeled ‘Cache Set Index’ in extended tag 1740b, as well as execution type 1632b and cache set index 1732b) stored in the second register (e.g., see registers 1612b and 1712b).
In some embodiments, the cache system (such as the cache system 600 or 1000) can further include a connection (e.g., see connection 604d) to an execution-type signal line (e.g., see execution-type signal line 605d) from the processor (e.g., see processors 601 and 1001) identifying an execution type. In such embodiments, the logic circuit (e.g., see logic circuits 606 and 1006) can be configured to generate the extended tag (e.g., see extended tags 1650 and 1750) from the memory address (e.g., see memory addresses 102e and 102b shown in
In some embodiments, for the determination of whether the generated extended tag (e.g., see extended tags 1650 and 1750) matches with the first extended tag for the first cache set (e.g., see extended tags 1640a and 1740a) or the second extended tag for the second cache set (e.g., see extended tags 1640b and 1740b), the logic circuit (e.g., see logic circuits 606 and 1006) can be configured to compare the first extended tag (e.g., see extended tags 1640a and 1740a) with the generated extended tag (e.g., see extended tags 1650 and 1750) to determine a cache hit or miss for the first cache set (e.g., see cache sets 1610a and 1710a). Specifically, as shown in
Also, for the determination of whether the generated extended tag matches with the first extended tag for the first cache set or the second extended tag for the second cache set, the logic circuit can be configured to compare the second extended tag (e.g., see extended tags 1640b and 1740b) with the generated extended tag (e.g., see extended tags 1650 and 1750) to determine a cache hit or miss for the second cache set (e.g., see cache sets 1610b and 1710b). Specifically, as shown in
In some embodiments, the logic circuit (e.g., see logic circuits 606 and 1006) can be further configured to receive output from the first cache set (e.g., see cache sets 1610a and 1710a) when the logic circuit determines the generated extended tag (e.g., see extended tags 1640a and 1740a) matches with the first extended tag for the first cache set (e.g., see extended tags 1640a and 1740a). The logic circuit can also be further configured to receive output from the second cache set (e.g., see cache sets 1610b and 1710b) when the logic circuit determines the generated extended tag (e.g., see cache sets 1610a and 1710a) matches with the second extended tag for the second cache set (e.g., see extended tags 1640a and 1740a).
In some embodiments, the cache address of the first cache set includes a first tag (e.g., see tags 1622a, 1622b, 1722a, and 1722b) of a cache block (e.g., see cache block 1624a, 1624b, 1724a, and 1724b) in the first cache set (e.g., see cache sets 1610a and 1710a). In such embodiments, the cache address of the second cache set includes a second tag (e.g., see tags 1626a, 1626b, 1726a, and 1726b) of a cache block (e.g., see cache block 1628a, 1628b, 1728a, and 1728b) in the second cache set (e.g., see cache sets 1610b and 1710b). Also, in such embodiments, in general, the block index is used as an address within individual cache sets. For instance, in such embodiments, the logic circuit (e.g. see logic circuits 606 and 1006) can be configured to use a first block index from the memory address (e.g. see block indexes 106e and 106b from memory addresses 102e and 102b shown in
In some embodiments, such as the embodiments illustrated in
Also, in the embodiments shown in
With the embodiments shown in
In some embodiments, such as the embodiments illustrated in
Also, in the embodiments shown in
In some embodiments, as shown in
In some embodiments, such as embodiments as shown in
Somewhat similarly, in some embodiments, such as embodiments as shown in
In some embodiments, a cache system can include a plurality of cache sets, including a first cache set and a second cache set. The cache system can also include a plurality of registers associated with the plurality of cache sets respectively, including a first register associated with the first cache set and a second register associated with the second cache set. The cache system can further include a connection to a command bus coupled between the cache system and a processor, a connection to an address bus coupled between the cache system and the processor, and a logic circuit coupled to the processor to control the plurality of cache sets according to the plurality of registers. The logic circuit can be configured to generate the first extended tag from a cache address of the first cache set and content stored in the first register, and to generate the second extended tag from a cache address of the second cache set and content stored in the second register. The logic circuit can also be configured to determine whether the first extended tag for the first cache set or the second extended tag for the second cache set matches with a generated extended tag generated from a memory address received from the processor. And, the logic circuit can be configured to implement a command received in the connection to the command bus via the first cache set in response to the generated extended tag matching with the first extended tag and via the second cache set in response to the generated extended tag matching with the second extended tag.
In such embodiments, cache system can also include a connection to an address bus coupled between the cache system and the processor. When the connection to the address bus receives the memory address from the processor, the logic circuit can be configured to generate the extended tag from at least the memory address. Also, the cache system can include a connection to an execution-type signal line from the processor identifying an execution type. In such examples, the logic circuit can be configured to generate the extended tag from the memory address and an execution type identified by the execution-type signal line. Also, the content stored in each of the first register and the second can include an execution type.
Further, for the determination of whether the generated extended tag matches with the first extended tag for the first cache set or the second extended tag for the second cache set, the logic circuit can be configured to: compare the first extended tag with the generated extended tag to determine a cache hit or miss for the first cache set; and compare the second extended tag with the generated extended tag to determine a cache hit or miss for the second cache set. Also, the logic circuit can be configured to: receive output from the first cache set when the logic circuit determines the generated extended tag matches with the first extended tag for the first cache set; and receive output from the second cache set when the logic circuit determines the generated extended tag matches with the second extended tag for the second cache set. In such embodiments and others, the cache address of the first cache set can include a first tag of a cache block in the first cache set, and the cache address of the second cache set can include a second tag of a cache block in the second cache set.
In some embodiments, a cache system can include a plurality of cache sets, including a first cache set and a second cache set. The cache system can also include a plurality of registers associated with the plurality of cache sets respectively, including a first register associated with the first cache set and a second register associated with the second cache set. And, the cache system can include a connection to a command bus coupled between the cache system and a processor, a connection to an execution-type signal line from a processor identifying an execution type, a connection to an address bus coupled between the cache system and the processor, and a logic circuit coupled to the processor to control the plurality of cache sets according to the plurality of registers. When the connection to the address bus receives a memory address from the processor, the logic circuit can be configured to: generate an extended tag from the memory address and an execution type identified by the execution-type signal line; and determine whether the generated extended tag matches with a first extended tag for the first cache set or a second extended tag for the second cache set. Also, the logic circuit can be configured to implement a command received in the connection to the command bus via the first cache set in response to the generated extended tag matching with the first extended tag and via the second cache set in response to the generated extended tag matching with the second extended tag.
As shown, the cache system can include a plurality of cache sets (e.g., see cache sets 1810a, 1810b, and 1810c). The plurality of cache sets includes a first cache set (e.g., see cache set 1810a) configured to provide a first physical output (e.g., see physical output 1820a) upon a cache hit and a second cache set (e.g., see cache set 1810b) configured to provide a second physical output (e.g., see physical output 1820b) upon a cache hit. The cache system can also include a connection (e.g., see connection 605a depicted in
Shown in
When the connection (e.g., see connection 605b) to the address bus (e.g., see address bus 605b) receives a memory address (e.g., see memory address 102b) from the processor (e.g., see processors 601 and 1001) and when the control register 1832 is in a first state (shown in
Also, when the connection (e.g., see connection 605b) to the address bus (e.g., see address bus 605b) receives a memory address (e.g., see memory address 102b) from the processor (e.g., see processors 601 and 1001) and when the control register 1832 is in a first state (shown in
When the connection (e.g., see connection 605b) to the address bus (e.g., see address bus 605b) receives a memory address (e.g., see memory address 102b) from the processor (e.g., see processors 601 and 1001) and when the control register 1832 is in a second state (not shown in
Also, when the connection (e.g., see connection 605b) to the address bus (e.g., see address bus 605b) receives a memory address (e.g., see memory address 102b) from the processor (e.g., see processors 601 and 1001) and when the control register 1832 is in the second state (not shown in
In some embodiments, the first logical cache is a normal cache for non-speculative execution by the processor, and the second logical cache is a shadow cache for speculative execution by the processor.
The mapping circuit 1830 solves the problem related to the execution type. Mapping circuit 1830 provides a solution to the how the execution type relates to mapping physical to logical cache sets. If the mapping circuit 1830 is used, a memory address (e.g., see address 102b) can be applied in each cache set (e.g., see cache sets 1810a, 1810b, and 1810c) to generate a physical output (e.g., see physical outputs 1820a, 1820b, and 1820c). The physical output (e.g., see physical outputs 1820a, 1820b, and 1820c) includes the tag and the cache block that are looked up using a block index of the memory address (e.g., see block index 106b). The mapping circuit 1830 can reroute the physical output (e.g., see physical outputs 1820a, 1820b, and 1820c) to one of the logical output (e.g., see logical outputs 1840a, 1840b, and 1840c). The cache system can do a tag compare at the physical output or at the logical output. If the tag compare is done at the physical output, the tag hit or miss of the physical output is routed through the mapping circuit 1830 to generate a hit or miss of the logical output. Otherwise, the tag itself is routed through the mapping circuit 1830; and a tag compare is performed at the logical output to generate the corresponding tag hit or miss result.
As illustrated in
In the embodiment shown in
In particular,
In some embodiments, each of the PLSM registers (e.g., see PLSM registers 1906a and 1906b as well as PLSM registers 2110a, 2110b, and 2110c depicted in
For the case of the PLSM registers 2006a, 2006b, and 2006c depicted in
In some embodiments, selections of physical outputs from cache sets or selections of cache hits or misses are by multiplexors that can be arranged in the system to have at least one multiplexor per type of output and per logic unit or per cache set (e.g., see multiplexors 1904a and 1904b shown in
As shown in
In some embodiments, the contents of the PLSM registers can be received from a control register such as control register 1832 shown in
As shown in
In some embodiments, the contents of the PLSM registers can be received from a control register such as control register 1832 shown in
In some embodiments, block selection can be based on a combination of a block index and a main or shadow setting. Such parameters can control the PLSM registers.
In some embodiments, such as the example shown in
Multiplexor 1904a is controlled by the PLSM register 1906a to provide hit or miss output of cache set 1810a and thus the hit or miss status of the cache set for the main or normal execution, when the cache sets are in a first state. Multiplexor 1904b is controlled by the PLSM register 1906b to provide hit or miss output of cache set 1810b and thus the hit or miss status of the cache set for the speculative execution, when the cache sets are in the first state. On the other hand, multiplexor 1904a is controlled by the PLSM register 1906a to provide hit or miss output of cache set 1810b and thus the hit or miss status of the cache set for the main or normal execution, when the cache sets are in a second state. Multiplexor 1904b is controlled by the PLSM register 1906b to provide hit or miss output of cache set 1810a and thus the hit or miss status of the cache set for the speculative execution, when the cache sets are in the second state.
Similar to the selection of hit or miss signals, the data looked up from the interchangeable caches can be selected to produce one result for the processor (such as if there is a hit), for example see physical outputs 1820a, 1820b, and 1820c shown in
For example, in a first state of the cache sets, when cache set 1810a is used as main cache set and cache set 1810b is used as shadow cache set, the multiplexor 2004a is controlled by the PLSM register 2006a to select the physical output 1820a of cache set 1810a for the main or normal logical cache used for non-speculative executions. Also, for example, in a second state of the cache sets, when cache set 1810b is used as main cache set and cache set 1810a is used as shadow cache set, then the multiplexor 2004a is controlled by the PLSM register 2006a to select the physical output 1820b of cache set 1810b for the main or normal logical cache used for non-speculative executions. In such examples, in the first state of the cache sets, when cache set 1810a is used as main cache set and cache set 1810b is used as shadow cache set, then the multiplexor 2004b is controlled by the PLSM register 2006b to select the physical output 1820b of cache set 1810b for the shadow logical cache used for speculative executions. Also, for example, in the second state of the cache sets, when cache set 1810a is used as main cache set and cache set 1810b is used as shadow cache set, then the multiplexor 2004b is controlled by the PLSM register 2006b to select the physical output 1820a of cache set 1810a for the shadow logical cache used for speculative executions.
In some embodiments, the cache system can further include a plurality of registers (e.g., see register 1812a as shown in
In some embodiments, the mapping circuit (e.g., see mapping circuit 1830) can be a part of or connected to the logic circuit and the state of the control register (e.g., see control register 1832) can control a state of a cache set of the plurality of cache sets. In some embodiments, the state of the control register can control the state of a cache set of the plurality of cache sets by changing a valid bit for each block of the cache set (e.g., see
Also, in some examples, the cache system can further include a connection (e.g., see connection 1002) to a speculation-status signal line (e.g., see speculation-status signal line 1004) from the processor identifying a status of a speculative execution of instructions by the processor. The connection to the speculation-status signal line can be configured to receive the status of a speculative execution, and the status of a speculative execution can indicate that a result of a speculative execution is to be accepted or rejected. When the execution type changes from the speculative execution to a non-speculative execution, the logic circuit (e.g., see logic circuits 606 and 1006) can be configured to change, via the control register (e.g., see control register 1832), the state of the first and second cache sets, if the status of speculative execution indicates that a result of speculative execution is to be accepted. And, when the execution type changes from the speculative execution to a non-speculative execution, the logic circuit can be configured to maintain, via the control register, the state of the first and second cache sets without changes, if the status of speculative execution indicates that a result of speculative execution is to be rejected.
In some embodiments, the mapping circuit (e.g., see mapping circuit 1830) is part of or connected to the logic circuit (e.g., see logic circuits 606 and 1006) and the state of the control register (e.g., see control register 1832) can control a state of a cache register of the plurality of cache registers (e.g., see register 1812a as shown in
The parts depicted in
As shown in
As shown in
In some embodiments, as shown in
Also, as shown in
Also, as shown in
Also, as shown in
Also, as shown in
Also, as shown in
In some embodiments, block selection can be based on a combination of a block index and a main or shadow setting.
In some embodiments, only one address (e.g., tag and index) are fed into the interchangeable cache sets (e.g., cache sets 1810a, 1810b and 1810c). In such embodiments, there is a signal controlling which cache set is updated according to memory if that cache set produces a miss. Similar to the selection of hit or miss signals, the data looked up from the interchangeable caches can be selected to produce one result for the processor (such as if there is a hit). For example, in a first state of the cache sets, if cache set 1810a is used as main cache set and cache set 1810b is used as shadow cache set, then the multiplexor 2110a is controlled by the PLSM register 2108a to select the hit or miss output of cache set 1804a and hit or miss status of the main cache set. And, multiplexor 2110b is controlled by the PLSM register 2108b to provide hit or miss output of cache set 1810b and thus the hit or miss status of the shadow cache set.
In such embodiments, when the cache sets are in a second state, when cache set 1810a is used as shadow cache and cache set 1810b is used as main cache, the multiplexor 2110a can be controlled by the PLSM register 2108b to select the hit or miss output of cache set 1810b and hit or miss status of the main cache. And, multiplexor 2110b can be controlled by the PLSM register 2108b to provide hit or miss output of cache set 1810a and thus the hit or miss status of the shadow cache.
Thus, multiplexor 2110a can output whether the main cache has hit or miss in the cache for the address; and the multiplexor 2110b can output whether a shadow cache has hit or miss in the cache for the same address. Then, depending on whether or not the address is speculative, the one of the output can be selected. When there is a cache miss, the address is used in the memory to load data to a corresponding cache. The PLSM registers can similarly enable the update of the corresponding cache set 1810a or set 1810b.
In some embodiments, in the first state of the cache sets, during speculative execution of a first instruction by the speculative thread, effects of the speculative execution are stored within the second cache set (e.g., cache set 1810b). During the speculative execution of the first instruction, the processor can be configured to assert a signal indicative of the speculative execution which is configured to block changes to the first cache set (e.g., cache set 1810a). When the signal is asserted by the processor, the processor can be further configured to block the second cache set (e.g., cache set 1810b) from updating the memory.
When the state of the cache sets changes to the second state, in response to a determination that execution of the first instruction is to be performed with the main thread, the second cache set (instead of the first cache set) is used with the first instruction. In response to a determination that execution of the first instruction is not to be performed with the main thread, the first cache set is used with the first instruction.
In some embodiments, in the first state, during the speculative execution of first instruction, the processor accesses the memory via the second cache set (e.g., cache set 1810b). And, during the speculative execution of one or more instructions, access to content in the second cache is limited to the speculative execution of the first instruction by the processor. During the speculative execution of the first instruction, the processor can be prohibited from changing the first cache set (e.g., cache set 1810a).
In some embodiments, the content of the first cache set (e.g., cache set 1810a) and/or the second cache set (e.g., cache set 1810b) can be accessible via a cache coherency protocol.
Method 2200 includes, at block 2202, executing, by a processor (e.g. processor 1001), a main thread and a speculative thread. The method 2200, at block 2204, includes providing, in a first cache set of a cache system coupled in between a memory system and the processor (e.g., cache set 1810a as shown in
At block 2207, the method 2200 continues with identifying, such as by the processor, whether a speculation of the speculative thread is successful so that the first plurality of blocks becomes accessible for the speculative thread and blocked for the main thread and so that the second plurality of blocks becomes accessible for the main thread and blocked for the speculative thread. As shown in
At block 2208, the method 200 continues with changing, by the processor solely or in combination with a cache controller, each first valid bit from indicating valid to invalid when a speculation of the speculative thread is successful so that the first plurality of blocks becomes accessible for the speculative thread and blocked for the main thread. Also, at block 2210, the method 200 continues with changing, by the processor solely or in combination with the cache controller, each second valid bit from indicating invalid to valid when a speculation of the speculative thread is successful so that the second plurality of blocks becomes accessible for the main thread and blocked for the speculative thread. Thus, the state of the cache sets does change from the first state to the second state.
In some embodiments, during speculative execution of a first instruction by the speculative thread, effects of the speculative execution are stored within the second cache set. In such embodiments, during the speculative execution of the first instruction, the processor can assert a signal indicative of the speculative execution which can block changes to the first cache. Also, when the signal is asserted by the processor, the processor can block the second cache from updating the memory. This occurs while the cache sets are in the first state.
Also, in such embodiments, in response to a determination that execution of the first instruction is to be performed with the main thread, the second cache set (instead of the first cache set) is used with the first instruction. In response to a determination that execution of the first instruction is not to be performed with the main thread, the first cache is used with the first instruction. This occurs while the cache sets are in the second state.
In some embodiments, during the speculative execution of first instruction, the processor accesses the memory via the second cache. And, during the speculative execution of one or more instructions, access to content in the second cache is limited to the speculative execution of the first instruction by the processor. In such embodiments, during the speculative execution of the first instruction, the processor is prohibited from changing the first cache.
In some embodiments, content of the first cache is accessible via a cache coherency protocol.
In
Method 2300, at block 2302, includes receiving, by a first physical-to-logical-mapping-set-mapping (PLSM) register (e.g., PLSM register 2108a shown in
At block 2306, the method 2300 includes determining, by a first logic unit (e.g., logic unit 2104a depicted in
At block 2310, the method 2300 continues with outputting to the processor, by a first multiplexor (e.g., multiplexor 2110a depicted in
And, at block 2312, outputting to the processor, by a second multiplexor (e.g., multiplexor 2110b), the second hit-or-miss result or the first hit-or-miss result according to the second valid bit received by the second PLSM register. In some embodiments, when the second valid bit received by the second PLSM register indicates valid, the second multiplexor outputs the second hit-or-miss result. And, when the second valid bit received by the second PLSM register indicates invalid, the second multiplexor outputs the first hit-or-miss result.
Some embodiments can include a central processing unit having processing circuitry configured to execute a main thread and a speculative thread. The central processing unit can also include or be connected to a first cache set of a cache system configured to couple in between a main memory and the processing circuitry, having a first plurality of blocks for the main thread. Each block of the first plurality of blocks can include cached data, a first valid bit, and a block address including an index and a tag. The processing circuitry, solely or in combination with a cache controller, can be configured to change each first valid bit from indicating valid to invalid when a speculation of the speculative thread is successful, so that the first plurality of blocks becomes accessible for the speculative thread and blocked for the main thread. The central processing unit can also include or be connected to a second cache set of the cache system coupled in between the main memory and the processing circuitry, including a second plurality of blocks for the speculative thread. Each block of the second plurality of blocks can include cached data, a second valid bit, and a block address having an index and a tag. The processing circuitry, solely or in combination with the cache controller, can be configured to change each second valid bit from indicating invalid to valid when a speculation of the speculative thread is successful, so that the second plurality of blocks becomes accessible for the main thread and blocked for the speculative thread. And, a block of the first plurality of blocks corresponds to a respective block of the second plurality blocks by having a same block address as the respective block of the second plurality of blocks.
The techniques disclosed herein can be applied to at least to computer systems where processors are separated from memory and processors communicate with memory and storage devices via communication buses and/or computer networks. Further, the techniques disclosed herein can be applied to computer systems in which processing capabilities are integrated within memory/storage. For example, the processing circuits, including executing units and/or registers of a typical processor, can be implemented within the integrated circuits and/or the integrated circuit packages of memory media to performing processing within a memory device. Thus, a processor (e.g., see processor 201, 401, 601, and 1001) as discussed above and illustrated in the drawings is not necessarily a central processing unit in the von Neumann architecture. The processor can be a unit integrated within memory to overcome the von Neumann bottleneck that limits computing performance as a result of a limit in throughput caused by latency in data moves between a central processing unit and memory configured separately according to the von Neumann architecture.
The description and drawings of the present disclosure are illustrative and are not to be construed as limiting. Numerous specific details are described to provide a thorough understanding. However, in certain instances, well known or conventional details are not described in order to avoid obscuring the description. References to one or an embodiment in the present disclosure are not necessarily references to the same embodiment; and, such references mean at least one.
In the foregoing specification, the disclosure has been described with reference to specific exemplary embodiments thereof. It will be evident that various modifications can be made thereto without departing from the broader spirit and scope as set forth in the following claims. The specification and drawings are, accordingly, to be regarded in an illustrative sense rather than a restrictive sense.
Claims
1. A cache system, comprising:
- a first cache;
- a second cache;
- a connection to a command bus coupled between the cache system and a processor;
- a connection to an address bus coupled between the cache system and the processor;
- a connection to a data bus coupled between the cache system and the processor;
- a connection to an execution-type signal line from the processor identifying an execution type; and
- a logic circuit coupled to control the first cache and the second cache according to the execution type;
- wherein the cache system is configured to be coupled between the processor and a memory system; and
- wherein when the execution type is a first type indicating non-speculative execution of instructions by the processor and the first cache is configured to service commands from the command bus for accessing the memory system, the logic circuit is configured to copy a portion of content cached in the first cache to the second cache.
2. The cache system of claim 1, wherein the logic circuit is configured to copy the portion of content cached in the first cache to the second cache independent of a current command received in the command bus.
3. The cache system of claim 1, wherein when the execution type is the first type indicating non-speculative execution of instructions by the processor and the first cache is configured to service commands from the command bus for accessing the memory system, the logic circuit is configured to service subsequent commands from the command bus using the second cache in response to the execution type being changed from the first type to a second type indicating speculative execution of instructions by the processor.
4. The cache system of claim 3, wherein the logic circuit is configured to complete synchronization of the portion of the content from the first cache to the second cache before servicing the subsequent commands after the execution type is changed from the first type to the second type.
5. The cache system of claim 3, wherein the logic circuit is configured to continue synchronization of the portion of the content from the first cache to the second cache while servicing the subsequent commands.
6. The cache system of claim 3, further comprising: a configurable data bit, wherein the logic circuit is further coupled to control the first cache and the second cache according to the configurable data bit.
7. The cache system of claim 6,
- wherein when the configurable data bit is in a first state, the logic circuit is configured to: implement commands received from the command bus for accessing the memory system via the first cache, when the execution type is the first type; and implement commands received from the command bus for accessing the memory system via the second cache, when the execution type is a second type; and
- wherein when the configurable data bit is in a second state, the logic circuit is configured to: implement commands received from the command bus for accessing the memory system via the second cache, when the execution type is the first type; and implement commands received from the command bus for accessing the memory system via the first cache, when the execution type is the second type.
8. The cache system of claim 7, further comprising:
- a connection to a speculation-status signal line from the processor identifying a status of a speculative execution of instructions by the processor;
- wherein the connection to the speculation-status signal line is configured to receive the status of a speculative execution,
- wherein the status of a speculative execution indicates that a result of a speculative execution is to be accepted or rejected, and
- wherein when the execution type changes from the second type to the first type, the logic circuit is configured to: toggle the configurable data bit, if the status of speculative execution indicates that a result of speculative execution is to be accepted; and maintain the configurable data bit without changes, if the status of speculative execution indicates that a result of speculative execution is to be rejected.
9. The cache system of claim 1, wherein the first cache and the second cache together comprise:
- a plurality of cache sets, including a first cache set and a second cache set; and
- a plurality of registers associated with the plurality of cache sets respectively, including a first register associated with the first cache set and a second register associated with the second cache set; and
- wherein the logic circuit is further coupled to control the plurality of cache sets according to the plurality of registers.
10. A cache system, comprising:
- a plurality of cache sets, including a first cache set and a second cache set;
- a plurality of registers associated with the plurality of cache sets respectively, including a first register associated with the first cache set and a second register associated with the second cache set;
- a connection to a command bus coupled between the cache system and a processor;
- a connection to an address bus coupled between the cache system and the processor;
- a connection to a data bus coupled between the cache system and the processor;
- a connection to an execution-type signal line from the processor identifying an execution type; and
- a logic circuit coupled to control the plurality of cache sets according to the execution type;
- wherein the cache system is configured to be coupled between the processor and a memory system; and
- wherein when the execution type is a first type indicating non-speculative execution of instructions by the processor and the first cache set is configured to service commands from the command bus for accessing the memory system, the logic circuit is configured to copy a portion of content cached in the first cache set to the second cache set.
11. The cache system of claim 10, wherein the logic circuit is configured to copy the portion of content cached in the first cache set to the second cache set independent of a current command received in the command bus.
12. The cache system of claim 10, wherein when the execution type is the first type indicating non-speculative execution of instructions by the processor and the first cache set is configured to service commands from the command bus for accessing the memory system, the logic circuit is configured to service subsequent commands from the command bus using the second cache set in response to the execution type being changed from the first type to a second type indicating speculative execution of instructions by the processor.
13. The cache system of claim 12, wherein the logic circuit is configured to complete synchronization of the portion of the content from the first cache set to the second cache set before servicing the subsequent commands after the execution type is changed from the first type to the second type.
14. The cache system of claim 12, wherein the logic circuit is configured to continue synchronization of the portion of the content from the first cache set to the second cache set while servicing the subsequent commands.
15. The cache system of claim 10,
- wherein the logic circuit is further coupled to control the plurality of cache sets according to the plurality of registers;
- wherein when the connection to the address bus receives a memory address from the processor, the logic circuit is configured to: generate a set index from at least the memory address; and determine whether the generated set index matches with content stored in the first register or with content stored in the second register; and
- wherein the logic circuit is configured to implement a command received in the connection to the command bus via the first cache set in response to the generated set index matching with the content stored in the first register and via the second cache set in response to the generated set index matching with the content stored in the second register.
16. The cache system of claim 15, wherein, in response to a determination that a data set of the memory system associated with the memory address is not currently cached in the cache system, the logic circuit is configured to allocate the first cache set for caching the data set and store the generated set index in the first register.
17. The cache system of claim 16, further comprising:
- a connection to an execution-type signal line from the processor identifying an execution type;
- wherein when the first and second registers are in a first state, the logic circuit is configured to: implement commands received from the command bus for accessing the memory system via the first cache set, when the execution type is a first type; and implement commands received from the command bus for accessing the memory system via the second cache set, when the execution type is a second type; and
- wherein when the first and second registers are in a second state, the logic circuit is configured to: implement commands received from the command bus for accessing the memory system via another cache set of the plurality of cache sets besides the first cache set, when the execution type is the first type; and implement commands received from the command bus for accessing the memory system via another other cache set of the plurality of cache sets besides the second cache set, when the execution type is the second type.
18. The cache system of claim 17, wherein the generated set index is generated further based on a type identified by the execution-type signal line.
19. The cache system of claim 18, further comprising:
- a connection to a speculation-status signal line from the processor identifying a status of a speculative execution of instructions by the processor;
- wherein the connection to the speculation-status signal line is configured to receive the status of a speculative execution;
- wherein the status of a speculative execution indicates that a result of a speculative execution is to be accepted or rejected; and
- wherein when the execution type changes from the second type to the first type, the logic circuit is configured to: change the content stored in the first register and the content stored in the second register, if the status of speculative execution indicates that a result of speculative execution is to be accepted; and maintain the content stored in the first register and the content stored in the second register without changes, if the status of speculative execution indicates that a result of speculative execution is to be rejected.
20. A cache system, comprising:
- a plurality of caches;
- a plurality of cache sets divided amongst the plurality of caches, including a first cache set and a second cache set;
- a plurality of registers associated with the plurality of cache sets respectively, including a first register associated with the first cache set and a second register associated with the second cache set;
- a connection to a command bus coupled between the cache system and a processor;
- a connection to an address bus coupled between the cache system and the processor;
- a connection to a data bus coupled between the cache system and the processor;
- a connection to an execution-type signal line from the processor identifying an execution type; and
- a logic circuit coupled to control the plurality of cache sets according to the execution type;
- wherein the cache system is configured to be coupled between the processor and a memory system; and
- wherein when the execution type is a first type indicating non-speculative execution of instructions by the processor and the first cache set is configured to service commands from the command bus for accessing the memory system, the logic circuit is configured to copy a portion of content cached in the first cache set to the second cache set.
Type: Application
Filed: Jul 31, 2019
Publication Date: Feb 4, 2021
Inventor: Steven Jeffrey Wallach (Dallas, TX)
Application Number: 16/528,479