Abstract: An instruction fetch pipeline includes first, second, and third sub-pipelines that respectively include: a TLB that receives a fetch virtual address, a tag random access memory (RAM) of a physically-indexed physically-tagged set associative instruction cache that receives a predicted set index, and a data RAM that receives the predicted set index and a predicted way number that specifies a way of the entry from which a block of instructions was previously fetched. The predicted set index specifies the instruction cache set that includes the entry. The three sub-pipelines respectively initiate in parallel: a TLB access using the fetch virtual address to obtain a translation thereof into a fetch physical address that includes a tag, a tag RAM access using the predicted set index to read a set of tags, and a data RAM access using the predicted set index and the predicted way number to fetch the block of instructions.
Type:
Grant
Filed:
June 8, 2022
Date of Patent:
June 18, 2024
Assignee:
Ventana Micro Systems Inc.
Inventors:
John G. Favor, Michael N. Michael, Vihar Soneji
Abstract: A dynamically-foldable instruction fetch pipeline receives a first fetch request that includes a fetch virtual address and includes first, second and third sub-pipelines that respectively include a translation lookaside buffer (TLB) that translates the fetch virtual address into a fetch physical address, a tag random access memory (RAM) of a physically-indexed physically-tagged set associative instruction cache that receives a set index that selects a set of tag RAM tags for comparison with a tag portion of the fetch physical address to determine a correct way of the instruction cache, and a data RAM of the instruction cache that receives the set index and a way number that together specify a data RAM entry from which to fetch an instruction block. When a control signal indicates a folded mode, the sub-pipelines operate in a parallel manner. When the control signal indicates a unfolded mode, the sub-pipelines operate in a sequential manner.
Type:
Grant
Filed:
June 8, 2022
Date of Patent:
June 18, 2024
Assignee:
Ventana Micro Systems Inc.
Inventors:
John G. Favor, Michael N. Michael, Vihar Soneji
Abstract: A microprocessor includes a branch target buffer (BTB). Each BTB entry holds a tag based on at least a portion of a virtual address of a block of instructions previously fetched from a physically-indexed physically-tagged set associative instruction cache using a physical address that is a translation of the virtual address, a translated address bit portion of a set index of an instruction cache entry from which the instruction block was previously fetched, and a way number of the instruction cache entry from which the instruction block was previously fetched. In response to a BTB hit based on a fetch virtual address, the BTB provides a translated address bit portion of a predicted set index that is the translated address bit portion of the set index from the hit on BTB entry and a predicted way number that is the way number from the hit on BTB entry.
Type:
Grant
Filed:
June 8, 2022
Date of Patent:
June 11, 2024
Assignee:
Ventana Micro Systems Inc.
Inventors:
John G. Favor, Michael N. Michael, Vihar Soneji
Abstract: A microprocessor includes a decode unit that maps architectural instructions into micro-operations and dispatches them to a scheduler that issues them to execution units that execute them by reading source operands from a register file and writing execution results to the register file. An architectural instruction instructs the microprocessor to load a constant into an architectural destination register. The decode unit maps the architectural instruction into a load constant micro-operation (LCM) and writes the LCM constant directly to a register of the register file without dispatching the LCM to the scheduler, such that the LCM is not issued to the execution units. In the same clock cycle, the decode unit indicates the LCM constant is available for consumption, such that the LCM imposes zero execution latency on dependent micro-operations and dispatches to the scheduler micro-operations other than the LCM. The register file may include a decode unit-dedicated write port.
Abstract: A microprocessor includes a physically-indexed physically-tagged second-level set-associative cache. A set index and a way uniquely identifies each entry. A load/store unit, during store/load instruction execution: detects that a first and second portions of store/load data are to be written/read to/from different first and second lines of memory specified by first and second store physical memory line addresses, writes to a store/load queue entry first and second store physical address proxies (PAPs) for first and second store physical memory line addresses (and all the store data in store execution case). The first and second store PAPs comprise respective set indexes and ways that uniquely identifies respective entries of the second-level cache that holds respective copies of the respective first and second lines of memory. The entries of the store queue are absent storage for holding the first and second store physical memory line addresses.
Abstract: A method and system for mitigating against side channel attacks (SCA) that exploit speculative store-to-load forwarding is described. The method comprises conditioning store-to-load forwarding on the memory dependence predictor (MDP) being trained for that load instruction. Training involves identifying situations in which store-to-load forwarding could have been performed, but wasn't, and obversely, identifying situations in which store-to-load forwarding was performed but resulted in an error.
Abstract: A method and system for mitigating against side channel attacks (SCA) that exploit speculative store-to-load forwarding is described. The method comprises ensuring that the physical load and store addresses match and/or that permissions are present before speculatively store-to-load forwarding. Various improvements maintain a short load-store pipeline, including usage of a virtual level-one data cache (DL1), usage of an inclusive physical level-two data cache (DL2), storage and lookup of physical data address equivalents in the DL1, and using a memory dependence predictor (MDP) to speed up or replace store queue camming of load data addresses against store data addresses.
Abstract: An instruction fetch pipeline includes first, second, and third sub-pipelines that respectively include: a TLB that receives a fetch virtual address, a tag random access memory (RAM) of a physically-indexed physically-tagged set associative instruction cache that receives a predicted set index, and a data RAM that receives the predicted set index and a predicted way number that specifies a way of the entry from which a block of instructions was previously fetched. The predicted set index specifies the instruction cache set that includes the entry. The three sub-pipelines respectively initiate in parallel: a TLB access using the fetch virtual address to obtain a translation thereof into a fetch physical address that includes a tag, a tag RAM access using the predicted set index to read a set of tags, and a data RAM access using the predicted set index and the predicted way number to fetch the block of instructions.
Type:
Grant
Filed:
June 8, 2022
Date of Patent:
May 7, 2024
Assignee:
Ventana Micro Systems Inc.
Inventors:
John G. Favor, Michael N. Michael, Vihar Soneji
Abstract: A method and system for mitigating against side channel attacks (SCA) that exploit speculative store-to-load forwarding is described. The method comprises conditioning store-to-load forwarding on the memory dependence predictor (MDP) being trained for that load instruction. Training involves identifying situations in which store-to-load forwarding could have been performed, but wasn't, and obversely, identifying situations in which store-to-load forwarding was performed but resulted in an error.
Abstract: An out-of-order and speculative execution microprocessor that mitigates side channel attacks includes a cache memory and fill request generation logic that generates a request to fill the cache memory with a cache line implicated by a memory address that misses in the cache memory. At least one execution pipeline receives first and second load operations, detects a condition in which the first load generates a need for an architectural exception, the second load misses in the cache memory, and the second load is newer in program order than the first load, and prevents state of the cache memory from being affected by the miss of the second load by inhibiting the fill request generation logic from generating a fill request for the second load or by canceling the fill request for the second load if the fill request generation logic has already generated the fill request for the second load.
Abstract: An instruction fetch pipeline includes first, second, and third sub-pipelines that respectively include: a TLB that receives a fetch virtual address, a tag random access memory (RAM) of a physically-indexed physically-tagged set associative instruction cache that receives a predicted set index, and a data RAM that receives the predicted set index and a predicted way number that specifies a way of the entry from which a block of instructions was previously fetched. The predicted set index specifies the instruction cache set that includes the entry. The three sub-pipelines respectively initiate in parallel: a TLB access using the fetch virtual address to obtain a translation thereof into a fetch physical address that includes a tag, a tag RAM access using the predicted set index to read a set of tags, and a data RAM access using the predicted set index and the predicted way number to fetch the block of instructions.
Type:
Grant
Filed:
June 8, 2022
Date of Patent:
January 23, 2024
Assignee:
Ventana Micro Systems Inc.
Inventors:
John G. Favor, Michael N. Michael, Vihar Soneji
Abstract: A microprocessor that mitigates side channel attacks includes a front end that processes instructions in program order and a back end that performs speculative execution of instructions out of program order in a superscalar fashion. Producing execution units produce architectural register results during execution of instructions. Consuming execution units consume the produced architectural register results during execution of instructions. The producing and consuming execution units may be the same or different execution units. Control logic detects that, during execution by a producing execution unit, an architectural register result producing instruction causes a need for an architectural exception and consequent flush of all instructions younger in program order than the producing instruction and prevents all instructions within the back end that are dependent upon the producing instruction from consuming the architectural register result produced by the producing instruction.
Abstract: A microprocessor includes a virtually-indexed L1 data cache that has an allocation policy that permits multiple synonyms to be co-resident. Each L2 entry is uniquely identified by a set index and a way number. A store unit, during a store instruction execution, receives a store physical address proxy (PAP) for a store physical memory line address (PMLA) from an L1 entry hit upon by a store virtual address, and writes the store PAP to a store queue entry. The store PAP comprises the set index and the way number of an L2 entry that holds a line specified by the store PMLA. The store unit, during the store commit, reads the store PAP from the store queue, looks up the store PAP in the L1 to detect synonyms, writes the store data to one or more of the detected synonyms, and evicts the non-written detected synonyms.
Type:
Grant
Filed:
May 24, 2022
Date of Patent:
January 9, 2024
Assignee:
Ventana Micro Systems Inc.
Inventors:
John G. Favor, Srivatsan Srinivasan, Robert Haskell Utley
Abstract: Each PIPT L2 cache entry is uniquely identified by a set index and a way and holds a generational identifier (GENID). The L2 detects a miss of a physical memory line address (PMLA). An L2 set index is obtained from the PMLA. The L2 picks a way for replacement, increments the GENID held in the entry in the picked way of the selected set, and forms a physical address proxy (PAP) for the PMLA with the obtained set index and the picked way. The PAP uniquely identifies the picked L2 entry. The L2 forms a generational PAP (GPAP) for the PMLA with the PAP and the incremented GENID. A load/store unit makes available the GPAP as a proxy of the PMLA for comparisons with GPAPs of other PMLAs, rather than making comparisons of the PMLA itself with the other PMLAs, to determine whether the PMLA matches the other PMLAs.
Type:
Grant
Filed:
May 18, 2022
Date of Patent:
January 2, 2024
Assignee:
Ventana Micro Systems Inc.
Inventors:
John G. Favor, Srivatsan Srinivasan, Robert Haskell Utley
Abstract: A microprocessor for mitigating side channel attacks includes a memory subsystem that receives a load operation that specifies a load address. The memory subsystem includes a virtually-indexed, virtually-tagged data cache memory (VIVTDCM) comprising entries that hold translation information. The memory subsystem also includes a data translation lookaside buffer (DTLB) comprising entries that hold physical address translations and translation information. The processor performs speculative execution of instructions and executes instructions out of program order. The memory system allows non-inclusion with respect to translation information between the VIVTDCM and the DTLB such that, for instances in time, translation information associated with the load address is present in the VIVTDCM and absent in the DTLB.
Abstract: A microprocessor prevents same address load-load ordering violations. Each load queue entry holds a load physical memory line address (PMLA) and an indication of whether a load instruction has completed execution. The microprocessor fills a line specified by a fill PMLA into a cache entry and snoops the load queue with the fill PMLA, either before the fill or in an atomic manner with the fill with respect to ability of the filled entry to be hit upon by any load instruction, to determine whether the fill PMLA matches load PMLAs in load queue entries associated with load instructions that have completed execution and there are other load instructions in the load queue that have not completed execution. The microprocessor, if the condition is true, flushes at least the other load instructions in the load queue that have not completed execution.
Abstract: A predictor includes a memory having a plurality of entries. Each entry includes a prediction of a hash of a next fetch address produced by a fetch block J of a series of successive fetch blocks in program execution order and a branch direction produced by the fetch block J. An input selects an entry for provision on the output. The output is fed back to the input such that the output provides the prediction of the hash of the next fetch address and the branch direction produced by each fetch block over a series of successive clock cycles. The hash of the next fetch address is insufficient for use by an instruction fetch unit to fetch from an instruction cache a fetch block J+1, whereas the next fetch address itself is sufficient for use by the instruction fetch unit to fetch from the instruction cache the fetch block J+1.
Abstract: A L2 cache is set associative, has N ways, and is inclusive of a virtual L1 cache such that when the virtual address misses in the L1: a portion of the virtual address is translated into a physical memory line address (PMLA), the PMLA is allocated into an L2 entry, and a physical address proxy (PAP) for the PMLA is allocated into an L1 entry. The PAP for the PMLA includes a set and a way that uniquely identify the L2 entry. The L2 receives a physical memory line address for allocation, uses a set index portion of the PMLA, and for each L2 way, forms a PAP corresponding to the way. The L1, for each PAP, generates a corresponding indicator of whether the PAP is L1 resident. The L2 selects, for replacement, a way whose indicator indicates the PAP is not resident in the L1.
Type:
Grant
Filed:
May 24, 2022
Date of Patent:
December 5, 2023
Assignee:
Ventana Micro Systems Inc.
Inventors:
John G. Favor, Srivatsan Srinivasan, Robert Haskell Utley
Abstract: A microprocessor includes a prediction unit pipeline having a first stage that makes first predictions at a rate of one per clock cycle. Each first prediction comprises a hash of a fetch address of a current fetch block and branch history update information produced by a previous fetch block immediately preceding the current fetch block. Second one or more stages, following the first stage, with a latency of N (at least one) clock cycles, use the first predictions to make second predictions at a rate of one per clock cycle. Each second prediction includes a fetch address of a next fetch block immediately succeeding the current fetch block and branch history update information produced by the current fetch block. For each second prediction of the second predictions, logic uses the second prediction to check whether the first prediction made N?1 clock cycles earlier than the second prediction is a mis-prediction.
Abstract: A microprocessor for mitigating side channel attacks includes a memory subsystem that receives a load operation that specifies a load address. The memory subsystem includes a virtually-indexed, virtually-tagged data cache memory (VIVTDCM) comprising entries that hold translation information. The memory subsystem also includes a data translation lookaside buffer (DTLB) comprising entries that hold physical address translations and translation information. The processor performs speculative execution of instructions and executes instructions out of program order. The memory system allows non-inclusion with respect to translation information between the VIVTDCM and the DTLB such that, for instances in time, translation information associated with the load address is present in the VIVTDCM and absent in the DTLB.