Abstract: A processor has a register file configurable for different execution modes. In one mode the multiple register segments form a single register file where each register segment stores a Multiple Instructions Multiple Data (MIMD) super instruction matrix issuing four simultaneous instruction matrices where each individual instruction within each of the four simultaneous instruction matrices is a scalar or Single Instruction Multiple Data (SIMD). Another execution mode has the multiple register segments forming individual independent register tiles with individual register state to support simultaneous processing of separate threads, where each instruction matrix is associated with a separate thread and a separate register file segment.
Abstract: Methods for read after write forwarding using a virtual address are disclosed. A method includes determining when a virtual address has been remapped from corresponding to a first physical address to a second physical address and determining if all stores occupying a store queue before the remapping have been retired from the store queue. Loads that are younger than the stores that occupied the store queue before the remapping are prevented from being dispatched and executed until the stores that occupied the store queue before the remapping have left the store queue and become globally visible.
Abstract: A method for acquiring cache line data associated with a load from respective hierarchical cache data storage components. As a part of the method, a store queue is accessed for one or more portions of a cache line associated with a load, and, if the one or more portions of the cache line is held in the store queue, the one or more portions of the cache line is stored in a load queue location associated with the load. The load is completed if the one or more portions of the cache line stored in the load queue location includes all portions of the cache line associated with the load.
Abstract: Methods for read request bypassing a last level cache which interfaces with an external fabric are disclosed. A method includes identifying a read request for a read transaction, generating a phantom read transaction identifier for the read transaction and forwarding the read transaction with the phantom read transaction identifier beyond a last level cache before detection of a hit or miss with respect to the read transaction. The phantom read transaction identifier acts as a pointer to a real read transaction identifier.
Abstract: Methods for invasive debug of a processor without processor execution of instructions are disclosed. As a part of a method, a memory mapped I/O of the processor is accessed using a debug bus and an operation is initiated that causes a debug port to gain access to registers of the processor using the memory mapped I/O. The invasive debug of the processor is executed from the debug port via registers of the processor.
Abstract: A method for executing blocks of instructions using a microprocessor architecture having a register view, source view, instruction view, and a plurality of register templates.
Abstract: A method for executing instructions using register templates to track interdependencies among blocks of instructions. The method includes receiving an incoming instruction sequence using a global front end; grouping the instructions to form instruction blocks; and using a register template to track instruction destinations and instruction sources by populating the register template with block numbers corresponding to the instruction blocks, wherein the block numbers corresponding to the instruction blocks indicate interdependencies among the blocks of instructions.
Abstract: A mutltiport memory cell having improved density area is disclosed. The memory cell includes a data storing component, a first memory access component coupled to a first side of the data storing component, a second memory access component coupled to a second side of the data storing component, first and second bit lines coupled to the first memory access component, first and second bit lines coupled to the second memory access component, first and second write lines coupled to the first memory access component and first and second write lines coupled to the second memory access component. The multiport memory cell also includes a read/write assist transistor, coupled to load transistors of the data storing component, that during read operations is activated for the duration of the read operation and during write operations is activated to impress the desired voltage level before or after one or more memory access components activated as a part of the write operation are deactivated.
Abstract: Systems and methods for accessing a unified translation lookaside buffer (TLB) are disclosed. A method includes receiving an indicator of a level one translation lookaside buffer (L1TLB) miss corresponding to a request for a virtual address to physical address translation, searching a cache that includes virtual addresses and page sizes that correspond to translation table entries (TTEs) that have been evicted from the L1TLB, where a page size is identified, and searching a second level TLB and identifying a physical address that is contained in the second level TLB. Access is provided to the identified physical address.
Type:
Grant
Filed:
March 7, 2012
Date of Patent:
January 6, 2015
Assignee:
Soft Machines, Inc.
Inventors:
Karthikeyan Avudaiyappan, Mohammad Abdallah
Abstract: A method for performing instruction scheduling in an out-of-order microprocessor pipeline is disclosed. The method comprises selecting a first set of instructions to dispatch from a scheduler to an execution module, wherein the execution module comprises two types of execution units. The first type of execution unit executes both a first and a second type of instruction and the second type of execution unit executes only the second type. Next, the method comprises selecting a second set of instructions to dispatch, which is a subset of the first set and comprises only instructions of the second type. Next, the method comprises determining a third set of instructions, which comprises instructions not selected as part of the second set. Finally, the method comprises dispatching the second set for execution using the second type of execution unit and dispatching the third set for execution using the first type of execution unit.
Abstract: A method for fast parallel adder processing. The method includes receiving parallel inputs from a communications path, wherein each input comprises one bit, adding the inputs using a parallel structure, wherein the parallel structure is optimized to accelerate the addition by utilizing a characteristic that the inputs are one bit each, and transmitting the resulting outputs to a subsequent stage.
Abstract: A method for executing dual dispatch of blocks and half blocks. The method includes receiving an incoming instruction sequence using a global front end; grouping the instructions to form instruction blocks, wherein each of the instruction blocks comprise two half blocks; scheduling the instructions of the instruction block to execute in accordance with a scheduler; and performing a dual dispatch of the two half blocks for execution on an execution unit.
Abstract: A method for preventing non-temporal entries from entering small critical structures is disclosed. The method comprises transferring a first entry from a higher level memory structure to an intermediate buffer. It further comprises determining a second entry to be evicted from the intermediate buffer and a corresponding value associated with the second entry. Subsequently, responsive to a determination that the second entry is frequently accessed, the method comprises installing the second entry into a lower level memory structure. Finally, the method comprises installing the first entry into a slot previously occupied by the second entry in the intermediate buffer.
Abstract: A microprocessor implemented method for resolving dependencies for a load instruction in a load store queue (LSQ) is disclosed. The method comprises initiating a computation of a virtual address corresponding to the load instruction in a first clock cycle. It also comprises transmitting early calculated lower address bits of the virtual address to a load store queue (LSQ) in the same cycle as the initiating. Finally, it comprises performing a partial match in the LSQ responsive to and using the lower address bits to find a prior aliasing store, wherein the prior aliasing store stores to a same address as the load instruction.
Abstract: A method for predicting a way of a set associative shadow cache is disclosed. As a part of a method, a request to fetch a first far taken branch instruction of a first cache line from an instruction cache is received, and responsive to a hit in the instruction cache, a predicted way is selected from a way array using a way that corresponds to the hit in the instruction cache. A second cache line is selected from a shadow cache using the predicted way and the first cache line and the second cache line are forwarded in the same clock cycle.
Type:
Application
Filed:
March 17, 2014
Publication date:
September 18, 2014
Applicant:
Soft Machines, Inc.
Inventors:
Mohammad ABDALLAH, Ravishankar RAO, Karthikeyan AVUDAIYAPPAN
Abstract: A microprocessor implemented method for maintaining a guest return address stack in an out-of-order microprocessor pipeline is disclosed. The method comprises mapping a plurality of instructions in a guest address space into a corresponding plurality of instructions in a native address space. For each function call instruction in the native address space fetched during execution, the method also comprises performing the following: (a) pushing a current entry into a guest return address stack (GRAS) responsive to a function call, wherein the GRAS is maintained at the fetch stage of the pipeline, and wherein the current entry comprises information regarding both a guest target return address and a corresponding native target return address associated with the function call; (b) popping the current entry from the GRAS in response to processing a return instruction; and (c) fetching instructions from the native target return address in the current entry after the popping from the GRAS.
Abstract: A method for sorting elements in hardware structures is disclosed. The method comprises selecting a plurality of elements to order from an unordered input queue (UIQ) within a predetermined range in response to finding a match between at least one most significant bit of the predetermined range and corresponding bits of a respective identifier associated with each of the plurality of elements. The method further comprises presenting each of the plurality of elements to a respective multiplexer. Further the method comprises generating a select signal for an enabled multiplexer in response to finding a match between at least one least significant bit of a respective identifier associated with each of the plurality of elements and a port number of the ordered queue. Finally, the method comprises forwarding a packet associated with a selected element identifier to a matching port number of the ordered queue from the enabled multiplexer.
Abstract: A microprocessor implemented method for processing a load instruction is disclosed. The method comprises computing a virtual address corresponding to the load instruction. Next, it comprises performing a lookup of a set associative translation lookaside buffer (TLB) and a set associative data cache memory in parallel using early calculated lower address bits of the virtual address. Subsequently, it comprises retrieving a set of entries from the TLB corresponding to a first group of lower address bits transmitted to the TLB, wherein the set of entries comprise a plurality of virtual addresses and corresponding physical addresses. Further, it comprises finding a matching entry for the virtual address in the set of entries using upper bits of the virtual address, wherein the matching entry comprises a physical address corresponding to the virtual address. Finally, it comprises finding a matching entry in the data cache memory using the physical address.