Patents Examined by Daniel Pan
  • Patent number: 10048964
    Abstract: In a processor, a disambiguation-free out of order load store queue method. The method includes implementing a memory resource that can be accessed by a plurality of asynchronous cores; implementing a store retirement buffer, wherein stores from a store queue have entries in the store retirement buffer in original program order; and upon dispatch of a subsequent load from a load queue, searching the store retirement buffer for address matching. The method further includes in cases where there are a plurality of address matches, locating a correct forwarding entry by scanning for the store retirement buffer for a first match; and forwarding data from the first match to the subsequent load.
    Type: Grant
    Filed: December 12, 2014
    Date of Patent: August 14, 2018
    Assignee: Intel Corporation
    Inventor: Mohammad A. Abdallah
  • Patent number: 9965277
    Abstract: An out of order processor. The processor includes a virtual load store queue for allocating a plurality of loads and a plurality of stores, wherein more loads and more stores can be accommodated beyond an actual physical size of the load store queue of the processor; wherein the processor allocates other instructions besides loads and stores beyond the actual physical size limitation of the load/store queue; and wherein the other instructions can be dispatched and executed even though intervening loads or stores do not have spaces in the load store queue.
    Type: Grant
    Filed: December 11, 2014
    Date of Patent: May 8, 2018
    Assignee: Intel Corporation
    Inventor: Mohammad A. Abdallah
  • Patent number: 9928121
    Abstract: A method for forwarding data from the store instructions to a corresponding load instruction in an out of order processor. The method includes accessing an incoming sequence of instructions; reordering the instructions in accordance with processor resources for dispatch and execution; ensuring a closest earlier store in machine order for to a corresponding load, by determining if said store has an actual age but said corresponding load does not have an actual age, then said store is earlier than said corresponding load; if said corresponding load has an actual age but said store does not have an actual age, then said corresponding load is earlier than said store; if neither said corresponding load or said store have an actual age, then a virtual identifier table is used to determine which is earlier; and if both said corresponding load and said store have actual ages, then the actual ages are used to determine which is earlier.
    Type: Grant
    Filed: December 11, 2014
    Date of Patent: March 27, 2018
    Assignee: Intel Corporation
    Inventor: Mohammad Abdallah
  • Patent number: 9910669
    Abstract: A processor includes a front end to receive an instruction, a decoder to decode the instruction, a core to execute the first instruction, and a retirement unit to retire the first instruction. The core includes logic to execute the first instruction, including logic to repeatedly record a translation lookaside buffer (TLB) until a designated number of records are determined, and flush the TLB after a flush interval.
    Type: Grant
    Filed: June 26, 2015
    Date of Patent: March 6, 2018
    Assignee: Intel Corporation
    Inventors: Kshitij A. Doshi, Christopher J. Hughes
  • Patent number: 9904552
    Abstract: An out of order processor. The processor includes a distributed load queue and a distributed store queue that maintain single program sequential semantics while allowing an out of order dispatch of loads and stores across a plurality of cores and memory fragments; wherein the processor allocates other instructions besides loads and stores beyond the actual physical size limitation of the load/store queue; and wherein the other instructions can be dispatched and executed even though intervening loads or stores do not have spaces in the load store queue.
    Type: Grant
    Filed: December 3, 2014
    Date of Patent: February 27, 2018
    Assignee: Intel Corporation
    Inventor: Mohammad Abdallah
  • Patent number: 9898299
    Abstract: Embodiments relate to multithreaded branch prediction. An aspect includes a system for dynamically evaluating how to share entries of a multithreaded branch prediction structure. The system includes a first-level branch target buffer coupled to a processor circuit. The processor circuit is configured to perform a method. The method includes receiving a search request to locate branch prediction information associated with the search request, and searching for an entry corresponding to the search request in the first-level branch prediction structure. The entry is not allowed based on a thread state of the entry indicating that the entry has caused a problem on a thread associated with the thread state.
    Type: Grant
    Filed: August 6, 2015
    Date of Patent: February 20, 2018
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: James J. Bonanno, Daniel Lipetz, Brian R. Prasky, Anthony Saporito
  • Patent number: 9898301
    Abstract: When a main processor issues a command to co-processor, a timeout value is included in the command. As the co-processor attempts to execute the command, it is determined whether the attempt is taking time beyond what is permitted by the timeout value. If the timeout is exceeded then responsive action is taken, such as the generation of a command timeout type failure message. The receipt of the command with the timeout value, and the consequent determination of a timeout condition for the command, may be determined by: the co-processor that receives the command, or a watchdog timer that is separate from the co-processor. Also, detection of co-processor hang and/or hung co-processor conditions during the time that a co-processor is executing a command for the main processor.
    Type: Grant
    Filed: June 20, 2014
    Date of Patent: February 20, 2018
    Assignee: International Business Machines Corporation
    Inventors: Nitin Gupta, Mehulkumar J. Patel, Deepak C. Shetty
  • Patent number: 9891927
    Abstract: A microprocessor includes a plurality of processing cores and an uncore random access memory (RAM) readable and writable by each of the plurality of processing cores. Each core of the plurality of processing cores comprises microcode run by the core that implements architectural instructions of an instruction set architecture of the microprocessor. The microcode is configured to both read and write the uncore RAM to accomplish inter-core communication between the plurality of processing cores.
    Type: Grant
    Filed: May 19, 2014
    Date of Patent: February 13, 2018
    Assignee: VIA TECHNOLOGIES, INC.
    Inventors: G. Glenn Henry, Terry Parks, Rodney E. Hooker, Stephan Gaskins
  • Patent number: 9891924
    Abstract: A method for implementing a reduced size register view data structure in a microprocessor. The method includes receiving an incoming instruction sequence using a global front end; grouping the instructions to form instruction blocks; using a plurality of multiplexers to access ports of a scheduling array to store the instruction blocks as a series of chunks.
    Type: Grant
    Filed: March 14, 2014
    Date of Patent: February 13, 2018
    Assignee: Intel Corporation
    Inventor: Mohammad A. Abdallah
  • Patent number: 9891918
    Abstract: A microprocessor includes a predicting unit having storage for holding a prediction history of characteristics of instructions previously executed by the microprocessor. The predicting unit accumulates the prediction history and uses the prediction history to make predictions related to subsequent instruction executions. The storage comprises a plurality of portions separately controllable for accumulating the prediction history. The microprocessor also includes a control unit that detects the microprocessor is running an operating system routine and controls the predicting unit to use only a fraction of the plurality of portions of the storage to accumulate the prediction history while the microprocessor is running the operating system routine.
    Type: Grant
    Filed: January 26, 2015
    Date of Patent: February 13, 2018
    Assignee: VIA ALLIANCE SEMICONDUCTOR CO., LTD.
    Inventors: Rodney E. Hooker, Terry Parks, John D. Bunda
  • Patent number: 9891928
    Abstract: A microprocessor a plurality of processing cores, wherein each of the plurality of processing cores instantiates a respective architecturally-visible storage resource. A first core of the plurality of processing cores is configured to encounter an architectural instruction that instructs the first core to update the respective architecturally-visible storage resource of the first core with a value specified by the architectural instruction. The first core is further configured to, in response to encountering the architectural instruction, provide the value to each of the other of the plurality of processing cores and update the respective architecturally-visible storage resource of the first core with the value. Each core of the plurality of processing cores other than the first core is configured to update the respective architecturally-visible storage resource of the core with the value provided by the first core without encountering the architectural instruction.
    Type: Grant
    Filed: August 9, 2016
    Date of Patent: February 13, 2018
    Assignee: VIA TECHNOLOGIES, INC.
    Inventors: G. Glenn Henry, Stephan Gaskins
  • Patent number: 9886279
    Abstract: A method for populating an instruction view data structure by using register template snapshots. The method includes receiving an incoming instruction sequence using a global front end; grouping the instructions to form instruction blocks; using a plurality of register templates to track instruction destinations and instruction sources by populating the register template with block numbers corresponding to the instruction blocks, wherein the block numbers corresponding to the instruction blocks indicate interdependencies among the blocks of instructions; populating and instruction view data structure, wherein the instruction view data structure stores instructions corresponding to the instruction blocks as recorded by the plurality of register templates; and using the instruction view data structure to feed a plurality of stacked execution units of execution stage in accordance with the readiness of instruction sources of the instruction blocks.
    Type: Grant
    Filed: March 14, 2014
    Date of Patent: February 6, 2018
    Assignee: Intel Corporation
    Inventor: Mohammad Abdallah
  • Patent number: 9886396
    Abstract: In one embodiment, a processor includes a frontend unit having an instruction decoder to receive and to decode instructions of a plurality of threads, an execution unit coupled to the instruction decoder to receive and execute the decoded instructions, and an instruction retirement unit having a retirement logic to receive the instructions from the execution unit and to retire the instructions associated with one or more of the threads that have an instruction or an event pending to be retired. The instruction retirement unit includes a thread arbitration logic to select one of the threads at a time and to dispatch the selected thread to the retirement logic for retirement processing.
    Type: Grant
    Filed: December 23, 2014
    Date of Patent: February 6, 2018
    Assignee: Intel Corporation
    Inventors: Roger Gramunt, Rammohan Padmanabhan, Ramon Matas, Neal S. Moyer, Benjamin C. Chaffin, Avinash Sodani, Alexey P. Suprun, Vikram S. Sundaram, Chung-Lun Chan, Gerardo A. Fernandez, Julio Gago, Michael S. Yang, Aditya Kesiraju
  • Patent number: 9875100
    Abstract: Methods, systems, and apparatus, including an apparatus for processing an instruction for accessing a N-dimensional tensor, the apparatus including multiple tensor index elements and multiple dimension multiplier elements, where each of the dimension multiplier elements has a corresponding tensor index element. The apparatus includes one or more processors configured to obtain an instruction to access a particular element of a N-dimensional tensor, where the N-dimensional tensor has multiple elements arranged across each of the N dimensions, and where N is an integer that is equal to or greater than one; determine, using one or more tensor index elements of the multiple tensor index elements and one or more dimension multiplier elements of the multiple dimension multiplier elements, an address of the particular element; and output data indicating the determined address for accessing the particular element of the N-dimensional tensor.
    Type: Grant
    Filed: March 13, 2017
    Date of Patent: January 23, 2018
    Assignee: Google LLC
    Inventors: Dong Hyuk Woo, Andrew Everett Phelps
  • Patent number: 9870339
    Abstract: Methods and apparatuses relating to tightly-coupled heterogeneous computing are described. In one embodiment, a hardware processor includes a plurality of execution units in parallel, a switch to connect inputs of the plurality of execution units to outputs of a first buffer and a plurality of memory banks and connect inputs of the plurality of memory banks and a plurality of second buffers in parallel to outputs of the first buffer, the plurality of memory banks, and the plurality of execution units, and an offload engine with inputs connected to outputs of the plurality of second buffers.
    Type: Grant
    Filed: June 26, 2015
    Date of Patent: January 16, 2018
    Assignee: Intel Corporation
    Inventors: Chang Yong Kang, Pierre Laurent, Hari K. Tadepalli, Prasad M. Ghatigar, T. J. O'Dwyer, Serge Zhilyaev
  • Patent number: 9870229
    Abstract: Embodiments of the present invention provide systems and methods for mapping the architected state of one or more threads to a set of distributed physical register files to enable independent execution of one or more threads in a multiple slice processor. In one embodiment, a system is disclosed including a plurality of dispatch queues which receive instructions from one or more threads and an even number of parallel execution slices, each parallel execution slice containing a register file. A routing network directs an output from the dispatch queues to the parallel execution slices and the parallel execution slices independently execute the one or more threads.
    Type: Grant
    Filed: September 29, 2015
    Date of Patent: January 16, 2018
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Sam G. Chu, Markus Kaltenbach, Hung Q. Le, Jentje Leenstra, Jose E. Moreira, Dung Q. Nguyen, Brian W. Thompto
  • Patent number: 9858074
    Abstract: Embodiments relate to non-default instruction handling within a transaction. An aspect includes entering a transaction, the transaction comprising a first plurality of instructions and a second plurality of instructions, wherein a default manner of handling of instructions in the transaction is one of atomic and non-atomic. Another aspect includes encountering a non-default specification instruction in the transaction, wherein the non-default specification instruction comprises a single instruction that specifies the second plurality of instructions of the transaction for handling in a non-default manner comprising one of atomic and non-atomic, wherein the non-default manner is different from the default manner. Another aspect includes handling the first plurality of instructions in the default manner. Yet another aspect includes handling the second plurality of instructions in the non-default manner.
    Type: Grant
    Filed: June 26, 2015
    Date of Patent: January 2, 2018
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Jonathan D. Bradbury, Michael K. Gschwind, Maged M. Michael, Eric M. Schwarz, Valentina Salapura, Chung-Lung K. Shum
  • Patent number: 9851970
    Abstract: An apparatus and method are described for performing SIMD reduction operations. For example, one embodiment of a processor comprises: a value vector register containing a plurality of data element values to be reduced; an index vector register to store a plurality of index values indicating which values in the value vector register are associated with one another; single instruction multiple data (SIMD) reduction logic to perform reduction operations on the data element values within the value vector register by combining data element values from the value vector register which are associated with one another as indicated by the index values in the index vector register; and an accumulation vector register to store results of the reduction operations generated by the SIMD reduction logic.
    Type: Grant
    Filed: December 23, 2014
    Date of Patent: December 26, 2017
    Assignee: INTEL CORPORATION
    Inventors: David M. Kunzman, Christopher J. Hughes
  • Patent number: 9830151
    Abstract: An apparatus and method for performing vector index loads and stores. For example, one embodiment of a processor comprises: a vector index register to store a plurality of index values; a mask register to store a plurality of mask bits; a vector register to store a plurality of vector data elements loaded from memory; and vector index load logic to identify an index stored in the vector index register to be used for a load operation using an immediate value and to responsively combine the index with a base memory address to determine a memory address for the load operation, the vector index load logic to load vector data elements from the memory address to the vector register in accordance with the plurality of mask bits.
    Type: Grant
    Filed: December 23, 2014
    Date of Patent: November 28, 2017
    Assignee: INTEL CORPORATION
    Inventors: Ashish Jha, Robert Valentine, Elmoustapha Ould-Ahmed-Vall
  • Patent number: 9817666
    Abstract: A method for a delayed branch implementation by using a front end track table. The method includes receiving an incoming instruction sequence using a global front end, wherein the instruction sequence includes at least one branch, creating a delayed branch in response to receiving the one branch, and using a front end track table to track both the delayed branch the one branch.
    Type: Grant
    Filed: March 17, 2014
    Date of Patent: November 14, 2017
    Assignee: Intel Corporation
    Inventor: Mohammad Abdallah