Patents Examined by Daniel Pan

Disambiguation-free out of order load store queue

Patent number: 10048964

Abstract: In a processor, a disambiguation-free out of order load store queue method. The method includes implementing a memory resource that can be accessed by a plurality of asynchronous cores; implementing a store retirement buffer, wherein stores from a store queue have entries in the store retirement buffer in original program order; and upon dispatch of a subsequent load from a load queue, searching the store retirement buffer for address matching. The method further includes in cases where there are a plurality of address matches, locating a correct forwarding entry by scanning for the store retirement buffer for a first match; and forwarding data from the first match to the subsequent load.

Type: Grant

Filed: December 12, 2014

Date of Patent: August 14, 2018

Assignee: Intel Corporation

Inventor: Mohammad A. Abdallah
Virtual load store queue having a dynamic dispatch window with a unified structure

Patent number: 9965277

Abstract: An out of order processor. The processor includes a virtual load store queue for allocating a plurality of loads and a plurality of stores, wherein more loads and more stores can be accommodated beyond an actual physical size of the load store queue of the processor; wherein the processor allocates other instructions besides loads and stores beyond the actual physical size limitation of the load/store queue; and wherein the other instructions can be dispatched and executed even though intervening loads or stores do not have spaces in the load store queue.

Type: Grant

Filed: December 11, 2014

Date of Patent: May 8, 2018

Assignee: Intel Corporation

Inventor: Mohammad A. Abdallah
Method and system for implementing recovery from speculative forwarding miss-predictions/errors resulting from load store reordering and optimization

Patent number: 9928121

Abstract: A method for forwarding data from the store instructions to a corresponding load instruction in an out of order processor. The method includes accessing an incoming sequence of instructions; reordering the instructions in accordance with processor resources for dispatch and execution; ensuring a closest earlier store in machine order for to a corresponding load, by determining if said store has an actual age but said corresponding load does not have an actual age, then said store is earlier than said corresponding load; if said corresponding load has an actual age but said store does not have an actual age, then said corresponding load is earlier than said store; if neither said corresponding load or said store have an actual age, then a virtual identifier table is used to determine which is earlier; and if both said corresponding load and said store have actual ages, then the actual ages are used to determine which is earlier.

Type: Grant

Filed: December 11, 2014

Date of Patent: March 27, 2018

Assignee: Intel Corporation

Inventor: Mohammad Abdallah
Instruction and logic for characterization of data access

Patent number: 9910669

Abstract: A processor includes a front end to receive an instruction, a decoder to decode the instruction, a core to execute the first instruction, and a retirement unit to retire the first instruction. The core includes logic to execute the first instruction, including logic to repeatedly record a translation lookaside buffer (TLB) until a designated number of records are determined, and flush the TLB after a flush interval.

Type: Grant

Filed: June 26, 2015

Date of Patent: March 6, 2018

Assignee: Intel Corporation

Inventors: Kshitij A. Doshi, Christopher J. Hughes
Virtual load store queue having a dynamic dispatch window with a distributed structure

Patent number: 9904552

Abstract: An out of order processor. The processor includes a distributed load queue and a distributed store queue that maintain single program sequential semantics while allowing an out of order dispatch of loads and stores across a plurality of cores and memory fragments; wherein the processor allocates other instructions besides loads and stores beyond the actual physical size limitation of the load/store queue; and wherein the other instructions can be dispatched and executed even though intervening loads or stores do not have spaces in the load store queue.

Type: Grant

Filed: December 3, 2014

Date of Patent: February 27, 2018

Assignee: Intel Corporation

Inventor: Mohammad Abdallah
Framework to provide time bound execution of co-processor commands

Patent number: 9898301

Abstract: When a main processor issues a command to co-processor, a timeout value is included in the command. As the co-processor attempts to execute the command, it is determined whether the attempt is taking time beyond what is permitted by the timeout value. If the timeout is exceeded then responsive action is taken, such as the generation of a command timeout type failure message. The receipt of the command with the timeout value, and the consequent determination of a timeout condition for the command, may be determined by: the co-processor that receives the command, or a watchdog timer that is separate from the co-processor. Also, detection of co-processor hang and/or hung co-processor conditions during the time that a co-processor is executing a command for the main processor.

Type: Grant

Filed: June 20, 2014

Date of Patent: February 20, 2018

Assignee: International Business Machines Corporation

Inventors: Nitin Gupta, Mehulkumar J. Patel, Deepak C. Shetty
Dynamic thread sharing in branch prediction structures

Patent number: 9898299

Abstract: Embodiments relate to multithreaded branch prediction. An aspect includes a system for dynamically evaluating how to share entries of a multithreaded branch prediction structure. The system includes a first-level branch target buffer coupled to a processor circuit. The processor circuit is configured to perform a method. The method includes receiving a search request to locate branch prediction information associated with the search request, and searching for an entry corresponding to the search request in the first-level branch prediction structure. The entry is not allowed based on a thread state of the entry indicating that the entry has caused a problem on a thread associated with the thread state.

Type: Grant

Filed: August 6, 2015

Date of Patent: February 20, 2018

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: James J. Bonanno, Daniel Lipetz, Brian R. Prasky, Anthony Saporito
Inter-core communication via uncore RAM

Patent number: 9891927

Abstract: A microprocessor includes a plurality of processing cores and an uncore random access memory (RAM) readable and writable by each of the plurality of processing cores. Each core of the plurality of processing cores comprises microcode run by the core that implements architectural instructions of an instruction set architecture of the microprocessor. The microcode is configured to both read and write the uncore RAM to accomplish inter-core communication between the plurality of processing cores.

Type: Grant

Filed: May 19, 2014

Date of Patent: February 13, 2018

Assignee: VIA TECHNOLOGIES, INC.

Inventors: G. Glenn Henry, Terry Parks, Rodney E. Hooker, Stephan Gaskins
Method for implementing a reduced size register view data structure in a microprocessor

Patent number: 9891924

Abstract: A method for implementing a reduced size register view data structure in a microprocessor. The method includes receiving an incoming instruction sequence using a global front end; grouping the instructions to form instruction blocks; using a plurality of multiplexers to access ports of a scheduling array to store the instruction blocks as a series of chunks.

Type: Grant

Filed: March 14, 2014

Date of Patent: February 13, 2018

Assignee: Intel Corporation

Inventor: Mohammad A. Abdallah
Fractional use of prediction history storage for operating system routines

Patent number: 9891918

Abstract: A microprocessor includes a predicting unit having storage for holding a prediction history of characteristics of instructions previously executed by the microprocessor. The predicting unit accumulates the prediction history and uses the prediction history to make predictions related to subsequent instruction executions. The storage comprises a plurality of portions separately controllable for accumulating the prediction history. The microprocessor also includes a control unit that detects the microprocessor is running an operating system routine and controls the predicting unit to use only a fraction of the plurality of portions of the storage to accumulate the prediction history while the microprocessor is running the operating system routine.

Type: Grant

Filed: January 26, 2015

Date of Patent: February 13, 2018

Assignee: VIA ALLIANCE SEMICONDUCTOR CO., LTD.

Inventors: Rodney E. Hooker, Terry Parks, John D. Bunda
Propagation of updates to per-core-instantiated architecturally-visible storage resource

Patent number: 9891928

Abstract: A microprocessor a plurality of processing cores, wherein each of the plurality of processing cores instantiates a respective architecturally-visible storage resource. A first core of the plurality of processing cores is configured to encounter an architectural instruction that instructs the first core to update the respective architecturally-visible storage resource of the first core with a value specified by the architectural instruction. The first core is further configured to, in response to encountering the architectural instruction, provide the value to each of the other of the plurality of processing cores and update the respective architecturally-visible storage resource of the first core with the value. Each core of the plurality of processing cores other than the first core is configured to update the respective architecturally-visible storage resource of the core with the value provided by the first core without encountering the architectural instruction.

Type: Grant

Filed: August 9, 2016

Date of Patent: February 13, 2018

Assignee: VIA TECHNOLOGIES, INC.

Inventors: G. Glenn Henry, Stephan Gaskins
Scalable event handling in multi-threaded processor cores

Patent number: 9886396

Abstract: In one embodiment, a processor includes a frontend unit having an instruction decoder to receive and to decode instructions of a plurality of threads, an execution unit coupled to the instruction decoder to receive and execute the decoded instructions, and an instruction retirement unit having a retirement logic to receive the instructions from the execution unit and to retire the instructions associated with one or more of the threads that have an instruction or an event pending to be retired. The instruction retirement unit includes a thread arbitration logic to select one of the threads at a time and to dispatch the selected thread to the retirement logic for retirement processing.

Type: Grant

Filed: December 23, 2014

Date of Patent: February 6, 2018

Assignee: Intel Corporation

Inventors: Roger Gramunt, Rammohan Padmanabhan, Ramon Matas, Neal S. Moyer, Benjamin C. Chaffin, Avinash Sodani, Alexey P. Suprun, Vikram S. Sundaram, Chung-Lun Chan, Gerardo A. Fernandez, Julio Gago, Michael S. Yang, Aditya Kesiraju
Method for populating and instruction view data structure by using register template snapshots

Patent number: 9886279

Abstract: A method for populating an instruction view data structure by using register template snapshots. The method includes receiving an incoming instruction sequence using a global front end; grouping the instructions to form instruction blocks; using a plurality of register templates to track instruction destinations and instruction sources by populating the register template with block numbers corresponding to the instruction blocks, wherein the block numbers corresponding to the instruction blocks indicate interdependencies among the blocks of instructions; populating and instruction view data structure, wherein the instruction view data structure stores instructions corresponding to the instruction blocks as recorded by the plurality of register templates; and using the instruction view data structure to feed a plurality of stacked execution units of execution stage in accordance with the readiness of instruction sources of the instruction blocks.

Type: Grant

Filed: March 14, 2014

Date of Patent: February 6, 2018

Assignee: Intel Corporation

Inventor: Mohammad Abdallah
Accessing data in multi-dimensional tensors

Patent number: 9875100

Abstract: Methods, systems, and apparatus, including an apparatus for processing an instruction for accessing a N-dimensional tensor, the apparatus including multiple tensor index elements and multiple dimension multiplier elements, where each of the dimension multiplier elements has a corresponding tensor index element. The apparatus includes one or more processors configured to obtain an instruction to access a particular element of a N-dimensional tensor, where the N-dimensional tensor has multiple elements arranged across each of the N dimensions, and where N is an integer that is equal to or greater than one; determine, using one or more tensor index elements of the multiple tensor index elements and one or more dimension multiplier elements of the multiple dimension multiplier elements, an address of the particular element; and output data indicating the determined address for accessing the particular element of the N-dimensional tensor.

Type: Grant

Filed: March 13, 2017

Date of Patent: January 23, 2018

Assignee: Google LLC

Inventors: Dong Hyuk Woo, Andrew Everett Phelps
Hardware processors and methods for tightly-coupled heterogeneous computing

Patent number: 9870339

Abstract: Methods and apparatuses relating to tightly-coupled heterogeneous computing are described. In one embodiment, a hardware processor includes a plurality of execution units in parallel, a switch to connect inputs of the plurality of execution units to outputs of a first buffer and a plurality of memory banks and connect inputs of the plurality of memory banks and a plurality of second buffers in parallel to outputs of the first buffer, the plurality of memory banks, and the plurality of execution units, and an offload engine with inputs connected to outputs of the plurality of second buffers.

Type: Grant

Filed: June 26, 2015

Date of Patent: January 16, 2018

Assignee: Intel Corporation

Inventors: Chang Yong Kang, Pierre Laurent, Hari K. Tadepalli, Prasad M. Ghatigar, T. J. O'Dwyer, Serge Zhilyaev
Independent mapping of threads

Patent number: 9870229

Abstract: Embodiments of the present invention provide systems and methods for mapping the architected state of one or more threads to a set of distributed physical register files to enable independent execution of one or more threads in a multiple slice processor. In one embodiment, a system is disclosed including a plurality of dispatch queues which receive instructions from one or more threads and an even number of parallel execution slices, each parallel execution slice containing a register file. A routing network directs an output from the dispatch queues to the parallel execution slices and the parallel execution slices independently execute the one or more threads.

Type: Grant

Filed: September 29, 2015

Date of Patent: January 16, 2018

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Sam G. Chu, Markus Kaltenbach, Hung Q. Le, Jentje Leenstra, Jose E. Moreira, Dung Q. Nguyen, Brian W. Thompto
Non-default instruction handling within transaction

Patent number: 9858074

Abstract: Embodiments relate to non-default instruction handling within a transaction. An aspect includes entering a transaction, the transaction comprising a first plurality of instructions and a second plurality of instructions, wherein a default manner of handling of instructions in the transaction is one of atomic and non-atomic. Another aspect includes encountering a non-default specification instruction in the transaction, wherein the non-default specification instruction comprises a single instruction that specifies the second plurality of instructions of the transaction for handling in a non-default manner comprising one of atomic and non-atomic, wherein the non-default manner is different from the default manner. Another aspect includes handling the first plurality of instructions in the default manner. Yet another aspect includes handling the second plurality of instructions in the non-default manner.

Type: Grant

Filed: June 26, 2015

Date of Patent: January 2, 2018

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Jonathan D. Bradbury, Michael K. Gschwind, Maged M. Michael, Eric M. Schwarz, Valentina Salapura, Chung-Lung K. Shum
Method and apparatus for performing reduction operations on a set of vector elements

Patent number: 9851970

Abstract: An apparatus and method are described for performing SIMD reduction operations. For example, one embodiment of a processor comprises: a value vector register containing a plurality of data element values to be reduced; an index vector register to store a plurality of index values indicating which values in the value vector register are associated with one another; single instruction multiple data (SIMD) reduction logic to perform reduction operations on the data element values within the value vector register by combining data element values from the value vector register which are associated with one another as indicated by the index values in the index vector register; and an accumulation vector register to store results of the reduction operations generated by the SIMD reduction logic.

Type: Grant

Filed: December 23, 2014

Date of Patent: December 26, 2017

Assignee: INTEL CORPORATION

Inventors: David M. Kunzman, Christopher J. Hughes
Method and apparatus for vector index load and store

Patent number: 9830151

Abstract: An apparatus and method for performing vector index loads and stores. For example, one embodiment of a processor comprises: a vector index register to store a plurality of index values; a mask register to store a plurality of mask bits; a vector register to store a plurality of vector data elements loaded from memory; and vector index load logic to identify an index stored in the vector index register to be used for a load operation using an immediate value and to responsively combine the index with a base memory address to determine a memory address for the load operation, the vector index load logic to load vector data elements from the memory address to the vector register in accordance with the plurality of mask bits.

Type: Grant

Filed: December 23, 2014

Date of Patent: November 28, 2017

Assignee: INTEL CORPORATION

Inventors: Ashish Jha, Robert Valentine, Elmoustapha Ould-Ahmed-Vall
Method for a delayed branch implementation by using a front end track table

Patent number: 9817666

Abstract: A method for a delayed branch implementation by using a front end track table. The method includes receiving an incoming instruction sequence using a global front end, wherein the instruction sequence includes at least one branch, creating a delayed branch in response to receiving the one branch, and using a front end track table to track both the delayed branch the one branch.

Type: Grant

Filed: March 17, 2014

Date of Patent: November 14, 2017

Assignee: Intel Corporation

Inventor: Mohammad Abdallah

1 2 3 4 5 … next