Patents Examined by Daniel Pan
-
Patent number: 10048964Abstract: In a processor, a disambiguation-free out of order load store queue method. The method includes implementing a memory resource that can be accessed by a plurality of asynchronous cores; implementing a store retirement buffer, wherein stores from a store queue have entries in the store retirement buffer in original program order; and upon dispatch of a subsequent load from a load queue, searching the store retirement buffer for address matching. The method further includes in cases where there are a plurality of address matches, locating a correct forwarding entry by scanning for the store retirement buffer for a first match; and forwarding data from the first match to the subsequent load.Type: GrantFiled: December 12, 2014Date of Patent: August 14, 2018Assignee: Intel CorporationInventor: Mohammad A. Abdallah
-
Patent number: 9965277Abstract: An out of order processor. The processor includes a virtual load store queue for allocating a plurality of loads and a plurality of stores, wherein more loads and more stores can be accommodated beyond an actual physical size of the load store queue of the processor; wherein the processor allocates other instructions besides loads and stores beyond the actual physical size limitation of the load/store queue; and wherein the other instructions can be dispatched and executed even though intervening loads or stores do not have spaces in the load store queue.Type: GrantFiled: December 11, 2014Date of Patent: May 8, 2018Assignee: Intel CorporationInventor: Mohammad A. Abdallah
-
Patent number: 9928121Abstract: A method for forwarding data from the store instructions to a corresponding load instruction in an out of order processor. The method includes accessing an incoming sequence of instructions; reordering the instructions in accordance with processor resources for dispatch and execution; ensuring a closest earlier store in machine order for to a corresponding load, by determining if said store has an actual age but said corresponding load does not have an actual age, then said store is earlier than said corresponding load; if said corresponding load has an actual age but said store does not have an actual age, then said corresponding load is earlier than said store; if neither said corresponding load or said store have an actual age, then a virtual identifier table is used to determine which is earlier; and if both said corresponding load and said store have actual ages, then the actual ages are used to determine which is earlier.Type: GrantFiled: December 11, 2014Date of Patent: March 27, 2018Assignee: Intel CorporationInventor: Mohammad Abdallah
-
Patent number: 9910669Abstract: A processor includes a front end to receive an instruction, a decoder to decode the instruction, a core to execute the first instruction, and a retirement unit to retire the first instruction. The core includes logic to execute the first instruction, including logic to repeatedly record a translation lookaside buffer (TLB) until a designated number of records are determined, and flush the TLB after a flush interval.Type: GrantFiled: June 26, 2015Date of Patent: March 6, 2018Assignee: Intel CorporationInventors: Kshitij A. Doshi, Christopher J. Hughes
-
Patent number: 9904552Abstract: An out of order processor. The processor includes a distributed load queue and a distributed store queue that maintain single program sequential semantics while allowing an out of order dispatch of loads and stores across a plurality of cores and memory fragments; wherein the processor allocates other instructions besides loads and stores beyond the actual physical size limitation of the load/store queue; and wherein the other instructions can be dispatched and executed even though intervening loads or stores do not have spaces in the load store queue.Type: GrantFiled: December 3, 2014Date of Patent: February 27, 2018Assignee: Intel CorporationInventor: Mohammad Abdallah
-
Patent number: 9898301Abstract: When a main processor issues a command to co-processor, a timeout value is included in the command. As the co-processor attempts to execute the command, it is determined whether the attempt is taking time beyond what is permitted by the timeout value. If the timeout is exceeded then responsive action is taken, such as the generation of a command timeout type failure message. The receipt of the command with the timeout value, and the consequent determination of a timeout condition for the command, may be determined by: the co-processor that receives the command, or a watchdog timer that is separate from the co-processor. Also, detection of co-processor hang and/or hung co-processor conditions during the time that a co-processor is executing a command for the main processor.Type: GrantFiled: June 20, 2014Date of Patent: February 20, 2018Assignee: International Business Machines CorporationInventors: Nitin Gupta, Mehulkumar J. Patel, Deepak C. Shetty
-
Patent number: 9898299Abstract: Embodiments relate to multithreaded branch prediction. An aspect includes a system for dynamically evaluating how to share entries of a multithreaded branch prediction structure. The system includes a first-level branch target buffer coupled to a processor circuit. The processor circuit is configured to perform a method. The method includes receiving a search request to locate branch prediction information associated with the search request, and searching for an entry corresponding to the search request in the first-level branch prediction structure. The entry is not allowed based on a thread state of the entry indicating that the entry has caused a problem on a thread associated with the thread state.Type: GrantFiled: August 6, 2015Date of Patent: February 20, 2018Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: James J. Bonanno, Daniel Lipetz, Brian R. Prasky, Anthony Saporito
-
Patent number: 9891927Abstract: A microprocessor includes a plurality of processing cores and an uncore random access memory (RAM) readable and writable by each of the plurality of processing cores. Each core of the plurality of processing cores comprises microcode run by the core that implements architectural instructions of an instruction set architecture of the microprocessor. The microcode is configured to both read and write the uncore RAM to accomplish inter-core communication between the plurality of processing cores.Type: GrantFiled: May 19, 2014Date of Patent: February 13, 2018Assignee: VIA TECHNOLOGIES, INC.Inventors: G. Glenn Henry, Terry Parks, Rodney E. Hooker, Stephan Gaskins
-
Patent number: 9891924Abstract: A method for implementing a reduced size register view data structure in a microprocessor. The method includes receiving an incoming instruction sequence using a global front end; grouping the instructions to form instruction blocks; using a plurality of multiplexers to access ports of a scheduling array to store the instruction blocks as a series of chunks.Type: GrantFiled: March 14, 2014Date of Patent: February 13, 2018Assignee: Intel CorporationInventor: Mohammad A. Abdallah
-
Patent number: 9891918Abstract: A microprocessor includes a predicting unit having storage for holding a prediction history of characteristics of instructions previously executed by the microprocessor. The predicting unit accumulates the prediction history and uses the prediction history to make predictions related to subsequent instruction executions. The storage comprises a plurality of portions separately controllable for accumulating the prediction history. The microprocessor also includes a control unit that detects the microprocessor is running an operating system routine and controls the predicting unit to use only a fraction of the plurality of portions of the storage to accumulate the prediction history while the microprocessor is running the operating system routine.Type: GrantFiled: January 26, 2015Date of Patent: February 13, 2018Assignee: VIA ALLIANCE SEMICONDUCTOR CO., LTD.Inventors: Rodney E. Hooker, Terry Parks, John D. Bunda
-
Patent number: 9891928Abstract: A microprocessor a plurality of processing cores, wherein each of the plurality of processing cores instantiates a respective architecturally-visible storage resource. A first core of the plurality of processing cores is configured to encounter an architectural instruction that instructs the first core to update the respective architecturally-visible storage resource of the first core with a value specified by the architectural instruction. The first core is further configured to, in response to encountering the architectural instruction, provide the value to each of the other of the plurality of processing cores and update the respective architecturally-visible storage resource of the first core with the value. Each core of the plurality of processing cores other than the first core is configured to update the respective architecturally-visible storage resource of the core with the value provided by the first core without encountering the architectural instruction.Type: GrantFiled: August 9, 2016Date of Patent: February 13, 2018Assignee: VIA TECHNOLOGIES, INC.Inventors: G. Glenn Henry, Stephan Gaskins
-
Patent number: 9886396Abstract: In one embodiment, a processor includes a frontend unit having an instruction decoder to receive and to decode instructions of a plurality of threads, an execution unit coupled to the instruction decoder to receive and execute the decoded instructions, and an instruction retirement unit having a retirement logic to receive the instructions from the execution unit and to retire the instructions associated with one or more of the threads that have an instruction or an event pending to be retired. The instruction retirement unit includes a thread arbitration logic to select one of the threads at a time and to dispatch the selected thread to the retirement logic for retirement processing.Type: GrantFiled: December 23, 2014Date of Patent: February 6, 2018Assignee: Intel CorporationInventors: Roger Gramunt, Rammohan Padmanabhan, Ramon Matas, Neal S. Moyer, Benjamin C. Chaffin, Avinash Sodani, Alexey P. Suprun, Vikram S. Sundaram, Chung-Lun Chan, Gerardo A. Fernandez, Julio Gago, Michael S. Yang, Aditya Kesiraju
-
Patent number: 9886279Abstract: A method for populating an instruction view data structure by using register template snapshots. The method includes receiving an incoming instruction sequence using a global front end; grouping the instructions to form instruction blocks; using a plurality of register templates to track instruction destinations and instruction sources by populating the register template with block numbers corresponding to the instruction blocks, wherein the block numbers corresponding to the instruction blocks indicate interdependencies among the blocks of instructions; populating and instruction view data structure, wherein the instruction view data structure stores instructions corresponding to the instruction blocks as recorded by the plurality of register templates; and using the instruction view data structure to feed a plurality of stacked execution units of execution stage in accordance with the readiness of instruction sources of the instruction blocks.Type: GrantFiled: March 14, 2014Date of Patent: February 6, 2018Assignee: Intel CorporationInventor: Mohammad Abdallah
-
Patent number: 9875100Abstract: Methods, systems, and apparatus, including an apparatus for processing an instruction for accessing a N-dimensional tensor, the apparatus including multiple tensor index elements and multiple dimension multiplier elements, where each of the dimension multiplier elements has a corresponding tensor index element. The apparatus includes one or more processors configured to obtain an instruction to access a particular element of a N-dimensional tensor, where the N-dimensional tensor has multiple elements arranged across each of the N dimensions, and where N is an integer that is equal to or greater than one; determine, using one or more tensor index elements of the multiple tensor index elements and one or more dimension multiplier elements of the multiple dimension multiplier elements, an address of the particular element; and output data indicating the determined address for accessing the particular element of the N-dimensional tensor.Type: GrantFiled: March 13, 2017Date of Patent: January 23, 2018Assignee: Google LLCInventors: Dong Hyuk Woo, Andrew Everett Phelps
-
Patent number: 9870339Abstract: Methods and apparatuses relating to tightly-coupled heterogeneous computing are described. In one embodiment, a hardware processor includes a plurality of execution units in parallel, a switch to connect inputs of the plurality of execution units to outputs of a first buffer and a plurality of memory banks and connect inputs of the plurality of memory banks and a plurality of second buffers in parallel to outputs of the first buffer, the plurality of memory banks, and the plurality of execution units, and an offload engine with inputs connected to outputs of the plurality of second buffers.Type: GrantFiled: June 26, 2015Date of Patent: January 16, 2018Assignee: Intel CorporationInventors: Chang Yong Kang, Pierre Laurent, Hari K. Tadepalli, Prasad M. Ghatigar, T. J. O'Dwyer, Serge Zhilyaev
-
Patent number: 9870229Abstract: Embodiments of the present invention provide systems and methods for mapping the architected state of one or more threads to a set of distributed physical register files to enable independent execution of one or more threads in a multiple slice processor. In one embodiment, a system is disclosed including a plurality of dispatch queues which receive instructions from one or more threads and an even number of parallel execution slices, each parallel execution slice containing a register file. A routing network directs an output from the dispatch queues to the parallel execution slices and the parallel execution slices independently execute the one or more threads.Type: GrantFiled: September 29, 2015Date of Patent: January 16, 2018Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Sam G. Chu, Markus Kaltenbach, Hung Q. Le, Jentje Leenstra, Jose E. Moreira, Dung Q. Nguyen, Brian W. Thompto
-
Patent number: 9858074Abstract: Embodiments relate to non-default instruction handling within a transaction. An aspect includes entering a transaction, the transaction comprising a first plurality of instructions and a second plurality of instructions, wherein a default manner of handling of instructions in the transaction is one of atomic and non-atomic. Another aspect includes encountering a non-default specification instruction in the transaction, wherein the non-default specification instruction comprises a single instruction that specifies the second plurality of instructions of the transaction for handling in a non-default manner comprising one of atomic and non-atomic, wherein the non-default manner is different from the default manner. Another aspect includes handling the first plurality of instructions in the default manner. Yet another aspect includes handling the second plurality of instructions in the non-default manner.Type: GrantFiled: June 26, 2015Date of Patent: January 2, 2018Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Jonathan D. Bradbury, Michael K. Gschwind, Maged M. Michael, Eric M. Schwarz, Valentina Salapura, Chung-Lung K. Shum
-
Patent number: 9851970Abstract: An apparatus and method are described for performing SIMD reduction operations. For example, one embodiment of a processor comprises: a value vector register containing a plurality of data element values to be reduced; an index vector register to store a plurality of index values indicating which values in the value vector register are associated with one another; single instruction multiple data (SIMD) reduction logic to perform reduction operations on the data element values within the value vector register by combining data element values from the value vector register which are associated with one another as indicated by the index values in the index vector register; and an accumulation vector register to store results of the reduction operations generated by the SIMD reduction logic.Type: GrantFiled: December 23, 2014Date of Patent: December 26, 2017Assignee: INTEL CORPORATIONInventors: David M. Kunzman, Christopher J. Hughes
-
Patent number: 9830151Abstract: An apparatus and method for performing vector index loads and stores. For example, one embodiment of a processor comprises: a vector index register to store a plurality of index values; a mask register to store a plurality of mask bits; a vector register to store a plurality of vector data elements loaded from memory; and vector index load logic to identify an index stored in the vector index register to be used for a load operation using an immediate value and to responsively combine the index with a base memory address to determine a memory address for the load operation, the vector index load logic to load vector data elements from the memory address to the vector register in accordance with the plurality of mask bits.Type: GrantFiled: December 23, 2014Date of Patent: November 28, 2017Assignee: INTEL CORPORATIONInventors: Ashish Jha, Robert Valentine, Elmoustapha Ould-Ahmed-Vall
-
Patent number: 9817666Abstract: A method for a delayed branch implementation by using a front end track table. The method includes receiving an incoming instruction sequence using a global front end, wherein the instruction sequence includes at least one branch, creating a delayed branch in response to receiving the one branch, and using a front end track table to track both the delayed branch the one branch.Type: GrantFiled: March 17, 2014Date of Patent: November 14, 2017Assignee: Intel CorporationInventor: Mohammad Abdallah