Dynamic Instruction Dependency Checking, Monitoring Or Conflict Resolution Patents (Class 712/216)
  • Patent number: 10019263
    Abstract: In a processor, a disambiguation-free out of order load store queue method. The method includes implementing a memory resource that can be accessed by a plurality of asynchronous cores; implementing a store retirement buffer, wherein stores from a store queue have entries in the store retirement buffer in original program order; and implementing speculative execution, wherein results of speculative execution can be saved in the store retirement/reorder buffer as a speculative state. The method further includes, upon dispatch of a subsequent load from a load queue, searching the store retirement buffer for address matching; and, in cases where there are a plurality of address matches, locating a correct forwarding entry by scanning for the store retirement buffer for a first match, and forwarding data from the first match to the subsequent load. Once speculative outcomes are known, the speculative state is retired to memory.
    Type: Grant
    Filed: December 12, 2014
    Date of Patent: July 10, 2018
    Assignee: Intel Corporation
    Inventor: Mohammad A. Abdallah
  • Patent number: 9996356
    Abstract: Apparatus and method for detecting and recovering from incorrect memory dependence speculation in an out-of-order processor are described herein. For example, one embodiment of a method comprises: executing a first load instruction; detecting when the first load instruction experiences a bad store-to-load forwarding event during execution; tracking the occurrences of bad store-to-load forwarding event experienced by the first load instruction during execution; controlling enablement of an S-bit in the first load instruction based on the tracked occurrences; generating a plurality of load operations responsive to an enabled S-bit in first load instruction, wherein execution of the plurality of load operations produces a result equivalent to that from the execution of the first load instruction.
    Type: Grant
    Filed: December 26, 2015
    Date of Patent: June 12, 2018
    Assignee: Intel Corporation
    Inventors: Vineeth Mekkat, Oleg Margulis, Jason M. Agron, Ethan Schuchman, Sebastian Winkel, Youfeng Wu, Gisle Dankel
  • Patent number: 9965320
    Abstract: A processor is described comprising memory access conflict detection circuitry to identify a conflict pertaining to a transaction being executed by a thread that believes it has locked information within a memory. The processor also includes logging circuitry to construct and report out a packet if the memory access conflict detection circuitry identifies a conflict that causes the transaction to be aborted.
    Type: Grant
    Filed: December 27, 2013
    Date of Patent: May 8, 2018
    Assignee: INTEL CORPORATION
    Inventors: Rolf Kassa, Justin E. Gottschlich, Shiliang Hu, Gilles A. Pokam, Robert C. Knauerhase
  • Patent number: 9959509
    Abstract: Changing a business process model involves several aspects: (1) given a set of change operations, dependencies and conflicts are encoded in dependency and conflict matrices; (2) given a change sequence for a process model M, the change sequence is broken up into subsequences such that operations from different subsequences are independent; (3) given a change sequence for a process model V1 and another change sequence for a process model V2, conflicts between operations in the different change sequences are determined; (4) the process structure tree can be used to localize dependency computations, yielding a more efficient approach to determining dependencies; and (5) the process structure tree can be used to localize conflict computations, yielding a more efficient approach to determining conflicts.
    Type: Grant
    Filed: November 26, 2008
    Date of Patent: May 1, 2018
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Jochen M. Kuester, Christian Gerth
  • Patent number: 9952900
    Abstract: A scheduler allocating a task to a socket, where the socket comprises a plurality of processor cores and a micro code engine. The scheduler receives metrics from the micro code engine, where the metrics are calculated by the micro code engine based on data receive from an event counter located on each of the plurality of processor cores. The scheduler determines whether a socket level load is below a socket threshold. Based on determining that the socket level load is below the socket threshold value, the scheduler determines whether a core level load is below a core threshold value. Based on determining that the core level load is below the core threshold value, the scheduler determines whether there is an available thread and based on determining that there is an available thread, the scheduler assigns the task to the available thread.
    Type: Grant
    Filed: December 1, 2016
    Date of Patent: April 24, 2018
    Assignee: International Business Machines Corporation
    Inventors: Pradipta K. Banerjee, Aneesh K. Kizhake Veetil, Dipankar Sarma, Vaidyanathan Srinivasan
  • Patent number: 9946809
    Abstract: The present invention addresses deficiencies of the art with respect to collaborative computer networks consisting of mixed data, control functions, analysis functions, and sensors in complex systems of systems. The method involves a database framework for representing complex heterogeneous characteristics of processes, systems, and systems of systems that feature many to many interrelationships. The homoiconic graph framework takes the form of an executable graph database, which is often faster for associative data sets, and maps more directly to object-oriented computer applications for large-scale operations. The invention provides a method to execute the graph database, in that it comprises nodes that are both data fragments and executable components. It is characterized as a homoiconic or executable graph framework to distinguish this unique feature from the concept of a graph database, which generally is a repository of connected data only.
    Type: Grant
    Filed: April 9, 2015
    Date of Patent: April 17, 2018
    Assignee: Introspective Systems LLC
    Inventors: Caryl Erin Johnson, Kay E Aikin, David S Phelps, Alexander Johnson, Nathaniel Rindlaub
  • Patent number: 9921813
    Abstract: A compiler 134 for compiling a first computer program 110 written in a first computer programming language into a second computer program written in a machine language, the compiler comprises a code generator to generate the second computer program by generating tables 142 and machine language code 144, the generated tables and the generated machine language code together forming the second computer program, wherein the generated machine language code references the tables and the generated machine language code does not contain arithmetic or logic machine instructions, the tables comprising pre-computed results of arithmetic and/or logic machine instructions.
    Type: Grant
    Filed: October 30, 2013
    Date of Patent: March 20, 2018
    Assignee: KONINKLIJKE PHILIPS N.V.
    Inventors: Paulus Mathias Hubertus Mechtildis Antonius Gorissen, Ludovicus Marinus Gerardus Maria Tolhuizen, Mina Deng, Wilhelmus Petrus Adrianus Johannus Michiels, Wicher I. Gispen, Constant Paul Marie Jozef Baggen
  • Patent number: 9921873
    Abstract: A technique for controlling the distribution of compute task processing in a multi-threaded system encodes each processing task as task metadata (TMD) stored in memory. The TMD includes work distribution parameters specifying how the processing task should be distributed for processing. Scheduling circuitry selects a task for execution when entries of a work queue for the task have been written. The work distribution parameters may define a number of work queue entries needed before a cooperative thread array” (“CTA”) may be launched to process the work queue entries according to the compute task. The work distribution parameters may define a number of CTAs that are launched to process the same work queue entries. Finally, the work distribution parameters may define a step size that is used to update pointers to the work queue entries.
    Type: Grant
    Filed: January 31, 2012
    Date of Patent: March 20, 2018
    Assignee: NVIDIA Corporation
    Inventors: Lacky V. Shah, Karim M. Abdalla, Sean J. Treichler, Abraham B. de Waal
  • Patent number: 9915998
    Abstract: An apparatus includes a first reservation station and a second reservation station. The first reservation station dispatches a first load micro instruction, and detects and indicates on a hold bus if the first load micro instruction is a specified load micro instruction directed to retrieve an operand from a prescribed resource other than on-core cache memory. The second reservation station is coupled to the hold bus, and dispatches one or more younger micro instructions therein that depend on the first load micro instruction for execution after a first number of clock cycles following dispatch of the first load micro instruction, and if it is indicated on the hold bus that the first load micro instruction is the specified load micro instruction, the second reservation station is configured to stall dispatch of the one or more younger micro instructions until the first load micro instruction has retrieved the operand.
    Type: Grant
    Filed: November 24, 2015
    Date of Patent: March 13, 2018
    Assignee: VIA ALLIANCE SEMICONDUCTOR CO., LTD
    Inventors: Gerard M. Col, Colin Eddy, G. Glenn Henry
  • Patent number: 9910704
    Abstract: A scheduler allocating a task to a socket, where the socket comprises a plurality of processor cores and a micro code engine. The scheduler receives metrics from the micro code engine, where the metrics are calculated by the micro code engine based on data receive from an event counter located on each of the plurality of processor cores. The scheduler determines whether a socket level load is below a socket threshold. Based on determining that the socket level load is below the socket threshold value, the scheduler determines whether a core level load is below a core threshold value. Based on determining that the core level load is below the core threshold value, the scheduler determines whether there is an available thread and based on determining that there is an available thread, the scheduler assigns the task to the available thread.
    Type: Grant
    Filed: September 21, 2017
    Date of Patent: March 6, 2018
    Assignee: International Business Machines Corporation
    Inventors: Pradipta K. Banerjee, Aneesh K. Kizhake Veetil, Dipankar Sarma, Vaidyanathan Srinivasan
  • Patent number: 9886418
    Abstract: Described herein are methods, systems, and apparatuses to utilize a matrix operation by accessing each of the operation's matrix operands via a respective single memory handle. This use of a single memory handle for each matrix operand eliminates significant overhead in memory allocation, data tracking, and subroutine complexity present in prior art solutions. The result of the matrix operation can also be accessible via a single memory handle identifying the matrix elements of the result.
    Type: Grant
    Filed: April 28, 2015
    Date of Patent: February 6, 2018
    Assignee: Intel Corporation
    Inventors: Andrew Yang, Carey Kloss, Prashant Arora, Tony Werner, Naveen Gandham Rao, Amir Khosrowshahi
  • Patent number: 9886279
    Abstract: A method for populating an instruction view data structure by using register template snapshots. The method includes receiving an incoming instruction sequence using a global front end; grouping the instructions to form instruction blocks; using a plurality of register templates to track instruction destinations and instruction sources by populating the register template with block numbers corresponding to the instruction blocks, wherein the block numbers corresponding to the instruction blocks indicate interdependencies among the blocks of instructions; populating and instruction view data structure, wherein the instruction view data structure stores instructions corresponding to the instruction blocks as recorded by the plurality of register templates; and using the instruction view data structure to feed a plurality of stacked execution units of execution stage in accordance with the readiness of instruction sources of the instruction blocks.
    Type: Grant
    Filed: March 14, 2014
    Date of Patent: February 6, 2018
    Assignee: Intel Corporation
    Inventor: Mohammad Abdallah
  • Patent number: 9886377
    Abstract: Described herein are one or more integrated circuits (ICs) comprising controller circuitry to receive a command to execute an operation for data inputs stored in an external memory or a local memory, and convert the operation into a set of matrix operations to operate on sub-portions of the data inputs. The IC(s) further comprise at least one processing circuitry to execute the set of matrix operations, the processing circuitry to include ALUs, a local memory external to the ALUs and accessible by the ALUs, and processing control circuitry to create at least one matrix operand in the local memory (from the data inputs of the operation) comprising at least one of a scalar, a vector, or a 2D matrix, and provide memory handles corresponding to each of the matrix operands to one of the ALUs to access the respective matrix operands when executing a matrix operation.
    Type: Grant
    Filed: October 5, 2015
    Date of Patent: February 6, 2018
    Assignee: Intel Corporation
    Inventors: Tony Werner, Aravind Kalaiah, Andrew Yang, Carey Kloss, Horace Lau, Naveen Gandham Rao, Amir Khosrowshahi
  • Patent number: 9864641
    Abstract: A method for managing workloads in a multiprocessing computer system is disclosed. Initially, a set of affinity domains is defined for a group of processor cores, wherein each of the affinity domains includes a subset of the processor cores. An affinity measure is defined to indicate that a given workload should be moved to a smaller affinity domain having fewer processor cores. A performance measure is defined to indicate the performance of a given workload. A given workload is determined based on the affinity measure and the performance measure. In response to a determination that a given workload should be moved to a smaller affinity domain based on the affinity measure, the given workload is moved to a smaller affinity domain. In response to a determination that there is a reduction in performance based on the performance measure, the given workload is moved to a larger affinity domain.
    Type: Grant
    Filed: June 23, 2014
    Date of Patent: January 9, 2018
    Assignee: International Business Machines Corporation
    Inventors: Alexander Barraclough Brown, Gisle Mikal Nitter Dankel
  • Patent number: 9858077
    Abstract: Issuing instructions to execution pipelines based on register-associated preferences and related instruction processing circuits, systems, methods, and computer-readable media are disclosed. In one embodiment, an instruction is detected in an instruction stream. Upon determining that the instruction specifies at least one source register, an execution pipeline preference(s) is determined based on at least one pipeline indicator associated with the at least one source register in a pipeline issuance table, and the instruction is issued to an execution pipeline based on the execution pipeline preference(s). Upon a determination that the instruction specifies at least one target register, at least one pipeline indicator associated with the at least one target register in the pipeline issuance table is updated based on the execution pipeline to which the instruction is issued. In this manner, optimal forwarding of instructions may be facilitated, thus improving processor performance.
    Type: Grant
    Filed: January 15, 2013
    Date of Patent: January 2, 2018
    Assignee: QUALCOMM Incorporated
    Inventors: Melinda J. Brown, James N. Dieffenderfer, Michael W. Morrow, Brian M. Stempel, Michael S. McIlvaine
  • Patent number: 9851971
    Abstract: An instruction stream includes a transactional code region. The transactional code region includes a latent modification instruction (LMI), a next sequential instruction (NSI) following the LMI, and a set of target instructions following the NSI in program order. Each target instruction has an associated function, and the LMI at least partially specifies a substitute function for the associated function. A processor executes the LMI, the NSI, and at least one of the target instructions, employing the substitute function at least partially specified by the LMI. The LMI, the NSI, and the target instructions may be executed by the processor in sequential program order or out of order.
    Type: Grant
    Filed: January 11, 2017
    Date of Patent: December 26, 2017
    Assignee: International Business Machines Corporation
    Inventors: Michael Karl Gschwind, Valentina Salapura, Chung-Lung K. Shum, Timothy J. Slegel
  • Patent number: 9836215
    Abstract: A storage device may include a memory; and a controller. The controller may be configured to: maintain, for each respective type of command segment of a plurality of types of command segments, a respective timer of a plurality of timers that indicates an amount of time elapsed since the controller last output a command segment of the respective type of command segments to the memory; determine, based on the plurality of timers, whether a command segment of a particular type of command segment of the plurality of types of command segments may be output to the memory at a current time; and responsive to determining that the command segment of the particular type of command segment may be output at the current time, permit output of the command segment of the particular type of command segment to the memory at the current time.
    Type: Grant
    Filed: November 19, 2014
    Date of Patent: December 5, 2017
    Assignee: WESTERN DIGITAL TECHNOLOGIES, INC.
    Inventor: Brian T. Nowak
  • Patent number: 9830185
    Abstract: In a multi-processor transaction execution environment, a transaction executes a hint instruction indicating proximity to completion of the transaction. Pending aborts of the transaction due to memory conflicts are suppressed based on the proximity of the transaction to completion.
    Type: Grant
    Filed: August 21, 2015
    Date of Patent: November 28, 2017
    Assignee: International Business Machines Corporation
    Inventors: Jonathan D. Bradbury, Dan F. Greiner, Michael Karl Gschwind, Maged M. Michael, Chung-Lung K. Shum
  • Patent number: 9798375
    Abstract: In some embodiments, a system includes a plurality of processor cores and a credit distribution circuit. The credit distribution circuit is configured to provide credits to the processor cores. A quantity of the provided credits is based on a total credit budget and requests for additional credits corresponding to the processor cores. The total credit budget is based on an amount of energy available to the processor cores (e.g., made available by a power supply) during a particular window of time. A particular processor core is configured to determine, based on a remaining number of credits for the particular processor core, whether to perform one or more pipeline operations. The particular processor core is further configured to deduct, based on determining to perform the one or more pipeline operations, one or more credits from a remaining quantity of credits allocated to the particular processor core.
    Type: Grant
    Filed: January 5, 2016
    Date of Patent: October 24, 2017
    Assignee: Apple Inc.
    Inventor: Daniel U. Becker
  • Patent number: 9779696
    Abstract: Methods and systems may provide for identifying a plurality of subject commands that reference a common screen location and access a read/write resource, and serializing the plurality of subject commands according to a predefined order. Additionally, execution of the plurality of subject commands may be deferred until one or more additional commands referencing the common screen location are executed. In one example, the plurality of subject commands are serialized in response to a serialization command.
    Type: Grant
    Filed: December 11, 2013
    Date of Patent: October 3, 2017
    Assignee: Intel Corporation
    Inventors: Tomasz Janczak, Aaron Lefohn, Marco Salvi, Larry Seiler
  • Patent number: 9740271
    Abstract: An apparatus including first and second reservation stations. The first reservation station dispatches a load micro instruction, and indicates on a hold bus if the load micro instruction is a specified load micro instruction directed to retrieve an operand from a prescribed resource other than on-core cache memory, where the specified load instruction comprises a load instruction resulting from execution of an x86 special bus cycle.
    Type: Grant
    Filed: December 14, 2014
    Date of Patent: August 22, 2017
    Assignee: VIA ALLIANCE SEMICONDUCTOR CO., LTD.
    Inventors: Gerard M. Col, Colin Eddy, G. Glenn Henry
  • Patent number: 9710306
    Abstract: Systems and methods for auto-throttling encapsulated compute tasks. A device driver may configure a parallel processor to execute compute tasks in a number of discrete throttled modes. The device driver may also allocate memory to a plurality of different processing units in a non-throttled mode. The device driver may also allocate memory to a subset of the plurality of processing units in each of the throttling modes. Data structures defined for each task include a flag that instructs the processing unit whether the task may be executed in the non-throttled mode or in the throttled mode. A work distribution unit monitors each of the tasks scheduled to run on the plurality of processing units and determines whether the processor should be configured to run in the throttled mode or in the non-throttled mode.
    Type: Grant
    Filed: April 9, 2012
    Date of Patent: July 18, 2017
    Assignee: NVIDIA Corporation
    Inventors: Jerome F. Duluk, Jr., Jesse David Hall, Philip Alexander Cuadra, Karim M. Abdalla
  • Patent number: 9703359
    Abstract: An apparatus includes a first reservation station and a second reservation station. The first reservation station dispatches a first load micro instruction, and detects and indicates on a hold bus if the first load micro instruction is a specified load micro instruction directed to retrieve an operand from a prescribed resource other than on-core cache memory.
    Type: Grant
    Filed: December 14, 2014
    Date of Patent: July 11, 2017
    Assignee: VIA ALLIANCE SEMICONDUCTOR CO., LTD.
    Inventors: Gerard M. Col, Colin Eddy, G. Glenn Henry
  • Patent number: 9697002
    Abstract: An instruction set architecture (ISA) includes instructions for selectively indicating last-use architected operands having values that will not be accessed again, wherein architected operands are made active or inactive after an instruction specified last-use by an instruction, wherein the architected operands are made active by performing a write operation to an inactive operand, wherein the activation/deactivation may be performed by the instruction having the last-use of the operand or another (prefix) instruction.
    Type: Grant
    Filed: October 3, 2011
    Date of Patent: July 4, 2017
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Michael K Gschwind, Valentina Salapura
  • Patent number: 9697580
    Abstract: This disclosure describes an apparatus configured to process graphics data. The apparatus may include a fixed hardware pipeline configured to execute one or more functions on a current set of graphics data. The fixed hardware pipeline may include a plurality of stages including a bypassable portion of the plurality of stages. The apparatus may further include a shortcut circuit configured to route the current set of graphics data around the bypassable portion of the plurality of stages, and a controller positioned before the bypassable portion of the plurality of stages, the controller configured to selectively route the current set of graphics data to one of the shortcut circuit or the bypassable portion of the plurality of stages.
    Type: Grant
    Filed: November 10, 2014
    Date of Patent: July 4, 2017
    Assignee: QUALCOMM Incorporated
    Inventors: Liang Li, Andrew Evan Gruber, Guofang Jiao, Zhenyu Qi, Gregory Steve Pitarys, Scott William Nolan
  • Patent number: 9690582
    Abstract: A processor includes a decoder to decode an instruction, a scheduler to schedule the instruction, and an execution unit to execute the instruction. The instruction is to load a memory operation applicable to a quantity of addresses into an execution vector. The execution vector includes a plurality of vector positions for respective addressees. The instruction is further to evaluate, for a given address in the execution vector at a vector position, whether a cache indicates that a previous memory operation was performed at a higher vector position than the vector position of the given address. The instruction is also to determine, based on the evaluation whether the cache indicates that the previous memory operation was performed at a higher vector position than the vector position of the given address, whether the memory operation will cause a memory error.
    Type: Grant
    Filed: December 30, 2013
    Date of Patent: June 27, 2017
    Assignee: Intel Corporation
    Inventors: Nalini Vasudevan, Youfeng Wu, Cheng Wang, Sara Baghsorkhi, Albert Hartono
  • Patent number: 9665970
    Abstract: Aspects include, for example, a method for interpreting information in a computer program, or profiling such a program to estimate a group size for instances of that program (program module, or portion thereof). Such a method can be used in a system that supports collecting outputs of executing instances, where those outputs can specify new program instances. Scheduling of new instances (or allocation of resources for executing such instances) can be deferred. A trigger to begin scheduling (or allocation) for a collection of instances uses a target group size for that program. Thus, different programs can have different group sizes, which can be set explicitly, or based on profiling. The profiling can occur during one or more of pre-execution and during execution. The group size estimate can be an input into an algorithm that also accounts for system state during execution.
    Type: Grant
    Filed: February 8, 2012
    Date of Patent: May 30, 2017
    Assignee: Imagination Technologies Limited
    Inventors: Jason Rupert Redgrave, Steven John Clohset, James Alexander McCombe, Luke Tilman Peterson
  • Patent number: 9665375
    Abstract: Systems and methods for efficient thread arbitration in a threaded processor with dynamic resource allocation. A processor includes a resource shared by multiple threads. The resource includes an array with multiple entries, each of which may be allocated for use by any thread. Control logic detects a load miss to memory, wherein the miss is associated with a latency greater than a given threshold. The load instruction or an immediately younger instruction is selected for replay for an associated thread. A pipeline flush and replay for the associated thread begins with the selected instruction. Instructions younger than the load instruction are held at a given pipeline stage until the load instruction completes. During replay, this hold prevents resources from being allocated to the associated thread while the load instruction is being serviced.
    Type: Grant
    Filed: April 26, 2012
    Date of Patent: May 30, 2017
    Assignee: Oracle International Corporation
    Inventors: Yuan C. Chou, Robert T. Golla, Mark A. Luttrell
  • Patent number: 9632785
    Abstract: Techniques are disclosed relating to specification of instruction operands. In some embodiments, this may involve assigning operands to source inputs. In one embodiment, an instruction includes one or more mapping values, each of which corresponds to a source of the instruction and each of which specifies a location value. In this embodiment, the instruction includes one or more location values that are each usable to identify an operand for the instruction. In this embodiment, a method may include accessing operands using the location values and assigning accessed operands to sources using the mapping values. In one embodiment, the sources may correspond to inputs of an execution block. In one embodiment, a destination mapping value in the instruction may specify a location value that indicates a destination for storing an instruction result.
    Type: Grant
    Filed: August 10, 2016
    Date of Patent: April 25, 2017
    Assignee: Apple Inc.
    Inventors: James S. Blomgren, Terence M. Potter
  • Patent number: 9626189
    Abstract: Embodiments relate to reducing operand store compare penalties by detecting potential unit of operation (UOP) dependencies. An aspect includes a computer system for reducing operation store compare penalties. The system includes memory and a processor. The system performs a method including cracking an instruction into units of operation, where each UOP includes instruction text and address determination fields. The method includes identifying a load UOP among the plurality of UOPs and comparing values of the address determination fields of the load UOP with values of address determination fields of one or more previously-decoded store UOPs. The method also includes forcing, prior to issuance of the instruction to an execution unit, a dependency between the load UOP and the one or more previously-decoded store UOPs based on the comparing.
    Type: Grant
    Filed: June 15, 2012
    Date of Patent: April 18, 2017
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Fadi Busaba, David Hutton, John G. Rell, Jr., Chung-Lung K. Shum
  • Patent number: 9612836
    Abstract: A system, method, and computer program product are provided for implementing a software-based scoreboarding mechanism. The method includes the steps of receiving a dependency barrier instruction that includes an immediate value and an identifier corresponding to a first register and, based on a comparison of the immediate value to the value stored in the first register, dispatching a subsequent instruction to at least a first processing unit of two or more processing units.
    Type: Grant
    Filed: February 3, 2014
    Date of Patent: April 4, 2017
    Assignee: NVIDIA Corporation
    Inventors: Robert Ohannessian, Jr., Michael Alan Fetterman, Olivier Giroux, Jack H. Choquette, Xiaogang Qiu, Shirish Gadre, Meenaradchagan Vishnu
  • Patent number: 9606805
    Abstract: Technical solutions are described for dynamically managing an operand-store-compare (OSC) prediction table for load and store operations executed out-of-order. One general aspect includes a method that includes receiving a request to retire a queue entry corresponding to an instruction. The method also includes identifying an OSC prediction for the instruction based on an OSC prediction table entry, where the OSC prediction indicates if the instruction is predicted to hit an OSC hazard. The method also includes determining if the instruction hit the OSC hazard. The method also includes in response to the OSC prediction indicating that the instruction is predicted to hit the OSC hazard and the instruction not hitting the OSC hazard, invalidating the OSC prediction table entry corresponding to the instruction. The present document further describes examples of other aspects such as methods, computer products.
    Type: Grant
    Filed: October 19, 2015
    Date of Patent: March 28, 2017
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Khary J. Alexander, Jane H. Bartik, Jatin Bhartia, James J. Bonanno, Adam B. Collura, Jang-Soo Lee, James R. Mitchell, Anthony Saporito
  • Patent number: 9606800
    Abstract: A microprocessor includes a front end module and a schedule queue module. The front end module is configured to retrieve first instructions, corresponding to a first thread, from an instruction cache, and retrieve second instructions, corresponding to a second thread, from the instruction cache. The front end module is also configured to decode the first instructions into first decoded instructions, and decode the second instructions into second decoded instructions. The schedule queue module is configured to selectively store the first decoded instructions and the second decoded instructions from the front end module and, for each stored decoded instruction, selectively issue the stored decoded instruction to an execution module. The schedule queue is further configured to reject storing an additional one of the first decoded instructions from the front end module in response to a count of the stored first decoded instructions in the schedule queue module exceeding a threshold.
    Type: Grant
    Filed: March 15, 2013
    Date of Patent: March 28, 2017
    Assignee: Marvell International Ltd.
    Inventors: Tom Hameenanttila, R. Frank O'Bleness, Sujat Jamil, Joseph Delgross
  • Patent number: 9600286
    Abstract: An instruction stream includes a transactional code region. The transactional code region includes a latent modification instruction (LMI), a next sequential instruction (NSI) following the LMI, and a set of target instructions following the NSI in program order. Each target instruction has an associated function, and the LMI at least partially specifies a substitute function for the associated function. A processor executes the LMI, the NSI, and at least one of the target instructions, employing the substitute function at least partially specified by the LMI. The LMI, the NSI, and the target instructions may be executed by the processor in sequential program order or out of order.
    Type: Grant
    Filed: June 30, 2014
    Date of Patent: March 21, 2017
    Assignee: International Business Machines Corporation
    Inventors: Michael Karl Gschwind, Valentina Salapura, Chung-Lung K. Shum, Timothy J. Slegel
  • Patent number: 9600287
    Abstract: An instruction stream includes a transactional code region. The transactional code region includes a latent modification instruction (LMI), a next sequential instruction (NSI) following the LMI, and a set of target instructions following the NSI in program order. Each target instruction has an associated function, and the LMI at least partially specifies a substitute function for the associated function. A processor executes the LMI, the NSI, and at least one of the target instructions, employing the substitute function at least partially specified by the LMI. The LMI, the NSI, and the target instructions may be executed by the processor in sequential program order or out of order.
    Type: Grant
    Filed: August 19, 2015
    Date of Patent: March 21, 2017
    Assignee: International Business Machines Corporation
    Inventors: Michael Karl Gschwind, Valentina Salapura, Chung-Lung K. Shum, Timothy J. Slegel
  • Patent number: 9588769
    Abstract: A processor performs out-of-order execution of a first instruction and a second instruction after the first instruction in program order, the first instruction includes source and destination indicators, the source indicator specifies a source of data, the destination indicator specifies a destination of the data, the first instruction instructs the processor to move the data from the source to the destination, the second instruction specifies a source indicator that specifies a source of data. A rename unit updates the second instruction source indicator with the first instruction source indicator if there are no intervening instructions that write to the source or to the destination of the first instruction and the second instruction source indicator matches the first instruction destination indicator.
    Type: Grant
    Filed: June 25, 2014
    Date of Patent: March 7, 2017
    Assignee: VIA ALLIANCE SEMICONDUCTOR CO., LTD.
    Inventors: Gerard M. Col, Matthew Daniel Day
  • Patent number: 9576072
    Abstract: Disclosed herein are technologies related to database calculation that utilizes parallel-computation of tasks in a directed acyclic graph. In accordance with one aspect, dependency of tasks is converted into a directed acyclic graph that topologically orders the tasks into layers of tasks. A database calculation may be performed, wherein the database calculation computes in parallel the tasks in each layer of the layers of tasks.
    Type: Grant
    Filed: February 13, 2014
    Date of Patent: February 21, 2017
    Assignee: SAP SE
    Inventors: Jing Gu, Jie Zhao, Xiangling Shi, Chengchang Wang, Yi Ru, Gan Li, Jiale Qu, Xu Li, Zhonglei Zou
  • Patent number: 9563428
    Abstract: In one embodiment, a computer-implemented method includes tracking a size of a load-store queue (LSQ) during compile time of a program. The size of the LSQ is time-varying and indicates how many memory access instructions of the program are on the LSQ. The method further includes scheduling, by a computer processor, a plurality of memory access instructions of the program based on the size of the LSQ.
    Type: Grant
    Filed: March 26, 2015
    Date of Patent: February 7, 2017
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Tong Chen, Alexandre E. Eichenberger, Arpith C. Jacob, Zehra N. Sura
  • Patent number: 9563476
    Abstract: Methods and systems that reduce the number of instance of a shared resource needed for a processor to perform an operation and/or execute a process without impacting function are provided. a method of processing in a processor is provided. Aspects include determining that an operation to be performed by the processor will require the use of a shared resource. A command can be issued to cause a second operation to not use the shared resources N cycles later. The shared resource can then be used for a first aspect of the operation at cycle X and then used for a second aspect of the operation at cycle X+N. The second operation may be rescheduled according to embodiments.
    Type: Grant
    Filed: August 27, 2015
    Date of Patent: February 7, 2017
    Assignee: Imagination Technologies, LLC
    Inventor: Debasish Chandra
  • Patent number: 9552196
    Abstract: In one embodiment, a computer-implemented method includes tracking a size of a load-store queue (LSQ) during compile time of a program. The size of the LSQ is time-varying and indicates how many memory access instructions of the program are on the LSQ. The method further includes scheduling, by a computer processor, a plurality of memory access instructions of the program based on the size of the LSQ.
    Type: Grant
    Filed: June 19, 2015
    Date of Patent: January 24, 2017
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Tong Chen, Alexandre E. Eichenberger, Arpith C. Jacob, Zehra N. Sura
  • Patent number: 9552169
    Abstract: A method and apparatus are described for efficient memory renaming prediction using virtual registers. For example, one embodiment of an apparatus comprises: a memory execution unit (MEU) to perform store and load operations to store data to memory and load data from memory, respectively; a plurality of memory rename (MRN) registers assigned to store and load operations, each MRN register to store data associated with a store operation so that the data is available for a subsequent load operation; and at least one MRN predictor comprising a data structure to allocate virtual memory rename (VMRN) registers to each of the MRN registers, the MRN predictor to query the data structure in response to a load and/or store operation using a value identifying the MRN register assigned to the load and/or store operation, respectively, to determine a current VMRN register associated with the load and/or store operation.
    Type: Grant
    Filed: May 7, 2015
    Date of Patent: January 24, 2017
    Assignee: Intel Corporation
    Inventors: Lihu Rappoport, Jared W. Stark, Kamil Garifullin, Franck Sala, Pavel I. Kryukov, Stanislav Shwartsman
  • Patent number: 9542192
    Abstract: A method for executing an application program using streams. A device driver receives a first command within an application program and parses the first command to identify a first stream token that is associated with a first stream. The device driver checks a memory location associated with the first stream for a first semaphore, and determines whether the first semaphore has been released. Once the first semaphore has been released, a second command within the application program is executed. Advantageously, embodiments of the invention provide a technique for developers to take advantage of the parallel execution capabilities of a GPU.
    Type: Grant
    Filed: August 15, 2008
    Date of Patent: January 10, 2017
    Assignee: NVIDIA Corporation
    Inventors: Nicholas Patrick Wilt, Ian Buck, Philip Cuadra
  • Patent number: 9535695
    Abstract: Techniques are disclosed relating to completion of load and store instructions in a weakly-ordered memory model. In one embodiment, a processor includes a load queue and a store queue and is configured to associate queue information with a load instruction in an instruction stream. In this embodiment, the queue information indicates a location of the load instruction in the load queue and one or more locations in the store queue that are associated with one or more store instructions that are older than the load instruction. The processor may determine, using the queue information, that the load instruction does not conflict with a store instruction in the store queue that is older than the load instruction. The processor may remove the load instruction from the load queue while the store instruction remains in the store queue. The queue information may include a wrap value for the load queue.
    Type: Grant
    Filed: January 25, 2013
    Date of Patent: January 3, 2017
    Assignee: Apple Inc.
    Inventors: John H. Mylius, Rajat Goel, Pradeep Kanapathipillai, Hari S. Kannan
  • Patent number: 9524165
    Abstract: Embodiments relate to register comparison for register comparison for operand store compare (OSC) prediction. An aspect includes, for each instruction in an instruction group of a processor pipeline: determining a base register value of the instruction; determining an index register value of the instruction; and determining a displacement of the instruction. Another aspect includes comparing the base register value, index register value, and displacement of each instruction in the instruction group to the base register value, index register value, and displacement of all other instructions in the instruction group. Another aspect includes based on the comparison, determining that a load instruction of the instruction group has a probable OSC conflict with a store instruction of the instruction group. Yet another aspect includes delaying the load instruction based on the determined probable OSC conflict.
    Type: Grant
    Filed: April 19, 2016
    Date of Patent: December 20, 2016
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: David Hutton, Wen Li, Eric Schwarz
  • Patent number: 9519482
    Abstract: A pipelined run-to-completion processor can decode three instructions in three consecutive clock cycles, and can also execute the instructions in three consecutive clock cycles. The first instruction causes the ALU to generate a value which is then loaded due to execution of the first instruction into a register of a register file. The second instruction accesses the register and loads the value into predicate bits in a register file read stage. The predicate bits are loaded in the very next clock cycle following the clock cycle in which the second instruction was decoded. The third instruction is a conditional instruction that uses the values of the predicate bits as a predicate code to determine a predicate function. If a predicate condition (as determined by the predicate function as applied to flags) is true then an instruction operation of the third instruction is carried out, otherwise it is not carried out.
    Type: Grant
    Filed: June 20, 2014
    Date of Patent: December 13, 2016
    Assignee: Netronome Systems, Inc.
    Inventor: Gavin J. Stark
  • Patent number: 9501286
    Abstract: A superscalar pipelined microprocessor includes a register set defined by its instruction set architecture, a cache memory, execution units, and a load unit, coupled to the cache memory and distinct from the other execution units. The load unit comprises an ALU. The load unit receives an instruction that specifies a memory address of a source operand, an operation to be performed on the source operand to generate a result, and a destination register of the register set to which the result is to be stored. The load unit reads the source operand from the cache memory. The ALU performs the operation on the source operand to generate the result, rather than forwarding the source operand to any of the other execution units of the microprocessor to perform the operation on the source operand to generate the result. The load unit outputs the result for subsequent retirement to the destination register.
    Type: Grant
    Filed: October 30, 2009
    Date of Patent: November 22, 2016
    Assignee: VIA TECHNOLOGIES, INC.
    Inventors: Gerard M. Col, Colin Eddy, Rodney E. Hooker
  • Patent number: 9495156
    Abstract: Technical solutions are described for dynamically managing an operand-store-compare (OSC) prediction table for load and store operations executed out-of-order. One general aspect includes a method that includes receiving a request to retire a queue entry corresponding to an instruction. The method also includes identifying an OSC prediction for the instruction based on an OSC prediction table entry, where the OSC prediction indicates if the instruction is predicted to hit an OSC hazard. The method also includes determining if the instruction hit the OSC hazard. The method also includes in response to the OSC prediction indicating that the instruction is predicted to hit the OSC hazard and the instruction not hitting the OSC hazard, invalidating the OSC prediction table entry corresponding to the instruction. The present document further describes examples of other aspects such as methods, computer products.
    Type: Grant
    Filed: April 5, 2016
    Date of Patent: November 15, 2016
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Khary J. Alexander, Jane H. Bartik, Jatin Bhartia, James J. Bonanno, Adam B. Collura, Jang-Soo Lee, James R. Mitchell, Anthony Saporito
  • Patent number: 9489207
    Abstract: Mechanisms are provided for partial flush handling with multiple branches per instruction group. The instruction fetch unit sorts instructions into groups. A group may include a floating branch instruction and a boundary branch instruction. For each group of instructions, the instruction sequencing unit creates an entry in a global completion table (GCT), which may also be referred to herein as a group completion table. The instruction sequencing unit uses the GCT to manage completion of instructions within each outstanding group. Because each group may include up to two branches, the instruction sequencing unit may dispatch instructions beyond the first branch, i.e. the floating branch. Therefore, if the floating branch results in a misprediction, the processor performs a partial flush of that group, as well as a flush of every group younger than that group.
    Type: Grant
    Filed: April 14, 2009
    Date of Patent: November 8, 2016
    Assignee: International Business Machines Corporation
    Inventors: William E. Burky, Brian R. Mestan, Dung Q. Nguyen, Balaram Sinharoy, Benjamin W. Stolt
  • Patent number: 9477476
    Abstract: Fusing immediate value, write-based instructions in instruction processing circuits, and related processor systems, methods, and computer-readable media are disclosed. In one embodiment, a first instruction indicating an operation writing an immediate value to a register is detected by an instruction processing circuit. The circuit also detects at least one subsequent instruction indicating an operation that overwrites at least one first portion of the register while maintaining a value of a second portion of the register. The at least one subsequent instruction is converted (or replaced) with a fused instruction(s), which indicates an operation writing the at least one first portion and the second portion of the register.
    Type: Grant
    Filed: November 27, 2012
    Date of Patent: October 25, 2016
    Assignee: QUALCOMM Incorporated
    Inventors: Melinda J. Brown, Michael William Morrow, James Norris Dieffenderfer, Brian Michael Stempel, Michael Scott McIlvaine, Rodney Wayne Smith, Jeffrey M. Schottmiller, Andrew S. Irwin
  • Patent number: 9471317
    Abstract: A processor includes a plurality of execution units. At least one of the execution units is configured to determine, based on a field of a first instruction, a number of additional instructions to execute in conjunction with the first instruction and prior to execution of the first instruction.
    Type: Grant
    Filed: September 27, 2012
    Date of Patent: October 18, 2016
    Assignee: TEXAS INSTRUMENTS DEUTSCHLAND GMBH
    Inventors: Horst Diewald, Johann Zipperer