Instruction Issuing Patents (Class 712/214)
  • Patent number: 8832464
    Abstract: A processor including instruction support for implementing hash algorithms may issue, for execution, programmer-selectable hash instructions from a defined instruction set architecture (ISA). The processor may include a cryptographic unit that may receive instructions for execution. The instructions include hash instructions defined within the ISA. In addition, the hash instructions may be executable by the cryptographic unit to implement a hash that is compliant with one or more respective hash algorithm specifications. In response to receiving a particular hash instruction defined within the ISA, the cryptographic unit may retrieve a set of input data blocks from a predetermined set of architectural registers of the processor, and generate a hash value of the set of input data blocks according to a hash algorithm that corresponds to the particular hash instruction.
    Type: Grant
    Filed: March 31, 2009
    Date of Patent: September 9, 2014
    Assignee: Oracle America, Inc.
    Inventors: Christopher H. Olson, Jeffrey S. Brooks, Robert T. Golla
  • Patent number: 8825988
    Abstract: The present invention provides a method and apparatus for implementing a matrix algorithm for scheduling instructions. One embodiment of the method includes selecting a first subset of instructions so that each instruction in the first subset is the earliest in program order of instructions associated with a corresponding one of a plurality of sub-matrices of a matrix that has a plurality of matrix entries. Each matrix entry indicates the program order of one pair of instructions that are eligible for execution. This embodiment also includes selecting, from the first subset of instructions, the instruction that is earliest in program order based on matrix entries associated with the first subset of instructions.
    Type: Grant
    Filed: November 12, 2010
    Date of Patent: September 2, 2014
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Jeff Rupley, Rajagopalan Desikan
  • Patent number: 8813080
    Abstract: In some embodiments, the invention involves a system and method to enhance an operating system's ability to schedule ready threads, specifically to select a logical processor on which to run the ready thread, based on platform policy. Platform policy may be performance-centric, power-centric, or a balance of the two. Embodiments of the present invention use temporal characteristics of the system utilization, or workload, and/or temporal characteristics of the ready thread in choosing a logical processor. Other embodiments are described and claimed.
    Type: Grant
    Filed: June 28, 2007
    Date of Patent: August 19, 2014
    Assignee: Intel Corporation
    Inventors: Russell J. Fenger, Leena K. Puthiyedath
  • Patent number: 8806180
    Abstract: A scheduler in a process of a computer system detects a task with an associated execution context that has not been previously invoked by the scheduler. The scheduler executes the task on a processing resource without performing a context switch if the processing resource executed a previous task to completion. The scheduler stores the execution context originally associated with the task for later use.
    Type: Grant
    Filed: May 1, 2008
    Date of Patent: August 12, 2014
    Assignee: Microsoft Corporation
    Inventors: Paul F. Ringseth, Genevieve Fernandes
  • Patent number: 8806253
    Abstract: A method of power gating a microprocessor having an instruction scheduling unit for receiving issued instructions from an instruction decoder; an execution unit receiving and sending signals from and to the instruction scheduling unit; and a state machine. The method comprises: obtaining a number of instructions per cycle being issued to the instruction scheduling unit; determining, if the number of instruction per cycle being issued to the instruction scheduling unit is less than a threshold level, and then determining if at least two of the instructions being issued to the instruction scheduling unit are independent of each other only when the instructions per cycle is less than the threshold level; determining when at least two of the instructions being issued to the instruction scheduling unit are independent of each other; and power gating the microprocessor to gate off power to idle macros with a signal from the state machine.
    Type: Grant
    Filed: August 8, 2012
    Date of Patent: August 12, 2014
    Assignee: International Business Machines Corporation
    Inventors: Tim Niggemeier, Harry Barowski, Maarten Boersma, Gunnar Spiess
  • Publication number: 20140223144
    Abstract: Load latency speculation in an out-of-order computer processor, including: issuing a load instruction for execution, wherein the load instruction has a predetermined expected execution latency; issuing a dependent instruction wakeup signal on an instruction wakeup bus, wherein the dependent instruction wakeup signal indicates that the load instruction will be completed upon the expiration of the expected execution latency; determining, upon the expiration of the expected execution latency, whether the load instruction has completed; and responsive to determining that the load instruction has not completed upon the expiration of the expected execution latency, issuing a negative dependent instruction wakeup signal on the instruction wakeup bus, wherein the negative dependent instruction wakeup signal indicates that the load instruction has not completed upon the expiration of the expected execution latency.
    Type: Application
    Filed: March 5, 2013
    Publication date: August 7, 2014
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Timothy H. Heil, Andrew D. Hilton, Adam J. Muff
  • Publication number: 20140223143
    Abstract: Load latency speculation in an out-of-order computer processor, including: issuing a load instruction for execution, wherein the load instruction has a predetermined expected execution latency; issuing a dependent instruction wakeup signal on an instruction wakeup bus, wherein the dependent instruction wakeup signal indicates that the load instruction will be completed upon the expiration of the expected execution latency; determining, upon the expiration of the expected execution latency, whether the load instruction has completed; and responsive to determining that the load instruction has not completed upon the expiration of the expected execution latency, issuing a negative dependent instruction wakeup signal on the instruction wakeup bus, wherein the negative dependent instruction wakeup signal indicates that the load instruction has not completed upon the expiration of the expected execution latency.
    Type: Application
    Filed: February 6, 2013
    Publication date: August 7, 2014
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: TIMOTHY H. HEIL, ANDREW D. HILTON, ADAM J. MUFF
  • Patent number: 8788793
    Abstract: A processor including L computing units, L being an integer of 2 or greater, the processor comprising: an instruction buffer including M×Z instruction storage areas each storing one instruction, M instruction streams being input in a state of being distinguished from each other, each of the M instruction streams including Z instructions, M and Z each being an integer of 2 or greater, M×Z being equal to or greater than L; an order information holding unit holding order information that indicates an order of the M×Z instruction storage areas; an extraction unit operable to extract instructions from the M×Z instruction storage areas; and a control unit operable to cause the extraction unit to extract L instructions in executable state from the M×Z instruction storage areas in accordance with the order indicated by the order information, and input the instructions into different ones of the L computing units.
    Type: Grant
    Filed: May 18, 2010
    Date of Patent: July 22, 2014
    Assignee: Panasonic Corporation
    Inventor: Hiroyuki Morishita
  • Publication number: 20140201501
    Abstract: Embodiments of the invention relate to dynamically routing instructions to execution units based on detected errors in the execution units. An aspect of the invention includes a computer system including a processor having an instruction issue unit and a plurality of execution units. The processor is configured to detect an error in a first execution unit among the plurality of execution units and adjust instruction dispatch rules of the instruction issue unit based on detecting the error in the first execution unit to restrict access to the first execution unit while leaving un-restricted access to the remaining execution units of the plurality of execution units.
    Type: Application
    Filed: January 15, 2013
    Publication date: July 17, 2014
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventor: INTERNATIONAL BUSINESS MACHINES CORPORATION
  • Patent number: 8782434
    Abstract: A pipelined processor comprising a cache memory system, fetching instructions for execution from a portion of said cache memory system, an instruction commencing processing before a digital signature of the cache line that contained the instruction is verified against a reference signature of the cache line, the verification being done at the point of decoding, dispatching, or committing execution of the instruction, the reference signature being stored in an encrypted form in the processor's memory, and the key for decrypting the said reference signature being stored in a secure storage location. The instruction processing proceeds when the two signatures exactly match and, where further instruction processing is suspended or processing modified on a mismatch of the two said signatures.
    Type: Grant
    Filed: July 15, 2011
    Date of Patent: July 15, 2014
    Assignee: The Research Foundation for the State University of New York
    Inventor: Kanad Ghose
  • Patent number: 8782435
    Abstract: A processor comprising: an instruction processing pipeline, configured to receive a sequence of instructions for execution, said sequence comprising at least one instruction including a flow control instruction which terminates the sequence; a hash generator, configured to generate a hash associated with execution of the sequence of instructions; a memory configured to securely receive a reference signature corresponding to a hash of a verified corresponding sequence of instructions; verification logic configured to determine a correspondence between the hash and the reference signature; and authorization logic configured to selectively produce a signal, in dependence on a degree of correspondence of the hash with the reference signature.
    Type: Grant
    Filed: July 15, 2011
    Date of Patent: July 15, 2014
    Assignee: The Research Foundation for The State University of New York
    Inventor: Kanad Ghose
  • Patent number: 8775777
    Abstract: Sourcing immediate values from a very long instruction word includes determining if a VLIW sub-instruction expansion condition exists. If the sub-instruction expansion condition exists, operation of a portion of a first arithmetic logic unit component is minimized. In addition, a part of a second arithmetic logic unit component is expanded by utilizing a block of a very long instruction word, which is normally utilized by the first arithmetic logic unit component, for the second arithmetic logic unit component if the sub-instruction expansion condition exists.
    Type: Grant
    Filed: August 15, 2007
    Date of Patent: July 8, 2014
    Assignee: NVIDIA Corporation
    Inventors: Tyson J. Bergland, Craig M. Okruhlica, Michael J. M. Toksvig, Justin M. Mahan, Edward A. Hutchins
  • Publication number: 20140189312
    Abstract: Embodiments of the present invention may include a data processing system comprising a processing execution block to execute instructions stored in an instruction queue, a programmable hardware accelerator, and a controller programmed to monitor the instruction queue to detect a first type of instructions stored in the instruction queue, reprogram the programmable hardware accelerator to execute the first type of instructions, and transmit the first type of instructions to the programmable hardware accelerator to be executed.
    Type: Application
    Filed: December 27, 2012
    Publication date: July 3, 2014
    Inventor: Kia Leong TAN
  • Publication number: 20140189313
    Abstract: Various embodiments of microprocessors and methods of operating a microprocessor during runahead operation are disclosed herein. One example method of operating a microprocessor includes identifying a runahead-triggering event associated with a runahead-triggering instruction and, responsive to identification of the runahead-triggering event, entering runahead operation and inserting the runahead-triggering instruction along with one or more additional instructions in a queue. The example method also includes resuming non-runahead operation of the microprocessor in response to resolution of the runahead-triggering event and re-dispatching the runahead-triggering instruction along with the one or more additional instructions from the queue to the execution logic.
    Type: Application
    Filed: December 28, 2012
    Publication date: July 3, 2014
    Applicant: NVIDIA CORPORATION
    Inventors: Guillermo J. Rozas, Alexander Klaiber, James van Zoeren, Paul Serris, Brad Hoyt, Sridharan Ramakrishnan, Hens Vanderschoot, Ross Segelken, Darrell D. Boggs, Magnus Ekman, Aravindh Baktha, David Dunn
  • Patent number: 8769539
    Abstract: A method and apparatus are provided to control the order of execution of load and store operations. Also provided is a computer readable storage device encoded with data for adapting a manufacturing facility to create the apparatus. One embodiment of the method includes determining whether a first group, comprising at least one or more instructions, is to be selected from a scheduling queue of a processor for execution using either a first execution mode or a second execution mode. The method also includes, responsive to determining that the first group is to be selected for execution using the second execution mode, preventing selection of the first group until a second group, comprising at least one or more instructions, that entered the scheduling queue prior to the first group is selected for execution.
    Type: Grant
    Filed: November 16, 2010
    Date of Patent: July 1, 2014
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Daniel Hopper, Suzanne Plummer, Christopher D. Bryant
  • Patent number: 8762126
    Abstract: Analyzing simulated operation of a computer including loading user-defined dynamically linked analysis libraries that each include specifications of events to be traced for analysis, including: executing, in separate hardware threads, one trace buffer handler for each analysis library, and associating, with each trace buffer handler, one or more analysis functions; translating static binary instructions for the simulated computer into binary instructions for the executing computer, including: inserting, into the translation, implementing code for each specification of an event to be traced and inserting, into the translation for each static instruction, a memory address of a separate static instruction buffer; executing the translation, including executing the implementing code and generating, in a trace buffer, one or more trace records for each specified event; and processing the trace buffer, including calling analysis functions and associating by the analysis functions through the separate static instruct
    Type: Grant
    Filed: January 5, 2011
    Date of Patent: June 24, 2014
    Assignee: International Business Machines Corporation
    Inventors: Patrick J. Bohrer, Ahmed Gheith, James L. Peterson
  • Patent number: 8756404
    Abstract: Improved techniques for executing instructions in a pipelined manner that may reduce stalls that occur when executing dependent instructions are provided. Stalls may be reduced by utilizing a cascaded arrangement of pipelines with execution units that are delayed with respect to each other. This cascaded delayed arrangement allows dependent instructions to be issued within a common issue group by scheduling them for execution in different pipelines to execute at different times.
    Type: Grant
    Filed: December 11, 2006
    Date of Patent: June 17, 2014
    Assignee: International Business Machines Corporation
    Inventor: David A. Luick
  • Patent number: 8732439
    Abstract: A method and computer-usable medium including instructions for performing a method for scheduling executable transactions within a multicore processor comprising a plurality of processor elements. The method includes listing, using at least one distribution queue, a portion of the executable transactions in order of eligibility for execution. A plurality of executable transaction schedulers are provided, wherein each executable transaction scheduler includes a scheduling process for determining a most eligible executable transaction for execution from at least one candidate executable transaction ready for execution. The executable transaction schedulers are linked together to provide a multilevel scheduler. The most eligible executable transaction is output from the multilevel scheduler to the at least one distribution queue.
    Type: Grant
    Filed: September 29, 2006
    Date of Patent: May 20, 2014
    Assignees: Synopsys, Inc., Fujitsu Semiconductor Limited
    Inventor: Mark D. Lippett
  • Publication number: 20140129805
    Abstract: Systems and methods for reducing power consumption by an execution pipeline are provided. In one example, a method includes stalling an operation from being executed in the execution pipeline based on inputs to the operation being unavailable in a register file and disabling access to read the register file in favor of controlling a bypass network based on the consumer characteristics of the operation and producer characteristics of other operations being executed in the execution pipeline to forward data produced at an execution stage in the execution pipeline to be used as one or more resources of the operation.
    Type: Application
    Filed: November 8, 2012
    Publication date: May 8, 2014
    Applicant: NVIDIA Corporation
    Inventor: Don Husby
  • Patent number: 8719807
    Abstract: A method and apparatus for enabling a Software Transactional Memory (STM) with precompiled binaries is herein described. Upon encountering an access operation in a transaction, an annotation field associated with a memory location referenced by the access is checked. In response to the memory location representing a previous similar access within the transaction, the access is performed without access barriers. However, if the annotation field is in a default state representing no previous access during a pendancy of the transaction, then a mode of the processor is determined. If the processor mode is in implicit mode, an access handler/barrier is asynchronously executed. Conversely, in an explicit mode, a flag is set instead of asynchronously executing the handler. In addition, during compilation convert explicit and convert implicit instructions are inserted to intelligently convert modes for precompiled and newly compiled binaries.
    Type: Grant
    Filed: December 28, 2006
    Date of Patent: May 6, 2014
    Assignee: Intel Corporation
    Inventors: Bratin Saha, Ali-Reza Adl-Tabatabai, Quinn Jacobson
  • Publication number: 20140122844
    Abstract: Intelligent context management for thread switching is achieved by determining that a register bank has not been used by a thread for a predetermined number of dispatches, and responsively disabling the register bank for use by that thread. A counter is incremented each time the thread is dispatched but the register bank goes unused. Usage or non-usage of the register bank is inferred by comparing a previous checksum for the register bank to a current checksum. If the previous and current checksums match, the system concludes that the register bank has not been used. If a thread attempts to access a disabled bank, the processor takes an interrupt, enables the bank, and resets the corresponding counter. For a system utilizing transactional memory, it is preferable to enable all of the register banks when thread processing begins to avoid aborted transactions from register banks disabled by lazy context management techniques.
    Type: Application
    Filed: November 1, 2012
    Publication date: May 1, 2014
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventor: Randal C. Swanberg
  • Publication number: 20140089589
    Abstract: Methods and processors for enforcing an order of memory access requests in the presence of barriers in an out-of-order processor pipeline. A speculative color is assigned to instruction operations in the front-end of the processor pipeline, while the instruction operations are still in order. The instruction operations are placed in any of multiple reservation stations and then issued out-of-order from the reservation stations. When a barrier is encountered in the front-end, the speculative color is changed, and instruction operations are assigned the new speculative color. A core interface unit maintains an architectural color, and the architectural color is changed when a barrier retires. The core interface unit stalls instruction operations with a speculative color that does match the architectural color.
    Type: Application
    Filed: September 27, 2012
    Publication date: March 27, 2014
    Applicant: APPLE INC.
    Inventors: Stephan G. Meier, Gerard R. Williams, III
  • Publication number: 20140082330
    Abstract: Systems and methods for static code scheduling are disclosed. A method can include receiving an intermediate representation of source code, building a directed acyclic graph (DAG) for the intermediate representation, and creating chains of dependent instructions from the DAG for cluster formation. The chains are merged into clusters and each node in the DAG is marked with an identifier of a cluster it is part of to generate a marked instruction DAG. Instruction DAG scheduling is then performed using information about the clusters to generate an ordered intermediate representation of the source code.
    Type: Application
    Filed: September 14, 2012
    Publication date: March 20, 2014
    Applicant: QUALCOMM INNOVATION CENTER, INC.
    Inventor: Sergei Larin
  • Publication number: 20140068228
    Abstract: Embodiments herein relate to forwarding an instruction based on predication criteria. A predicate state associated with a packet of data is to be compared to an instruction associated with the predication criteria. The instruction is to be forwarded to an execution unit if the predication criteria includes or matches the predicate state of the packet.
    Type: Application
    Filed: August 29, 2012
    Publication date: March 6, 2014
    Inventors: David A. Warren, Thomas A. Keaveny
  • Publication number: 20140052965
    Abstract: Dynamic CPU GPU load balancing is described based on power. In one example, an instruction is received and power values are received for a central processing core (CPU) and a graphics processing core (GPU). The CPU or the GPU is selected based on the received power values and the instruction is sent to the selected core for processing.
    Type: Application
    Filed: February 8, 2012
    Publication date: February 20, 2014
    Inventor: Uzi Sarel
  • Patent number: 8635621
    Abstract: The invention relates to a method and apparatus for execution scheduling of a program thread of an application program and executing the scheduled program thread on a data processing system. The method includes: providing an application program thread priority to a thread execution scheduler; selecting for execution the program thread from a plurality of program threads inserted into the thread execution queue, wherein the program thread is selected for execution using a round-robin selection scheme, and wherein the round-robin selection scheme selects the program thread based on an execution priority associated with the program thread bit; placing the program thread in a data processing execution queue within the data processing system; and removing the program thread from the thread execution queue after a successful execution of the program thread by the data processing system.
    Type: Grant
    Filed: August 22, 2008
    Date of Patent: January 21, 2014
    Assignee: International Business Machines Corporation
    Inventors: David Stephen Levitan, Jeffrey Richard Summers
  • Publication number: 20130346729
    Abstract: Systems, methods and computer program product provide for pipelining out-of-order instructions. Embodiments comprise an instruction reservation station for short instructions of a short latency type and long instructions of a long latency type, an issue queue containing at least two short instructions of a short latency type, which are to be chained to match a latency of a long instruction of a long latency type, a register file, at least one execution pipeline for instructions of a short latency type and at least one execution pipeline for instructions of a long latency type; wherein results of the at least one execution pipeline for instructions of the short latency type are written to the register file, preserved in an auxiliary buffer, or forwarded to inputs of said execution pipelines. Data of the auxiliary buffer are written to the register file.
    Type: Application
    Filed: June 26, 2013
    Publication date: December 26, 2013
    Inventors: Harry Barowski, Tim Niggemeier
  • Patent number: 8615644
    Abstract: A technique for indicating a safe shared resource condition with respect to a disabled thread provides a mechanism for providing a fast indication to other hardware threads that a temporarily disabled thread can no longer impact shared resources, such as shared special-purpose registers and translation look-aside buffers within the processor core. Signals from pipelines within the core indicates whether any of the instructions pending in the pipeline impact the shared resources and if not, then the thread disable status is presented to the other threads via a state change in a thread status register. Upon receiving an indication that a particular hardware thread is to be disabled, control logic halts the dispatch of instructions for the particular hardware thread, and then waits until any indication that a shared resource is impacted by an instruction has cleared. Then the control logic updates the thread status to indicate the thread is disabled.
    Type: Grant
    Filed: February 19, 2010
    Date of Patent: December 24, 2013
    Assignee: International Business Machines Corporation
    Inventors: Becky Bruce, Giles R. Frazier, Bradly G. Frey, Kumar K. Gala, Cathy May, Michael D. Snyder, Gary Whisenhunt, James Xenidis
  • Publication number: 20130339669
    Abstract: A NONTRANSACTIONAL STORE instruction, executed in transactional execution mode, performs stores that are retained, even if a transaction associated with the instruction aborts. The stores include user-specified information that may facilitate debugging of an aborted transaction.
    Type: Application
    Filed: March 3, 2013
    Publication date: December 19, 2013
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventor: INTERNATIONAL BUSINESS MACHINES CORPORATION
  • Patent number: 8612728
    Abstract: A pipelined computer processor is presented that reduces data hazards such that high processor utilization is attained. The processor restructures a set of instructions to operate concurrently on multiple pieces of data in multiple passes. One subset of instructions operates on one piece of data while different subsets of instructions operate concurrently on different pieces of data. A validity pipeline tracks the priming and draining of the pipeline processor to ensure that only valid data is written to registers or memory. Pass-dependent addressing is provided to correctly address registers and memory for different pieces of data.
    Type: Grant
    Filed: August 8, 2011
    Date of Patent: December 17, 2013
    Assignee: Micron Technology, Inc.
    Inventors: Neal Andrew Crook, Alan T. Wootton, James Peterson
  • Patent number: 8607031
    Abstract: A hardware device for concurrently processing a fixed set of predetermined tasks associated with an algorithm which includes a number of processes, some of the processes being dependent on binary decisions, includes a plurality of task units for processing data, making decisions and/or processing data and making decisions, including source task units and destination task units. A task interconnection logic means interconnect the task units for communicating actions from a source task unit to a destination task unit. Each of the task units includes a processor for executing only a particular single task of the fixed set of predetermined tasks associated with the algorithm in response to a received request action, and a status manager for handling the actions from the source task units and building the actions to be sent to the destination task units.
    Type: Grant
    Filed: February 3, 2012
    Date of Patent: December 10, 2013
    Assignee: International Business Machines Corporation
    Inventors: Alain Benayoun, Jean-Francois Le Pennec, Patrick Michel, Claude Pin
  • Patent number: 8607030
    Abstract: A multi-thread processor in accordance with an exemplary aspect of the present invention includes a plurality of hardware threads each of which generates an independent instruction flow, a thread scheduler that outputs a thread selection signal TSEL designating a hardware thread to be executed in a next execution cycle, a first selector that outputs an instruction generated by a hardware thread selected according to the thread selection signal, and an execution pipeline that executes an instruction output from the first selector, wherein the thread scheduler specifies execution of at least one hardware thread selected in a fixed manner in a predetermined first execution period, and specifies execution of an arbitrary hardware thread in a second execution period.
    Type: Grant
    Filed: September 28, 2009
    Date of Patent: December 10, 2013
    Assignee: Renesas Electronics Corporation
    Inventors: Koji Adachi, Kazunori Miyamoto
  • Publication number: 20130326197
    Abstract: Issuing instructions to execution pipelines based on register-associated preferences and related instruction processing circuits, systems, methods, and computer-readable media are disclosed. In one embodiment, an instruction is detected in an instruction stream. Upon determining that the instruction specifies at least one source register, an execution pipeline preference(s) is determined based on at least one pipeline indicator associated with the at least one source register in a pipeline issuance table, and the instruction is issued to an execution pipeline based on the execution pipeline preference(s). Upon a determination that the instruction specifies at least one target register, at least one pipeline indicator associated with the at least one target register in the pipeline issuance table is updated based on the execution pipeline to which the instruction is issued. In this manner, optimal forwarding of instructions may be facilitated, thus improving processor performance.
    Type: Application
    Filed: January 15, 2013
    Publication date: December 5, 2013
    Applicant: QUALCOMM Incorporated
    Inventors: Melinda J. Brown, James N. Dieffenderfer, Michael W. Morrow, Brian M. Stempel, Michael S. Mcllvaine
  • Patent number: 8601177
    Abstract: A method may include distributing ranges of addresses in a memory among a first set of functions in a first pipeline. The first set of the functions in the first pipeline may operate on data using the ranges of addresses. Different ranges of addresses in the memory may be redistributed among a second set of functions in a second pipeline without waiting for the first set of functions to be flushed of data.
    Type: Grant
    Filed: June 27, 2012
    Date of Patent: December 3, 2013
    Assignee: Intel Corporation
    Inventor: Thomas A. Piazza
  • Patent number: 8593465
    Abstract: The present invention provides a system for handling extra contexts for shader constants, and applications thereof. In an embodiment there is provided a computer-based method for executing a series of compute packets in an execution pipeline. The execution pipeline includes a first plurality of registers configured to store state-updates of a first type and a second plurality of registers configured to store state-updates of a second type. A first number of state-updates of the first type and a second number of state-updates of the second type are respectively identified and stored in the first and second plurality of registers. A compute packet is sent to the execution pipeline responsive to the first number and the second number. Then, the compute packet is executed by the execution pipeline.
    Type: Grant
    Filed: June 13, 2007
    Date of Patent: November 26, 2013
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Mark M. Leather, Brian D. Emberling
  • Patent number: 8595468
    Abstract: A multi-core processor system supporting simultaneous thread sharing across execution resources of multiple processor cores is provided. The multi-core processor system includes a first processor core with a first instruction queue and dispatch logic in communication with a first execution resource of the first processor core. The multi-core processor system also includes a second processor core with a second instruction queue and dispatch logic in communication with a second execution resource of the second processor core. A high-speed execution resource bus couples the first and second processor cores. The first instruction queue and dispatch logic is configured to issue a first instruction of a thread to the first execution resource and issue a second instruction of the thread over the high-speed execution resource bus to the second execution resource for simultaneous execution of the first and second instruction of the thread on the first and second processor cores.
    Type: Grant
    Filed: December 17, 2009
    Date of Patent: November 26, 2013
    Assignee: International Business Machines Corporation
    Inventors: Shawn M. Luke, John Sargis, Jr., Daneyand J. Singley
  • Publication number: 20130305022
    Abstract: Mechanisms are provided, in a processor, for executing instructions that are younger than a previously dispatched synchronization (sync) instruction is provided. An instruction sequencer unit of the processor dispatches a sync instruction. The sync instruction is sent to a nest of one or more devices outside of the processor. The instruction sequencer unit dispatches a subsequent instruction after dispatching the sync instruction. The dispatching of the subsequent instruction after dispatching the sync instruction is performed prior to receiving a sync acknowledgement response from the nest. The instruction sequencer unit performs a completion of the subsequent instruction based on whether completion of the subsequent instruction is dependent upon receiving the sync acknowledgement from the nest and completion of the sync instruction.
    Type: Application
    Filed: May 14, 2012
    Publication date: November 14, 2013
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Susan E. Eisen, Hung Q. Le, Bryan J. Lloyd, Dung Q. Nguyen, David S. Ray, Benjamin W. Stolt, Shih-Hsiung S. Tung
  • Publication number: 20130305021
    Abstract: Basic blocks within a thread program are characterized for convergence based on variance analysis or corresponding instructions. Each basic block is marked as divergent based on transitive control dependence on a block that is either divergent or comprising a variant branch condition. Convergent basic blocks that are defined by invariant instructions are advantageously identified as candidates for scalarization by a thread program compiler.
    Type: Application
    Filed: May 9, 2012
    Publication date: November 14, 2013
    Inventors: Vinod GROVER, Yunsup LEE, Xiangyun KONG, Gautam CHAKRABARTI, Ronny M. KRASHINSKY
  • Publication number: 20130283013
    Abstract: In accordance with embodiments disclosed herein, there are provided methods, systems, and apparatuses for enabling an agent interfacing with a pipelined backbone to locally handle transactions while obeying an ordering rule including, for example, receiving a transaction which requests access to a backbone; decoding routing destination information from the transaction received, in which the decoded routing destination information designates the transaction to be processed either locally or processed via the backbone; storing the decoded routing destination information and the transaction into a First-In-First-Out (FIFO) buffer; retrieving the decoded routing destination information and the transaction from the FIFO buffer; and processing the transaction locally or via the backbone based on the decoded routing destination information retrieved from the FIFO buffer with the transaction.
    Type: Application
    Filed: November 9, 2011
    Publication date: October 24, 2013
    Inventors: Ngek Leong Guok, Kah Meng Yeem, Poh Thiam Teoh, Su Wei Lim
  • Patent number: 8566568
    Abstract: Instruction execution delay is alterable after the system design has been finalized, thus enabling the system to dynamically account for various conditions that impact instruction execution. In some embodiments, the dynamic delay is determined by an application to be executed by the processing system. In other embodiments, the dynamic delay is determined by analyzing the history of previously executed instructions. In yet other embodiments, the dynamic delay is determined by assessing the processing resources available to a given application. Regardless, the delay may be dynamically altered on a per-instruction, multiple instruction, or application basis. Processor instruction execution may be controlled by determining a first delay value for a first set of one or more instructions and a second delay value for a second set of one or more instructions. Execution of the sets of instructions is delayed based on the corresponding delay value.
    Type: Grant
    Filed: August 16, 2006
    Date of Patent: October 22, 2013
    Assignee: QUALCOMM Incorporated
    Inventors: Gerald Paul Michalak, Kenneth Alan Dockser
  • Patent number: 8560812
    Abstract: A multithread execution device includes: a program memory in which a plurality of programs are stored; an instruction issue unit that issues an instruction retrieved from the program memory; an instruction execution unit that executes the instruction; a target execution speed information memory that stores target execution speed information of the instruction; an execution speed monitor that monitors an execution speed of the instruction; a feedback control unit that commands the instruction issue unit to issue the instruction such that the execution speed of the instruction approximately corresponds to the target execution speed information.
    Type: Grant
    Filed: May 20, 2010
    Date of Patent: October 15, 2013
    Assignees: Toyota Jidosha Kabushiki Kaisha, Renesas Electronics Corporation
    Inventors: Tetsuaki Wakabayashi, Koji Adachi, Kazuya Okamoto
  • Patent number: 8561079
    Abstract: The information processing device in the simultaneous multi-threading system is operated in an inter-thread performance load arbitration control method, and includes: an instruction input control unit for sharing among threads control of inputting an instruction in an arithmetic unit for acquiring the instruction from memory and performing an operation on the basis of the instruction; a commit stack entry provided for each thread for holding information obtained by decoding the instruction; an instruction completion order control unit for updating the memory and a general purpose register depending on an arithmetic result obtained by the arithmetic unit in an order of the instructions input from the instruction input control unit; and a performance load balance analysis unit for detecting the information registered in the commit stack entry and controlling the instruction input control unit.
    Type: Grant
    Filed: December 11, 2009
    Date of Patent: October 15, 2013
    Assignee: Fujitsu Limited
    Inventors: Takashi Suzuki, Toshio Yoshida
  • Publication number: 20130262823
    Abstract: A computer system for optimizing instructions includes a processor including an instruction execution unit configured to execute instructions and an instruction optimization unit configured to optimize instructions and memory to store machine instructions to be executed by the instruction execution unit. The computer system is configured to perform a method including analyzing machine instructions from among a stream of instructions to be executed by the instruction execution unit, the machine instructions including a memory load instruction and a data processing instruction to perform a data processing function based on the memory load instruction, identifying the machine instructions as being eligible for optimization, merging the machine instructions into a single optimized internal instruction, and executing the single optimized internal instruction to perform a memory load function and a data processing function corresponding to the memory load instruction and the data processing instruction.
    Type: Application
    Filed: March 28, 2012
    Publication date: October 3, 2013
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Michael K. Gschwind, Valentina Salapura
  • Patent number: 8539204
    Abstract: One embodiment of the present invention sets forth a technique for performing aggregation operations across multiple threads that execute independently. Aggregation is specified as part of a barrier synchronization or barrier arrival instruction, where in addition to performing the barrier synchronization or arrival, the instruction aggregates (using reduction or scan operations) values supplied by each thread. When a thread executes the barrier aggregation instruction the thread contributes to a scan or reduction result, and waits to execute any more instructions until after all of the threads have executed the barrier aggregation instruction. A reduction result is communicated to each thread after all of the threads have executed the barrier aggregation instruction and a scan result is communicated to each thread as the barrier aggregation instruction is executed by the thread.
    Type: Grant
    Filed: September 24, 2010
    Date of Patent: September 17, 2013
    Assignee: Nvidia Corporation
    Inventors: Brian Fahs, Ming Y. Siu, Brett W. Coon, John R. Nickolls, Lars Nyland
  • Patent number: 8539203
    Abstract: In an exemplary aspect, the present invention provides a multi-thread processor including a plurality of hardware threads each of which generates an independent instruction flow, a thread scheduler that outputs a thread selection signal in accordance with a first or second schedule, the thread selection signal designating a hardware thread to be executed in a next execution cycle among the plurality of hardware threads, a first selector that selects one of the plurality of hardware threads according to the thread selection signal and outputs an instruction generated by the selected hardware thread, and an execution pipeline that executes an instruction output from the first selector, wherein when the multi-thread processor is in a first state, the thread scheduler selects the first schedule, and when the multi-thread processor is in a second state, the thread scheduler selects the second schedule.
    Type: Grant
    Filed: September 23, 2009
    Date of Patent: September 17, 2013
    Assignee: Renesas Electronics Corporation
    Inventors: Koji Adachi, Toshiyuki Matsunaga
  • Patent number: 8533721
    Abstract: A method and system to schedule out of order operations without the requirement to execute compare, ready and pick logic in a single cycle. A lazy out-of-order scheduler splits each scheduling loop into two consecutive cycles. The scheduling loop includes a compare stage, a ready stage and a pick stage. The compare stage and the ready stage are executed in a first of the two consecutive cycles and the pick stage is executed in a second of the two consecutive cycles. By splitting each scheduling loop into two consecutive cycles, selecting the oldest operation by default and checking the readiness of the oldest operation, it relieves the system of timing requirements and avoids the need for power hungry logic. Every execution of an operation does not appear as one extra cycle longer and the lazy out-of-order scheduler retains most of the performance of a full out-of-order scheduler.
    Type: Grant
    Filed: March 26, 2010
    Date of Patent: September 10, 2013
    Assignee: Intel Corporation
    Inventors: Stephen J. Robinson, Deepak Limaye
  • Patent number: 8533719
    Abstract: The disclosed embodiments provide a system that facilitates scheduling threads in a multi-threaded processor with multiple processor cores. During operation, the system executes a first thread in a processor core that is associated with a shared cache. During this execution, the system measures one or more metrics to characterize the first thread. Then, the system uses the characterization of the first thread and a characterization for a second, second thread to predict a performance impact that would occur if the second thread were to simultaneously execute in a second processor core that is also associated with the cache. If the predicted performance impact indicates that executing the second thread on the second processor core will improve performance for the multi-threaded processor, the system executes the second thread on the second processor core.
    Type: Grant
    Filed: April 5, 2010
    Date of Patent: September 10, 2013
    Assignee: Oracle International Corporation
    Inventors: Alexandra Fedorova, David Vengerov, Kishore Kumar Pusukuri
  • Patent number: 8533435
    Abstract: One embodiment of the present invention sets forth a technique for collecting operands specified by an instruction. As a sequence of instructions is received the operands specified by the instructions are assigned to ports, so that each one of the operands specified by a single instruction is assigned to a different port. Reading of the operands from a multi-bank register file is scheduled by selecting an operand from each one of the different ports to produce an operand read request and ensuring that two or more of the selected operands are not stored in the same bank of the multi-bank register file. The operands specified by the operand read request are read from the multi-bank register file in a single clock cycle. Each instruction is then executed as the operands specified by the instruction are read from the multi-bank register file and collected over one or more clock cycles.
    Type: Grant
    Filed: September 3, 2010
    Date of Patent: September 10, 2013
    Assignee: NVIDIA Corporation
    Inventors: Xiaogang Qiu, Ming Y. Siu, Yan Yan Tang, John Erik Lindholm, Michael C. Shebanow, Stuart F. Oberman
  • Patent number: 8527740
    Abstract: A system and method for enhancing barrier collective synchronization on a computer system comprises a computer system including a data storage device. The computer system includes a program stored in the data storage device and steps of the program being executed by a processor. The system includes providing a plurality of communicators for storing state information for a bather algorithm. Each communicator designates a master core in a multi-processor environment of the computer system. The system allocates or designates one counter for each of a plurality of threads. The system configures a table with a number of entries equal to the maximum number of threads. The system sets a table entry with an ID associated with a communicator when a process thread initiates a collective. The system determines an allocated or designated counter by searching entries in the table.
    Type: Grant
    Filed: January 29, 2010
    Date of Patent: September 3, 2013
    Assignee: International Business Machines Corporation
    Inventors: Sameer Kumar, Amith R. Mamidala, Joseph D. Ratterman, Michael Blocksome, Douglas Miller
  • Patent number: 8521991
    Abstract: A technique for selecting instructions for execution from an issue queue at multiple function units while reducing the chances of instruction collisions. In an embodiment, each function unit in a processor may include a selection logic circuit that selects a specific instruction from the issue queue for execution. In order to avoid instruction collision, a function unit may have a selection logic circuit that may select two instructions from an instruction queue: one according to a first selection technique and one according to a second selection technique. Then, by comparing the instruction selected by the first selection technique to the instruction selected by the selection logic circuit of another function unit, the instruction selected by the second technique may be used instead if there will be an instruction collision because the instruction selected by the first selection technique is the same as the instruction selected at a different function unit.
    Type: Grant
    Filed: December 4, 2009
    Date of Patent: August 27, 2013
    Assignee: STMicroelectronics (Beijing) R&D Co., Ltd.
    Inventors: Kai-feng Wang, Hong-Xia Sun, Peng-fei Zhu, Yong-qiang Wu