Instruction Issuing Patents (Class 712/214)
-
Patent number: 8832464Abstract: A processor including instruction support for implementing hash algorithms may issue, for execution, programmer-selectable hash instructions from a defined instruction set architecture (ISA). The processor may include a cryptographic unit that may receive instructions for execution. The instructions include hash instructions defined within the ISA. In addition, the hash instructions may be executable by the cryptographic unit to implement a hash that is compliant with one or more respective hash algorithm specifications. In response to receiving a particular hash instruction defined within the ISA, the cryptographic unit may retrieve a set of input data blocks from a predetermined set of architectural registers of the processor, and generate a hash value of the set of input data blocks according to a hash algorithm that corresponds to the particular hash instruction.Type: GrantFiled: March 31, 2009Date of Patent: September 9, 2014Assignee: Oracle America, Inc.Inventors: Christopher H. Olson, Jeffrey S. Brooks, Robert T. Golla
-
Patent number: 8825988Abstract: The present invention provides a method and apparatus for implementing a matrix algorithm for scheduling instructions. One embodiment of the method includes selecting a first subset of instructions so that each instruction in the first subset is the earliest in program order of instructions associated with a corresponding one of a plurality of sub-matrices of a matrix that has a plurality of matrix entries. Each matrix entry indicates the program order of one pair of instructions that are eligible for execution. This embodiment also includes selecting, from the first subset of instructions, the instruction that is earliest in program order based on matrix entries associated with the first subset of instructions.Type: GrantFiled: November 12, 2010Date of Patent: September 2, 2014Assignee: Advanced Micro Devices, Inc.Inventors: Jeff Rupley, Rajagopalan Desikan
-
Patent number: 8813080Abstract: In some embodiments, the invention involves a system and method to enhance an operating system's ability to schedule ready threads, specifically to select a logical processor on which to run the ready thread, based on platform policy. Platform policy may be performance-centric, power-centric, or a balance of the two. Embodiments of the present invention use temporal characteristics of the system utilization, or workload, and/or temporal characteristics of the ready thread in choosing a logical processor. Other embodiments are described and claimed.Type: GrantFiled: June 28, 2007Date of Patent: August 19, 2014Assignee: Intel CorporationInventors: Russell J. Fenger, Leena K. Puthiyedath
-
Patent number: 8806180Abstract: A scheduler in a process of a computer system detects a task with an associated execution context that has not been previously invoked by the scheduler. The scheduler executes the task on a processing resource without performing a context switch if the processing resource executed a previous task to completion. The scheduler stores the execution context originally associated with the task for later use.Type: GrantFiled: May 1, 2008Date of Patent: August 12, 2014Assignee: Microsoft CorporationInventors: Paul F. Ringseth, Genevieve Fernandes
-
Patent number: 8806253Abstract: A method of power gating a microprocessor having an instruction scheduling unit for receiving issued instructions from an instruction decoder; an execution unit receiving and sending signals from and to the instruction scheduling unit; and a state machine. The method comprises: obtaining a number of instructions per cycle being issued to the instruction scheduling unit; determining, if the number of instruction per cycle being issued to the instruction scheduling unit is less than a threshold level, and then determining if at least two of the instructions being issued to the instruction scheduling unit are independent of each other only when the instructions per cycle is less than the threshold level; determining when at least two of the instructions being issued to the instruction scheduling unit are independent of each other; and power gating the microprocessor to gate off power to idle macros with a signal from the state machine.Type: GrantFiled: August 8, 2012Date of Patent: August 12, 2014Assignee: International Business Machines CorporationInventors: Tim Niggemeier, Harry Barowski, Maarten Boersma, Gunnar Spiess
-
Publication number: 20140223144Abstract: Load latency speculation in an out-of-order computer processor, including: issuing a load instruction for execution, wherein the load instruction has a predetermined expected execution latency; issuing a dependent instruction wakeup signal on an instruction wakeup bus, wherein the dependent instruction wakeup signal indicates that the load instruction will be completed upon the expiration of the expected execution latency; determining, upon the expiration of the expected execution latency, whether the load instruction has completed; and responsive to determining that the load instruction has not completed upon the expiration of the expected execution latency, issuing a negative dependent instruction wakeup signal on the instruction wakeup bus, wherein the negative dependent instruction wakeup signal indicates that the load instruction has not completed upon the expiration of the expected execution latency.Type: ApplicationFiled: March 5, 2013Publication date: August 7, 2014Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Timothy H. Heil, Andrew D. Hilton, Adam J. Muff
-
Publication number: 20140223143Abstract: Load latency speculation in an out-of-order computer processor, including: issuing a load instruction for execution, wherein the load instruction has a predetermined expected execution latency; issuing a dependent instruction wakeup signal on an instruction wakeup bus, wherein the dependent instruction wakeup signal indicates that the load instruction will be completed upon the expiration of the expected execution latency; determining, upon the expiration of the expected execution latency, whether the load instruction has completed; and responsive to determining that the load instruction has not completed upon the expiration of the expected execution latency, issuing a negative dependent instruction wakeup signal on the instruction wakeup bus, wherein the negative dependent instruction wakeup signal indicates that the load instruction has not completed upon the expiration of the expected execution latency.Type: ApplicationFiled: February 6, 2013Publication date: August 7, 2014Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: TIMOTHY H. HEIL, ANDREW D. HILTON, ADAM J. MUFF
-
Patent number: 8788793Abstract: A processor including L computing units, L being an integer of 2 or greater, the processor comprising: an instruction buffer including M×Z instruction storage areas each storing one instruction, M instruction streams being input in a state of being distinguished from each other, each of the M instruction streams including Z instructions, M and Z each being an integer of 2 or greater, M×Z being equal to or greater than L; an order information holding unit holding order information that indicates an order of the M×Z instruction storage areas; an extraction unit operable to extract instructions from the M×Z instruction storage areas; and a control unit operable to cause the extraction unit to extract L instructions in executable state from the M×Z instruction storage areas in accordance with the order indicated by the order information, and input the instructions into different ones of the L computing units.Type: GrantFiled: May 18, 2010Date of Patent: July 22, 2014Assignee: Panasonic CorporationInventor: Hiroyuki Morishita
-
Publication number: 20140201501Abstract: Embodiments of the invention relate to dynamically routing instructions to execution units based on detected errors in the execution units. An aspect of the invention includes a computer system including a processor having an instruction issue unit and a plurality of execution units. The processor is configured to detect an error in a first execution unit among the plurality of execution units and adjust instruction dispatch rules of the instruction issue unit based on detecting the error in the first execution unit to restrict access to the first execution unit while leaving un-restricted access to the remaining execution units of the plurality of execution units.Type: ApplicationFiled: January 15, 2013Publication date: July 17, 2014Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventor: INTERNATIONAL BUSINESS MACHINES CORPORATION
-
Patent number: 8782434Abstract: A pipelined processor comprising a cache memory system, fetching instructions for execution from a portion of said cache memory system, an instruction commencing processing before a digital signature of the cache line that contained the instruction is verified against a reference signature of the cache line, the verification being done at the point of decoding, dispatching, or committing execution of the instruction, the reference signature being stored in an encrypted form in the processor's memory, and the key for decrypting the said reference signature being stored in a secure storage location. The instruction processing proceeds when the two signatures exactly match and, where further instruction processing is suspended or processing modified on a mismatch of the two said signatures.Type: GrantFiled: July 15, 2011Date of Patent: July 15, 2014Assignee: The Research Foundation for the State University of New YorkInventor: Kanad Ghose
-
Patent number: 8782435Abstract: A processor comprising: an instruction processing pipeline, configured to receive a sequence of instructions for execution, said sequence comprising at least one instruction including a flow control instruction which terminates the sequence; a hash generator, configured to generate a hash associated with execution of the sequence of instructions; a memory configured to securely receive a reference signature corresponding to a hash of a verified corresponding sequence of instructions; verification logic configured to determine a correspondence between the hash and the reference signature; and authorization logic configured to selectively produce a signal, in dependence on a degree of correspondence of the hash with the reference signature.Type: GrantFiled: July 15, 2011Date of Patent: July 15, 2014Assignee: The Research Foundation for The State University of New YorkInventor: Kanad Ghose
-
Patent number: 8775777Abstract: Sourcing immediate values from a very long instruction word includes determining if a VLIW sub-instruction expansion condition exists. If the sub-instruction expansion condition exists, operation of a portion of a first arithmetic logic unit component is minimized. In addition, a part of a second arithmetic logic unit component is expanded by utilizing a block of a very long instruction word, which is normally utilized by the first arithmetic logic unit component, for the second arithmetic logic unit component if the sub-instruction expansion condition exists.Type: GrantFiled: August 15, 2007Date of Patent: July 8, 2014Assignee: NVIDIA CorporationInventors: Tyson J. Bergland, Craig M. Okruhlica, Michael J. M. Toksvig, Justin M. Mahan, Edward A. Hutchins
-
Publication number: 20140189312Abstract: Embodiments of the present invention may include a data processing system comprising a processing execution block to execute instructions stored in an instruction queue, a programmable hardware accelerator, and a controller programmed to monitor the instruction queue to detect a first type of instructions stored in the instruction queue, reprogram the programmable hardware accelerator to execute the first type of instructions, and transmit the first type of instructions to the programmable hardware accelerator to be executed.Type: ApplicationFiled: December 27, 2012Publication date: July 3, 2014Inventor: Kia Leong TAN
-
Publication number: 20140189313Abstract: Various embodiments of microprocessors and methods of operating a microprocessor during runahead operation are disclosed herein. One example method of operating a microprocessor includes identifying a runahead-triggering event associated with a runahead-triggering instruction and, responsive to identification of the runahead-triggering event, entering runahead operation and inserting the runahead-triggering instruction along with one or more additional instructions in a queue. The example method also includes resuming non-runahead operation of the microprocessor in response to resolution of the runahead-triggering event and re-dispatching the runahead-triggering instruction along with the one or more additional instructions from the queue to the execution logic.Type: ApplicationFiled: December 28, 2012Publication date: July 3, 2014Applicant: NVIDIA CORPORATIONInventors: Guillermo J. Rozas, Alexander Klaiber, James van Zoeren, Paul Serris, Brad Hoyt, Sridharan Ramakrishnan, Hens Vanderschoot, Ross Segelken, Darrell D. Boggs, Magnus Ekman, Aravindh Baktha, David Dunn
-
Patent number: 8769539Abstract: A method and apparatus are provided to control the order of execution of load and store operations. Also provided is a computer readable storage device encoded with data for adapting a manufacturing facility to create the apparatus. One embodiment of the method includes determining whether a first group, comprising at least one or more instructions, is to be selected from a scheduling queue of a processor for execution using either a first execution mode or a second execution mode. The method also includes, responsive to determining that the first group is to be selected for execution using the second execution mode, preventing selection of the first group until a second group, comprising at least one or more instructions, that entered the scheduling queue prior to the first group is selected for execution.Type: GrantFiled: November 16, 2010Date of Patent: July 1, 2014Assignee: Advanced Micro Devices, Inc.Inventors: Daniel Hopper, Suzanne Plummer, Christopher D. Bryant
-
Patent number: 8762126Abstract: Analyzing simulated operation of a computer including loading user-defined dynamically linked analysis libraries that each include specifications of events to be traced for analysis, including: executing, in separate hardware threads, one trace buffer handler for each analysis library, and associating, with each trace buffer handler, one or more analysis functions; translating static binary instructions for the simulated computer into binary instructions for the executing computer, including: inserting, into the translation, implementing code for each specification of an event to be traced and inserting, into the translation for each static instruction, a memory address of a separate static instruction buffer; executing the translation, including executing the implementing code and generating, in a trace buffer, one or more trace records for each specified event; and processing the trace buffer, including calling analysis functions and associating by the analysis functions through the separate static instructType: GrantFiled: January 5, 2011Date of Patent: June 24, 2014Assignee: International Business Machines CorporationInventors: Patrick J. Bohrer, Ahmed Gheith, James L. Peterson
-
Patent number: 8756404Abstract: Improved techniques for executing instructions in a pipelined manner that may reduce stalls that occur when executing dependent instructions are provided. Stalls may be reduced by utilizing a cascaded arrangement of pipelines with execution units that are delayed with respect to each other. This cascaded delayed arrangement allows dependent instructions to be issued within a common issue group by scheduling them for execution in different pipelines to execute at different times.Type: GrantFiled: December 11, 2006Date of Patent: June 17, 2014Assignee: International Business Machines CorporationInventor: David A. Luick
-
Patent number: 8732439Abstract: A method and computer-usable medium including instructions for performing a method for scheduling executable transactions within a multicore processor comprising a plurality of processor elements. The method includes listing, using at least one distribution queue, a portion of the executable transactions in order of eligibility for execution. A plurality of executable transaction schedulers are provided, wherein each executable transaction scheduler includes a scheduling process for determining a most eligible executable transaction for execution from at least one candidate executable transaction ready for execution. The executable transaction schedulers are linked together to provide a multilevel scheduler. The most eligible executable transaction is output from the multilevel scheduler to the at least one distribution queue.Type: GrantFiled: September 29, 2006Date of Patent: May 20, 2014Assignees: Synopsys, Inc., Fujitsu Semiconductor LimitedInventor: Mark D. Lippett
-
Publication number: 20140129805Abstract: Systems and methods for reducing power consumption by an execution pipeline are provided. In one example, a method includes stalling an operation from being executed in the execution pipeline based on inputs to the operation being unavailable in a register file and disabling access to read the register file in favor of controlling a bypass network based on the consumer characteristics of the operation and producer characteristics of other operations being executed in the execution pipeline to forward data produced at an execution stage in the execution pipeline to be used as one or more resources of the operation.Type: ApplicationFiled: November 8, 2012Publication date: May 8, 2014Applicant: NVIDIA CorporationInventor: Don Husby
-
Patent number: 8719807Abstract: A method and apparatus for enabling a Software Transactional Memory (STM) with precompiled binaries is herein described. Upon encountering an access operation in a transaction, an annotation field associated with a memory location referenced by the access is checked. In response to the memory location representing a previous similar access within the transaction, the access is performed without access barriers. However, if the annotation field is in a default state representing no previous access during a pendancy of the transaction, then a mode of the processor is determined. If the processor mode is in implicit mode, an access handler/barrier is asynchronously executed. Conversely, in an explicit mode, a flag is set instead of asynchronously executing the handler. In addition, during compilation convert explicit and convert implicit instructions are inserted to intelligently convert modes for precompiled and newly compiled binaries.Type: GrantFiled: December 28, 2006Date of Patent: May 6, 2014Assignee: Intel CorporationInventors: Bratin Saha, Ali-Reza Adl-Tabatabai, Quinn Jacobson
-
Publication number: 20140122844Abstract: Intelligent context management for thread switching is achieved by determining that a register bank has not been used by a thread for a predetermined number of dispatches, and responsively disabling the register bank for use by that thread. A counter is incremented each time the thread is dispatched but the register bank goes unused. Usage or non-usage of the register bank is inferred by comparing a previous checksum for the register bank to a current checksum. If the previous and current checksums match, the system concludes that the register bank has not been used. If a thread attempts to access a disabled bank, the processor takes an interrupt, enables the bank, and resets the corresponding counter. For a system utilizing transactional memory, it is preferable to enable all of the register banks when thread processing begins to avoid aborted transactions from register banks disabled by lazy context management techniques.Type: ApplicationFiled: November 1, 2012Publication date: May 1, 2014Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventor: Randal C. Swanberg
-
Publication number: 20140089589Abstract: Methods and processors for enforcing an order of memory access requests in the presence of barriers in an out-of-order processor pipeline. A speculative color is assigned to instruction operations in the front-end of the processor pipeline, while the instruction operations are still in order. The instruction operations are placed in any of multiple reservation stations and then issued out-of-order from the reservation stations. When a barrier is encountered in the front-end, the speculative color is changed, and instruction operations are assigned the new speculative color. A core interface unit maintains an architectural color, and the architectural color is changed when a barrier retires. The core interface unit stalls instruction operations with a speculative color that does match the architectural color.Type: ApplicationFiled: September 27, 2012Publication date: March 27, 2014Applicant: APPLE INC.Inventors: Stephan G. Meier, Gerard R. Williams, III
-
Publication number: 20140082330Abstract: Systems and methods for static code scheduling are disclosed. A method can include receiving an intermediate representation of source code, building a directed acyclic graph (DAG) for the intermediate representation, and creating chains of dependent instructions from the DAG for cluster formation. The chains are merged into clusters and each node in the DAG is marked with an identifier of a cluster it is part of to generate a marked instruction DAG. Instruction DAG scheduling is then performed using information about the clusters to generate an ordered intermediate representation of the source code.Type: ApplicationFiled: September 14, 2012Publication date: March 20, 2014Applicant: QUALCOMM INNOVATION CENTER, INC.Inventor: Sergei Larin
-
Publication number: 20140068228Abstract: Embodiments herein relate to forwarding an instruction based on predication criteria. A predicate state associated with a packet of data is to be compared to an instruction associated with the predication criteria. The instruction is to be forwarded to an execution unit if the predication criteria includes or matches the predicate state of the packet.Type: ApplicationFiled: August 29, 2012Publication date: March 6, 2014Inventors: David A. Warren, Thomas A. Keaveny
-
Publication number: 20140052965Abstract: Dynamic CPU GPU load balancing is described based on power. In one example, an instruction is received and power values are received for a central processing core (CPU) and a graphics processing core (GPU). The CPU or the GPU is selected based on the received power values and the instruction is sent to the selected core for processing.Type: ApplicationFiled: February 8, 2012Publication date: February 20, 2014Inventor: Uzi Sarel
-
Patent number: 8635621Abstract: The invention relates to a method and apparatus for execution scheduling of a program thread of an application program and executing the scheduled program thread on a data processing system. The method includes: providing an application program thread priority to a thread execution scheduler; selecting for execution the program thread from a plurality of program threads inserted into the thread execution queue, wherein the program thread is selected for execution using a round-robin selection scheme, and wherein the round-robin selection scheme selects the program thread based on an execution priority associated with the program thread bit; placing the program thread in a data processing execution queue within the data processing system; and removing the program thread from the thread execution queue after a successful execution of the program thread by the data processing system.Type: GrantFiled: August 22, 2008Date of Patent: January 21, 2014Assignee: International Business Machines CorporationInventors: David Stephen Levitan, Jeffrey Richard Summers
-
Publication number: 20130346729Abstract: Systems, methods and computer program product provide for pipelining out-of-order instructions. Embodiments comprise an instruction reservation station for short instructions of a short latency type and long instructions of a long latency type, an issue queue containing at least two short instructions of a short latency type, which are to be chained to match a latency of a long instruction of a long latency type, a register file, at least one execution pipeline for instructions of a short latency type and at least one execution pipeline for instructions of a long latency type; wherein results of the at least one execution pipeline for instructions of the short latency type are written to the register file, preserved in an auxiliary buffer, or forwarded to inputs of said execution pipelines. Data of the auxiliary buffer are written to the register file.Type: ApplicationFiled: June 26, 2013Publication date: December 26, 2013Inventors: Harry Barowski, Tim Niggemeier
-
Patent number: 8615644Abstract: A technique for indicating a safe shared resource condition with respect to a disabled thread provides a mechanism for providing a fast indication to other hardware threads that a temporarily disabled thread can no longer impact shared resources, such as shared special-purpose registers and translation look-aside buffers within the processor core. Signals from pipelines within the core indicates whether any of the instructions pending in the pipeline impact the shared resources and if not, then the thread disable status is presented to the other threads via a state change in a thread status register. Upon receiving an indication that a particular hardware thread is to be disabled, control logic halts the dispatch of instructions for the particular hardware thread, and then waits until any indication that a shared resource is impacted by an instruction has cleared. Then the control logic updates the thread status to indicate the thread is disabled.Type: GrantFiled: February 19, 2010Date of Patent: December 24, 2013Assignee: International Business Machines CorporationInventors: Becky Bruce, Giles R. Frazier, Bradly G. Frey, Kumar K. Gala, Cathy May, Michael D. Snyder, Gary Whisenhunt, James Xenidis
-
Publication number: 20130339669Abstract: A NONTRANSACTIONAL STORE instruction, executed in transactional execution mode, performs stores that are retained, even if a transaction associated with the instruction aborts. The stores include user-specified information that may facilitate debugging of an aborted transaction.Type: ApplicationFiled: March 3, 2013Publication date: December 19, 2013Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventor: INTERNATIONAL BUSINESS MACHINES CORPORATION
-
Patent number: 8612728Abstract: A pipelined computer processor is presented that reduces data hazards such that high processor utilization is attained. The processor restructures a set of instructions to operate concurrently on multiple pieces of data in multiple passes. One subset of instructions operates on one piece of data while different subsets of instructions operate concurrently on different pieces of data. A validity pipeline tracks the priming and draining of the pipeline processor to ensure that only valid data is written to registers or memory. Pass-dependent addressing is provided to correctly address registers and memory for different pieces of data.Type: GrantFiled: August 8, 2011Date of Patent: December 17, 2013Assignee: Micron Technology, Inc.Inventors: Neal Andrew Crook, Alan T. Wootton, James Peterson
-
Patent number: 8607031Abstract: A hardware device for concurrently processing a fixed set of predetermined tasks associated with an algorithm which includes a number of processes, some of the processes being dependent on binary decisions, includes a plurality of task units for processing data, making decisions and/or processing data and making decisions, including source task units and destination task units. A task interconnection logic means interconnect the task units for communicating actions from a source task unit to a destination task unit. Each of the task units includes a processor for executing only a particular single task of the fixed set of predetermined tasks associated with the algorithm in response to a received request action, and a status manager for handling the actions from the source task units and building the actions to be sent to the destination task units.Type: GrantFiled: February 3, 2012Date of Patent: December 10, 2013Assignee: International Business Machines CorporationInventors: Alain Benayoun, Jean-Francois Le Pennec, Patrick Michel, Claude Pin
-
Patent number: 8607030Abstract: A multi-thread processor in accordance with an exemplary aspect of the present invention includes a plurality of hardware threads each of which generates an independent instruction flow, a thread scheduler that outputs a thread selection signal TSEL designating a hardware thread to be executed in a next execution cycle, a first selector that outputs an instruction generated by a hardware thread selected according to the thread selection signal, and an execution pipeline that executes an instruction output from the first selector, wherein the thread scheduler specifies execution of at least one hardware thread selected in a fixed manner in a predetermined first execution period, and specifies execution of an arbitrary hardware thread in a second execution period.Type: GrantFiled: September 28, 2009Date of Patent: December 10, 2013Assignee: Renesas Electronics CorporationInventors: Koji Adachi, Kazunori Miyamoto
-
Publication number: 20130326197Abstract: Issuing instructions to execution pipelines based on register-associated preferences and related instruction processing circuits, systems, methods, and computer-readable media are disclosed. In one embodiment, an instruction is detected in an instruction stream. Upon determining that the instruction specifies at least one source register, an execution pipeline preference(s) is determined based on at least one pipeline indicator associated with the at least one source register in a pipeline issuance table, and the instruction is issued to an execution pipeline based on the execution pipeline preference(s). Upon a determination that the instruction specifies at least one target register, at least one pipeline indicator associated with the at least one target register in the pipeline issuance table is updated based on the execution pipeline to which the instruction is issued. In this manner, optimal forwarding of instructions may be facilitated, thus improving processor performance.Type: ApplicationFiled: January 15, 2013Publication date: December 5, 2013Applicant: QUALCOMM IncorporatedInventors: Melinda J. Brown, James N. Dieffenderfer, Michael W. Morrow, Brian M. Stempel, Michael S. Mcllvaine
-
Patent number: 8601177Abstract: A method may include distributing ranges of addresses in a memory among a first set of functions in a first pipeline. The first set of the functions in the first pipeline may operate on data using the ranges of addresses. Different ranges of addresses in the memory may be redistributed among a second set of functions in a second pipeline without waiting for the first set of functions to be flushed of data.Type: GrantFiled: June 27, 2012Date of Patent: December 3, 2013Assignee: Intel CorporationInventor: Thomas A. Piazza
-
Patent number: 8593465Abstract: The present invention provides a system for handling extra contexts for shader constants, and applications thereof. In an embodiment there is provided a computer-based method for executing a series of compute packets in an execution pipeline. The execution pipeline includes a first plurality of registers configured to store state-updates of a first type and a second plurality of registers configured to store state-updates of a second type. A first number of state-updates of the first type and a second number of state-updates of the second type are respectively identified and stored in the first and second plurality of registers. A compute packet is sent to the execution pipeline responsive to the first number and the second number. Then, the compute packet is executed by the execution pipeline.Type: GrantFiled: June 13, 2007Date of Patent: November 26, 2013Assignee: Advanced Micro Devices, Inc.Inventors: Mark M. Leather, Brian D. Emberling
-
Patent number: 8595468Abstract: A multi-core processor system supporting simultaneous thread sharing across execution resources of multiple processor cores is provided. The multi-core processor system includes a first processor core with a first instruction queue and dispatch logic in communication with a first execution resource of the first processor core. The multi-core processor system also includes a second processor core with a second instruction queue and dispatch logic in communication with a second execution resource of the second processor core. A high-speed execution resource bus couples the first and second processor cores. The first instruction queue and dispatch logic is configured to issue a first instruction of a thread to the first execution resource and issue a second instruction of the thread over the high-speed execution resource bus to the second execution resource for simultaneous execution of the first and second instruction of the thread on the first and second processor cores.Type: GrantFiled: December 17, 2009Date of Patent: November 26, 2013Assignee: International Business Machines CorporationInventors: Shawn M. Luke, John Sargis, Jr., Daneyand J. Singley
-
Publication number: 20130305022Abstract: Mechanisms are provided, in a processor, for executing instructions that are younger than a previously dispatched synchronization (sync) instruction is provided. An instruction sequencer unit of the processor dispatches a sync instruction. The sync instruction is sent to a nest of one or more devices outside of the processor. The instruction sequencer unit dispatches a subsequent instruction after dispatching the sync instruction. The dispatching of the subsequent instruction after dispatching the sync instruction is performed prior to receiving a sync acknowledgement response from the nest. The instruction sequencer unit performs a completion of the subsequent instruction based on whether completion of the subsequent instruction is dependent upon receiving the sync acknowledgement from the nest and completion of the sync instruction.Type: ApplicationFiled: May 14, 2012Publication date: November 14, 2013Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Susan E. Eisen, Hung Q. Le, Bryan J. Lloyd, Dung Q. Nguyen, David S. Ray, Benjamin W. Stolt, Shih-Hsiung S. Tung
-
Publication number: 20130305021Abstract: Basic blocks within a thread program are characterized for convergence based on variance analysis or corresponding instructions. Each basic block is marked as divergent based on transitive control dependence on a block that is either divergent or comprising a variant branch condition. Convergent basic blocks that are defined by invariant instructions are advantageously identified as candidates for scalarization by a thread program compiler.Type: ApplicationFiled: May 9, 2012Publication date: November 14, 2013Inventors: Vinod GROVER, Yunsup LEE, Xiangyun KONG, Gautam CHAKRABARTI, Ronny M. KRASHINSKY
-
Publication number: 20130283013Abstract: In accordance with embodiments disclosed herein, there are provided methods, systems, and apparatuses for enabling an agent interfacing with a pipelined backbone to locally handle transactions while obeying an ordering rule including, for example, receiving a transaction which requests access to a backbone; decoding routing destination information from the transaction received, in which the decoded routing destination information designates the transaction to be processed either locally or processed via the backbone; storing the decoded routing destination information and the transaction into a First-In-First-Out (FIFO) buffer; retrieving the decoded routing destination information and the transaction from the FIFO buffer; and processing the transaction locally or via the backbone based on the decoded routing destination information retrieved from the FIFO buffer with the transaction.Type: ApplicationFiled: November 9, 2011Publication date: October 24, 2013Inventors: Ngek Leong Guok, Kah Meng Yeem, Poh Thiam Teoh, Su Wei Lim
-
Patent number: 8566568Abstract: Instruction execution delay is alterable after the system design has been finalized, thus enabling the system to dynamically account for various conditions that impact instruction execution. In some embodiments, the dynamic delay is determined by an application to be executed by the processing system. In other embodiments, the dynamic delay is determined by analyzing the history of previously executed instructions. In yet other embodiments, the dynamic delay is determined by assessing the processing resources available to a given application. Regardless, the delay may be dynamically altered on a per-instruction, multiple instruction, or application basis. Processor instruction execution may be controlled by determining a first delay value for a first set of one or more instructions and a second delay value for a second set of one or more instructions. Execution of the sets of instructions is delayed based on the corresponding delay value.Type: GrantFiled: August 16, 2006Date of Patent: October 22, 2013Assignee: QUALCOMM IncorporatedInventors: Gerald Paul Michalak, Kenneth Alan Dockser
-
Patent number: 8560812Abstract: A multithread execution device includes: a program memory in which a plurality of programs are stored; an instruction issue unit that issues an instruction retrieved from the program memory; an instruction execution unit that executes the instruction; a target execution speed information memory that stores target execution speed information of the instruction; an execution speed monitor that monitors an execution speed of the instruction; a feedback control unit that commands the instruction issue unit to issue the instruction such that the execution speed of the instruction approximately corresponds to the target execution speed information.Type: GrantFiled: May 20, 2010Date of Patent: October 15, 2013Assignees: Toyota Jidosha Kabushiki Kaisha, Renesas Electronics CorporationInventors: Tetsuaki Wakabayashi, Koji Adachi, Kazuya Okamoto
-
Patent number: 8561079Abstract: The information processing device in the simultaneous multi-threading system is operated in an inter-thread performance load arbitration control method, and includes: an instruction input control unit for sharing among threads control of inputting an instruction in an arithmetic unit for acquiring the instruction from memory and performing an operation on the basis of the instruction; a commit stack entry provided for each thread for holding information obtained by decoding the instruction; an instruction completion order control unit for updating the memory and a general purpose register depending on an arithmetic result obtained by the arithmetic unit in an order of the instructions input from the instruction input control unit; and a performance load balance analysis unit for detecting the information registered in the commit stack entry and controlling the instruction input control unit.Type: GrantFiled: December 11, 2009Date of Patent: October 15, 2013Assignee: Fujitsu LimitedInventors: Takashi Suzuki, Toshio Yoshida
-
Publication number: 20130262823Abstract: A computer system for optimizing instructions includes a processor including an instruction execution unit configured to execute instructions and an instruction optimization unit configured to optimize instructions and memory to store machine instructions to be executed by the instruction execution unit. The computer system is configured to perform a method including analyzing machine instructions from among a stream of instructions to be executed by the instruction execution unit, the machine instructions including a memory load instruction and a data processing instruction to perform a data processing function based on the memory load instruction, identifying the machine instructions as being eligible for optimization, merging the machine instructions into a single optimized internal instruction, and executing the single optimized internal instruction to perform a memory load function and a data processing function corresponding to the memory load instruction and the data processing instruction.Type: ApplicationFiled: March 28, 2012Publication date: October 3, 2013Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Michael K. Gschwind, Valentina Salapura
-
Patent number: 8539204Abstract: One embodiment of the present invention sets forth a technique for performing aggregation operations across multiple threads that execute independently. Aggregation is specified as part of a barrier synchronization or barrier arrival instruction, where in addition to performing the barrier synchronization or arrival, the instruction aggregates (using reduction or scan operations) values supplied by each thread. When a thread executes the barrier aggregation instruction the thread contributes to a scan or reduction result, and waits to execute any more instructions until after all of the threads have executed the barrier aggregation instruction. A reduction result is communicated to each thread after all of the threads have executed the barrier aggregation instruction and a scan result is communicated to each thread as the barrier aggregation instruction is executed by the thread.Type: GrantFiled: September 24, 2010Date of Patent: September 17, 2013Assignee: Nvidia CorporationInventors: Brian Fahs, Ming Y. Siu, Brett W. Coon, John R. Nickolls, Lars Nyland
-
Patent number: 8539203Abstract: In an exemplary aspect, the present invention provides a multi-thread processor including a plurality of hardware threads each of which generates an independent instruction flow, a thread scheduler that outputs a thread selection signal in accordance with a first or second schedule, the thread selection signal designating a hardware thread to be executed in a next execution cycle among the plurality of hardware threads, a first selector that selects one of the plurality of hardware threads according to the thread selection signal and outputs an instruction generated by the selected hardware thread, and an execution pipeline that executes an instruction output from the first selector, wherein when the multi-thread processor is in a first state, the thread scheduler selects the first schedule, and when the multi-thread processor is in a second state, the thread scheduler selects the second schedule.Type: GrantFiled: September 23, 2009Date of Patent: September 17, 2013Assignee: Renesas Electronics CorporationInventors: Koji Adachi, Toshiyuki Matsunaga
-
Patent number: 8533721Abstract: A method and system to schedule out of order operations without the requirement to execute compare, ready and pick logic in a single cycle. A lazy out-of-order scheduler splits each scheduling loop into two consecutive cycles. The scheduling loop includes a compare stage, a ready stage and a pick stage. The compare stage and the ready stage are executed in a first of the two consecutive cycles and the pick stage is executed in a second of the two consecutive cycles. By splitting each scheduling loop into two consecutive cycles, selecting the oldest operation by default and checking the readiness of the oldest operation, it relieves the system of timing requirements and avoids the need for power hungry logic. Every execution of an operation does not appear as one extra cycle longer and the lazy out-of-order scheduler retains most of the performance of a full out-of-order scheduler.Type: GrantFiled: March 26, 2010Date of Patent: September 10, 2013Assignee: Intel CorporationInventors: Stephen J. Robinson, Deepak Limaye
-
Patent number: 8533719Abstract: The disclosed embodiments provide a system that facilitates scheduling threads in a multi-threaded processor with multiple processor cores. During operation, the system executes a first thread in a processor core that is associated with a shared cache. During this execution, the system measures one or more metrics to characterize the first thread. Then, the system uses the characterization of the first thread and a characterization for a second, second thread to predict a performance impact that would occur if the second thread were to simultaneously execute in a second processor core that is also associated with the cache. If the predicted performance impact indicates that executing the second thread on the second processor core will improve performance for the multi-threaded processor, the system executes the second thread on the second processor core.Type: GrantFiled: April 5, 2010Date of Patent: September 10, 2013Assignee: Oracle International CorporationInventors: Alexandra Fedorova, David Vengerov, Kishore Kumar Pusukuri
-
Patent number: 8533435Abstract: One embodiment of the present invention sets forth a technique for collecting operands specified by an instruction. As a sequence of instructions is received the operands specified by the instructions are assigned to ports, so that each one of the operands specified by a single instruction is assigned to a different port. Reading of the operands from a multi-bank register file is scheduled by selecting an operand from each one of the different ports to produce an operand read request and ensuring that two or more of the selected operands are not stored in the same bank of the multi-bank register file. The operands specified by the operand read request are read from the multi-bank register file in a single clock cycle. Each instruction is then executed as the operands specified by the instruction are read from the multi-bank register file and collected over one or more clock cycles.Type: GrantFiled: September 3, 2010Date of Patent: September 10, 2013Assignee: NVIDIA CorporationInventors: Xiaogang Qiu, Ming Y. Siu, Yan Yan Tang, John Erik Lindholm, Michael C. Shebanow, Stuart F. Oberman
-
Patent number: 8527740Abstract: A system and method for enhancing barrier collective synchronization on a computer system comprises a computer system including a data storage device. The computer system includes a program stored in the data storage device and steps of the program being executed by a processor. The system includes providing a plurality of communicators for storing state information for a bather algorithm. Each communicator designates a master core in a multi-processor environment of the computer system. The system allocates or designates one counter for each of a plurality of threads. The system configures a table with a number of entries equal to the maximum number of threads. The system sets a table entry with an ID associated with a communicator when a process thread initiates a collective. The system determines an allocated or designated counter by searching entries in the table.Type: GrantFiled: January 29, 2010Date of Patent: September 3, 2013Assignee: International Business Machines CorporationInventors: Sameer Kumar, Amith R. Mamidala, Joseph D. Ratterman, Michael Blocksome, Douglas Miller
-
Patent number: 8521991Abstract: A technique for selecting instructions for execution from an issue queue at multiple function units while reducing the chances of instruction collisions. In an embodiment, each function unit in a processor may include a selection logic circuit that selects a specific instruction from the issue queue for execution. In order to avoid instruction collision, a function unit may have a selection logic circuit that may select two instructions from an instruction queue: one according to a first selection technique and one according to a second selection technique. Then, by comparing the instruction selected by the first selection technique to the instruction selected by the selection logic circuit of another function unit, the instruction selected by the second technique may be used instead if there will be an instruction collision because the instruction selected by the first selection technique is the same as the instruction selected at a different function unit.Type: GrantFiled: December 4, 2009Date of Patent: August 27, 2013Assignee: STMicroelectronics (Beijing) R&D Co., Ltd.Inventors: Kai-feng Wang, Hong-Xia Sun, Peng-fei Zhu, Yong-qiang Wu