Instruction Issuing Patents (Class 712/214)
  • Publication number: 20110153991
    Abstract: A system and method for issuing a processor instruction to multiple processing sections arranged in an out-of-order processing pipeline architecture. The multiple processing sections include a first execution unit with a pipeline length and a second execution unit operating upon data produced by the first execution unit. An instruction issue unit accepts a complex instruction that is cracked into respective micro-ops for the first execution unit and the second execution unit. The instruction issue unit issues the first micro-op to the first execution unit to produce intermediate data. The instruction issue unit then delays for a time period corresponding to the processing pipeline length of the first execution unit. After the delay, a second micro-op is issued to the second execution unit.
    Type: Application
    Filed: December 23, 2009
    Publication date: June 23, 2011
    Applicant: International Business Machines Corporation
    Inventors: FADI BUSABA, Brian Curran, Lee Eisen, Christian Jacobi, David A. Schroter, Eric Schwarz
  • Patent number: 7966477
    Abstract: A method and apparatus for power optimized replay. In one embodiment, the method includes the issuance of an instruction selected from a queue. Once issued, the instruction may be enqueued within a recirculation queue if completion of the instruction is blocked by a detected blocking condition. Once enqueued, instructions contained within the recirculation queue may be reissued once a blocking condition of an instruction within the recirculation queue is satisfied. Accordingly, a power optimized replay scheme as described herein optimizes power while retaining the advantages provided by selectively replaying of blocked instructions to improve power efficiency.
    Type: Grant
    Filed: May 18, 2010
    Date of Patent: June 21, 2011
    Assignee: Marvell International Ltd.
    Inventors: Sujat Jamil, Hang Nguyen, Samantha J. Edirisooriya, David E. Miner, R. Frank O'Bleness, Steven J. Tu
  • Publication number: 20110138152
    Abstract: A processor which executes threads having different characteristics is provided with an instruction control device. In the instruction control device, a first instruction control unit issues an instruction included in a first instruction sequence to an instruction execution unit. In addition, a second instruction control unit issues an instruction included in a second instruction sequence to the instruction execution unit. Here, the delay time of the second instruction control unit is shorter than the delay time of the first instruction control unit.
    Type: Application
    Filed: August 18, 2009
    Publication date: June 9, 2011
    Applicant: PANASONIC CORPORATION
    Inventor: Satoshi Ogura
  • Patent number: 7949855
    Abstract: A processor buffers asynchronous threads. Instructions requiring operations provided by a plurality of execution units are divided into phases, each phase having at least one computation operation and at least one memory access operation. Instructions within each phase are qualified and prioritized. The instructions may be qualified based on the status of the execution unit needed to execute one or more of the current instructions. The instructions may also be qualified based on an age of each instruction, status of the execution units, a divergence potential, locality, thread diversity, and resource requirements. Qualified instructions may be prioritized based on execution units needed to execute instructions and the execution units in use. One or more of the prioritized instructions is issued per cycle to the plurality of execution units.
    Type: Grant
    Filed: April 28, 2008
    Date of Patent: May 24, 2011
    Assignee: NVIDIA Corporation
    Inventors: Peter C. Mills, John Erik Lindholm, Brett W. Coon, Gary M. Tarolli, John Matthew Burgess
  • Patent number: 7945764
    Abstract: A multirate execution unit is capable of being operated in a plurality of modes, with the execution unit being capable of clocked at multiple different rates relative to a multithreaded issue unit such that, in applications where maximum performance is desired, the execution unit can be clocked at a rate that is faster than the clock rate for the multithreaded issue unit, and in applications where a lower power profile is desired, the execution unit can be throttled back to a slower rate to reduce the power consumption of the execution unit. When the execution unit is clocked at a faster rate than the multithreaded issue unit, the issue unit is permitted to issue more instructions per cycle than when the execution unit is throttled to the slower rate to increase overall instruction throughput.
    Type: Grant
    Filed: January 11, 2008
    Date of Patent: May 17, 2011
    Assignee: International Business Machines Corporation
    Inventors: Eric Oliver Mejdrich, Adam James Muff, Matthew Ray Tubbs
  • Patent number: 7941642
    Abstract: In one embodiment, a multithreaded processor includes a multithreaded instruction source that may provide a plurality of instructions each corresponding to a respective one of a plurality of threads. The multithreaded processor also includes a pick unit coupled to the multithreaded instruction source. The pick unit may select in a given cycle, a first divide instruction corresponding to one thread of the plurality of threads and a second divide instruction corresponding to another thread of the plurality of threads based upon a thread selection algorithm. Further, the multithreaded processor includes a storage coupled to a functional unit including a divider configured to execute the first divide instruction and the second divide instruction. The storage may store one of the first and the second divide instructions during execution of the other of the first and the second divide instructions.
    Type: Grant
    Filed: June 30, 2004
    Date of Patent: May 10, 2011
    Assignee: Oracle America, Inc.
    Inventors: Robert T. Golla, Jeffrey S. Brooks, Christopher H. Olson
  • Patent number: 7941644
    Abstract: A processing unit includes multiple execution units and sequencer logic that is disposed downstream of instruction buffer logic, and that is responsive to a sequencer instruction present in an instruction stream. In response to such an instruction, the sequencer logic issues a plurality of instructions associated with a long latency operation to one execution unit, while blocking instructions from the instruction buffer logic from being issued to that execution unit. In addition, the blocking of instructions from being issued to the execution unit does not affect the issuance of instructions to any other execution unit, and as such, other instructions from the instruction buffer logic are still capable of being issued to and executed by other execution units even while the sequencer logic is issuing the plurality of instructions associated with the long latency operation.
    Type: Grant
    Filed: October 16, 2008
    Date of Patent: May 10, 2011
    Assignee: International Business Machines Corporation
    Inventors: Eric Oliver Mejdrich, Adam James Muff, Matthew Ray Tubbs
  • Patent number: 7941643
    Abstract: A system, apparatus and method for an interleaving multi-thread processing device are described herein. The multi-thread processing device includes an execution block to execute instructions and a fetch block to fetch and issue instructions, interleavingly, of a first instruction execution thread and at least one other instruction execution thread. The fetch block includes at least one program counter, which is allocable and/or corresponds to each instruction execution thread.
    Type: Grant
    Filed: July 9, 2007
    Date of Patent: May 10, 2011
    Assignee: Marvell World Trade Ltd.
    Inventors: Jack Kang, Yu-Chi Chuang
  • Patent number: 7937563
    Abstract: A system and method for providing a digital real-time voltage droop detection and subsequent voltage droop reduction. A scheduler within a reservation station may store a weight value for each instruction corresponding to node capacitance switching activity for the instruction derived from pre-silicon power modeling analysis. For instructions picked with available source data, the corresponding weight values are summed together to produce a local current consumption value and this value is summed with any existing global current consumption values from corresponding schedulers of other processor cores yielding an activity event. The activity event is stored. Hashing functions within the scheduler are used to determine both a recent and an old activity average using the calculated activity event and stored older activity events.
    Type: Grant
    Filed: May 27, 2008
    Date of Patent: May 3, 2011
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Samuel D. Naffziger, Michael Gerard Butler
  • Patent number: 7937565
    Abstract: The method and system for data speculation of multicore systems are disclosed. In one embodiment, a method includes dynamically determining whether a current speculative load instruction and an associated store instruction have same memory addresses in an application thread in compiled code running on a main core using a dynamic helper thread running on a idle core substantially before encountering the current speculative load instruction. The instruction sequence associated with the current speculative load instruction is then edited by the dynamic helper thread based on the outcome of the determination so that the current speculative load instruction becomes a non-speculative load instruction.
    Type: Grant
    Filed: February 21, 2008
    Date of Patent: May 3, 2011
    Assignee: Hewlett-Packard Development Company, L.P.
    Inventors: Sandya Srivilliputtur Mannarswamy, Hariharan Sandanagobalane
  • Patent number: 7937562
    Abstract: A processing apparatus includes an execution stage which executes each of instruction streams, a first resource counter which counts the number of operating resources used when the execution stage executes a first one of the instruction streams, a second resource counter which holds data of the number of unused ones of the operating resources, and a control circuit which reads the count value of the first resource counter from a management table when a subsequent instruction stream is executed, to control a start of execution of the subsequent instruction stream in accordance with a subtraction result obtained by subtracting the count value from the data. The control circuit checks whether a number of operating resources required by the subsequent instruction stream is secured based on the subtraction result before the subsequent instruction stream starts to be executed.
    Type: Grant
    Filed: July 27, 2005
    Date of Patent: May 3, 2011
    Assignee: Kabushiki Kaisha Toshiba
    Inventors: Itaru Yamazaki, Tatsuo Teruyama
  • Publication number: 20110099354
    Abstract: An information processing apparatus includes an instruction supplying section that supplies a plurality of instructions as a single instruction group, an executing section that repetitively executes a plurality of execution processes corresponding to the plurality of instructions in parallel, an issue timing control section that controls an issue timing of each of the instructions to the executing section so that the plurality of execution processes are executed with a timing delayed in accordance with a predetermined latency, and an operand transforming section that transforms an operand register address of each of the instructions in accordance with a predetermined increment value upon every repetition of execution in the executing section.
    Type: Application
    Filed: August 24, 2010
    Publication date: April 28, 2011
    Applicant: Sony Corporation
    Inventors: Satoshi Takashima, Hirokazu Hanaki
  • Publication number: 20110078414
    Abstract: A processor includes an instruction fetch unit configured to issue instructions for execution, where the instructions are selected from a number of threads, where each given instruction has a corresponding thread identifier, and where at least some of the instructions specify operand(s) via register identifiers. A register file stores operands usable by the instructions, and may include several banks, each corresponding to a register identifiers and including several entries corresponding to the several threads, wherein the entries are configured to store data values. In response to receiving a request to read a particular register identifier for a given thread identifier, the register file may be configured to decode the given thread identifier to retrieve entries from the banks that correspond to the given thread identifier. The register file may further select, from among the retrieved entries, a data value corresponding to the particular register identifier to be output.
    Type: Application
    Filed: September 30, 2009
    Publication date: March 31, 2011
    Inventors: Christopher H. Olson, Xiang Shan Li, Robert T. Golla
  • Publication number: 20110078416
    Abstract: A computer processor comprises a decode unit and a processing channel. The decode unit decodes a stream of instruction packets from a memory, each instruction packet comprising a plurality of instructions. The processing channel comprises a plurality of functional units and operable to perform control processing operations. The decode unit is operable to receive and decode instruction packets of a bit length of 64 bits and to detect if the instruction packet defines three control instructions each having a length of 21 bits. The decode unit detects that the instruction packet comprises the three control instructions. The control instructions are supplied to the processing channel for execution in the order in which they appear in the instruction packet. The detection uses an identification bit in the instruction packet.
    Type: Application
    Filed: December 9, 2010
    Publication date: March 31, 2011
    Applicant: Icera Inc.
    Inventor: Simon Knowles
  • Patent number: 7917784
    Abstract: Methods and systems for managing power consumption in data processing systems are described. In one embodiment, a data processing system includes a general purpose processing unit, a graphics processing unit (GPU), at least one peripheral interface controller, at least one bus coupled to the general purpose processing unit, and a power controller coupled to at least the general purpose processing unit and the GPU. The power controller is configured to turn power off for the general purpose processing unit in response to a first state of an instruction queue of the general purpose processing unit and is configured to turn power off for the GPU in response to a second state of an instruction queue of the GPU. The first state and the second state represent an instruction queue having either no instructions or instructions for only future events or actions.
    Type: Grant
    Filed: January 7, 2007
    Date of Patent: March 29, 2011
    Assignee: Apple Inc.
    Inventors: Joshua de Cesare, Bernard Semeria, Michael Smith
  • Publication number: 20110072243
    Abstract: One embodiment of the present invention sets forth a technique for collecting operands specified by an instruction. As a sequence of instructions is received the operands specified by the instructions are assigned to ports, so that each one of the operands specified by a single instruction is assigned to a different port. Reading of the operands from a multi-bank register file is scheduled by selecting an operand from each one of the different ports to produce an operand read request and ensuring that two or more of the selected operands are not stored in the same bank of the multi-bank register file. The operands specified by the operand read request are read from the multi-bank register file in a single clock cycle. Each instruction is then executed as the operands specified by the instruction are read from the multi-bank register file and collected over one or more clock cycles.
    Type: Application
    Filed: September 3, 2010
    Publication date: March 24, 2011
    Inventors: Xiaogang Qiu, Ming Y. Siu, Yan Yan Tang, John Erik Lindholm, Michael C. Shebanow, Stuart F. Oberman
  • Publication number: 20110072244
    Abstract: One embodiment of the present invention sets forth a technique for ensuring cache access instructions are scheduled for execution in a multi-threaded system to improve cache locality and system performance. A credit-based technique may be used to control instruction by instruction scheduling for each warp in a group so that the group of warps is processed uniformly. A credit is computed for each warp and the credit contributes to a weight for each warp. The weight is used to select instructions for the warps that are issued for execution.
    Type: Application
    Filed: September 17, 2010
    Publication date: March 24, 2011
    Inventors: John Erik Lindholm, Brett W. Coon, Jered Wierzbicki, Robert J. Stoll, Stuart F. Oberman
  • Patent number: 7913070
    Abstract: Tracking the order of issued instructions using a counter is presented. In one embodiment, a saturating, decrementing counter is used. The counter is initialized to a value that corresponds to the processor's commit point. Instructions are issued from a first issue queue to one or more execution units and one or more second issue queues. After being issued by the first issue queue, the counter associated with each instruction is decremented during each instruction cycle until the instruction is executed by one of the execution units. Once the counter reaches zero it will be completed by the execution unit. If a flush condition occurs, instructions with counters equal to zero are maintained (i.e., not flushed or invalidated), while other instructions in the pipeline are invalidated based upon their counter values.
    Type: Grant
    Filed: October 13, 2008
    Date of Patent: March 22, 2011
    Assignee: International Business Machines Corporation
    Inventors: Christopher Michael Abernathy, Jonathan James DeMent, Ronald Hall, Robert Alan Philhower, David Shippy
  • Publication number: 20110055523
    Abstract: A method and apparatus for branch determination. The method includes a first command issuing within a computer processor, wherein execution of the first command by the computer processor includes evaluating one or more conditions to set one or more flags. The method further includes a second command issuing, subsequent to the first command issuing, within the computer processor, wherein execution of the second command by the computer processor includes causing the computer processor to wait until the one or more flags are set. Subsequent to the first and second commands issuing, the method includes a third command issuing within the computer processor, wherein execution of the third command by the computer processor includes performing a jump operation based on a value of at least one of the one or more flags set by the first command.
    Type: Application
    Filed: August 31, 2009
    Publication date: March 3, 2011
    Inventors: David A. Kaplan, Daniel B. Hopper, Benjamin C. Serebrin
  • Patent number: 7890734
    Abstract: In one embodiment, a multithreaded processor includes a plurality of buffers, each configured to store instructions corresponding to a respective thread. The multithreaded processor also includes a pick unit coupled to the plurality of buffers. The pick unit may pick from at least one of the buffers in a given cycle, a valid instruction based upon a thread selection algorithm. The pick unit may further cancel, in the given cycle, the picking of the valid instruction in response to receiving a cancel indication.
    Type: Grant
    Filed: June 30, 2004
    Date of Patent: February 15, 2011
    Assignee: Open Computing Trust I & II
    Inventor: Robert T. Golla
  • Patent number: 7882335
    Abstract: The present invention provides system and method for a group priority issue schema for a cascaded pipeline. The system includes a cascaded delayed execution pipeline unit having a plurality of execution pipelines that execute instructions in a common issue group in a delayed manner relative to each other. The system further includes circuitry configured to: (1) receive an issue group of instructions; (2) determine if at least one load instruction is in the issue group, if so scheduling the least one load instruction in a first pipeline based upon a priority list; and (3) schedule execution of the issue group of instructions in the cascaded delayed execution pipeline unit.
    Type: Grant
    Filed: February 19, 2008
    Date of Patent: February 1, 2011
    Assignee: International Business Machines Corporation
    Inventor: David A Luick
  • Patent number: 7877579
    Abstract: The present invention provides a system and method for prioritizing compare instructions in a cascaded pipeline. The system includes a cascaded delayed execution pipeline unit having a plurality of execution pipelines that execute instructions in a common issue group in a delayed manner relative to each other. The system further includes circuitry configured to: (1) receive an issue group of instructions; (2) determine if at least one compare instruction is in the issue group, if so scheduling the least one compare instruction in a one of the plurality of execution pipelines based upon a first prioritization scheme; (3) determine if there is an issue conflict for one of the plurality of execution pipelines and resolving the issue conflict by scheduling the at least one compare instruction in a different execution pipeline; (4) schedule execution of the issue group of instructions in the cascaded delayed execution pipeline unit.
    Type: Grant
    Filed: February 19, 2008
    Date of Patent: January 25, 2011
    Assignee: International Business Machines Corporation
    Inventor: David A. Luick
  • Patent number: 7870368
    Abstract: The present invention provides a system and method for prioritizing branch instructions in a cascaded pipeline. The system includes a cascaded delayed execution pipeline unit having a plurality of execution pipelines that execute instructions in a common issue group in a delayed manner relative to each other. The system further includes circuitry configured to: (1) receive an issue group of instructions; (2) determine if at least one branch instruction is in the issue group, if so scheduling the least one branch instruction in a one of the plurality of execution pipelines based upon a first prioritization scheme; (3) determine if there is an issue conflict for one of the plurality of execution pipelines and resolving the issue conflict by scheduling the at least one branch instruction in a different execution pipeline; (4) schedule execution of the issue group of instructions in the cascaded delayed execution pipeline unit.
    Type: Grant
    Filed: February 19, 2008
    Date of Patent: January 11, 2011
    Assignee: International Business Machines Corporation
    Inventor: David A. Luick
  • Publication number: 20110004743
    Abstract: A data processing apparatus 1 has a plurality of registers 10 of the same type of register and a plurality of processing pipelines 40, 50, each processing pipeline 40, 50 being arranged to process instructions. At least one instruction includes a destination register specifier specifying which of said registers is a destination register for storing a processing result of the at least one instruction. Instruction issuing circuitry 26 is configured to issue the at least one instruction for processing by one of the plurality of processing pipelines. The instruction issuing circuitry 26 selects the one of the plurality of processing pipelines to which the candidate instruction is issued in dependence upon the value of the destination register specifier of the candidate instruction.
    Type: Application
    Filed: July 1, 2009
    Publication date: January 6, 2011
    Applicant: ARM Limited
    Inventor: David Raymond Lutz
  • Patent number: 7865700
    Abstract: The present invention provides a system and method for prioritizing store instructions in a cascaded pipeline. The system includes a cascaded delayed execution pipeline unit having a plurality of execution pipelines that execute instructions in a common issue group in a delayed manner relative to each other. The system further includes circuitry configured to: (1) receive an issue group of instructions; (2) determine if at least one store instruction is in the issue group, if so scheduling the least one store instruction in a one of the plurality of execution pipelines based upon a first prioritization scheme; (3) determine if there is an issue conflict for one of the plurality of execution pipelines and resolving the issue conflict by scheduling the at least one store instruction in a different execution pipeline; (4) schedule execution of the issue group of instructions in the cascaded delayed execution pipeline unit.
    Type: Grant
    Filed: February 19, 2008
    Date of Patent: January 4, 2011
    Assignee: International Business Machines Corporation
    Inventor: David A. Luick
  • Patent number: 7865862
    Abstract: A design structure embodied in a machine readable medium used in a design process includes an apparatus for dynamically selecting compiled instructions for execution, the apparatus including an input for receiving static instructions for execution on a first execution unit and receiving dynamic instructions for execution on a second execution unit; and an instruction selection element adapted to evaluate throughput performance of the static instructions and dynamic instructions based on current states of the execution units and select the static instructions or the dynamic instructions for execution at runtime on the first execution unit or the second execution unit, respectively, based on the throughput performance of the instructions.
    Type: Grant
    Filed: November 8, 2007
    Date of Patent: January 4, 2011
    Assignee: International Business Machines Corporation
    Inventors: Deanna J. Chou, Jesse E. Craig, John Sargis, Jr., Daneyand J. Singley, Sebastian T. Ventrone
  • Publication number: 20100332804
    Abstract: Systems and methods for efficient picking of instructions for out-of-order issue and execution in a processor. In one embodiment, a processor comprises a unified pick queue that is dynamically allocated. Each entry is configured to store age and dependency information relative to other decoded instructions. Also, each entry stores a picked field, which when asserted indicates the decoded instruction has already been picked for out-of-order issue and execution. When asserted, a trigger field indicates a result of a corresponding decoded instruction will be available a predetermined number of clock cycles afterward. A younger instruction dependent on a result of an older instruction is ready to be picked before the result of the older instruction is available. In this case, the older instruction has asserted picked and trigger fields.
    Type: Application
    Filed: June 29, 2009
    Publication date: December 30, 2010
    Inventors: Robert T. Golla, Matthew B. Smittle, Mark A. Luttrell, Xiang Shan Li
  • Patent number: 7861063
    Abstract: In one embodiment, a processor comprises a fetch unit and a pick unit. The fetch unit is configured to fetch instructions for execution by the processor. The pick unit is configured to schedule instructions fetched by the fetch unit for execution in the processor. The pick unit is configured to inhibit scheduling a delayed control transfer instruction (DCTI) until a delay slot instruction of the DCTI is available for scheduling. For example, in some embodiments, the pick unit may inhibit scheduling until the delay slot instruction is written to an instruction buffer, until the delay slot instruction is fetched, etc.
    Type: Grant
    Filed: June 30, 2004
    Date of Patent: December 28, 2010
    Assignee: Oracle America, Inc.
    Inventors: Robert T. Golla, Paul J. Jordan, Jama I. Barreh
  • Patent number: 7861062
    Abstract: The data processing device has a plurality of functional units and issues instructions in successive instruction cycles. Instructions of a first type are each intended for one functional unit at a time. An instruction of a second type causes a combination of functional units to respond in the same instruction execution cycle, a result from one functional unit being used by another as part of the execution of the same instruction. Preferably, the device supports alternative operation at a number of different instruction cycle rates, dependent on whether an executed program segment contains instructions of the second type. The fastest instruction cycle rate does not allow execution of the instruction of the second type, because operation by different functional units does not fit within the instruction execution cycle.
    Type: Grant
    Filed: June 22, 2004
    Date of Patent: December 28, 2010
    Assignee: Koninklijke Philips Electronics N.V.
    Inventors: Carlos Antonio Alba Pinto, Balakrishnan Srinivasan, Ramanathan Sethuraman
  • Patent number: 7861065
    Abstract: A computer processor that includes a plurality of execution pipelines, each execution pipeline including a configuration of one or more execution units of the processor, each execution pipeline characterized by an execution pipeline type, each execution pipeline type determined according to the types of computer program instructions executed in each execution pipeline; a plurality of hardware threads of execution, each hardware thread including computer program instructions characterized by instruction types, each hardware thread including sequences of instructions of a same instruction type, the sequences interspersed with instructions of other types; and an instruction dispatcher capable of dispatching instructions preferentially during a predefined preference period from a preferred hardware thread to a particular execution pipeline in dependence upon whether the preferred hardware thread presents a sequence of instructions, ready for execution from the preferred hardware thread, of a type for execution in
    Type: Grant
    Filed: May 9, 2008
    Date of Patent: December 28, 2010
    Assignee: International Business Machines Corporation
    Inventors: Timothy H. Heil, Brian L. Koehler, Eric O. Mejdrich
  • Patent number: 7861064
    Abstract: A method for selectively accelerating early instruction processing including receiving an instruction data that is normally processed in an execution stage of a processor pipeline, wherein a configuration of the instruction data allows a processing of the instruction data to be accelerated from the execution stage to an address generation stage that occurs earlier in the processor pipeline than the execution stage, determining whether the instruction data can be dispatched to the address generation stage to be processed without being delayed due to an unavailability of a processing resource needed for the processing of the instruction data in the address generation stage, dispatching the instruction data to be processed in the address generation stage if it can be dispatched without being delayed due to the unavailability of the processing resource, and dispatching the instruction data to be processed in the execution stage if it can not be dispatched without being delayed due to the unavailability of the pro
    Type: Grant
    Filed: February 26, 2008
    Date of Patent: December 28, 2010
    Assignee: International Business Machines Corporation
    Inventors: Khary J. Alexander, Fadi Y. Busaba, Bruce C. Giamei, David S. Hutton, Chung-Lung Kevin Shum
  • Patent number: 7861066
    Abstract: A mechanism for suppressing instruction replay includes a processor having one or more execution units and a scheduler that issue instruction operations for execution by the one or more execution units. The scheduler may also cause instruction operations that are determined to be incorrectly executed to be replayed, or reissued. In addition, a prediction unit within the processor may predict whether a given instruction operation will replay and to provide an indication that the given instruction operation will replay. The processor also includes a decode unit that may decode instructions and in response to detecting the indication, may flag the given instruction operation. The scheduler may further inhibit issue of the flagged instruction operation until a status associated with the flagged instruction is good.
    Type: Grant
    Filed: July 20, 2007
    Date of Patent: December 28, 2010
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Ashutosh S. Dhodapkar, Michael G. Butler, Gene W. Shen
  • Publication number: 20100325394
    Abstract: A system and method for balancing instruction loads between multiple execution units are disclosed. One or more execution units may be represented by a slot configured to accept instructions on behalf of the execution unit(s). A decode unit may assign instructions to a particular slot for subsequent scheduling for execution. Slot assignments may be made based on an instruction's type and/or on a history of previous slot assignments. A cumulative slot assignment history may be maintained in a bias counter, the value of which reflects the bias of previous slot assignments. Slot assignments may be determined based on the value of the bias counter, in order to balance the instruction load across all slots, and all execution units. The bias counter may reflect slot assignments made only within a desired historical window. A separate data structure may store data reflecting the actual slot assignments made during the desired historical window.
    Type: Application
    Filed: June 23, 2009
    Publication date: December 23, 2010
    Inventors: Robert T. Golla, Gregory F. Grohoski
  • Publication number: 20100318769
    Abstract: A system and method of compiling program code, wherein the program code includes an operation on an array of data elements stored in memory of a computer system. The program code is scanned for an equation which operates on data of lengths other than the limited number of vector supported data lengths. The equation is then replaced with vectorized machine executable code, wherein the machine executable code comprises a nested loop and wherein the nested loop comprises an exterior loop and a virtual interior loop. The exterior loop decomposes the equation into a plurality of loops of length N, wherein N is an integer greater than one. The virtual interior loop executes vector operations corresponding to the N length loop to form a result vector of length N, wherein the virtual interior loop includes one or more vector atomic memory operation (AMO) instructions, used to resolve false conflicts.
    Type: Application
    Filed: June 12, 2009
    Publication date: December 16, 2010
    Applicant: Cray Inc.
    Inventor: Terry D. Greyzck
  • Patent number: 7853777
    Abstract: An apparatus for reducing instruction re-fetching in a multithreading processor configured to concurrently execute a plurality of threads is disclosed. The apparatus includes a buffer for each thread that stores fetched instructions of the thread, having an indicator for indicating which of the fetched instructions in the buffer have already been dispatched for execution. An input for each thread indicates that one or more of the already-dispatched instructions in the buffer has been flushed from execution. Control logic for each thread updates the indicator to indicate the flushed instructions are no longer already-dispatched, in response to the input. This enables the processor to re-dispatch the flushed instructions from the buffer to avoid re-fetching the flushed instructions. In one embodiment, there are fewer buffers than threads, and they are dynamically allocatable by the threads. In one embodiment, a single integrated buffer is shared by all the threads.
    Type: Grant
    Filed: February 4, 2005
    Date of Patent: December 14, 2010
    Assignee: MIPS Technologies, Inc.
    Inventors: Darren M. Jones, Ryan C. Kinter, G. Michael Uhler, Sanjay Vishin
  • Publication number: 20100312992
    Abstract: A multithread execution device includes: a program memory in which a plurality of programs are stored; an instruction issue unit that issues an instruction retrieved from the program memory; an instruction execution unit that executes the instruction; a target execution speed information memory that stores target execution speed information of the instruction; an execution speed monitor that monitors an execution speed of the instruction; a feedback control unit that commands the instruction issue unit to issue the instruction such that the execution speed of the instruction approximately corresponds to the target execution speed information.
    Type: Application
    Filed: May 20, 2010
    Publication date: December 9, 2010
    Applicants: TOYOTA JIDOSHA KABUSHIKI KAISHA, Renesas Electronics Corporation
    Inventors: Tetsuaki Wakabayashi, Koji Adachi, Kazuya Okamoto
  • Publication number: 20100306504
    Abstract: At least one instruction of a sequence of program instructions has a plurality of alternative outcomes including at least a first outcome that is independent of at least one operand and a second outcome that is dependent on the at least one operand. The at least one operand is a value generated by a preceding instruction in the sequence. The instruction is issued for execution independently of when the at least one operand is generated by the preceding instruction. Recovery circuitry is provided to perform a recovery operation in the event that the second outcome is executed for the at least one instruction and the at least one operand has not been generated by the preceding instruction when the at least one instruction is to be executed by said instruction execution circuitry.
    Type: Application
    Filed: May 27, 2009
    Publication date: December 2, 2010
    Applicant: ARM Limited
    Inventors: Robert Gregory McDonald, Paul Gilbert Meyer
  • Publication number: 20100306505
    Abstract: A processor 2 includes an execution cluster 10 having multiple execution units 14, 16, 18, 20. The execution units 14, 16, 18, 20 share result buses 22, 24. Issue circuitry 12 within the execution cluster 10 determines future availability of a result bus 22, 24 for an instruction to be issued (or recently issued) using a known cycle count for that instruction. The availability is tracked for each result bus using a mask register 32 storing a mask value within which each bit position indicates the availability or non-availability of that result bus at a particular processing cycle in the future. The mask value is left shifted each processing cycle.
    Type: Application
    Filed: June 1, 2009
    Publication date: December 2, 2010
    Inventors: David James Williamson, Conrado Blasco Allue
  • Publication number: 20100299504
    Abstract: A microprocessor includes an architectural register and a non-architectural register, each having a plurality of condition code flags. A first instruction of the microarchitectural instruction set of the microprocessor instructs the microprocessor to update the plurality of condition code flags based on a result of the first instruction. The first instruction includes a field for indicating whether to update the plurality of condition code flags of the architectural or non-architectural register. A second instruction of the microarchitectural instruction set instructs the microprocessor to conditionally perform an operation based on one of the plurality of condition code flags. The second instruction includes a field for indicating whether to use the one of the plurality of condition code flags of the architectural or non-architectural register to determine whether to perform the operation.
    Type: Application
    Filed: May 20, 2009
    Publication date: November 25, 2010
    Applicant: VIA TECHNOLOGIES, INC.
    Inventors: G. Glenn Henry, Terry Parks, Gerard M. Col
  • Patent number: 7818547
    Abstract: Embodiments of an apparatus, system and method enhance the efficiency of processor resource utilization during instruction prefetching via one or more speculative threads. Renamer logic and a map table are utilized to perform filtering of instructions in a speculative thread instruction stream. The map table includes a yes-a-thing bit to indicate whether the associated physical register's content reflects the value that would be computed by the main thread. A thread progress beacon table is utilized to track relative progress of a main thread and a speculative helper thread. Based upon information in the thread progress beacon table, the main thread may effect termination of a helper thread that is not likely to provide a performance benefit for the main thread.
    Type: Grant
    Filed: April 18, 2008
    Date of Patent: October 19, 2010
    Assignee: Intel Corporation
    Inventors: Tor M. Aamodt, Hong Wang, Per Hammarlund, John P. Shen, Steve Shih-wei Liao, Perry H. Wang
  • Publication number: 20100262808
    Abstract: The illustrative embodiments described herein provide a computer-implemented method, apparatus, and a system for managing instructions. A load/store unit receives a first instruction at a port. The load/store unit rejects the first instruction in response to determining that the first instruction has a first reject condition. Then, the instruction sequencing unit activates a first bit in response to the load/store unit rejection the first instruction. The instruction sequencing unit blocks the first instruction from reissue while the first bit is activated. The processor unit determines a class of rejection of the first instruction. The instruction sequencing unit starts a timer. The length of the timer is based on the class of rejection of the first instruction. The instruction sequencing unit resets the first bit in response to the timer expiring. The instruction sequencing unit allows the first instruction to become eligible for reissue in response to resetting the first bit.
    Type: Application
    Filed: April 8, 2009
    Publication date: October 14, 2010
    Applicant: International Business Machines Corporation
    Inventors: Pradip Bose, Alper Buyuktosunoglu, Michael Stephen Floyd, Dung Quoc Nguyen, Bruce Joseph Ronchetti
  • Publication number: 20100257341
    Abstract: A processor having a dependency matrix comprises a first array comprising a plurality of cells arranged in a plurality of columns and a plurality of rows. Each row represents an instruction in a processor execution queue and each cell represents a dependency relationship between two instructions in the processor execution queue. A first latch couples to the first array and comprises a first bit, the first bit indicating a first status. A second latch couples to the first array and comprises a second bit, the second bit indicating a second status. A first read port couples to the first array, comprising a first read wordline and a first read bitline. The first read wordline couples to the first latch and a first column and asserts a first available signal based on the first bit. The first read bitline couples to a first row and generates a first ready signal based on the first available signal and a first cell.
    Type: Application
    Filed: April 3, 2009
    Publication date: October 7, 2010
    Applicant: International Business Machines Corporation
    Inventors: Mary D. Brown, James W. Bishop, William E. Burky, John B. Griswell, JR., Dung Q. Nguyen, Todd A. Venton
  • Publication number: 20100257339
    Abstract: A processor having a dependency matrix comprises a first array comprising a plurality of cells arranged in a plurality of columns and a plurality of rows. Each row represents an instruction in a processor execution queue and each cell in the first array represents a dependency relationship between two instructions in the processor execution queue. A clear port couples to the first array and clears a column of the first array. A producer status module couples to the clear port and the first array and determines an execution status of a producer instruction, wherein the producer instruction is an instruction in the processor execution queue. An available-status port couples to the first array and the producer status module and sets a read wordline column corresponding to the producer instruction based on the execution status of the producer instruction. The available-status port deasserts the read wordline column in response to a selection of the producer for execution.
    Type: Application
    Filed: April 3, 2009
    Publication date: October 7, 2010
    Applicant: International Business Machines Corporation
    Inventors: Mary D. Brown, William E. Burky, Dung Q. Nguyen, Todd A. Venton
  • Publication number: 20100257336
    Abstract: A processor having a dependency matrix comprises a first array comprising a plurality of first cells. A second array couples to the first array and comprises a plurality of second cells. A first write port couples to the first array and the second array and writes to the first array and the second array. A first read port couples to the first array and the second array and reads from the first array and the second array. A second read port couples to the first array and reads from the first array. A second write port couples to the second read port, reads from the second read port and writes to the second array.
    Type: Application
    Filed: April 3, 2009
    Publication date: October 7, 2010
    Applicant: International Business Machines Corporation
    Inventors: Saiful Islam, Mary D. Brown, Bjorn P. Christensen, Sam G. Chu, Robert A. Cordes, Maureen A. Delaney, Jafar Nahidi, Joel A. Silberman
  • Patent number: 7809929
    Abstract: A universal register rename mechanism for instructions with multiple targets using a common destination tag. For each instruction that updates multiple destinations, a single rename entry is allocated to handle all destinations associated with it. A rename entry now consists of a DTAG and a vector to indicate the type of destination(s) that is/are being updated by such a particular instruction. For example, a common DTAG can be assigned to a fixed point unit instruction (FXU) that updates general purpose register (GPR), fixed point exception register (XER), and condition code register (CR) destinations. During flush time, the DTAGs in the recovery link may be used to restore the information indicating that the youngest instruction updates a particular architected register. By using a single, universal rename structure for all types of destinations, a large saving in silicon and power can be realized without the need to sacrifice performance.
    Type: Grant
    Filed: April 18, 2007
    Date of Patent: October 5, 2010
    Assignee: International Business Machines Corporation
    Inventors: Hung Q. Le, Dung Q. Nguyen, Balaram Sinharoy
  • Publication number: 20100250902
    Abstract: A mechanism is provided for tracking deallocated load instructions. A processor detects whether a load instruction in a set of instructions in an issue queue has missed. Responsive to a miss of the load instruction, an instruction scheduler allocates the load instruction to a load miss queue and deallocates the load instruction from the issue queue. The instruction scheduler determines whether there is a dependence entry for the load instruction in an issue queue portion of a dependence matrix. Responsive to the existence of the dependence entry for the load instruction in the issue queue portion of the dependence matrix, the instruction scheduler reads data from the dependence entry of the issue queue portion of the dependence matrix that specifies a set of dependent instructions that are dependent on the load instruction and writes the data into a new entry in a load miss queue portion of the dependence matrix.
    Type: Application
    Filed: March 24, 2009
    Publication date: September 30, 2010
    Applicant: International Business Machines Corporation
    Inventors: Christopher M. Abernathy, Mary D. Brown, William E. Burky, Todd A. Venton
  • Publication number: 20100251016
    Abstract: A mechanism is provided for issuing instructions. An instruction dispatch unit receives an instruction for dispatch to one of a plurality of execution units. The instruction dispatch unit analyzes a tag register to determine whether a previous tag associated with a previous instruction has been stored in the tag register. Responsive to the previous tag associated with the previous instruction failing to be stored in the tag register, the instruction dispatch unit storing a tag corresponding to the instruction in the tag register. The instruction dispatch unit dispatches the instruction to an issue queue for issue to the one of the plurality of execution units.
    Type: Application
    Filed: March 24, 2009
    Publication date: September 30, 2010
    Applicant: International Business Machines Corporation
    Inventors: Christopher M. Abernathy, Mary D. Brown, Dung Q. Nguyen, Todd A. Venton
  • Publication number: 20100250900
    Abstract: An information handling system includes a processor with an issue unit (IU) that may perform instruction dependency tracking for successive instruction issue operations. The IU maintains non-shifting issue queue (NSIQ) and shifting issue queue (SIQ) instructions along with relative instruction to instruction dependency information. A mapper maps queue position data for instructions that dispatch to issue queue locations within the IU. The IU may test an issuing producer instruction against consumer instructions in the IU for queue position (QPOS) and register tag (RTAG) matches. A matching consumer instruction may issue in a successive manner in the case of a queue position match or in a next processor cycle in the case of a register tag match.
    Type: Application
    Filed: March 24, 2009
    Publication date: September 30, 2010
    Applicant: International Business Machines Corporation
    Inventors: Mary Douglass Brown, William Elton Burky, Dung Quoc Nguyen, Balaram Sinharoy
  • Publication number: 20100250966
    Abstract: A processor including instruction support for implementing hash algorithms may issue, for execution, programmer-selectable hash instructions from a defined instruction set architecture (ISA). The processor may include a cryptographic unit that may receive instructions for execution. The instructions include hash instructions defined within the ISA. In addition, the hash instructions may be executable by the cryptographic unit to implement a hash that is compliant with one or more respective hash algorithm specifications. In response to receiving a particular hash instruction defined within the ISA, the cryptographic unit may retrieve a set of input data blocks from a predetermined set of architectural registers of the processor, and generate a hash value of the set of input data blocks according to a hash algorithm that corresponds to the particular hash instruction.
    Type: Application
    Filed: March 31, 2009
    Publication date: September 30, 2010
    Inventors: Christopher H. Olson, Jeffrey S. Brooks, Robert T. Golla
  • Publication number: 20100246815
    Abstract: A processor including instruction support for implementing the Kasumi block cipher algorithm may issue, for execution, programmer-selectable instructions from a defined instruction set architecture (ISA). The processor may include a cryptographic unit that may receive instructions for execution. The instructions include one or more Kasumi instructions defined within the ISA. In addition, the Kasumi instructions may be executable by the cryptographic unit to implement portions of a Kasumi cipher that is compliant with 3rd Generation Partnership Project (3GPP) Technical Specification TS 35.202 version 8.0.0. In response to receiving a Kasumi FL( )-operation instruction defined within the ISA, the cryptographic unit may perform an FL( ) operation, as defined by the Kasumi cipher, upon a data input operand and a subkey operand in which the data input operand and subkey operand may be specified by the Kasumi FL( )-operation instruction.
    Type: Application
    Filed: March 31, 2009
    Publication date: September 30, 2010
    Inventors: Christopher H. Olson, Gregory F. Grohoski, Lawrence A. Spracklen