Abstract: A new zSeries floating-point unit has a fused multiply-add dataflow capable of supporting two architectures and fused MULTIPLY and ADD and Multiply and SUBTRACT in both RRF and RXF formats for the fused functions. Both binary and hexadecimal floating-point instructions are supported for a total of 6 formats. The floating-point unit is capable of performing a multiply-add instruction for hexadecimal or binary every cycle with a latency of 5 cycles. This supports two architectures with two internal formats with their own biases. This has eliminated format conversion cycles and has optimized the width of the dataflow. The unit is optimized for both hexadecimal and binary floating-point architecture supporting a multiply-add/subtract per cycle.
Type:
Grant
Filed:
May 12, 2003
Date of Patent:
August 7, 2007
Assignee:
International Business Machines Corporation
Abstract: According to one embodiment of the invention, an apparatus is provided which includes a first register to hold an initial value of a first index associated with a looping instruction to be executed for a number of iterations, a second register to hold an initial value of a second index associated with the respective looping instruction, and a third register to hold data indicating non-linear variation pattern associated with the second index. For each iteration, actual increment of the first index and actual increment of the second index are set based on a target increment and the data indicating the non-linear variation pattern associated with the second index.
Type:
Grant
Filed:
November 25, 2002
Date of Patent:
July 24, 2007
Assignee:
Intel Corporation
Inventors:
Bapiraiu Vinnnakota, Saleem Mohammadali, Carl A. Alberola
Abstract: Methods and apparatus are disclosed to control power consumption within a processor. An example processor disclosed herein comprises an instruction retirement unit; a first set of functional blocks to process a first set of instructions having a first instruction type; a second set of functional blocks to process a second set of instructions having a second instruction type; and a controller to enable the first set of functional blocks to process an instruction allocated to the instruction retirement unit if the type of the instruction is the first type, and to disable the first set of functional blocks after the instruction is retired by the instruction retirement unit.
Type:
Grant
Filed:
December 23, 2003
Date of Patent:
July 17, 2007
Assignee:
Intel Corporation
Inventors:
Nicholas G. Samra, Andrew S. Huang, Namratha R. Jaisimha
Abstract: Systems and methods of processing branch instructions provide for a bimodal predictor and a plurality of global predictors. The bimodal predictor is coupled to a prediction selector, where the bimodal predictor generates a bimodal prediction for branch instructions. The plurality of global predictors is coupled to the prediction selector, where each global predictor generates a corresponding global prediction for a branch instruction using different history or stew lengths. The prediction selector selects branch predictions from the bimodal prediction and the global predictions in order to arbitrate between predictors. The arbitration, update, and allocation schemes are designed to choose the most accurate predictor for each branch. Lower level predictors are used as filters to increase effective predictor capacity. Allocate and update schemes minimize aliasing between predictors.
Type:
Grant
Filed:
December 24, 2003
Date of Patent:
July 10, 2007
Assignee:
Intel Corporation
Inventors:
Stephan J. Jourdan, Mark C. Davis, Pierre Michaud
Abstract: A multi-threaded processor is configured to detect excepted instructions from a first program, and to stop fetching younger instructions from that same program, to thereby conserve system resources that can be used by other programs. Each fetched program instruction has an associated status bit, which is set if the instruction excepts. Each excepting instruction is logged in an exception logging unit, which causes the associated status bit to be set. Each program has an associated in-flight vector table that tracks the instructions that have been fetched for that program. The status bits are compared with the in-flight vector table to identify the program that is associated with an excepted instruction. That program is then disabled, thereby preventing further fetching of instructions for that program until the excepted instruction clears.
Type:
Grant
Filed:
July 16, 2001
Date of Patent:
July 3, 2007
Assignee:
Hewlett-Packard Development Company, L.P.
Abstract: A computer system is provided with precise and non-precise watch modes. The computer system is a pipelined system in which the fate of an instruction is determined at the decode stage. Once instructions have been decoded, it is not possible for them to be “killed” later in the pipeline. According to the precise watch mode, instructions are held at the decode stage until the guard value has been resolved to determine whether or not that instruction is committed. Actions of the decode unit are determined in dependence on whether or not the instruction is committed when the guard has been resolved. According to a non-precise watch mode, instructions continue to be decoded and executed normally until a breakpoint instruction has had its guard resolved. At that point, an on-chip emulator can take over operations of the processor in a divert mode. The computer system can take into account different intrusion levels while implementing the watch modes.
Type:
Grant
Filed:
December 22, 2000
Date of Patent:
July 3, 2007
Assignee:
STMicroelectronics S.A.
Inventors:
Andrew Cofler, Laurent Wojcieszak, Isabelle Sename
Abstract: A multiprocessor system has a plurality of CPUs with respective local buses, and a memory which stores a plurality of programs to be executed by the CPUs and is connected to a common bus which can be accessed via the local buses, each local bus being connected to a CPU identification register which stores an identification value for identifying the corresponding CPU. When a program which is specific to a CPU is to be executed by that CPU, the corresponding identification value is read out from the identification register of the CPU and is judged, and branching to the appropriate program is performed based on the judgement result.
Abstract: In a multi-streaming processor having a memory cache, a system for fetching instructions from individual ones of multiple streams to an instruction pipeline is provided, comprising a fetch algorithm for selecting from which stream to fetch an instruction, and a hit/miss predictor for forecasting whether a load instruction will hit or miss the cache. The prediction by the hit-miss predictor is used by the fetch algorithm in determining from which stream to fetch. A hit prediction results in a next instruction being fetched from the same stream as the instruction tested by the hit/miss predictor, while a miss prediction results in the next instruction being fetched from a different stream, if any. The predictor is also used to determine which instructions to dispatch to functional units.
Abstract: Redefined hardware structured transactions and the associated responses in a data processing device are made user programmable. Three registers, a identifier register, a mask register and a response register, are used to redirect transactions or other operations within an application specific integrated circuit after post-silicon testing has been completed and there is no opportunity to redirect the hardware logic contained therein. When enabled, the registers allow for the insertion of blank table entries that can be programmed at a later time to handle unexpected output responses which occur due to unforeseen problems in the preprogrammed operation of the device. Transaction redirection can be accomplished on selected fields of identified transactions. The method is applicable to any hardware device in which it is desired to redirect actions originally defined in look-up tables when such tables are not capable of adjustment or alteration without redesign or re-manufacture.
Type:
Grant
Filed:
January 9, 2002
Date of Patent:
June 26, 2007
Assignee:
International Business Machines Corporation
Abstract: If a consumer instruction specifies a 64 bit source register comprised of results provided by two 32 bit producer instructions, the number of dependencies that must be tracked per source register can be decreased by transforming one or more of the 32 bit producer instructions so that rather than simply storing its result in a 32 bit destination register, the transformed instruction stores its result into a 64 bit logical register along with another 32 bit value held in another 32 bit register.
Type:
Grant
Filed:
April 5, 2004
Date of Patent:
June 26, 2007
Assignee:
Sun Microsystems, Inc.
Inventors:
Julian A. Prabhu, Atul Kalambur, Sudarshan Kadambi, Daniel L. Liebholz, Julie M. Staraitis
Abstract: The ManArray core indirect VLIW processor consists of an array controller sequence processor (SP) merged with a processing element (PE0) closely coupling the SP with the PE array and providing the capability to share execution units between the SP and PE0. Consequently, in the merged SP/PE0 a single set of execution units are coupled with two independent register files. To make efficient use of the SP and PE resources, the ManArray architecture specifies a bit in the instruction format, the SP/PE-bit, to differentiate SP instructions from PE instructions. Multiple register contexts are obtained in the ManArray processor by controlling how the array SP/PE-bit in the ManArray instruction format is used in conjunction with a context switch bit (CSB) for the context selection of the PE register file or the SP register file.
Type:
Grant
Filed:
January 21, 2004
Date of Patent:
June 26, 2007
Assignee:
Altera Corporation
Inventors:
Edwin Franklin Barry, Gerald George Pechanek, David Strube
Abstract: A method and mechanism for managing shifts in a shifting queue. A reservation station in a processing device includes a queue of shifting entries. On a given cycle, zero, one, or two instructions may be dispatched and stored in the queue. Depending upon the dispatch conditions and the state of the queue, existing entries within the queue may be shifted to make room for the newly dispatched instruction(s) at the top of the queue. Shift vectors are generated which identify entries of the queue which are to be shifted and by how much. A queue management approach is adopted in which three rules are generally followed: (i) Only shift entries that must shift due to dispatch pressure from above; (ii) If an entry must be shifted elsewhere, shift it as far down the array as the particular implementation allows; and (iii) Don't allow the previous conditions to force additional entries to shift that are not required to shift by dispatch pressure.
Abstract: A branch control apparatus in a microprocessor. The apparatus includes a branch target address cache (BTAC) that caches indications of whether a branch instruction wraps across two cache lines. When an instruction cache fetch address of a first cache line containing the first part of the branch instruction hits in the BTAC, the BTAC outputs a target address of the branch instruction and indicates the wrap condition. The target address is stored in a register. The next sequential fetch address selects a second cache line containing the second part of the branch instruction. After the two cache lines containing the branch instruction are fetched, the target address from the register is provided to the instruction cache in order to fetch a third cache line containing a target instruction of the branch. The three cache lines are stored in order in an instruction buffer for decoding.
Type:
Grant
Filed:
August 19, 2005
Date of Patent:
June 19, 2007
Assignee:
IP-First, LLC
Inventors:
G. Glenn Henry, Brent Bean, Thomas C. McDonald
Abstract: Methods and apparatus are provided for implementing an efficient processor having state information included in each register. A processor has registers configured to hold both data and state information, such as carry and overflow information. State information and data can be read and written in the same operation. Holding state information along with data in the same register can provide a variety of benefits, particularly in the context of multithreaded programmable chips.
Abstract: Within a multiple instruction pipeline data processing system which supports predication instructions, program instructions are initially decoded upon the assumption that they are predicated. A predication signal is generated within the instruction decoder stages when a predication instruction is detected. The presence or absence of this predication signal can then be used to correct any decoding which has been performed upon the basis of an assumption that the program instructions are predicated. The predication instruction can predicate a variable number of following instructions. The predication instruction can issue in parallel with an instruction which it predicates and yet the proper identification of the predication instruction need not be confirmed until at least some decoding has been performed upon the other program instruction.
Type:
Grant
Filed:
March 7, 2005
Date of Patent:
June 19, 2007
Assignee:
ARM Limited
Inventors:
Conrado Blasco Allue, Glen Andrew Harris, Stephen John Hill
Abstract: A method to handle data dependencies in a pipelined computer system is disclosed. The method includes allocating a plurality of registers, enabling execution of computer instructions concurrently by using the plurality of registers, and tracking and reducing data dependencies in the computer instructions by correlating a busy condition of a computer instruction to each register.
Type:
Grant
Filed:
January 2, 2002
Date of Patent:
June 5, 2007
Assignee:
Intel Corporation
Inventors:
Bohuslav Rychlik, Ryan N. Rakvic, Edward Brekelbaum, Bryan Black
Abstract: A method for operating a processor having an architecture of a larger bitlength with a program comprising instructions compiled to produce instruction results of at least one smaller bitlength having the steps of detecting when in program order a first smaller bitlength instruction is to be dispatched which does not have a target register address as one of its sources, and adding a so_extract_instruction into an instruction stream before the smaller bitlength instruction.
Type:
Grant
Filed:
December 18, 2001
Date of Patent:
June 5, 2007
Assignee:
International Business Machines Corporation
Inventors:
Petra Leber, Jens Leenstra, Wolfram Sauer, Dieter Wendel
Abstract: A system may include a dispatch unit, a scheduler, and an execution core. The dispatch unit may be configured to modify a load operation to include a register-to-register move operation in response to an indication that a speculative result of the load operation is linked to a data value identified by a first tag. The scheduler may be coupled to the dispatch unit and configured to issue the register-to-register move operation in response to availability of the data value. The execution core may be configured to execute the register-to-register move operation by outputting the data value and a tag indicating that the data value is the result of the load operation.
Type:
Grant
Filed:
April 30, 2002
Date of Patent:
May 22, 2007
Assignee:
Advanced Micro Devices, Inc.
Inventors:
Kevin Michael Lepak, Benjamin Thomas Sander, James K. Pickett
Abstract: A device and method for implementing prediction verification control and recovery control in speculative instruction execution when a prediction error occurs with simple hardware configuration are disclosed. This device includes a branch instruction insertion unit that dynamically inserts a branch instruction subsequent to a target instruction for prediction in a group of instructions consisting of the target instruction for prediction for which a value is to be predicted and a subsequent instruction. An instruction issuing unit speculatively issues a subsequent instruction to an execution unit without waiting for the execution result of the target instruction for prediction and an execution unit executes the issued instructions.
Abstract: The present application describes a method and a system for executing instructions while reducing the logic required for execution in a processor. Instructions (e.g., atomic, integer-multiply, integer-divide, move on integer registers, graphics, floating point calculations or the like) are expanded into helper instructions before execution (e.g., in the integer, floating point, graphics and memory units or the like). Such instructions are treated as complex instructions. The functionality of a complex instruction is shared among multiple helpers so that by executing the helpers representing the complex instruction, the functionality of complex instruction is achieved. The expansion of complex instructions into helper instructions reduces the amount of hardware and complexity involved in supporting these individual complex instructions in various units in the processor.
Type:
Grant
Filed:
March 31, 2003
Date of Patent:
May 15, 2007
Assignee:
Sun Microsystems, Inc.
Inventors:
Chandra M. R. Thimmannagari, Sorin Iacobovici, Rabin Sugumar