Commitment Control Or Register Bypass Patents (Class 712/218)
  • Patent number: 6883086
    Abstract: When fetching a load value for a load instruction results in a cache miss, the load instruction and any load-dependent instructions may be speculatively executed with a predicted load value and retired before the missing cache line is retrieved and the actual load value is determined. By storing the predicted load value in a table, when the actual load value is determined it may be compared with the predicted load value from the table. If the predicted load value was incorrect, the load and load-dependent instructions may be re-executed with the actual load value. A compiler may determine which load instructions are highly predictable and likely to result in cache misses, and designate only those load instructions for speculative execution.
    Type: Grant
    Filed: March 6, 2002
    Date of Patent: April 19, 2005
    Assignee: Intel Corporation
    Inventor: James D. Dundas
  • Patent number: 6880067
    Abstract: Techniques are provided for retiring instructions that typically complete early as compared to most instructions. In an embodiment, all instructions are processed normally until the instruction queue is full. At that time, the system is frozen, e.g., all units stop processing instructions. For each instruction in the instruction queue, if the instruction meets the criteria for early retirement, then the instruction is terminated and the system is updated to reflect that the instruction has been terminated. The system is then unfrozen, and all units resume their functions.
    Type: Grant
    Filed: March 30, 2001
    Date of Patent: April 12, 2005
    Assignee: Hewlett-Packard Development Company L.P.
    Inventor: Carl D. Burch
  • Patent number: 6877086
    Abstract: Rescheduling multiple micro-operations in a processor using a replay queue. The processor comprises a replay queue to receive a plurality of instructions and an execution unit to execute the plurality of instructions. A scheduler is coupled between the replay queue and the execution unit. The scheduler speculatively schedules instructions for execution and dispatches each instruction to the execution unit. A checker is coupled to the execution unit to determine whether each instruction has executed successfully. The checker is also coupled to the replay queue to communicate to the replay queue each instruction that has not executed successfully.
    Type: Grant
    Filed: November 2, 2000
    Date of Patent: April 5, 2005
    Assignee: Intel Corporation
    Inventors: Darrell D. Boggs, Douglas M. Carmean, Per H. Hammarlund, Francis X. McKeen, David J. Sager, Ronak Singhal
  • Patent number: 6870789
    Abstract: In the Retirement Payload Array (RPA) of a microprocessor, the pointer advance signal “ADVANCE POINTER” from the Instruction Retirement Logic (IRL) of the Instruction Scheduling Unit (ISU) is utilized to provide conditional read RPA signals. Consequently, according to the invention, a read of the RPA is completed only if it is determined that the read word line being read in the current cycle is not the same read word line that was read in the previous cycle.
    Type: Grant
    Filed: February 8, 2002
    Date of Patent: March 22, 2005
    Assignee: Sun Microsystems, Inc.
    Inventors: Arjun P. Chandran, Gregg K. Tsujimoto, Anup S. Mehta
  • Patent number: 6865665
    Abstract: There is disclosed a data processor for stalling the instruction execution pipeline after a cache miss and re-loading the correct cache data into any bypass devices before restarting the pipeline.
    Type: Grant
    Filed: December 29, 2000
    Date of Patent: March 8, 2005
    Assignee: STMicroelectronics, Inc.
    Inventor: Anthony X. Jarvis
  • Patent number: 6862677
    Abstract: An instruction execution device and method are disclosed for reducing register write traffic within a processor. The instruction execution device includes an instruction pipeline for producing a result for an instruction, a register file that includes at least one write port for storing the result, a bypass circuit for allowing access to the result, a means for indicating whether the result is used by only one other instruction, and a register file control for preventing the result from being stored in the write port when the result has been accessed via the bypass circuit and is used by only one other instruction.
    Type: Grant
    Filed: February 16, 2000
    Date of Patent: March 1, 2005
    Assignee: Koninklijke Philips Electronics N.V.
    Inventor: Paul Stravers
  • Patent number: 6859872
    Abstract: A computation core includes a computation block, an addressing block and an instruction sequencer, which are coupled to a memory through a memory interface. The computation block includes a register file and dual execution units. The execution units include features for enhanced performance in executing digital signal computations. The computation core is configured for executing digital signal processor instructions and microcontroller instructions, while achieving efficient digital signal processor computation and high code density. A finite impulse response filter algorithm achieves high performance on the dual execution units.
    Type: Grant
    Filed: May 12, 2000
    Date of Patent: February 22, 2005
    Assignee: Analog Devices, Inc.
    Inventors: William C. Anderson, John Edmondson, Jose Fridman, Marc Hoffman
  • Patent number: 6857060
    Abstract: According to one embodiment, a method features operations for executing instructions in an instruction window. The first and second instructions are examined to determine their sources and destinations. The written on bit of the first instruction is set to a “written on” state if the destinations of the first and second instructions are the same while a used bit of the first instruction is set to a “used” state if the source of the second instruction is the destination of the first instruction. Thereafter, a priority of the first instruction can be determined from the written on and used bits.
    Type: Grant
    Filed: March 30, 2001
    Date of Patent: February 15, 2005
    Assignee: Intel Corporation
    Inventors: George Elias, Adi Yoaz, Ronny Ronen
  • Patent number: 6851044
    Abstract: An instruction execution device and method are disclosed for reducing register write traffic within a processor with exception routines. The instruction execution device includes an instruction pipeline for producing a result for an instruction, wherein the exception routines may interrupt the instruction pipeline a random intervals, a register file that includes at least one write port for storing the result, a bypass circuit for allowing access to the result, a means for indicating whether the result is used by only one other instruction, a register file control for preventing the result from being stored in the write port when the result has been accessed via the bypass circuit and is used by only one other instruction, a First in First out (FIFO) buffer for storing the result and a FIFO control for writing the contents of the FIFO buffer to the register file when an exception occurs.
    Type: Grant
    Filed: February 16, 2000
    Date of Patent: February 1, 2005
    Assignee: Koninklijke Philips Electronics N.V.
    Inventor: Paul Stravers
  • Patent number: 6842851
    Abstract: A system and method for reading register contents from a computational pipeline having a plurality of computational units. The system includes a readback bus and a read control unit. The readback bus has a plurality of logic units coupled in a series. Each logic unit couples to a corresponding one of the computational units. The read control unit couples to each of the computational units through a corresponding load line, and is configured to assert a load signal on one of the load lines in response to a register read request. Each of the computational units is configured to transmit a data value from a selected register onto the readback bus in response to detecting an assertion of the load signal on its corresponding load line.
    Type: Grant
    Filed: February 28, 2002
    Date of Patent: January 11, 2005
    Assignee: Sun Microsytems, Inc.
    Inventors: Wayne Eric Burk, Ewa M. Kubalska, Brian D. Emberling
  • Patent number: 6839831
    Abstract: A data processing apparatus includes first (78) and second (80) functional unit groups, each includes a plurality of functional units and a register file (76) comprising a plurality of registers. A comparator (181) receives the operand register number of a current instruction for a functional unit in the first functional unit group, and the destination register number of an immediately preceding instruction for the second functional unit group. A register file bypass multiplexer (174) selects the data from the register corresponding to the operand number of the current instruction on no match and selects the output of the second functional unit group (hotpath 172) if the comparator indicates a match. The first functional unit utilizes the output of the second functional unit group without waiting for the result to be stored in the register file.
    Type: Grant
    Filed: December 8, 2000
    Date of Patent: January 4, 2005
    Assignee: Texas Instruments Incorporated
    Inventors: Keith Balmer, Richard D. Simpson, Iain Robertson, John Keay
  • Publication number: 20040260912
    Abstract: A processor (PR2) has a functional unit (FU21) connected to series coupled temporary registers (TR21-TR23) and to a register file (RF2), which has an output connected to an input (IP1) of the functional unit via multiplexors (MUX1-MUX4). Read addresses (B, E, A) and write addresses (A, D, G) are sent to the register file and to a control means. The latter includes registers (REG1-REG4) and comparators (C1-C4) which control the multiplexors (MUX1-MUX4). On a read address (B) a value (V(B)) is sent to the functional unit (FU21) after the register file access time has lapsed. The functional unit performs an operation and the result (V(A)) is clocked through the temporary registers (TR1-TR3) and is sent to the register file (RF2). A later read address (A) coincides in the comparator (C2) with a write address (A) from the register (REG2), the multiplexer (MUX2) is switched and the result (V(A)) is fetched from the temporary register (TR1).
    Type: Application
    Filed: April 20, 2004
    Publication date: December 23, 2004
    Inventor: Nils Ola Linnermark
  • Publication number: 20040255099
    Abstract: A data processor (200) has a pipelined execution unit (120). Whether a first instruction is one of a class of instructions wherein as a result of execution of the first instruction the contents of an operand register will be stored in a destination register is determined. A second instruction that references the destination register is received before a completion of execution of the first instruction. The second instruction is executed using the contents of the operand register without stalling the second instruction in the pipelined execution unit (120).
    Type: Application
    Filed: June 12, 2003
    Publication date: December 16, 2004
    Inventor: Stephen Charles Kromer
  • Publication number: 20040250050
    Abstract: A method and apparatus for controlling program instruction completion timing for processor verification provides, alternatively or in combination, an improved simulation technique and/or processor in which resource allocation as well as other performance-specific scenarios can be stressed over typical operating conditions by controlling the completion timing of one or more program instructions. A high-level program controlling simulation of a VHDL model of a processor can simulate extension of the completion time of a predetermined instruction in order to hold the instruction in the execution and completion queues, placing an effective hold on the resources allocated for the instruction. Alternatively, the VHDL model may include logic for controlling completion timing of the program instruction by using a processor clock cycle counter. Verification testing of actual processor hardware may be facilitated by including the counter and associated control logic within production or prototype processors.
    Type: Application
    Filed: June 9, 2003
    Publication date: December 9, 2004
    Applicant: International Business Machines Corporation
    Inventors: John Martin Ludden, Darin Marcus Greene, David A. Schroter, Wallace Keith Sharp
  • Publication number: 20040243791
    Abstract: A method for processing registers in an out-of-order processor. A predicate in an instruction is predicted. An architecturally correct value is then computed using a read-modify-write operation. The predicted value is compared to the architecturally correct value. The instruction with an incorrectly-predicted predicate is flushed from the pipeline if the predicted value and the architecturally correct value are different.
    Type: Application
    Filed: July 8, 2004
    Publication date: December 2, 2004
    Inventors: Edward T. Grochowski, Jared W. Stark
  • Patent number: 6826677
    Abstract: A processor, such as a VLIW processor capable of software-pipeline execution, includes an instruction issuing unit 10 for issuing, in a predetermined sequence, instructions to be executed. The sequence of instructions includes preselected value-producing instructions which, when executed, produce respective values. Instruction executing units 14, 16, 18 execute the issued instructions. A register file 20 has a set of registers, for storing values produced by the executed instructions. In operation the processor assigns the values produced by the value-producing instructions respective sequence numbers according to the order of issuance of their respective value-producing instructions. Each produced value is allocated one of the registers, for storing that produced value, in dependence upon the sequence number assigned to that value. The registers may be renamed each time a value-producing instruction is issued.
    Type: Grant
    Filed: February 6, 2001
    Date of Patent: November 30, 2004
    Assignee: PTS Corporation
    Inventor: Nigel Peter Topham
  • Patent number: 6826678
    Abstract: A method, processor architecture, computer program product, and data processing system for determining when an instruction in a pipelined processor should be completed is provided. As each instruction is issued to an execution unit, an entry for that instruction is placed within a “finish pipe,” which consists of a series of consecutively numbered stages. Each clock cycle, the entries in the finish pipe advance one stage. When an entry has reached the stage corresponding to the latency of its associated execution unit, it becomes mature. Each clock cycle, the finish pipe is scanned to find the entry having the highest-numbered stage of any entry in the finish pipe. If that entry is mature, it is removed from the finish pipe and the instructions associated with that entry is allowed to complete. If not, the entry simply advances along with the other entries and the pipe rescanned in the next cycle.
    Type: Grant
    Filed: April 11, 2002
    Date of Patent: November 30, 2004
    Assignee: International Business Machines Corporation
    Inventors: Hung Qui Le, Dung Quoc Nguyen
  • Publication number: 20040225838
    Abstract: The present invention relates to a data processing apparatus and method for accessing items of architectural state. The data processing apparatus comprises a plurality of registers operable to store items of architectural state, and a plurality of functional units, each functional unit being operable to perform a processing operation with reference to one or more of those items of architectural state. At least one of the functional units has a register cache associated therewith having one or more cache entries, each cache entry being operable to store a copy of one of the items of architectural state, and a register identifier identifying the register containing that item of architectural state. Control logic is operable to determine a subset of the items of architectural state to be copied in the register cache in dependence on the processing operation of the functional unit with which the register cache is associated. This assists in alleviating demands on access ports associated with the registers.
    Type: Application
    Filed: May 9, 2003
    Publication date: November 11, 2004
    Inventor: Stuart David Biles
  • Publication number: 20040221138
    Abstract: A distributed system is provided for apportioning an instruction stream into multiple segments for processing in multiple parallel processing units, and for merging the processed segments into a single processed instruction stream having the same sequential relative order as the original instruction stream. Tags may be attached to each segment after apportioning to indicate the order in which the various segments are to be merged. In one embodiment, the end of each segment includes a tag indicating the unit to which the next instruction in the original instruction sequence is directed.
    Type: Application
    Filed: November 13, 2001
    Publication date: November 4, 2004
    Inventors: Roni Rosner, Micha G. Moffie, Abraham Mendelson
  • Publication number: 20040221139
    Abstract: A microprocessor may include one or more functional units configured to execute operations, a scheduler configured to issue operations to the functional units for execution, and at least one replay detection unit. The scheduler may be configured to maintain state information for each operation. Such state information may, among other things, indicate whether an associated operation has completed execution. The replay detection unit may be configured to detect that one of the operations in the scheduler should be replayed. If an instance of that operation is currently being executed by one of the functional units when operation is detected as needing to be replayed, the replay detection unit is configured to inhibit an update to the state information for that operation in response to execution of the in-flight instance of the operation. Various embodiments of computer systems may include such a microprocessor.
    Type: Application
    Filed: May 2, 2003
    Publication date: November 4, 2004
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Michael A. Filippo, James K. Pickett, Benjamin T. Sander
  • Publication number: 20040215937
    Abstract: A method and multithreaded processor for dynamically sharing an interrupt handling logic unit among multiple threads. A first and second state unit may be configured to determine whether an interrupt was generated from a first thread and a second thread, respectively. An arbiter may be coupled to the first and second state units. A shared interrupt handling logic unit may be coupled to the arbiter where the shared interrupt handling logic unit may be configured to handle interrupts generated from the first and second threads. Upon a state unit, e.g., first state unit, second state unit, determining an interrupt was generated from a particular thread, the state unit may request control of the interrupt handling logic unit from the arbiter. The arbiter may grant the request from the state unit if the interrupt handling logic unit is available to handle the interrupt detected.
    Type: Application
    Filed: April 23, 2003
    Publication date: October 28, 2004
    Applicant: International Business Machines Corporation
    Inventors: William E. Burky, Susan E. Eisen, Hung Q. Le, David A. Schroter
  • Publication number: 20040215938
    Abstract: A methodology to process flushes in an SMT processor with a dynamically shared group completion table (GCT) and a Flush table comprises identification of incoming flush sources by thread. This uses the forward link array by flush source to determine the next instruction group following the group indicated by the flush source (for example, for mispredicts and load/store flush-next type flushes). Presentation of flush completion table entry numbers or instruction group identifiers (Gtags) to the flush table for computation of oldest flushed group tag corresponding to each thread. The flush selection cycle wherein the flush table outputs are compared against saved versions of all the flush Gtags presented to determine which flush source matches the oldest group output from the flush table. The flush source information is used with the selected oldest Gtag to determine the appropriate additional flushing action to take during the flush cycle.
    Type: Application
    Filed: April 23, 2003
    Publication date: October 28, 2004
    Applicant: International Business Machines Corporation
    Inventors: William E. Burky, Hung Q. Le, Dung Q. Nguyen, David A. Schroter
  • Patent number: 6810474
    Abstract: In a conventional information processor that performs speculative execution of a following instruction having a data dependency, since an arithmetic and logical unit is used in performing the speculative execution and the same ALU is used again when the prediction is wrong, the frequency of use of the ALU increases. To prevent this, a history ALU for outputting a past execution result of an instruction, as it is, as an execution result of the instruction and an instruction issue circuit for issuing an instruction whose operand is the same as a past value to the history ALU are provided with an intention of omitting the actual speculative execution. A Guard cache provided in the history cache stores addresses of instructions that give low prediction accuracy, whereby any instruction whose address has been registered in the Guard cache is prevented from being registered again in the history cache.
    Type: Grant
    Filed: August 29, 2000
    Date of Patent: October 26, 2004
    Assignee: Hitachi, Ltd.
    Inventor: Yoshio Miki
  • Publication number: 20040210743
    Abstract: An SMT system has a dynamically shared GCT. Performance for the SMT is improved by configuring the GCT to allow an instruction group from each thread to complete simultaneously. The GCT has a read port for each thread corresponding to the completion table instruction/address array for simultaneous updating on completion. The forward link array also has a read port for each thread to find the next instruction group for each thread upon completion. The backward link array has a backward link write port for each thread in order to update the backward links for each thread simultaneously. The GCT has independent pointer management for each thread. Each of the threads has simultaneous commit of their renamed result registers and simultaneous updating of outstanding load and store tag usage.
    Type: Application
    Filed: April 21, 2003
    Publication date: October 21, 2004
    Applicant: International Business Machines Corporation
    Inventors: William E. Burky, Peter J. Klim, Hung Q. Le
  • Publication number: 20040210742
    Abstract: An SMT system has a single thread mode and an SMT mode. Instructions are alternately selected from two threads every clock cycle and loaded into the IFAR in a three cycle pipeline of the IFU. If a branch predicted taken instruction is detected in the branch prediction circuit in stage three of the pipeline, then in the single thread mode a calculated address from the branch prediction circuit is loaded into the IFAR on the next clock cycle. If the instruction in the branch prediction circuit detects a branch predicted taken in the SMT mode, then the selected instruction address is loaded into the IFAR on the first clock cycle following branch predicted taken detection. The calculated target address is fed back and loaded into the IFAR in the second clock cycle following branch predicted taken detection. Feedback delay effectively switches the pipeline from three stages to four stages.
    Type: Application
    Filed: April 21, 2003
    Publication date: October 21, 2004
    Applicant: International Business Machines Corporation
    Inventors: David Stephen Levitan, Balaram Sinharoy
  • Publication number: 20040210744
    Abstract: In an embodiment, a pipelined digital signal processor (DSP) may generate a valid bit in an alignment stage. The valid bit may be qualified in a decode stage in response to receiving a stall signal and/or a kill signal. The valid bit output from the decode stage may be stored in a latch in an address calculation (AC) stage. The valid bit may be held in the latch by a latch enable circuit in response to receiving a stall signal. The valid bit output from the latch may be qualified in the AC stage. The circuit in the AC stage including the latch, the latch enable circuit, and a valid bit qualifier may be repeated in downstream pipeline stages, for example, the execution stages.
    Type: Application
    Filed: May 17, 2004
    Publication date: October 21, 2004
    Applicants: Intel Corporation, a Delaware corporation, Analog Devices, Inc., a Delaware corporation
    Inventors: Charles P. Roth, Ravi P. Singh, Gregory A. Overkamp, Thomas Tomazin
  • Patent number: 6804815
    Abstract: A sequence control mechanism enables out-of-order processing of contexts by processors of a symmetric multiprocessor system having a plurality of processors arrayed as a processing engine. The processors of the engine are preferably arrayed as a plurality of rows or clusters embedded between input and output buffers, wherein each cluster of processors is configured to process contexts in a first in, first out (FIFO) synchronization order. However, the sequence control mechanism allows out-of-order context processing among the clusters of processors, while selectively enforcing FIFO synchronization ordering among those clusters on an as needed basis, i.e., for certain contexts. As a result, the control mechanism reduces undesired processing delays among those processors.
    Type: Grant
    Filed: September 18, 2000
    Date of Patent: October 12, 2004
    Assignee: Cisco Technology, Inc.
    Inventors: Darren Kerr, Jeffery B. Scott, John William Marshall, Kenneth H. Potter, Scott Nellenbach
  • Publication number: 20040199749
    Abstract: A method for limiting a number of register file read ports used to process a store instruction includes decoding the store instruction, where the decoding generates a decoded store instruction, identifying a store data register and source operand registers included in the decoded store instruction, and appending a set of attribute fields to the decoded store instruction. Further, dependent on a value of at least one of the attribute fields, source values corresponding to the source operand registers are read using the register file read ports at a time that the store instruction is issued, and a store data value corresponding to the store data register is read using one of the register file read ports at a time that the store instruction is committed.
    Type: Application
    Filed: April 3, 2003
    Publication date: October 7, 2004
    Inventors: Robert Golla, Chandra M. R. Thimmannagari, Sorin Iacobovici, Rabin A. Sugumar, Robert Nuckolls
  • Publication number: 20040193846
    Abstract: A method and apparatus for a microprocessor with multiple memory read opportunity ports in a pipeline is disclosed. In one embodiment, a register file may have only one read port. When a statistically rare instruction requires two operands to be read from the register file, a spacer may be introduced into the pipeline, permitting the use of a second opportunity port to read its second operand from the register file at a later time. The spacer may be a nop, or it may be another instruction that receives its operands from a bypass path. In other embodiments, a register alias table may have only one read port, and a second opportunity port may be used to read a second physical register address.
    Type: Application
    Filed: March 28, 2003
    Publication date: September 30, 2004
    Inventor: Eric A. Sprangle
  • Publication number: 20040186983
    Abstract: An system and method for retiring instructions in a superscalar microprocessor which executes a program comprising a set of instructions having a predetermined program order, the retirement system for simultaneously retiring groups of instructions executed in or out of order by the microprocessor. The retirement system comprises a done block for monitoring the status of the instructions to determine which instruction or group of instructions have been executed, a retirement control block for determining whether each executed instruction is retirable, a temporary buffer for storing results of instructions executed out of program order, and a register array for storing retirable-instruction results.
    Type: Application
    Filed: April 2, 2004
    Publication date: September 23, 2004
    Applicant: Seiko Epson Corporation
    Inventors: Johannes Wang, Sanjiv Garg, Trevor Deosaran
  • Patent number: 6792446
    Abstract: A processor is provided that includes an execution unit for executing instructions and a replay system for replaying instructions which have not executed properly. The replay system is coupled to the execution unit and includes a checker for determining whether each instruction has executed properly and a plurality of replay queues or replay queue sections coupled to the checker for temporarily storing one or more instructions for replay. In one embodiment, thread-specific replay queue sections may each be used to store a long latency instruction for each thread until the long latency instruction is ready to be executed (e.g., data for a load instruction has been retrieved from external memory). By storing the long latency instruction and its dependents in a replay queue section for one thread which has stalled, execution resources are made available for improving the speed of execution of other threads which have not stalled.
    Type: Grant
    Filed: February 1, 2002
    Date of Patent: September 14, 2004
    Assignee: Intel Corporation
    Inventors: Amit A. Merchant, Darrell D. Buggs, David J. Sager
  • Patent number: 6775761
    Abstract: An system and method for retiring instructions in a superscalar microprocessor which executes a program comprising a set of instructions having a predetermined program order, the retirement system for simultaneously retiring groups of instructions executed in or out of order by the microprocessor. The retirement system comprises a done block for monitoring the status of the instructions to determine which instruction or group of instructions have been executed, a retirement control block for determining whether each executed instruction is retirable, a temporary buffer for storing results of instructions executed out of program order, and a register array for storing retirable-instruction results.
    Type: Grant
    Filed: May 22, 2002
    Date of Patent: August 10, 2004
    Assignee: Seiko Epson Corporation
    Inventors: Johannes Wang, Sanjiv Garg, Trevor Deosaran
  • Publication number: 20040153631
    Abstract: A method for handling instructions that use non-windowed registers in an out-of-order microprocessor with windowed registers is provided. When an instruction with a non-windowed destination register is detected, the computed result of the instruction is stored in a temporary storage register instead of the non-windowed register designated as the instruction's destination. When the instruction is ready for retirement, the result is transferred from the temporary storage register into the non-windowed register designated as the instruction's destination. When another instruction's source register is a non-windowed register, the microprocessor determines whether the instruction should use data from the designated non-windowed register or from a temporary storage register, to prevent the other instruction from using incorrect data.
    Type: Application
    Filed: January 30, 2003
    Publication date: August 5, 2004
    Inventors: Chandra M. R. Thimmannagari, Sorin Iacobovici, Rabin A. Sugumar
  • Patent number: 6772318
    Abstract: There is disclosed a bypass control method in which data can be set on a source register of an instruction to be executed on an instruction bus in a short time. A bypass control apparatus of the present invention includes a plurality of comparators for comparing the outputs of flip-flops for transferring a register number of a destination register on the instruction bus with each other. By utilizing a comparison result of a comparator for comparing the comparison results of these comparators with the register number of the source register on the instruction bus, a bypass path of data inputted to the source register of the instruction to be executed can be set in a short time. When a plurality of agreements are detected, the bypass path is set on the basis of the output of the flip-flop on a first stage side, so that it is possible to avoid a disadvantage inputting old data to the source register by mistake.
    Type: Grant
    Filed: September 22, 2000
    Date of Patent: August 3, 2004
    Assignee: Kabushiki Kaisha Toshiba
    Inventor: Tatsuo Teruyama
  • Publication number: 20040148494
    Abstract: One embodiment of the present invention provides a system that facilitates eliminating register usage for temporary operands involved in pipeline bypassing operations. During operation, the system receives a series of instructions at a processor, wherein the processor recognizes that the series of instructions can make use of a pipeline bypassing mechanism. During the pipeline bypassing operation, the processor examines an indicator associated with the series of instructions. If the indicator is set, the processor does not store the temporary operand used by the series of instructions into the register file of the processor, because the temporary operand will not be used by subsequent instructions.
    Type: Application
    Filed: January 29, 2003
    Publication date: July 29, 2004
    Inventor: Jan Civlin
  • Patent number: 6766440
    Abstract: A digital system is provided that includes a central processing unit (CPU) that has an instruction execution pipeline with a plurality of functional units for executing instructions in a sequence of CPU cycles. The execution units are clustered into two or more groups. Cross-path circuitry is provided such that results from any execution unit in one execution unit cluster can be supplied to execution units in another cluster. A cross-path stall is conditionally inserted to stall all of the functional groups when one execution unit cluster requires an operand from another cluster on a given CPU cycle and the execution unit that is producing that operand completes the computation of that operand on an immediately preceding CPU cycle.
    Type: Grant
    Filed: October 31, 2000
    Date of Patent: July 20, 2004
    Assignee: Texas Instruments Incorporated
    Inventors: Donald E. Steiss, David Hoyle
  • Publication number: 20040139301
    Abstract: An apparatus for killing an instruction after it has already been loaded into an instruction queue of a microprocessor is disclosed. The apparatus includes control logic that detects a condition in which the instruction must not be executed, such as a branch instruction misprediction; however, the control logic determines the condition too late to prevent the instruction from being loaded into the instruction queue. The control logic generates a kill signal indicating the instruction must not be executed. A kill queue receives the kill signal and stores its value. The kill queue maintains its entries in parallel with the instruction queue entries so that when the instruction queue subsequently outputs the instruction, the kill queue also outputs the value of the kill signal associated with the, instruction. If the kill signal value output from the kill queue is true, then the microprocessor invalidates the instruction and does not execute it.
    Type: Application
    Filed: July 31, 2003
    Publication date: July 15, 2004
    Applicant: IP-First, LLC.
    Inventor: Thomas McDonald
  • Publication number: 20040139299
    Abstract: A method and mechanism for improving Instruction Level Parallelism (ILP) of a program and eventually improving Instructions per cycle (IPC) allows dependent instructions to be grouped and dispatched simultaneously by forwarding the oldest instruction, or source instruction, General Register (GR) data to the other dependent instructions. A source instruction of a load type loading a GR value into a GR. The dependent instructions will then select the forwarded data to perform their computation. The dependent instructions use the same GR read address as the source instruction. Another source instruction of a load type loads a memory data into a GR. The loaded memory data is forwarded or replicated on the memory read bus of the other dependent instructions. The mechanism allows Address Generator Output to be forwarded to the other dependent instructions when the source instruction is a load type loading a memory address into a GR.
    Type: Application
    Filed: January 14, 2003
    Publication date: July 15, 2004
    Applicant: International Business Machines Corporation
    Inventors: Fadi Busaba, Klaus J. Getzlaff, Bruce C. Giamei, Christopher A. Krygowski, Timothy J. Slegel
  • Publication number: 20040139300
    Abstract: A method and mechanism for improving Instruction Level Parallelism (ILP) of a program and eventually improving Instructions per cycle (IPC) allows dependent instructions to be grouped and dispatched simultaneously by forwarding the oldest instruction, or source instruction, result to the other dependent instructions result buses or registers thus bypassing the dependent instruction execution stage. A source instruction that performs arithmetic, logical or rotate/shift type operation on operands and updates a GR with the computed result. A load type dependent or target instruction loading a GR value into a GR will then select the forwarded result of the source instruction to its write bus for the GR update. Another target instruction of a store type stores a memory data from a GR data. The result of source instruction is also used by the dependent instruction to update storage. The mechanism allows also the dependent instruction to be a load type that loads a GR data into a Control Register (CR).
    Type: Application
    Filed: January 14, 2003
    Publication date: July 15, 2004
    Applicant: International Business Machines Corporation
    Inventors: Fadi Busaba, Klaus J. Getzlaff, Bruce C. Giamei, Christopher A. Krygowski, Timothy J. Slegel
  • Publication number: 20040128484
    Abstract: Embodiments of the present invention provide a method and system for data exchange between execution units and arrays in a processing unit. Embodiments of the invention may include an execution unit, an out-of-order unit and a delayed write-back buffer coupled between the out-of-order unit and the execution unit. The delayed write-back buffer may dispatch data required by the execution unit for processing.
    Type: Application
    Filed: December 30, 2002
    Publication date: July 1, 2004
    Inventors: Zeev Sperber, Eitan Rosen
  • Publication number: 20040128485
    Abstract: According to one embodiment, a microprocessor is described. The microprocessor includes a scalar processor and a vector processor. The vector processor fuses multiple instructions that are to be processed. The fused instructions enable a single source register to simultaneously transmit its data contents to multiple math units.
    Type: Application
    Filed: December 27, 2002
    Publication date: July 1, 2004
    Inventor: Scott R. Nelson
  • Publication number: 20040128482
    Abstract: A method and apparatus for eliminating register reads and writes in a scheduled instruction cache. More particularly, the present invention pertains to a method of increasing overall processor performance by implementing a novel pre-cache scheduling operation to eliminate superfluous register reads and writes via a bypass network.
    Type: Application
    Filed: December 26, 2002
    Publication date: July 1, 2004
    Inventor: Gad S. Sheaffer
  • Patent number: 6754807
    Abstract: An apparatus for managing vertical dependencies between instructions in first and second instruction pipelines includes: 1) identifier (ID) reclaim circuitry for determining a sequential set of retired identifiers associated with retired instructions and for determining a next retire ID sequentially following the set; 2) first ID generation circuitry for sequentially assigning identifiers to destination registers associated with instructions entering the pipelines; 3) second ID generation circuitry associated with the first pipeline for identifying a first dependent source register associated with a first dependent source operand of a first instruction entering the first pipeline and assigning an ID of the first register to the first operand; and 4) instruction scheduling circuitry for comparing the first operand ID of the first instruction with the next retire ID and scheduling the first instruction for execution if the first operand ID is less than or equal to the next retire ID.
    Type: Grant
    Filed: August 31, 2000
    Date of Patent: June 22, 2004
    Assignee: STMicroelectronics, Inc.
    Inventors: Sivagnanam Parthasarathy, Alexander Driker
  • Patent number: 6754808
    Abstract: In an embodiment, a pipelined digital signal processor (DSP) may generate a valid bit in an alignment stage. The valid bit may be qualified in a decode stage in response to receiving a stall signal and/or a kill signal. The valid bit output from the decode stage may be stored in a latch in an address calculation (AC) stage. The valid bit may be held in the latch by a latch enable circuit in response to receiving a stall signal. The valid bit output from the latch may be qualified in the AC stage. The circuit in the AC stage including the latch, the latch enable circuit, and a valid bit qualifier may be repeated in downstream pipeline stages, for example, the execution stages.
    Type: Grant
    Filed: September 29, 2000
    Date of Patent: June 22, 2004
    Assignees: Intel Corporation, Analog Devices, Inc.
    Inventors: Charles P. Roth, Ravi P. Singh, Gregory A. Overkamp, Thomas Tomazin
  • Publication number: 20040117597
    Abstract: Effective remote register file access time can be reduced in a clustered VLIW processor using partitioned register files and some additional hardware for pre-fetching remote registers. An instruction pre-fetcher and an instruction pre-decoder is used for pre-fetching and partially decoding instructions in order to pre-fetch the remote registers required for executing VLIWs at run-time, thus substantially reducing the number of inter-cluster copy instructions. The instructions (VLIWs) are scheduled taking into account the various hardware constraints such as limited inter-cluster communication bandwidth, inter-cluster communication delay, etc.
    Type: Application
    Filed: December 16, 2002
    Publication date: June 17, 2004
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventor: Krishnan K. Kailas
  • Patent number: 6742111
    Abstract: A data processing system having a distributed reservation station is provided which stores basic blocks of code in the form of microprocessor instructions. The present invention is capable of distributing basic blocks of code to the various distributed reservation stations. Due to the smaller number of entries in the distributed reservation stations, the look up time required to find a particular instruction is much less than in a centralized reservation station. Additional instruction level parallelism is achieved by maintaining single basic blocks of code in the distributed reservation stations. With a distributed reservation station, an independent scheduler can be used for each one of the distributed reservation stations. When the instruction is ready for execution, the scheduler will remove that instruction from the distributed reservation station and queue that instruction(s) for immediate execution at the particular execution unit.
    Type: Grant
    Filed: August 31, 1998
    Date of Patent: May 25, 2004
    Assignee: STMicroelectronics, Inc.
    Inventor: Naresh H. Soni
  • Patent number: 6742108
    Abstract: A load is executed speculatively as a dismissible load instruction, which does not take exceptions, and a check instruction, which is in the same format as the dismissible load, that when executed determines whether an exception should be taken on the dismissible load. In this manner, a load may be executed speculatively while ensuring that an exception occurs at the same time it would have occurred had the load been executed non-speculatively.
    Type: Grant
    Filed: September 14, 1998
    Date of Patent: May 25, 2004
    Assignee: Intel Corporation
    Inventor: Kent G. Fielden
  • Publication number: 20040093485
    Abstract: A high-performance, superscalar-based computer system with out-of-order instruction execution for enhanced resource utilization and performance throughput. The computer system fetches a plurality of fixed length instructions with a specified, sequential program order (in-order). The computer system includes an instruction execution unit including a register file, a plurality of functional units, and an instruction control unit for examining the instructions and scheduling the instructions for out-of-order execution by the functional units. The register file includes a set of temporary data registers that are utilized by the instruction execution control unit to receive data results generated by the functional units. The data results of each executed instruction are stored in the temporary data registers until all prior instructions have been executed, thereby retiring the executed instruction in-order.
    Type: Application
    Filed: November 5, 2003
    Publication date: May 13, 2004
    Applicant: Seiko Epson Corporation
    Inventors: Le Trong Nguyen, Derek J. Lentz, Yoshiyuki Miyayama, Sanjiv Garg, Yasuaki Hagiwara, Johannes Wang, Te-Li Lau, Sze-Shun Wang, Quang H. Trang
  • Patent number: 6735647
    Abstract: An apparatus and method for reordering data at a data destination is provided. The apparatus and method provides dynamic, adaptive management of receive buffers in a host channel adapter while recovering on the fly the order of data sent over a medium that does not preserve order. In an exemplary embodiment, the method and apparatus provides a method and apparatus of reordering data of a data transmission received from a source device. The method and apparatus receives, in a data transfer buffer, a data packet transmitted over a connection associated with the source device and determines if the connection requires reordering of data packets. If the connection requires reordering of data packets, the data packet is transferred from the data transfer buffer to a reorder buffer and a reorder state cache is updated to reflect the transfer of the data packet to the reorder buffer.
    Type: Grant
    Filed: September 5, 2002
    Date of Patent: May 11, 2004
    Assignee: International Business Machines Corporation
    Inventors: William Todd Boyd, Douglas J. Joseph, Renato John Recio
  • Patent number: 6735688
    Abstract: According to one aspect of the invention, a microprocessor is provided that includes an execution core, a first replay mechanism and a second replay mechanism. The execution core performs data speculation in executing a first instruction. The first replay mechanism is used to replay the first instruction via a first replay path if an error of a first type is detected which indicates that the data speculation is erroneous. The second replay mechanism is used to replay the first instruction via a second replay path if an error of a second type is detected which indicates that the data speculation is erroneous.
    Type: Grant
    Filed: February 14, 2000
    Date of Patent: May 11, 2004
    Assignee: Intel Corporation
    Inventors: Michael D. Upton, David J. Sager, Darrell Boggs, Glenn J. Hinton