Commitment Control Or Register Bypass Patents (Class 712/218)

Repair of mis-predicted load values

Patent number: 6883086

Abstract: When fetching a load value for a load instruction results in a cache miss, the load instruction and any load-dependent instructions may be speculatively executed with a predicted load value and retired before the missing cache line is retrieved and the actual load value is determined. By storing the predicted load value in a table, when the actual load value is determined it may be compared with the predicted load value from the table. If the predicted load value was incorrect, the load and load-dependent instructions may be re-executed with the actual load value. A compiler may determine which load instructions are highly predictable and likely to result in cache misses, and designate only those load instructions for speculative execution.

Type: Grant

Filed: March 6, 2002

Date of Patent: April 19, 2005

Assignee: Intel Corporation

Inventor: James D. Dundas
Retiring instructions that meet the early-retirement criteria to improve computer operation throughput

Patent number: 6880067

Abstract: Techniques are provided for retiring instructions that typically complete early as compared to most instructions. In an embodiment, all instructions are processed normally until the instruction queue is full. At that time, the system is frozen, e.g., all units stop processing instructions. For each instruction in the instruction queue, if the instruction meets the criteria for early retirement, then the instruction is terminated and the system is updated to reflect that the instruction has been terminated. The system is then unfrozen, and all units resume their functions.

Type: Grant

Filed: March 30, 2001

Date of Patent: April 12, 2005

Assignee: Hewlett-Packard Development Company L.P.

Inventor: Carl D. Burch
Method and apparatus for rescheduling multiple micro-operations in a processor using a replay queue and a counter

Patent number: 6877086

Abstract: Rescheduling multiple micro-operations in a processor using a replay queue. The processor comprises a replay queue to receive a plurality of instructions and an execution unit to execute the plurality of instructions. A scheduler is coupled between the replay queue and the execution unit. The scheduler speculatively schedules instructions for execution and dispatches each instruction to the execution unit. A checker is coupled to the execution unit to determine whether each instruction has executed successfully. The checker is also coupled to the replay queue to communicate to the replay queue each instruction that has not executed successfully.

Type: Grant

Filed: November 2, 2000

Date of Patent: April 5, 2005

Assignee: Intel Corporation

Inventors: Darrell D. Boggs, Douglas M. Carmean, Per H. Hammarlund, Francis X. McKeen, David J. Sager, Ronak Singhal
Modified retirement payload array

Patent number: 6870789

Abstract: In the Retirement Payload Array (RPA) of a microprocessor, the pointer advance signal “ADVANCE POINTER” from the Instruction Retirement Logic (IRL) of the Instruction Scheduling Unit (ISU) is utilized to provide conditional read RPA signals. Consequently, according to the invention, a read of the RPA is completed only if it is determined that the read word line being read in the current cycle is not the same read word line that was read in the previous cycle.

Type: Grant

Filed: February 8, 2002

Date of Patent: March 22, 2005

Assignee: Sun Microsystems, Inc.

Inventors: Arjun P. Chandran, Gregg K. Tsujimoto, Anup S. Mehta
Processor pipeline cache miss apparatus and method for operation

Patent number: 6865665

Abstract: There is disclosed a data processor for stalling the instruction execution pipeline after a cache miss and re-loading the correct cache data into any bypass devices before restarting the pipeline.

Type: Grant

Filed: December 29, 2000

Date of Patent: March 8, 2005

Assignee: STMicroelectronics, Inc.

Inventor: Anthony X. Jarvis
System and method for eliminating write back to register using dead field indicator

Patent number: 6862677

Abstract: An instruction execution device and method are disclosed for reducing register write traffic within a processor. The instruction execution device includes an instruction pipeline for producing a result for an instruction, a register file that includes at least one write port for storing the result, a bypass circuit for allowing access to the result, a means for indicating whether the result is used by only one other instruction, and a register file control for preventing the result from being stored in the write port when the result has been accessed via the bypass circuit and is used by only one other instruction.

Type: Grant

Filed: February 16, 2000

Date of Patent: March 1, 2005

Assignee: Koninklijke Philips Electronics N.V.

Inventor: Paul Stravers
Digital signal processor computation core with pipeline having memory access stages and multiply accumulate stages positioned for efficient operation

Patent number: 6859872

Abstract: A computation core includes a computation block, an addressing block and an instruction sequencer, which are coupled to a memory through a memory interface. The computation block includes a register file and dual execution units. The execution units include features for enhanced performance in executing digital signal computations. The computation core is configured for executing digital signal processor instructions and microcontroller instructions, while achieving efficient digital signal processor computation and high code density. A finite impulse response filter algorithm achieves high performance on the dual execution units.

Type: Grant

Filed: May 12, 2000

Date of Patent: February 22, 2005

Assignee: Analog Devices, Inc.

Inventors: William C. Anderson, John Edmondson, Jose Fridman, Marc Hoffman
System, apparatus and method for prioritizing instructions and eliminating useless instructions

Patent number: 6857060

Abstract: According to one embodiment, a method features operations for executing instructions in an instruction window. The first and second instructions are examined to determine their sources and destinations. The written on bit of the first instruction is set to a “written on” state if the destinations of the first and second instructions are the same while a used bit of the first instruction is set to a “used” state if the source of the second instruction is the destination of the first instruction. Thereafter, a priority of the first instruction can be determined from the written on and used bits.

Type: Grant

Filed: March 30, 2001

Date of Patent: February 15, 2005

Assignee: Intel Corporation

Inventors: George Elias, Adi Yoaz, Ronny Ronen
System and method for eliminating write backs with buffer for exception processing

Patent number: 6851044

Abstract: An instruction execution device and method are disclosed for reducing register write traffic within a processor with exception routines. The instruction execution device includes an instruction pipeline for producing a result for an instruction, wherein the exception routines may interrupt the instruction pipeline a random intervals, a register file that includes at least one write port for storing the result, a bypass circuit for allowing access to the result, a means for indicating whether the result is used by only one other instruction, a register file control for preventing the result from being stored in the write port when the result has been accessed via the bypass circuit and is used by only one other instruction, a First in First out (FIFO) buffer for storing the result and a FIFO control for writing the contents of the FIFO buffer to the register file when an exception occurs.

Type: Grant

Filed: February 16, 2000

Date of Patent: February 1, 2005

Assignee: Koninklijke Philips Electronics N.V.

Inventor: Paul Stravers
Reading a selected register in a series of computational units forming a processing pipeline upon expiration of a time delay

Patent number: 6842851

Abstract: A system and method for reading register contents from a computational pipeline having a plurality of computational units. The system includes a readback bus and a read control unit. The readback bus has a plurality of logic units coupled in a series. Each logic unit couples to a corresponding one of the computational units. The read control unit couples to each of the computational units through a corresponding load line, and is configured to assert a load signal on one of the load lines in response to a register read request. Each of the computational units is configured to transmit a data value from a selected register onto the readback bus in response to detecting an assertion of the load signal on its corresponding load line.

Type: Grant

Filed: February 28, 2002

Date of Patent: January 11, 2005

Assignee: Sun Microsytems, Inc.

Inventors: Wayne Eric Burk, Ewa M. Kubalska, Brian D. Emberling
Data processing apparatus with register file bypass

Patent number: 6839831

Abstract: A data processing apparatus includes first (78) and second (80) functional unit groups, each includes a plurality of functional units and a register file (76) comprising a plurality of registers. A comparator (181) receives the operand register number of a current instruction for a functional unit in the first functional unit group, and the destination register number of an immediately preceding instruction for the second functional unit group. A register file bypass multiplexer (174) selects the data from the register corresponding to the operand number of the current instruction on no match and selects the output of the second functional unit group (hotpath 172) if the comparator indicates a match. The first functional unit utilizes the output of the second functional unit group without waiting for the result to be stored in the register file.

Type: Grant

Filed: December 8, 2000

Date of Patent: January 4, 2005

Assignee: Texas Instruments Incorporated

Inventors: Keith Balmer, Richard D. Simpson, Iain Robertson, John Keay
Arrangement and a method in processor technology

Publication number: 20040260912

Abstract: A processor (PR2) has a functional unit (FU21) connected to series coupled temporary registers (TR21-TR23) and to a register file (RF2), which has an output connected to an input (IP1) of the functional unit via multiplexors (MUX1-MUX4). Read addresses (B, E, A) and write addresses (A, D, G) are sent to the register file and to a control means. The latter includes registers (REG1-REG4) and comparators (C1-C4) which control the multiplexors (MUX1-MUX4). On a read address (B) a value (V(B)) is sent to the functional unit (FU21) after the register file access time has lapsed. The functional unit performs an operation and the result (V(A)) is clocked through the temporary registers (TR1-TR3) and is sent to the register file (RF2). A later read address (A) coincides in the comparator (C2) with a write address (A) from the register (REG2), the multiplexer (MUX2) is switched and the result (V(A)) is fetched from the temporary register (TR1).

Type: Application

Filed: April 20, 2004

Publication date: December 23, 2004

Inventor: Nils Ola Linnermark
Method and data processor with reduced stalling due to operand dependencies

Publication number: 20040255099

Abstract: A data processor (200) has a pipelined execution unit (120). Whether a first instruction is one of a class of instructions wherein as a result of execution of the first instruction the contents of an operand register will be stored in a destination register is determined. A second instruction that references the destination register is received before a completion of execution of the first instruction. The second instruction is executed using the contents of the operand register without stalling the second instruction in the pipelined execution unit (120).

Type: Application

Filed: June 12, 2003

Publication date: December 16, 2004

Inventor: Stephen Charles Kromer
Method and apparatus for controlling program instruction completion timing for processor verification

Publication number: 20040250050

Abstract: A method and apparatus for controlling program instruction completion timing for processor verification provides, alternatively or in combination, an improved simulation technique and/or processor in which resource allocation as well as other performance-specific scenarios can be stressed over typical operating conditions by controlling the completion timing of one or more program instructions. A high-level program controlling simulation of a VHDL model of a processor can simulate extension of the completion time of a predetermined instruction in order to hold the instruction in the execution and completion queues, placing an effective hold on the resources allocated for the instruction. Alternatively, the VHDL model may include logic for controlling completion timing of the program instruction by using a processor clock cycle counter. Verification testing of actual processor hardware may be facilitated by including the counter and associated control logic within production or prototype processors.

Type: Application

Filed: June 9, 2003

Publication date: December 9, 2004

Applicant: International Business Machines Corporation

Inventors: John Martin Ludden, Darin Marcus Greene, David A. Schroter, Wallace Keith Sharp
Processing partial register writes in an out-of-order processor

Publication number: 20040243791

Abstract: A method for processing registers in an out-of-order processor. A predicate in an instruction is predicted. An architecturally correct value is then computed using a read-modify-write operation. The predicted value is compared to the architecturally correct value. The instruction with an incorrectly-predicted predicate is flushed from the pipeline if the predicted value and the architecturally correct value are different.

Type: Application

Filed: July 8, 2004

Publication date: December 2, 2004

Inventors: Edward T. Grochowski, Jared W. Stark
Renaming registers to values produced by instructions according to assigned produce sequence number

Patent number: 6826677

Abstract: A processor, such as a VLIW processor capable of software-pipeline execution, includes an instruction issuing unit 10 for issuing, in a predetermined sequence, instructions to be executed. The sequence of instructions includes preselected value-producing instructions which, when executed, produce respective values. Instruction executing units 14, 16, 18 execute the issued instructions. A register file 20 has a set of registers, for storing values produced by the executed instructions. In operation the processor assigns the values produced by the value-producing instructions respective sequence numbers according to the order of issuance of their respective value-producing instructions. Each produced value is allocated one of the registers, for storing that produced value, in dependence upon the sequence number assigned to that value. The registers may be renamed each time a value-producing instruction is issued.

Type: Grant

Filed: February 6, 2001

Date of Patent: November 30, 2004

Assignee: PTS Corporation

Inventor: Nigel Peter Topham
Completion monitoring in a processor having multiple execution units with various latencies

Patent number: 6826678

Abstract: A method, processor architecture, computer program product, and data processing system for determining when an instruction in a pipelined processor should be completed is provided. As each instruction is issued to an execution unit, an entry for that instruction is placed within a “finish pipe,” which consists of a series of consecutively numbered stages. Each clock cycle, the entries in the finish pipe advance one stage. When an entry has reached the stage corresponding to the latency of its associated execution unit, it becomes mature. Each clock cycle, the finish pipe is scanned to find the entry having the highest-numbered stage of any entry in the finish pipe. If that entry is mature, it is removed from the finish pipe and the instructions associated with that entry is allowed to complete. If not, the entry simply advances along with the other entries and the pipe rescanned in the next cycle.

Type: Grant

Filed: April 11, 2002

Date of Patent: November 30, 2004

Assignee: International Business Machines Corporation

Inventors: Hung Qui Le, Dung Quoc Nguyen
Accessing items of architectural state in a data processing apparatus

Publication number: 20040225838

Abstract: The present invention relates to a data processing apparatus and method for accessing items of architectural state. The data processing apparatus comprises a plurality of registers operable to store items of architectural state, and a plurality of functional units, each functional unit being operable to perform a processing operation with reference to one or more of those items of architectural state. At least one of the functional units has a register cache associated therewith having one or more cache entries, each cache entry being operable to store a copy of one of the items of architectural state, and a register identifier identifying the register containing that item of architectural state. Control logic is operable to determine a subset of the items of architectural state to be copied in the register cache in dependence on the processing operation of the functional unit with which the register cache is associated. This assists in alleviating demands on access ports associated with the registers.

Type: Application

Filed: May 9, 2003

Publication date: November 11, 2004

Inventor: Stuart David Biles
Reordering in a system with parallel processing flows

Publication number: 20040221138

Abstract: A distributed system is provided for apportioning an instruction stream into multiple segments for processing in multiple parallel processing units, and for merging the processed segments into a single processed instruction stream having the same sequential relative order as the original instruction stream. Tags may be attached to each segment after apportioning to indicate the order in which the various segments are to be merged. In one embodiment, the end of each segment includes a tag indicating the unit to which the next instruction in the original instruction sequence is directed.

Type: Application

Filed: November 13, 2001

Publication date: November 4, 2004

Inventors: Roni Rosner, Micha G. Moffie, Abraham Mendelson
System and method to prevent in-flight instances of operations from disrupting operation replay within a data-speculative microprocessor

Publication number: 20040221139

Abstract: A microprocessor may include one or more functional units configured to execute operations, a scheduler configured to issue operations to the functional units for execution, and at least one replay detection unit. The scheduler may be configured to maintain state information for each operation. Such state information may, among other things, indicate whether an associated operation has completed execution. The replay detection unit may be configured to detect that one of the operations in the scheduler should be replayed. If an instance of that operation is currently being executed by one of the functional units when operation is detected as needing to be replayed, the replay detection unit is configured to inhibit an update to the state information for that operation in response to execution of the in-flight instance of the operation. Various embodiments of computer systems may include such a microprocessor.

Type: Application

Filed: May 2, 2003

Publication date: November 4, 2004

Applicant: Advanced Micro Devices, Inc.

Inventors: Michael A. Filippo, James K. Pickett, Benjamin T. Sander
Dynamically share interrupt handling logic among multiple threads

Publication number: 20040215937

Abstract: A method and multithreaded processor for dynamically sharing an interrupt handling logic unit among multiple threads. A first and second state unit may be configured to determine whether an interrupt was generated from a first thread and a second thread, respectively. An arbiter may be coupled to the first and second state units. A shared interrupt handling logic unit may be coupled to the arbiter where the shared interrupt handling logic unit may be configured to handle interrupts generated from the first and second threads. Upon a state unit, e.g., first state unit, second state unit, determining an interrupt was generated from a particular thread, the state unit may request control of the interrupt handling logic unit from the arbiter. The arbiter may grant the request from the state unit if the interrupt handling logic unit is available to handle the interrupt detected.

Type: Application

Filed: April 23, 2003

Publication date: October 28, 2004

Applicant: International Business Machines Corporation

Inventors: William E. Burky, Susan E. Eisen, Hung Q. Le, David A. Schroter
SMT flush arbitration

Publication number: 20040215938

Abstract: A methodology to process flushes in an SMT processor with a dynamically shared group completion table (GCT) and a Flush table comprises identification of incoming flush sources by thread. This uses the forward link array by flush source to determine the next instruction group following the group indicated by the flush source (for example, for mispredicts and load/store flush-next type flushes). Presentation of flush completion table entry numbers or instruction group identifiers (Gtags) to the flush table for computation of oldest flushed group tag corresponding to each thread. The flush selection cycle wherein the flush table outputs are compared against saved versions of all the flush Gtags presented to determine which flush source matches the oldest group output from the flush table. The flush source information is used with the selected oldest Gtag to determine the appropriate additional flushing action to take during the flush cycle.

Type: Application

Filed: April 23, 2003

Publication date: October 28, 2004

Applicant: International Business Machines Corporation

Inventors: William E. Burky, Hung Q. Le, Dung Q. Nguyen, David A. Schroter
Information processor

Patent number: 6810474

Abstract: In a conventional information processor that performs speculative execution of a following instruction having a data dependency, since an arithmetic and logical unit is used in performing the speculative execution and the same ALU is used again when the prediction is wrong, the frequency of use of the ALU increases. To prevent this, a history ALU for outputting a past execution result of an instruction, as it is, as an execution result of the instruction and an instruction issue circuit for issuing an instruction whose operand is the same as a past value to the history ALU are provided with an intention of omitting the actual speculative execution. A Guard cache provided in the history cache stores addresses of instructions that give low prediction accuracy, whereby any instruction whose address has been registered in the Guard cache is prevented from being registered again in the history cache.

Type: Grant

Filed: August 29, 2000

Date of Patent: October 26, 2004

Assignee: Hitachi, Ltd.

Inventor: Yoshio Miki
Dynamically shared group completion table between multiple threads

Publication number: 20040210743

Abstract: An SMT system has a dynamically shared GCT. Performance for the SMT is improved by configuring the GCT to allow an instruction group from each thread to complete simultaneously. The GCT has a read port for each thread corresponding to the completion table instruction/address array for simultaneous updating on completion. The forward link array also has a read port for each thread to find the next instruction group for each thread upon completion. The backward link array has a backward link write port for each thread in order to update the backward links for each thread simultaneously. The GCT has independent pointer management for each thread. Each of the threads has simultaneous commit of their renamed result registers and simultaneous updating of outstanding load and store tag usage.

Type: Application

Filed: April 21, 2003

Publication date: October 21, 2004

Applicant: International Business Machines Corporation

Inventors: William E. Burky, Peter J. Klim, Hung Q. Le
Method and circuit for modifying pipeline length in a simultaneous multithread processor

Publication number: 20040210742

Abstract: An SMT system has a single thread mode and an SMT mode. Instructions are alternately selected from two threads every clock cycle and loaded into the IFAR in a three cycle pipeline of the IFU. If a branch predicted taken instruction is detected in the branch prediction circuit in stage three of the pipeline, then in the single thread mode a calculated address from the branch prediction circuit is loaded into the IFAR on the next clock cycle. If the instruction in the branch prediction circuit detects a branch predicted taken in the SMT mode, then the selected instruction address is loaded into the IFAR on the first clock cycle following branch predicted taken detection. The calculated target address is fed back and loaded into the IFAR in the second clock cycle following branch predicted taken detection. Feedback delay effectively switches the pipeline from three stages to four stages.

Type: Application

Filed: April 21, 2003

Publication date: October 21, 2004

Applicant: International Business Machines Corporation

Inventors: David Stephen Levitan, Balaram Sinharoy
Valid bit generation and tracking in a pipelined processor

Publication number: 20040210744

Abstract: In an embodiment, a pipelined digital signal processor (DSP) may generate a valid bit in an alignment stage. The valid bit may be qualified in a decode stage in response to receiving a stall signal and/or a kill signal. The valid bit output from the decode stage may be stored in a latch in an address calculation (AC) stage. The valid bit may be held in the latch by a latch enable circuit in response to receiving a stall signal. The valid bit output from the latch may be qualified in the AC stage. The circuit in the AC stage including the latch, the latch enable circuit, and a valid bit qualifier may be repeated in downstream pipeline stages, for example, the execution stages.

Type: Application

Filed: May 17, 2004

Publication date: October 21, 2004

Applicants: Intel Corporation, a Delaware corporation, Analog Devices, Inc., a Delaware corporation

Inventors: Charles P. Roth, Ravi P. Singh, Gregory A. Overkamp, Thomas Tomazin
Sequence control mechanism for enabling out of order context processing

Patent number: 6804815

Abstract: A sequence control mechanism enables out-of-order processing of contexts by processors of a symmetric multiprocessor system having a plurality of processors arrayed as a processing engine. The processors of the engine are preferably arrayed as a plurality of rows or clusters embedded between input and output buffers, wherein each cluster of processors is configured to process contexts in a first in, first out (FIFO) synchronization order. However, the sequence control mechanism allows out-of-order context processing among the clusters of processors, while selectively enforcing FIFO synchronization ordering among those clusters on an as needed basis, i.e., for certain contexts. As a result, the control mechanism reduces undesired processing delays among those processors.

Type: Grant

Filed: September 18, 2000

Date of Patent: October 12, 2004

Assignee: Cisco Technology, Inc.

Inventors: Darren Kerr, Jeffery B. Scott, John William Marshall, Kenneth H. Potter, Scott Nellenbach
Method and apparatus to limit register file read ports in an out-of-order, multi-stranded processor

Publication number: 20040199749

Abstract: A method for limiting a number of register file read ports used to process a store instruction includes decoding the store instruction, where the decoding generates a decoded store instruction, identifying a store data register and source operand registers included in the decoded store instruction, and appending a set of attribute fields to the decoded store instruction. Further, dependent on a value of at least one of the attribute fields, source values corresponding to the source operand registers are read using the register file read ports at a time that the store instruction is issued, and a store data value corresponding to the store data register is read using one of the register file read ports at a time that the store instruction is committed.

Type: Application

Filed: April 3, 2003

Publication date: October 7, 2004

Inventors: Robert Golla, Chandra M. R. Thimmannagari, Sorin Iacobovici, Rabin A. Sugumar, Robert Nuckolls
Method and apparatus for utilizing multiple opportunity ports in a processor pipeline

Publication number: 20040193846

Abstract: A method and apparatus for a microprocessor with multiple memory read opportunity ports in a pipeline is disclosed. In one embodiment, a register file may have only one read port. When a statistically rare instruction requires two operands to be read from the register file, a spacer may be introduced into the pipeline, permitting the use of a second opportunity port to read its second operand from the register file at a later time. The spacer may be a nop, or it may be another instruction that receives its operands from a bypass path. In other embodiments, a register alias table may have only one read port, and a second opportunity port may be used to read a second physical register address.

Type: Application

Filed: March 28, 2003

Publication date: September 30, 2004

Inventor: Eric A. Sprangle
System and method for retiring approximately simultaneously a group of instructions in a superscalar microprocessor

Publication number: 20040186983

Abstract: An system and method for retiring instructions in a superscalar microprocessor which executes a program comprising a set of instructions having a predetermined program order, the retirement system for simultaneously retiring groups of instructions executed in or out of order by the microprocessor. The retirement system comprises a done block for monitoring the status of the instructions to determine which instruction or group of instructions have been executed, a retirement control block for determining whether each executed instruction is retirable, a temporary buffer for storing results of instructions executed out of program order, and a register array for storing retirable-instruction results.

Type: Application

Filed: April 2, 2004

Publication date: September 23, 2004

Applicant: Seiko Epson Corporation

Inventors: Johannes Wang, Sanjiv Garg, Trevor Deosaran
Storing of instructions relating to a stalled thread

Patent number: 6792446

Abstract: A processor is provided that includes an execution unit for executing instructions and a replay system for replaying instructions which have not executed properly. The replay system is coupled to the execution unit and includes a checker for determining whether each instruction has executed properly and a plurality of replay queues or replay queue sections coupled to the checker for temporarily storing one or more instructions for replay. In one embodiment, thread-specific replay queue sections may each be used to store a long latency instruction for each thread until the long latency instruction is ready to be executed (e.g., data for a load instruction has been retrieved from external memory). By storing the long latency instruction and its dependents in a replay queue section for one thread which has stalled, execution resources are made available for improving the speed of execution of other threads which have not stalled.

Type: Grant

Filed: February 1, 2002

Date of Patent: September 14, 2004

Assignee: Intel Corporation

Inventors: Amit A. Merchant, Darrell D. Buggs, David J. Sager
System and method for retiring approximately simultaneously a group of instructions in a superscalar microprocessor

Patent number: 6775761

Abstract: An system and method for retiring instructions in a superscalar microprocessor which executes a program comprising a set of instructions having a predetermined program order, the retirement system for simultaneously retiring groups of instructions executed in or out of order by the microprocessor. The retirement system comprises a done block for monitoring the status of the instructions to determine which instruction or group of instructions have been executed, a retirement control block for determining whether each executed instruction is retirable, a temporary buffer for storing results of instructions executed out of program order, and a register array for storing retirable-instruction results.

Type: Grant

Filed: May 22, 2002

Date of Patent: August 10, 2004

Assignee: Seiko Epson Corporation

Inventors: Johannes Wang, Sanjiv Garg, Trevor Deosaran
Method to handle instructions that use non-windowed registers in a windowed microprocessor capable of out-of-order execution

Publication number: 20040153631

Abstract: A method for handling instructions that use non-windowed registers in an out-of-order microprocessor with windowed registers is provided. When an instruction with a non-windowed destination register is detected, the computed result of the instruction is stored in a temporary storage register instead of the non-windowed register designated as the instruction's destination. When the instruction is ready for retirement, the result is transferred from the temporary storage register into the non-windowed register designated as the instruction's destination. When another instruction's source register is a non-windowed register, the microprocessor determines whether the instruction should use data from the designated non-windowed register or from a temporary storage register, to prevent the other instruction from using incorrect data.

Type: Application

Filed: January 30, 2003

Publication date: August 5, 2004

Inventors: Chandra M. R. Thimmannagari, Sorin Iacobovici, Rabin A. Sugumar
Bypass control circuit

Patent number: 6772318

Abstract: There is disclosed a bypass control method in which data can be set on a source register of an instruction to be executed on an instruction bus in a short time. A bypass control apparatus of the present invention includes a plurality of comparators for comparing the outputs of flip-flops for transferring a register number of a destination register on the instruction bus with each other. By utilizing a comparison result of a comparator for comparing the comparison results of these comparators with the register number of the source register on the instruction bus, a bypass path of data inputted to the source register of the instruction to be executed can be set in a short time. When a plurality of agreements are detected, the bypass path is set on the basis of the output of the flip-flop on a first stage side, so that it is possible to avoid a disadvantage inputting old data to the source register by mistake.

Type: Grant

Filed: September 22, 2000

Date of Patent: August 3, 2004

Assignee: Kabushiki Kaisha Toshiba

Inventor: Tatsuo Teruyama
Method and apparatus for reducing register usage within a pipelined processor

Publication number: 20040148494

Abstract: One embodiment of the present invention provides a system that facilitates eliminating register usage for temporary operands involved in pipeline bypassing operations. During operation, the system receives a series of instructions at a processor, wherein the processor recognizes that the series of instructions can make use of a pipeline bypassing mechanism. During the pipeline bypassing operation, the processor examines an indicator associated with the series of instructions. If the indicator is set, the processor does not store the temporary operand used by the series of instructions into the register file of the processor, because the temporary operand will not be used by subsequent instructions.

Type: Application

Filed: January 29, 2003

Publication date: July 29, 2004

Inventor: Jan Civlin
Microprocessor with conditional cross path stall to minimize CPU cycle time length

Patent number: 6766440

Abstract: A digital system is provided that includes a central processing unit (CPU) that has an instruction execution pipeline with a plurality of functional units for executing instructions in a sequence of CPU cycles. The execution units are clustered into two or more groups. Cross-path circuitry is provided such that results from any execution unit in one execution unit cluster can be supplied to execution units in another cluster. A cross-path stall is conditionally inserted to stall all of the functional groups when one execution unit cluster requires an operand from another cluster on a given CPU cycle and the execution unit that is producing that operand completes the computation of that operand on an immediately preceding CPU cycle.

Type: Grant

Filed: October 31, 2000

Date of Patent: July 20, 2004

Assignee: Texas Instruments Incorporated

Inventors: Donald E. Steiss, David Hoyle
Apparatus and method for killing an instruction after loading the instruction into an instruction queue in a pipelined microprocessor

Publication number: 20040139301

Abstract: An apparatus for killing an instruction after it has already been loaded into an instruction queue of a microprocessor is disclosed. The apparatus includes control logic that detects a condition in which the instruction must not be executed, such as a branch instruction misprediction; however, the control logic determines the condition too late to prevent the instruction from being loaded into the instruction queue. The control logic generates a kill signal indicating the instruction must not be executed. A kill queue receives the kill signal and stores its value. The kill queue maintains its entries in parallel with the instruction queue entries so that when the instruction queue subsequently outputs the instruction, the kill queue also outputs the value of the kill signal associated with the, instruction. If the kill signal value output from the kill queue is true, then the microprocessor invalidates the instruction and does not execute it.

Type: Application

Filed: July 31, 2003

Publication date: July 15, 2004

Applicant: IP-First, LLC.

Inventor: Thomas McDonald
Operand forwarding in a superscalar processor

Publication number: 20040139299

Abstract: A method and mechanism for improving Instruction Level Parallelism (ILP) of a program and eventually improving Instructions per cycle (IPC) allows dependent instructions to be grouped and dispatched simultaneously by forwarding the oldest instruction, or source instruction, General Register (GR) data to the other dependent instructions. A source instruction of a load type loading a GR value into a GR. The dependent instructions will then select the forwarded data to perform their computation. The dependent instructions use the same GR read address as the source instruction. Another source instruction of a load type loads a memory data into a GR. The loaded memory data is forwarded or replicated on the memory read bus of the other dependent instructions. The mechanism allows Address Generator Output to be forwarded to the other dependent instructions when the source instruction is a load type loading a memory address into a GR.

Type: Application

Filed: January 14, 2003

Publication date: July 15, 2004

Applicant: International Business Machines Corporation

Inventors: Fadi Busaba, Klaus J. Getzlaff, Bruce C. Giamei, Christopher A. Krygowski, Timothy J. Slegel
Result forwarding in a superscalar processor

Publication number: 20040139300

Abstract: A method and mechanism for improving Instruction Level Parallelism (ILP) of a program and eventually improving Instructions per cycle (IPC) allows dependent instructions to be grouped and dispatched simultaneously by forwarding the oldest instruction, or source instruction, result to the other dependent instructions result buses or registers thus bypassing the dependent instruction execution stage. A source instruction that performs arithmetic, logical or rotate/shift type operation on operands and updates a GR with the computed result. A load type dependent or target instruction loading a GR value into a GR will then select the forwarded result of the source instruction to its write bus for the GR update. Another target instruction of a store type stores a memory data from a GR data. The result of source instruction is also used by the dependent instruction to update storage. The mechanism allows also the dependent instruction to be a load type that loads a GR data into a Control Register (CR).

Type: Application

Filed: January 14, 2003

Publication date: July 15, 2004

Applicant: International Business Machines Corporation

Inventors: Fadi Busaba, Klaus J. Getzlaff, Bruce C. Giamei, Christopher A. Krygowski, Timothy J. Slegel
Method and apparatus for transparent delayed write-back

Publication number: 20040128484

Abstract: Embodiments of the present invention provide a method and system for data exchange between execution units and arrays in a processing unit. Embodiments of the invention may include an execution unit, an out-of-order unit and a delayed write-back buffer coupled between the out-of-order unit and the execution unit. The delayed write-back buffer may dispatch data required by the execution unit for processing.

Type: Application

Filed: December 30, 2002

Publication date: July 1, 2004

Inventors: Zeev Sperber, Eitan Rosen
Method for fusing instructions in a vector processor

Publication number: 20040128485

Abstract: According to one embodiment, a microprocessor is described. The microprocessor includes a scalar processor and a vector processor. The vector processor fuses multiple instructions that are to be processed. The fused instructions enable a single source register to simultaneously transmit its data contents to multiple math units.

Type: Application

Filed: December 27, 2002

Publication date: July 1, 2004

Inventor: Scott R. Nelson
Eliminating register reads and writes in a scheduled instruction cache

Publication number: 20040128482

Abstract: A method and apparatus for eliminating register reads and writes in a scheduled instruction cache. More particularly, the present invention pertains to a method of increasing overall processor performance by implementing a novel pre-cache scheduling operation to eliminate superfluous register reads and writes via a bypass network.

Type: Application

Filed: December 26, 2002

Publication date: July 1, 2004

Inventor: Gad S. Sheaffer
System and method for managing vertical dependencies in a digital signal processor

Patent number: 6754807

Abstract: An apparatus for managing vertical dependencies between instructions in first and second instruction pipelines includes: 1) identifier (ID) reclaim circuitry for determining a sequential set of retired identifiers associated with retired instructions and for determining a next retire ID sequentially following the set; 2) first ID generation circuitry for sequentially assigning identifiers to destination registers associated with instructions entering the pipelines; 3) second ID generation circuitry associated with the first pipeline for identifying a first dependent source register associated with a first dependent source operand of a first instruction entering the first pipeline and assigning an ID of the first register to the first operand; and 4) instruction scheduling circuitry for comparing the first operand ID of the first instruction with the next retire ID and scheduling the first instruction for execution if the first operand ID is less than or equal to the next retire ID.

Type: Grant

Filed: August 31, 2000

Date of Patent: June 22, 2004

Assignee: STMicroelectronics, Inc.

Inventors: Sivagnanam Parthasarathy, Alexander Driker
Valid bit generation and tracking in a pipelined processor

Patent number: 6754808

Abstract: In an embodiment, a pipelined digital signal processor (DSP) may generate a valid bit in an alignment stage. The valid bit may be qualified in a decode stage in response to receiving a stall signal and/or a kill signal. The valid bit output from the decode stage may be stored in a latch in an address calculation (AC) stage. The valid bit may be held in the latch by a latch enable circuit in response to receiving a stall signal. The valid bit output from the latch may be qualified in the AC stage. The circuit in the AC stage including the latch, the latch enable circuit, and a valid bit qualifier may be repeated in downstream pipeline stages, for example, the execution stages.

Type: Grant

Filed: September 29, 2000

Date of Patent: June 22, 2004

Assignees: Intel Corporation, Analog Devices, Inc.

Inventors: Charles P. Roth, Ravi P. Singh, Gregory A. Overkamp, Thomas Tomazin
Method and apparatus for providing fast remote register access in a clustered VLIW processor using partitioned register files

Publication number: 20040117597

Abstract: Effective remote register file access time can be reduced in a clustered VLIW processor using partitioned register files and some additional hardware for pre-fetching remote registers. An instruction pre-fetcher and an instruction pre-decoder is used for pre-fetching and partially decoding instructions in order to pre-fetch the remote registers required for executing VLIWs at run-time, thus substantially reducing the number of inter-cluster copy instructions. The instructions (VLIWs) are scheduled taking into account the various hardware constraints such as limited inter-cluster communication bandwidth, inter-cluster communication delay, etc.

Type: Application

Filed: December 16, 2002

Publication date: June 17, 2004

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventor: Krishnan K. Kailas
Reservation stations to increase instruction level parallelism

Patent number: 6742111

Abstract: A data processing system having a distributed reservation station is provided which stores basic blocks of code in the form of microprocessor instructions. The present invention is capable of distributing basic blocks of code to the various distributed reservation stations. Due to the smaller number of entries in the distributed reservation stations, the look up time required to find a particular instruction is much less than in a centralized reservation station. Additional instruction level parallelism is achieved by maintaining single basic blocks of code in the distributed reservation stations. With a distributed reservation station, an independent scheduler can be used for each one of the distributed reservation stations. When the instruction is ready for execution, the scheduler will remove that instruction from the distributed reservation station and queue that instruction(s) for immediate execution at the particular execution unit.

Type: Grant

Filed: August 31, 1998

Date of Patent: May 25, 2004

Assignee: STMicroelectronics, Inc.

Inventor: Naresh H. Soni
Method and apparatus for executing load instructions speculatively

Patent number: 6742108

Abstract: A load is executed speculatively as a dismissible load instruction, which does not take exceptions, and a check instruction, which is in the same format as the dismissible load, that when executed determines whether an exception should be taken on the dismissible load. In this manner, a load may be executed speculatively while ensuring that an exception occurs at the same time it would have occurred had the load been executed non-speculatively.

Type: Grant

Filed: September 14, 1998

Date of Patent: May 25, 2004

Assignee: Intel Corporation

Inventor: Kent G. Fielden
High-performance, superscalar-based computer system with out-of-order instruction execution

Publication number: 20040093485

Abstract: A high-performance, superscalar-based computer system with out-of-order instruction execution for enhanced resource utilization and performance throughput. The computer system fetches a plurality of fixed length instructions with a specified, sequential program order (in-order). The computer system includes an instruction execution unit including a register file, a plurality of functional units, and an instruction control unit for examining the instructions and scheduling the instructions for out-of-order execution by the functional units. The register file includes a set of temporary data registers that are utilized by the instruction execution control unit to receive data results generated by the functional units. The data results of each executed instruction are stored in the temporary data registers until all prior instructions have been executed, thereby retiring the executed instruction in-order.

Type: Application

Filed: November 5, 2003

Publication date: May 13, 2004

Applicant: Seiko Epson Corporation

Inventors: Le Trong Nguyen, Derek J. Lentz, Yoshiyuki Miyayama, Sanjiv Garg, Yasuaki Hagiwara, Johannes Wang, Te-Li Lau, Sze-Shun Wang, Quang H. Trang
Data reordering mechanism for high performance networks

Patent number: 6735647

Abstract: An apparatus and method for reordering data at a data destination is provided. The apparatus and method provides dynamic, adaptive management of receive buffers in a host channel adapter while recovering on the fly the order of data sent over a medium that does not preserve order. In an exemplary embodiment, the method and apparatus provides a method and apparatus of reordering data of a data transmission received from a source device. The method and apparatus receives, in a data transfer buffer, a data packet transmitted over a connection associated with the source device and determines if the connection requires reordering of data packets. If the connection requires reordering of data packets, the data packet is transferred from the data transfer buffer to a reorder buffer and a reorder state cache is updated to reflect the transfer of the data packet to the reorder buffer.

Type: Grant

Filed: September 5, 2002

Date of Patent: May 11, 2004

Assignee: International Business Machines Corporation

Inventors: William Todd Boyd, Douglas J. Joseph, Renato John Recio
Processor having replay architecture with fast and slow replay paths

Patent number: 6735688

Abstract: According to one aspect of the invention, a microprocessor is provided that includes an execution core, a first replay mechanism and a second replay mechanism. The execution core performs data speculation in executing a first instruction. The first replay mechanism is used to replay the first instruction via a first replay path if an error of a first type is detected which indicates that the data speculation is erroneous. The second replay mechanism is used to replay the first instruction via a second replay path if an error of a second type is detected which indicates that the data speculation is erroneous.

Type: Grant

Filed: February 14, 2000

Date of Patent: May 11, 2004

Assignee: Intel Corporation

Inventors: Michael D. Upton, David J. Sager, Darrell Boggs, Glenn J. Hinton

prev … 5 6 7 8 9 10 11 12 13 … next