Superscalar Patents (Class 712/23)

System and method for high-speed register renaming by counting

Patent number: 6212619

Abstract: A superscalar computer architecture for executing instructions out-of-order, comprising a multiplicity of execution units, a plurality of registers, and a register renaming circuit which generates a list of tags corresponding to specific registers that are not in use during loading of a given instruction. A table is constructed having one bit for each register per instruction in flight. The entries in the table may be combined in a logical OR fashion to create a vector that identifies which registers are in use by instructions that are in flight. Validity bits can also be generated to indicate validity of the generated tags, wherein a generated tag is invalid only if an insufficient number of registers are available during loading of the given instruction. The execution units are preferably pipelined.

Type: Grant

Filed: May 11, 1998

Date of Patent: April 3, 2001

Assignee: International Business Machines Corporation

Inventors: Sang Hoo Dhong, Harm Peter Hofstee, Kevin John Nowka, Joel Abraham Silberman
Computer processor having a checker

Patent number: 6212626

Abstract: A computer processor that has a checker for receiving an instruction. The checker includes a scoreboard, an input for receiving an external replay signal, and decision logic coupled to the scoreboard and the input. The decision logic determines whether the instruction executed correctly based on both the scoreboard and the external replay signal.

Type: Grant

Filed: December 30, 1998

Date of Patent: April 3, 2001

Assignee: Intel Corporation

Inventors: Amit A. Merchant, David J. Sager
Distributed pipeline memory architecture for a computer system with even and odd pids

Patent number: 6209020

Abstract: A computer system architecture in which each processor has its own memory, strategically distributed along the stages of an execution pipeline of the processor, to provide fast access to often used information, such as the contents of the address and data registers, the program counter, etc. Memory storage is strategically located in close physical proximity to a stage in an execution pipeline at which memory is commonly or repeatedly accessed. Coupled to the pipeline at various stages are small memory cells for storing information that is consistently and repeatedly requested at that stage in the execution pipeline. The speed of the execution pipeline in a processor is critical to overall performance of the processor and the computer architecture of the present invention as a whole. To that end, the clock cycle time at which the pipeline is operated is increased as much as the operating characteristics of the logic and associated circuitry will allow.

Type: Grant

Filed: September 20, 1996

Date of Patent: March 27, 2001

Assignee: Nortel Networks Limited

Inventors: Richard L. Angle, Edward S. Harriman, Jr., Geoffrey B. Ladwig
Dependency table for reducing dependency checking hardware

Patent number: 6209084

Abstract: A dependency table stores a reorder buffer tag for each register. When operand fetch is performed for a set of concurrently decoded instructions, dependency checking is performed including checking for dependencies between the set of concurrently decoded instructions as well as accessing the dependency table to select the reorder buffer tag stored therein. Either the reorder buffer tag of one of the concurrently decoded instructions, the reorder buffer tag stored in the dependency table, the instruction result corresponding to the stored reorder buffer tag, or the value from the register itself is forwarded as the source operand for the instruction. The dependency table stores the width of the register being updated. Prior to forwarding the reorder buffer tag stored within the dependency table, the width stored therein is compared to the width of the source operand being requested.

Type: Grant

Filed: May 5, 2000

Date of Patent: March 27, 2001

Assignee: Advanced Micro Devices, Inc.

Inventors: Muralidharan S. Chinnakonda, Thang M. Tran, Wade A. Walker
Method and system for nonsequential instruction dispatch and execution in a superscalar processor system

Patent number: 6209081

Abstract: A method and system for permitting nonsequential instruction dispatch in a superscalar processor system which dispatches sequentially ordered multiple instructions simultaneously to a group of execution units on an opportunistic basis for execution and placement of results thereof within specified general purpose registers. Each instruction generally includes at least one source operand and one destination operand. A plurality of intermediate storage buffers are provided and each time an instruction is dispatched to an available execution unit, a particular one of the intermediate storage buffers is assigned to any destination operand within the dispatched instruction, permitting the results of the execution of each instruction to be stored within an intermediate storage buffer.

Type: Grant

Filed: June 7, 1994

Date of Patent: March 27, 2001

Assignee: International Business Machines Corporation

Inventors: James Allan Kahle, Donald Emil Waldecker
Processor pipeline including replay

Patent number: 6205542

Abstract: The invention provides a method for executing instructions. The method includes dispatching and executing a first and second plurality of instructions in a portion of a pipeline without first determining whether stages of the portion of the pipeline are ready. The method further includes determining if an execution problem is encountered and replaying the first plurality of instructions in response to determining that the first plurality of instructions encountered an execution problem. The invention also provides a processor pipeline. The processor pipeline includes a front end to fetch a plurality of instructions for execution and a back end to execute the plurality of instructions fetched by the front end. The back end includes a retirement stage to determine if an instruction had an execution problem. The back end is non-stallable.

Type: Grant

Filed: January 14, 1999

Date of Patent: March 20, 2001

Assignee: Intel Corporation

Inventors: Edward T. Grochowski, Derrick C. Lin
Dynamically typed register architecture

Patent number: 6195746

Abstract: Dynamically typed registers in a processor are provided by associating a type specifier with a register specifier for each register in the processor, storing the register specifiers and associated type specifiers in a register type table. The type specifier associated with an operand register of an instruction is employed to dispatch the instruction to an appropriate execution unit within the processor. The results of the instruction are stored in a register having an associated type specifier matching the execution unit type. Register specifiers are dynamically allocated to particular execution units within the processor by altering the type specifier associated with the register specifiers. Register values may be either discarded or converted when the register specifier type is altered. A general instruction allows conversion of the value from one type to another without storing the converted value in memory.

Type: Grant

Filed: January 31, 1997

Date of Patent: February 27, 2001

Assignee: International Business Machines Corporation

Inventor: Ravindra Kumar Nair
Unified multi-function operation scheduler for out-of-order execution in a superscaler processor

Patent number: 6195744

Abstract: A superscalar processor includes a scheduler which selects operations for out-of-order execution. The scheduler contains storage and control logic which is partitioned into entries corresponding to operations to be executed, being executed, or completed. The scheduler issues operations to execution units for parallel pipelined execution, selects and provides operands as required for execution, and acts as a reorder buffer keeping the results of operations until the results can be safely committed. The scheduler is tightly coupled to execution pipelines and provides a large parallel path for initial operation stages which minimize pipeline bottlenecks and hold ups into and out of the execution units. The scheduler monitors the entries to determine when all operands required for execution of an operation are available and provides required operands to the execution units. The operands selected can be from a register file, a scheduler entry, or an execution unit.

Type: Grant

Filed: February 18, 1999

Date of Patent: February 27, 2001

Assignee: Advanced Micro Devices, Inc.

Inventors: John G. Favor, Amos Ben-Meir, Warren G. Stapleton
Method and apparatus for facilitating multiple storage instruction completions in a superscalar processor during a single clock cycle

Patent number: 6192461

Abstract: One aspect of the invention relates to an apparatus for processing a store instruction on a superscalar processor that employs in-order completion of instructions, the processor having an instruction dispatch unit, an architected register file, a rename register file, a load store unit, a completion unit and cache memory. In one embodiment of the invention, the apparatus includes a pointer queue having an entry corresponding to the store instruction, the entry containing a pointer to the entries in the architected and rename register files that contain data required by the store instruction; and a multiplexer coupled to read ports on the architected and rename register files so that data can be passed from one of the register files into an entry in a data queue, the data queue being coupled to the cache memory.

Type: Grant

Filed: January 30, 1998

Date of Patent: February 20, 2001

Assignee: International Business Machines Corporation

Inventors: Barry D. Williamson, Jim E. Phillips, Dq Nguyen
Superscalar microprocessor including a load/store unit, decode units and a reorder buffer to detect dependencies between access to a stack cache and a data cache

Patent number: 6192462

Abstract: A superscalar microprocessor is provided which maintains coherency between a pair of caches accessed from different stages of an instruction processing pipeline. A dependency checking structure is provided within the microprocessor. The dependency checking structure compares memory accesses performed from the execution stage of the instruction processing pipeline to memory accesses performed from the decode stage. The decode stage performs memory accesses to a stack cache, while the execution stage performs its accesses (address for which are formed via indirect addressing) to the stack cache and to a data cache. If a read memory access performed by the execution stage is dependent upon a write memory access performed by the decode stage, the read memory access is stalled until the write memory access completes.

Type: Grant

Filed: September 28, 1998

Date of Patent: February 20, 2001

Assignee: Advanced Micro Devices, Inc.

Inventors: Thang M. Tran, David B. Witt, William M. Johnson
Forwarding stored dara fetched for out-of-order load/read operation to over-taken operation read-accessing same memory location

Patent number: 6189088

Abstract: The present invention is directed to method and apparatus for reordering load operations in a computer processing system.

Type: Grant

Filed: February 3, 1999

Date of Patent: February 13, 2001

Assignee: International Business Machines Corporation

Inventor: Michael K. Gschwind
Apparatus and method for retiring instructions in excess of the number of accessible write ports

Patent number: 6189089

Abstract: A superscalar microprocessor includes a reorder buffer to correctly handle dependency checking and multiple updates to the same destination. The reorder buffer stores instructions in program order, and retires instructions that have executed and the results obtained. When a instruction is retired, the results of the instruction are stored and the memory space in the reorder buffer is deallocated. The results of the retired instructions are stored to a register file via a retire bus. If the results of two or more retired instructions output to the same register in the register file, then only the newest instruction, the later instruction in the original program sequence, is stored to the program register. The register file has a plurality of write ports for the transfer of data via the retire bus. If two retired instructions output to the same register, then a write port is not utilized. The retire window is the number of instructions monitored for retirement.

Type: Grant

Filed: January 6, 1999

Date of Patent: February 13, 2001

Assignee: Advanced Micro Devices, Inc.

Inventors: Wade A. Walker, D. T. Matheny
Superscalar microprocessor employing a data cache capable of performing store accesses in a single clock cycle

Patent number: 6189068

Abstract: A superscalar microprocessor employing a data cache configured to perform store accesses in a single clock cycle is provided. The superscalar microprocessor speculatively stores data within a predicted way of the data cache after capturing the data currently being stored in that predicted way. During a subsequent clock cycle, the cache hit information for the store access validates the way prediction. If the way prediction is correct, then the store is complete, utilizing a single clock cycle of data cache bandwidth. Additionally, the way prediction structure implemented within the data cache bypasses the tag comparisons of the data cache to select data bytes for the output. Therefore, the access time of the associative data cache may be substantially similar to a direct-mapped cache access time. The superscalar microprocessor may therefore be capable of high frequency operation.

Type: Grant

Filed: June 28, 1999

Date of Patent: February 13, 2001

Assignee: Advanced Micro Devices, Inc.

Inventors: David B. Witt, Rajiv M. Hattangadi
Checking data type of operands specified by an instruction using attributes in a tagged array architecture

Patent number: 6185671

Abstract: The present invention discloses a method and apparatus for matching data types of operands in an instruction. A type code of an operand used by the instruction is determined. An attribute value of a storage element which corresponds to the operand is read from a speculative array. This attribute value is then compared with the type code.

Type: Grant

Filed: March 31, 1998

Date of Patent: February 6, 2001

Assignee: Intel Corporation

Inventors: Vladimir Pentovski, Gerald Bennett, Stephen A. Fischer, Eric Heit, Glenn J. Hinton, Patrice L. Roussel
Method and apparatus for speculative execution of instructions

Patent number: 6185668

Abstract: An apparatus and method are described for implementing handling of exceptions caused by speculated instructions in a CPU having speculative execution capabilities. A CPU implementing speculative execution contains a speculative bit register file. Each speculative bit in the speculative bit register file is logically associated with a particular general purpose register, while remaining physically separate. This is accomplished through the use of a physically separate register file (the speculative bit register file) and register selection circuitry allowing simultaneous access to the two register files. The present invention provides instruction execution hardware supporting speculative execution with minimal impact on computational and structural complexity.

Type: Grant

Filed: December 21, 1995

Date of Patent: February 6, 2001

Assignee: Intergraph Corporation

Inventor: Siamak Arya
Pending access queue for providing data to a target register during an intermediate pipeline phase after a computer cache miss

Patent number: 6185660

Abstract: An apparatus in a computer, called a pending access queue, for providing data for register load instructions after a cache miss. After a cache miss, when data is available for a register load instruction, the data is first directed to the pending access queue and is provided to an execution pipeline directly from the pending access queue, without requiring the data to be entered in the cache. Entries in the pending access queue include destination register identification, enabling injection of the data into the pipeline during intermediate pipeline phases. The pending access queue provides results to the requesting unit in any order needed, supporting out-of-order cache returns, and provides for arbitration when multiple sources have data ready to be processed. Each separate request to a single line is provided a separate entry, and each entry is provided with its appropriate part of the line as soon as the line is available, thereby rapidly providing data for multiple misses to a single line.

Type: Grant

Filed: September 23, 1997

Date of Patent: February 6, 2001

Assignee: Hewlett-Packard Company

Inventors: Dean A. Mulla, Sorin Iacobovici
System for converting instructions, and method therefore

Patent number: 6178496

Abstract: A converter (130) comprises a multiplex-buffer (410) at a bus (120), a decoder (420), an output buffer (430) and a comparator (440). The multiplex-buffer (410) forwards VMAX bits (260) of Huffman coded code portions (222) from the bus (120) to the decoder (410). The VMAX bits (260) can comprise further bits set to logical “1” or “0” at random. On a control output (414), the multiplex-buffer (410) provides a signal K identifying which of the VMAX bits are valid or invalid. The decoder (420) maps the VMAX bits (260) into a preliminary bit cluster (426) and indicates a code length V regardless whether some or all bits are valid or not. The comparator (440) checks V and K and allows the output buffer (430) to copy the preliminary bit cluster (426) into instruction portions (212) only when the code length fits the identification of valid bits.

Type: Grant

Filed: February 17, 1999

Date of Patent: January 23, 2001

Assignee: Motorola, Inc.

Inventors: Rami Natan, Alex Miretsky, Vitaly Sukonik
Data processor capable of executing two instructions having operand interference at high speed in parallel

Patent number: 6178492

Abstract: A data processor comprises an instruction decoding unit having two decoders decoding respective instructions of an instruction group consisting of a plurality of instructions including a first instruction and a second instruction succeeding the first instruction, and a judging unit judging whether or not a combination of the first instruction and the second instruction can be executed in parallel and a bus for transferring two data in parallel between an operand access unit and an integer operation unit. The data processor uses a superscalar technique. Two instructions having an operand interference can be executed in parallel at high speed and two instructions accessing a memory can be executed in parallel without considerable hardware.

Type: Grant

Filed: November 9, 1995

Date of Patent: January 23, 2001

Assignee: Mitsubishi Denki Kabushiki Kaisha

Inventor: Masahito Matsuo
Superscalar processor employing a high performance write back buffer controlled by a state machine to reduce write cycles

Patent number: 6170040

Abstract: A microprocessor of a superscalar structure having a datapath, a data cache, a bus unit and first and second pipelines includes a write buffer equipped in the bus unit and a write back buffer in the data cache to reduce write cycles. The write buffer receives data of a burst write cycle from the write back buffer and data of a single write cycle from the datapath. The write buffer in the microprocessor allows data to be written in the write buffer and then to be written in the external memory when the microprocessor is available for performing an external cycle. The processor includes a state machine to control the write buffer and also includes one write buffer for each of the first and second pipelines in order to diminish the write cycles. The write buffers also include a bit block which indicates whether information in the write buffer is written by a cache miss or a hit in a line having a shared state.

Type: Grant

Filed: November 6, 1997

Date of Patent: January 2, 2001

Assignee: Hyundai Electronics Industries Co., Ltd.

Inventors: Suk Joong Lee, Han Heung Kim
Method and apparatus for implementing predicated sequences in a processor with renaming

Patent number: 6170052

Abstract: Systems, apparatus, and methods are disclosed for generating pairs of conditional instructions corresponding to special predicate sequences from single instructions having a predicate. These pairs of conditional instructions update a destination register regardless of the truth or falsity of the predicate. The destination register is renamed to a new physical location. In this manner, register renaming can be used with predicate sequences to gain performance efficiencies and to overcome limitations of the prior attempted approaches.

Type: Grant

Filed: December 31, 1997

Date of Patent: January 2, 2001

Assignee: Intel Corporation

Inventor: Michael J. Morrison
Register and instruction controller for superscalar processor

Patent number: 6167503

Abstract: In a superscalar computer system, a plurality of instructions are executed concurrently. The instructions being executed access data stored at addresses of the superscalar computer system. An instruction generator, such as a compiler, partitions the instructions into a plurality of sets. The plurality of sets are disjoint according to the addresses of the data to be accessed by the instructions while executing in the superscalar computer system. The system includes a plurality of clusters for executing the instructions. There is one cluster for each one of the plurality of sets of instructions. Each set of instructions is distributed to the plurality of clusters so that the addresses of the data accessed by the instructions are substantially disjoint among the clusters while immediately executing the instructions. This partitioning and distributing minimizes the number of interconnects between the clusters of the superscalar computer.

Type: Grant

Filed: October 6, 1995

Date of Patent: December 26, 2000

Assignee: Compaq Computer Corporation

Inventor: Norman P. Jouppi
Computer processor with a replay system

Patent number: 6163838

Abstract: A computer processor includes a multiplexer having a first input, a second input, and an output, and a scheduler coupled to the multiplexer first input. The processor further includes an execution unit coupled to the multiplexer output. The execution unit is adapted to receive a plurality of instructions from the multiplexer. The processor further includes a replay system coupled to the second multiplexer input and the scheduler. The replay system replays an instruction that has not correctly executed by sending a stop scheduler signal to the scheduler and sending the instruction to the multiplexer.

Type: Grant

Filed: June 30, 1998

Date of Patent: December 19, 2000

Assignee: Intel Corporation

Inventors: Amit A. Merchant, David J. Sager, Darrell D. Boggs
Multimedia computer with integrated circuit memory

Patent number: 6161159

Abstract: An alternate route to improved multimedia performance without replacing the central processor unit (CPU) is presented, through the utilization of general-purpose components available in a computer. The method relies on the use of integrated circuit memory boards having a data port for directly inputting encoded image signals from an I/O device into the memory. An on-board decoder provided on the IC memory is used to decode the variable-length encoded input signals. This approach enalbes to reduce the computational load on the CPU so that the usual bottleneck which is the slow process of data exchange between the CPU and the memory boards is eliminated. The CPU directly accesses the processed image data in the memory and displays the final image on the monitor. This route to increasing the image processing speed of a computer has considerable merits because it is low cost and is readily applicable to mass-produced IC memories with only a few additional fabrication steps.

Type: Grant

Filed: September 25, 1997

Date of Patent: December 12, 2000

Assignee: NEC Corporation

Inventor: Kazumasa Suzuki
Microprocessor employing and method of using a control bit vector storage for instruction execution

Patent number: 6157994

Abstract: A control bit vector storage is provided. The present control bit vector storage (preferably included within a functional unit) stores control bits indicative of a particular instruction. The control bits are divided into multiple control vectors, each vector indicative of one instruction operation. The control bits control dataflow elements within the functional unit to cause the instruction operation to be performed. Additionally, the present control bit vector storage allows complex instructions (or instructions which produce multiple results) to be divided into simpler operations. The hardware included within the functional unit may be reduced to that employed to perform the simpler operations. In one embodiment, the control bit vector storage comprises a plurality of vector storages. Each vector storage comprises a pair of individual vector storages and a shared vector storage. The shared vector storage stores control bits common to both control vectors.

Type: Grant

Filed: July 8, 1998

Date of Patent: December 5, 2000

Assignee: Advanced Micro Devices, Inc.

Inventor: Marty L. Pflum
Method for performing branch prediction and resolution of two or more branch instructions within two or more branch prediction buffers

Patent number: 6157998

Abstract: A branch prediction unit apparatus and method uses an instruction buffer (20), a completion unit (24), and a branch prediction unit (BPU) (28). The instruction buffer (20) and/or the completion unit (24) contain a plurality of instruction entries that contain valid bits and stream identifier (SID) bits. The branch prediction unit contains a plurality of branch prediction buffers (28a-28c). The SID bits are used to associate the pending and executing instructions in the units (20 and 24) into instruction streams related to predicted branches located in the buffers (28a-28c). The SID bits as well as age bits associated with the buffers (28a-28c) are used to perform efficient branch prediction, branch resolution/retirement, and branch misprediction recovery.

Type: Grant

Filed: April 3, 1998

Date of Patent: December 5, 2000

Assignee: Motorola Inc.

Inventors: Jeffrey Pidge Rupley, II, Marvin A. Denman, Bradley G. Burgess, David C. Holloway
Method and apparatus for employing a cycle bit parallel executing instructions

Patent number: 6154828

Abstract: A method and apparatus including means for storing an executable file which includes a group of bits which define functional operations and cycle bits associated with each functional operation and means for completing a variable number of the functional operations in parallel during a single execution cycle in accordance with a state of the associated cycle bit. The method and apparatus eliminates the need for complex data dependency checking hardware and allows a minimum amount of control logic to complete execution of executable files. The method and apparatus further minimizes the necessity of adding null operations (NOPs) to executable files which reduces the amount of storage space necessary to store the executable files and allows executable files to be used on multiple hardware implementations and for register values to be used for multiple purposes during single execution cycles.

Type: Grant

Filed: June 3, 1993

Date of Patent: November 28, 2000

Assignee: Compaq Computer Corporation

Inventors: Joseph Dominic Macri, Francis X. McKeen, Joel S. Emer, William Robert Grundmann, Robert P. Nix, David Arthur James Webb, Jr.
System for simultaneously accessing one or more stack elements by multiple functional units using real stack addresses

Patent number: 6148391

Abstract: Embodiments of the present invention provide a stack renaming method and apparatus for stack based processors. Using principles of the present invention, a stack can be accessed simultaneously by one or more functional units in a stack processor. The stack apparatus includes a stack renaming unit capable of renaming a logical stack address to a real stack address. Each logical stack address corresponds to a storage element in the stack renaming unit which stores a real stack address. A circular counter is used in the stack renaming unit to sequentially cycle through each of the logical stack addresses. The real stack addresses corresponding to each of the logical stack addresses can be stored out of order in the stack renaming unit. A stack control unit is coupled to the stack renaming unit and provides one or more control signals to the stack renaming unit and coordinates the operation of the stack renaming unit within the stack apparatus.

Type: Grant

Filed: March 26, 1998

Date of Patent: November 14, 2000

Assignee: Sun Microsystems, Inc.

Inventor: Bruce Petrick
Shared floating-point unit in a single chip multiprocessor

Patent number: 6148395

Abstract: A single-chip multiprocessor (2, 102) is disclosed. The multiprocessor (2, 102) includes multiple central processing units, or CPUs, (10, 110) that share a floating-point unit (5, 105). The floating-point unit (5, 105) may receive floating-point instruction codes from either or both of the multiple CPUs (10, 110) in the multiprocessor (2, 102), and includes circuitry (52) for decoding the floating-point instructions for execution by its execution circuitry (65). A dispatch unit (56) in the floating-point unit (5, 105) performs arbitration between floating-point instructions if more than one of the CPUs (10, 110) is forwarding instructions to the floating-point unit (5, 105) at the same time. Dedicated register banks, preferably in the form of stacks (60), are provided in the floating-point unit (5, 105).

Type: Grant

Filed: May 15, 1997

Date of Patent: November 14, 2000

Assignee: Texas Instruments Incorporated

Inventors: Tuan Q. Dao, Donald E. Steiss
Instruction completion logic distributed among execution units for improving completion efficiency

Patent number: 6134645

Abstract: Each execution unit within a superscalar processor has an associated completion table that contains a copy of the status of all instructions dispatched but not completed. A central completion table maintains the status of every dispatched instruction as reported by the dispatch unit and the individual execution units. Execution units send finish signals to the completion table responsible for retiring a particular type of instruction. The central completion table retires instructions that may cause an interrupt and instructions whose results may target the same register. The execution units' associated completion tables retire the balance of the instructions and the execution units send instruction status to the central completion table and to each execution unit. This reduces the number of instructions that are retired by the central completion table, increasing the number of instructions retired per clock cycle.

Type: Grant

Filed: June 1, 1998

Date of Patent: October 17, 2000

Assignee: International Business Machines Corporation

Inventor: Dung Quoc Nguyen
System and method for executing and completing store instructions

Patent number: 6134646

Abstract: In a processor, store instructions are divided or cracked into store data and store address generation portions for separate and parallel execution within two execution units. The address generation portion of the store instruction is executed within the load store unit, while the store data portion of the instruction is executed in an execution unit other than the load store unit. If the store instruction is a fixed point execution unit, then the store data portion is executed within the fixed point unit. If the store instruction is a floating point store instruction, then the store data portion of the store instruction is executed within the floating point unit. The store instruction is completed when all older instructions have completed and when all instructions in the instruction group have finished.

Type: Grant

Filed: July 29, 1999

Date of Patent: October 17, 2000

Assignee: International Business Machines Corp.

Inventors: Kurt Alan Feiste, Tai Dinh Ngo, Amy May Tuvell
Computing system having multiple nodes coupled in series in a closed loop

Patent number: 6134647

Abstract: A data processing system includes a plurality of nodes, a serial data bus interconnecting the nodes in series in a closed loop for passing address and data information, and at least one processing node. In one construction, this processing node has a processor, a printed circuit board, a memory partitioned into first and second sections and a local bus connecting the processor, a block sharable memory section of the memory, and the printed circuit board. The local bus is used for transferring data in parallel from the processor to a directly sharable memory section of the memory on the printed circuit board and for transferring data from the block sharable memory to the printed circuit board.

Type: Grant

Filed: April 14, 1999

Date of Patent: October 17, 2000

Assignee: Sun Microsystems, Inc.

Inventors: John D. Acton, Michael D. Derbish, Gavin G. Gibson, Jack M. Hardy, Jr., Hugh M. Humphreys, Steven P. Kent, Steven E. Schelong, Ricardo Yong, William B. DeRolf
Reorder buffer employed in a microprocessor to store instruction results having a plurality of entries predetermined to correspond to a plurality of functional units

Patent number: 6134651

Abstract: A reorder buffer is configured into multiple lines of storage, wherein a line of storage includes sufficient storage for instruction results regarding a predefined maximum number of concurrently dispatchable instructions. A line of storage is allocated whenever one or more instructions are dispatched. A microprocessor employing the reorder buffer is also configured with fixed, symmetrical issue positions. The symmetrical nature of the issue positions may increase the average number of instructions to be concurrently dispatched and executed by the microprocessor. The average number of unused locations within the line decreases as the average number of concurrently dispatched instructions increases. One particular implementation of the reorder buffer includes a future file. The future file comprises a storage location corresponding to each register within the microprocessor.

Type: Grant

Filed: December 10, 1999

Date of Patent: October 17, 2000

Assignee: Advanced Micro Devices, Inc.

Inventors: David B. Witt, Thang M. Tran
Data processing system having an apparatus for exception tracking during out-of-order operation and method therefor

Patent number: 6128722

Abstract: An apparatus for integer exception register (XER) renaming and methods of using the same are implemented. In a central processing unit (CPU) having a pipelined architecture, integer instructions that use or update the XER may be executed out-of-order using the XER renaming mechanism. Any instruction that updates the XER has an associated instruction identifier (IID) stored in a register. Subsequent instructions that use data in the XER use the stored IID to determine when the XER data has been updated by the execution of the instruction corresponding to the stored IID. As each instruction that updates XER data is executed, the data is stored in an XER rename buffer. Instructions using XER data then obtain the updated, valid, XER data from the rename buffer. In this way, these instructions can obtain valid XER data prior to completion of the preceding instructions. The XER data is retrieved from the XER rename buffer by indexing into the buffer by using an index derived from the stored IID.

Type: Grant

Filed: February 13, 1998

Date of Patent: October 3, 2000

Assignee: International Business Machines Corporation

Inventors: Richard Edmund Fry, Dung Quoc Nguyen, Albert Thomas Williams
Temporary pipeline register file for a superpipelined superscalar processor

Patent number: 6128721

Abstract: A processor method and apparatus. The processor has an execution pipeline, a register file and a controller. The execution pipeline is for executing an instruction and has a first stage for generating a first result and a last stage for generating a final result. The register file is for storing the first result and the final result. The controller makes the first result stored in the register file available in the event that the first result is needed for the execution of a subsequent instruction. By storing the result of the first stage in the register file, the length of the execution pipeline is reduced from that of the prior art. Furthermore, logic required for providing inputs to the execution pipeline is greatly simplified over that required by the prior art.

Type: Grant

Filed: November 17, 1993

Date of Patent: October 3, 2000

Assignee: Sun Microsystems, Inc.

Inventors: Robert Yung, William N. Joy, Marc Tremblay
High-performance, superscalar-based computer system with out-of-order instruction execution

Patent number: 6128723

Abstract: A high-performance, superscalar-based computer system with out-of-order instruction execution for enhanced resource utilization and performance throughput. The computer system fetches a plurality of fixed length instructions with a specified, sequential program order (in-order). The computer system includes an instruction execution unit including a register file, a plurality of functional units, and an instruction control unit for examining the instructions and scheduling the instructions for out-of-order execution by the functional units. The register file includes a set of temporary data registers that are utilized by the instruction execution control unit to receive data results generated by the functional units. The data results of each executed instruction are stored in the temporary data registers until all prior instructions have been executed, thereby retiring the executed instructions in-order.

Type: Grant

Filed: May 11, 1999

Date of Patent: October 3, 2000

Assignee: Seiko Epson Corporation

Inventors: Le Trong Nguyen, Derek J. Lentz, Yoshiyuki Miyayama, Sanjiv Garg, Yasuaki Hagiwara, Johannes Wang, Te-Li Lau, Sze-Shun Wang, Quang H. Trang
Symmetrical instructions queue for high clock frequency scheduling

Patent number: 6122727

Abstract: An instruction queue is physically divided into two (or more) instruction queues. Each instruction queue is configured to store a dependency vector for each instruction operation stored in that instruction queue. The dependency vector is evaluated to determine if the corresponding instruction operation may be scheduled for execution. Instruction scheduling logic in each physical queue may schedule instruction operations based on the instruction operations stored in that physical queue independent of the scheduling logic in other queues. The instruction queues evaluate the dependency vector in portions, during different phases of the clock. During a first phase, a first instruction queue evaluates a first portion of the dependency vectors and generates a set of intermediate scheduling request signals. During a second phase, the first instruction queue evaluates a second portion of the dependency vector and the intermediate scheduling request signal to generate a scheduling request signal.

Type: Grant

Filed: August 24, 1998

Date of Patent: September 19, 2000

Assignee: Advanced Micro Devices, Inc.

Inventor: David B. Witt
Executing partial-width packed data instructions

Patent number: 6122725

Abstract: A method and apparatus are provided for executing scalar packed data instructions. According to one aspect of the invention, a processor includes a plurality of registers, a register renaming unit coupled to the plurality of registers, and a decoder coupled to the register renaming unit. The register renaming unit provides an architectural register file to store packed data operands each of which include a plurality of data elements. The decoder is configured to decode a first and second set of instructions (e.g., a set of full-width packed data instructions and a set of partial-width packed data instructions) that each specify one or more registers in the architectural register file. Each of the instructions in the first set of instructions specify operations to be performed on all of the data elements stored in the one or more specified registers.

Type: Grant

Filed: March 31, 1998

Date of Patent: September 19, 2000

Assignee: Intel Corporation

Inventors: Patrice Roussel, Ticky Thakkar, Mohammad A. Abdallah, Vladimir Pentkovski, James Coke
Reservation station for a floating point processing unit

Patent number: 6122721

Abstract: A reservation station with format conversion logic enables the implementation of a superscalar computer processing system which incorporates both a floating point functional unit and non-floating point functional units. By converting operand data in a floating point reservation station from external formats to an internal floating point format, a system incorporating such a floating point reservation station enables the representation of operand data in uniform external formats outside floating point arithmetic units (e.g., in a reorder buffer, on operand and result busses, and within non-floating functional units) while also enabling the use of a specialized internal representation (internal floating point format) within floating point arithmetic units.

Type: Grant

Filed: March 1, 1999

Date of Patent: September 19, 2000

Assignee: Advanced Micro Devices, Inc.

Inventors: Michael D. Goddard, Kelvin D. Goveas, Norman Bujanos
Method and apparatus for efficient vertical SIMD computations

Patent number: 6115812

Abstract: An apparatus and method for performing vertical parallel operations on packed data is described. A first set of data operands and a second set of data operands are accessed. Each of these sets of data represents graphical data stored in a first format. The first set of data operands is convereted into a converted set and the second set of data operands is replicated to generate a replicated set. A vertical matrix multiplication is performed on the converted set and the replicated set to generate transformed graphical data.

Type: Grant

Filed: April 1, 1998

Date of Patent: September 5, 2000

Assignee: Intel Corporation

Inventors: Mohammad Abdallah, Thomas Huff, Gregory C. Parrish, Shreekant S. Thakkar
Data processor

Patent number: 6112289

Abstract: A data processor comprises an instruction decoding unit having two decoders decoding respective instructions of an instruction group consisting of a plurality of instructions including a first instruction and a second instruction succeeding the first instruction, and a judging unit judging whether or not a combination of the first instruction and the second instruction can be executed in parallel and a bus for transferring two data in parallel between an operand access unit and an integer operation unit. The data processor uses a superscalar technique. Two instructions having an operand interference can be executed in parallel at high speed and two instructions accessing a memory can be executed in parallel without considerable hardware.

Type: Grant

Filed: August 28, 1998

Date of Patent: August 29, 2000

Assignee: Mitsubishi Denki Kabushiki Kaisha

Inventor: Masahito Matsuo
Method and apparatus for position independent reconfiguration in a network of multiple context processing elements

Patent number: 6108760

Abstract: A method and an apparatus for position independent reconfiguration in a network of multiple context processing elements are provided. Wach multiple context processing element in a networked array of multiple context processing elements has an assigned physical identification. Virtual identifications may also be assigned to a number of the multiple context processing elements. Data is transmitted to at least one of the multiple context processing elements of the array, the data comprising control data, configuration data, an address mask, and a destination identification. The transmitted address mask is applied to either the physical or virtual identification and to a destination identification. The masked physical or virtual identification is compared to the masked destination identification.

Type: Grant

Filed: October 31, 1997

Date of Patent: August 22, 2000

Assignee: Silicon Spice

Inventors: Ethan Mirsky, Robert French, Ian Eslick
Apparatus and method for tracing microprocessor instructions

Patent number: 6106573

Abstract: A microprocessor implements an instruction tracing mechanism that saves the state of the microprocessor without special hardware. Prior to the execution of a traced instruction, a trace microcode routine is implemented that saves the state of the microprocessor to external memory. The state information saved by the trace microcode routine varies depending upon the amount of data needed by the end user. After the state of the processor has been saved, the trace instruction is executed. State information that changed during the execution of the trace instruction is saved to memory prior to a subsequent instruction. The trace instruction mechanism advantageously requires minimal special hardware and expedites the saving of the processor state information.

Type: Grant

Filed: May 14, 1999

Date of Patent: August 22, 2000

Assignee: Advanced Micro Devices, Inc.

Inventors: Rupaka Mahalingaiah, James K. Pickett
Dependency table for reducing dependency checking hardware

Patent number: 6108769

Abstract: A dependency table stores a reorder buffer tag for each register. The stored reorder buffer tag corresponds to the last of the instructions within the reorder buffer (in program order) to update the register. Otherwise, the dependency table indicates that the value stored in the register is valid. When operand fetch is performed for a set of concurrently decoded instructions, dependency checking is performed including checking for dependencies between the set of concurrently decoded instructions as well as accessing the dependency table to select the reorder buffer tag stored therein. Either the reorder buffer tag of one of the concurrently decoded instructions, the reorder buffer tag stored in the dependency table, the instruction result corresponding to the stored reorder buffer tag, or the value from the register itself is forwarded as the source operand for the instruction. Information from the comparators and the information stored in the dependency table is sufficient to select which value is forwarded.

Type: Grant

Filed: May 17, 1996

Date of Patent: August 22, 2000

Assignee: Advanced Micro Devices, Inc.

Inventors: Muralidharan S. Chinnakonda, Thang M. Tran, Wade A. Walker
Multithreaded processor for processing multiple instruction streams independently of each other by flexibly controlling throughput in each instruction stream

Patent number: 6105127

Abstract: A multithreaded processor for executing multiple instruction streams is provided. This multithreaded processor includes: a plurality of functional units for executing instructions; a plurality of instruction decode units, corresponding to the multiple instruction streams on a one-to-one basis, for respectively decoding an instruction, and producing an instruction issue request for designating to which functional unit the decoded instruction should be issued and requesting for the issuance of the decoded instruction to the designated functional unit; a holding unit for holding the priority level of each instruction stream; and a control unit for deciding which decoded instruction should be issued to a functional unit designated by two or more instruction issue requests at the same time, in accordance with the priority levels held by the holding unit.

Type: Grant

Filed: August 27, 1997

Date of Patent: August 15, 2000

Assignee: Matsushita Electric Industrial Co., Ltd.

Inventors: Kozo Kimura, Tokuzo Kiyohara, Kousuke Yoshioka
High-performance, superscalar-based computer system with out-of-order instruction execution

Patent number: 6101594

Abstract: A high-performance, superscalar-based computer system with out-of-order instruction execution for enhanced resource utilization and performance throughput. The computer system fetches a plurality of fixed length instructions with a specified, sequential program order (in-order). The computer system includes an instruction execution unit including a register file, a plurality of functional units, and an instruction control unit for examining the instructions and scheduling the instructions for out-of-order execution by the functional units. The register file includes a set of temporary data registers that are utilized by the instruction execution control unit to receive data results generated by the functional units. The data results of each executed instruction are stored in the temporary data registers until all prior instructions have been executed, thereby retiring the executed instructions in-order.

Type: Grant

Filed: May 11, 1999

Date of Patent: August 8, 2000

Assignee: Seiko Epson Corporation

Inventors: Le Trong Nguyen, Derek J. Lentz, Yoshiyuki Miyayama, Sanjiv Garg, Yasuaki Hagiwara, Johannes Wang, Te-Li Lau, Sze-Shun Wang, Quang H. Trang
Fetching instructions from an instruction cache using sequential way prediction

Patent number: 6101595

Abstract: An instruction fetch unit that employs sequential way prediction. The instruction fetch unit comprises a control unit configured to convey a first index and a first way to an instruction cache in a first clock cycle. The first index and first way select a first group of contiguous instruction bytes within the instruction cache, as well as a corresponding branch prediction block. The branch prediction block is stored in a branch prediction storage, and includes a predicted sequential way value. The control unit is further configured to convey a second index and a second way to the instruction cache in a second clock cycle succeeding the first clock cycle. This second index and second way select a second group of contiguous instruction bytes from the instruction cache.

Type: Grant

Filed: February 8, 1999

Date of Patent: August 8, 2000

Assignee: Advanced Micro Devices, Inc.

Inventors: James K. Pickett, Thang M. Tran
System for completing instruction out-of-order which performs target address comparisons prior to dispatch

Patent number: 6098168

Abstract: A mechanism structured to check for instruction collisions at the Dispatch Unit rather than the Completion Unit. In processors which issue multiple commands simultaneously, a flag bit is sent to the Completion Unit and attached to the instruction in the queue that follows the other in program order if they both have the same targeted address. When the instructions from position 1 and position 2 of the instruction queue are ready to issue, the Completion Unit checks position 2 for a flag bit. If there is a bit, then the instruction in position 1 is discarded and the instruction in position 2 is written to the target address. If there is no flag bit with the instruction in position 2, the instruction in position 1 is written to the target register. This method eliminates the need to compare all the targeted addresses that are associated with the rename registers. It requires two comparisons instead of a minimum of 15 comparisons.

Type: Grant

Filed: March 24, 1998

Date of Patent: August 1, 2000

Assignee: International Business Machines Corporation

Inventors: Lee Evan Eisen, Michael Putrino
Reducing data dependent conflicts by converting single precision instructions into microinstructions using renamed phantom registers in a processor having double precision registers

Patent number: 6094719

Abstract: In an out-of-order processor having single-precision floating-point registers aliased into double-precision floating-point registers, a single-precision floating-point arithmetic operation having four possible register dependencies is converted into two microinstructions which are processed normally within the processor. The first microinstruction is coded to perform the arithmetic operation specified by the single-precision instruction using the first and second source registers specified and storing the result in a phantom register. The second microinstruction is coded for merging the contents of the phantom register and the destination register specified. Each microinstruction has at most two possible register dependencies, thereby reducing the total number of register dependencies which the processor is required to track.

Type: Grant

Filed: June 25, 1997

Date of Patent: July 25, 2000

Assignee: Sun Microsystems, Inc.

Inventor: Ramesh Panwar
Register renaming in which moves are accomplished by swapping rename tags

Patent number: 6094716

Abstract: An apparatus for accelerating move operations includes a lookahead unit which detects move instructions prior to the execution of the move instructions (e.g. upon selection of the move operations for dispatch within a processor). Upon detecting a move instruction, the lookahead unit signals a register rename unit, which reassigns the rename register associated with the source register to the destination register. In one particular embodiment, the lookahead unit attempts to accelerate moves from a base pointer register to a stack pointer register (and vice versa). An embodiment of the lookahead unit generates lookahead values for the stack pointer register by maintaining cumulative effects of the increments and decrements of previously dispatched instructions. The cumulative effects of the increments and decrements prior to a particular instruction may be added to a previously generated value of the stack pointer register to generate a lookahead value for that particular instruction.

Type: Grant

Filed: July 14, 1998

Date of Patent: July 25, 2000

Assignee: Advanced Micro Devices, Inc.

Inventor: David B. Witt
Shared register storage mechanisms for multithreaded computer systems with out-of-order execution

Patent number: 6092175

Abstract: A method and organization for implementing the registers required in a computer system supporting multithreading and dynamic out-of-order execution. Multithreaded computer systems are those in which the processor supports multiple contexts (threads), and either rapid context switching from thread to thread or scheduling of instructions from different threads within a single cycle. An important component of processors for such systems is the register file; the processor needs a large register file or resource to provide the registers used for the threads. One form of the invention maintains a set of private architecturally specified registers, and a set of private renaming register for each different thread. In the other three embodiments, sharing of renaming registers between different threads is permitted, to enable a reduction in the total number of registers required.

Type: Grant

Filed: April 2, 1998

Date of Patent: July 18, 2000

Assignee: University of Washington

Inventors: Henry M. Levy, Susan J. Eggers, Jack Lo, Dean M. Tullsen

prev … 5 6 7 8 9 10 11 next