Superscalar Patents (Class 712/23)
  • Patent number: 6212619
    Abstract: A superscalar computer architecture for executing instructions out-of-order, comprising a multiplicity of execution units, a plurality of registers, and a register renaming circuit which generates a list of tags corresponding to specific registers that are not in use during loading of a given instruction. A table is constructed having one bit for each register per instruction in flight. The entries in the table may be combined in a logical OR fashion to create a vector that identifies which registers are in use by instructions that are in flight. Validity bits can also be generated to indicate validity of the generated tags, wherein a generated tag is invalid only if an insufficient number of registers are available during loading of the given instruction. The execution units are preferably pipelined.
    Type: Grant
    Filed: May 11, 1998
    Date of Patent: April 3, 2001
    Assignee: International Business Machines Corporation
    Inventors: Sang Hoo Dhong, Harm Peter Hofstee, Kevin John Nowka, Joel Abraham Silberman
  • Patent number: 6212626
    Abstract: A computer processor that has a checker for receiving an instruction. The checker includes a scoreboard, an input for receiving an external replay signal, and decision logic coupled to the scoreboard and the input. The decision logic determines whether the instruction executed correctly based on both the scoreboard and the external replay signal.
    Type: Grant
    Filed: December 30, 1998
    Date of Patent: April 3, 2001
    Assignee: Intel Corporation
    Inventors: Amit A. Merchant, David J. Sager
  • Patent number: 6209020
    Abstract: A computer system architecture in which each processor has its own memory, strategically distributed along the stages of an execution pipeline of the processor, to provide fast access to often used information, such as the contents of the address and data registers, the program counter, etc. Memory storage is strategically located in close physical proximity to a stage in an execution pipeline at which memory is commonly or repeatedly accessed. Coupled to the pipeline at various stages are small memory cells for storing information that is consistently and repeatedly requested at that stage in the execution pipeline. The speed of the execution pipeline in a processor is critical to overall performance of the processor and the computer architecture of the present invention as a whole. To that end, the clock cycle time at which the pipeline is operated is increased as much as the operating characteristics of the logic and associated circuitry will allow.
    Type: Grant
    Filed: September 20, 1996
    Date of Patent: March 27, 2001
    Assignee: Nortel Networks Limited
    Inventors: Richard L. Angle, Edward S. Harriman, Jr., Geoffrey B. Ladwig
  • Patent number: 6209084
    Abstract: A dependency table stores a reorder buffer tag for each register. When operand fetch is performed for a set of concurrently decoded instructions, dependency checking is performed including checking for dependencies between the set of concurrently decoded instructions as well as accessing the dependency table to select the reorder buffer tag stored therein. Either the reorder buffer tag of one of the concurrently decoded instructions, the reorder buffer tag stored in the dependency table, the instruction result corresponding to the stored reorder buffer tag, or the value from the register itself is forwarded as the source operand for the instruction. The dependency table stores the width of the register being updated. Prior to forwarding the reorder buffer tag stored within the dependency table, the width stored therein is compared to the width of the source operand being requested.
    Type: Grant
    Filed: May 5, 2000
    Date of Patent: March 27, 2001
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Muralidharan S. Chinnakonda, Thang M. Tran, Wade A. Walker
  • Patent number: 6209081
    Abstract: A method and system for permitting nonsequential instruction dispatch in a superscalar processor system which dispatches sequentially ordered multiple instructions simultaneously to a group of execution units on an opportunistic basis for execution and placement of results thereof within specified general purpose registers. Each instruction generally includes at least one source operand and one destination operand. A plurality of intermediate storage buffers are provided and each time an instruction is dispatched to an available execution unit, a particular one of the intermediate storage buffers is assigned to any destination operand within the dispatched instruction, permitting the results of the execution of each instruction to be stored within an intermediate storage buffer.
    Type: Grant
    Filed: June 7, 1994
    Date of Patent: March 27, 2001
    Assignee: International Business Machines Corporation
    Inventors: James Allan Kahle, Donald Emil Waldecker
  • Patent number: 6205542
    Abstract: The invention provides a method for executing instructions. The method includes dispatching and executing a first and second plurality of instructions in a portion of a pipeline without first determining whether stages of the portion of the pipeline are ready. The method further includes determining if an execution problem is encountered and replaying the first plurality of instructions in response to determining that the first plurality of instructions encountered an execution problem. The invention also provides a processor pipeline. The processor pipeline includes a front end to fetch a plurality of instructions for execution and a back end to execute the plurality of instructions fetched by the front end. The back end includes a retirement stage to determine if an instruction had an execution problem. The back end is non-stallable.
    Type: Grant
    Filed: January 14, 1999
    Date of Patent: March 20, 2001
    Assignee: Intel Corporation
    Inventors: Edward T. Grochowski, Derrick C. Lin
  • Patent number: 6195746
    Abstract: Dynamically typed registers in a processor are provided by associating a type specifier with a register specifier for each register in the processor, storing the register specifiers and associated type specifiers in a register type table. The type specifier associated with an operand register of an instruction is employed to dispatch the instruction to an appropriate execution unit within the processor. The results of the instruction are stored in a register having an associated type specifier matching the execution unit type. Register specifiers are dynamically allocated to particular execution units within the processor by altering the type specifier associated with the register specifiers. Register values may be either discarded or converted when the register specifier type is altered. A general instruction allows conversion of the value from one type to another without storing the converted value in memory.
    Type: Grant
    Filed: January 31, 1997
    Date of Patent: February 27, 2001
    Assignee: International Business Machines Corporation
    Inventor: Ravindra Kumar Nair
  • Patent number: 6195744
    Abstract: A superscalar processor includes a scheduler which selects operations for out-of-order execution. The scheduler contains storage and control logic which is partitioned into entries corresponding to operations to be executed, being executed, or completed. The scheduler issues operations to execution units for parallel pipelined execution, selects and provides operands as required for execution, and acts as a reorder buffer keeping the results of operations until the results can be safely committed. The scheduler is tightly coupled to execution pipelines and provides a large parallel path for initial operation stages which minimize pipeline bottlenecks and hold ups into and out of the execution units. The scheduler monitors the entries to determine when all operands required for execution of an operation are available and provides required operands to the execution units. The operands selected can be from a register file, a scheduler entry, or an execution unit.
    Type: Grant
    Filed: February 18, 1999
    Date of Patent: February 27, 2001
    Assignee: Advanced Micro Devices, Inc.
    Inventors: John G. Favor, Amos Ben-Meir, Warren G. Stapleton
  • Patent number: 6192461
    Abstract: One aspect of the invention relates to an apparatus for processing a store instruction on a superscalar processor that employs in-order completion of instructions, the processor having an instruction dispatch unit, an architected register file, a rename register file, a load store unit, a completion unit and cache memory. In one embodiment of the invention, the apparatus includes a pointer queue having an entry corresponding to the store instruction, the entry containing a pointer to the entries in the architected and rename register files that contain data required by the store instruction; and a multiplexer coupled to read ports on the architected and rename register files so that data can be passed from one of the register files into an entry in a data queue, the data queue being coupled to the cache memory.
    Type: Grant
    Filed: January 30, 1998
    Date of Patent: February 20, 2001
    Assignee: International Business Machines Corporation
    Inventors: Barry D. Williamson, Jim E. Phillips, Dq Nguyen
  • Patent number: 6192462
    Abstract: A superscalar microprocessor is provided which maintains coherency between a pair of caches accessed from different stages of an instruction processing pipeline. A dependency checking structure is provided within the microprocessor. The dependency checking structure compares memory accesses performed from the execution stage of the instruction processing pipeline to memory accesses performed from the decode stage. The decode stage performs memory accesses to a stack cache, while the execution stage performs its accesses (address for which are formed via indirect addressing) to the stack cache and to a data cache. If a read memory access performed by the execution stage is dependent upon a write memory access performed by the decode stage, the read memory access is stalled until the write memory access completes.
    Type: Grant
    Filed: September 28, 1998
    Date of Patent: February 20, 2001
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Thang M. Tran, David B. Witt, William M. Johnson
  • Patent number: 6189088
    Abstract: The present invention is directed to method and apparatus for reordering load operations in a computer processing system.
    Type: Grant
    Filed: February 3, 1999
    Date of Patent: February 13, 2001
    Assignee: International Business Machines Corporation
    Inventor: Michael K. Gschwind
  • Patent number: 6189089
    Abstract: A superscalar microprocessor includes a reorder buffer to correctly handle dependency checking and multiple updates to the same destination. The reorder buffer stores instructions in program order, and retires instructions that have executed and the results obtained. When a instruction is retired, the results of the instruction are stored and the memory space in the reorder buffer is deallocated. The results of the retired instructions are stored to a register file via a retire bus. If the results of two or more retired instructions output to the same register in the register file, then only the newest instruction, the later instruction in the original program sequence, is stored to the program register. The register file has a plurality of write ports for the transfer of data via the retire bus. If two retired instructions output to the same register, then a write port is not utilized. The retire window is the number of instructions monitored for retirement.
    Type: Grant
    Filed: January 6, 1999
    Date of Patent: February 13, 2001
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Wade A. Walker, D. T. Matheny
  • Patent number: 6189068
    Abstract: A superscalar microprocessor employing a data cache configured to perform store accesses in a single clock cycle is provided. The superscalar microprocessor speculatively stores data within a predicted way of the data cache after capturing the data currently being stored in that predicted way. During a subsequent clock cycle, the cache hit information for the store access validates the way prediction. If the way prediction is correct, then the store is complete, utilizing a single clock cycle of data cache bandwidth. Additionally, the way prediction structure implemented within the data cache bypasses the tag comparisons of the data cache to select data bytes for the output. Therefore, the access time of the associative data cache may be substantially similar to a direct-mapped cache access time. The superscalar microprocessor may therefore be capable of high frequency operation.
    Type: Grant
    Filed: June 28, 1999
    Date of Patent: February 13, 2001
    Assignee: Advanced Micro Devices, Inc.
    Inventors: David B. Witt, Rajiv M. Hattangadi
  • Patent number: 6185671
    Abstract: The present invention discloses a method and apparatus for matching data types of operands in an instruction. A type code of an operand used by the instruction is determined. An attribute value of a storage element which corresponds to the operand is read from a speculative array. This attribute value is then compared with the type code.
    Type: Grant
    Filed: March 31, 1998
    Date of Patent: February 6, 2001
    Assignee: Intel Corporation
    Inventors: Vladimir Pentovski, Gerald Bennett, Stephen A. Fischer, Eric Heit, Glenn J. Hinton, Patrice L. Roussel
  • Patent number: 6185668
    Abstract: An apparatus and method are described for implementing handling of exceptions caused by speculated instructions in a CPU having speculative execution capabilities. A CPU implementing speculative execution contains a speculative bit register file. Each speculative bit in the speculative bit register file is logically associated with a particular general purpose register, while remaining physically separate. This is accomplished through the use of a physically separate register file (the speculative bit register file) and register selection circuitry allowing simultaneous access to the two register files. The present invention provides instruction execution hardware supporting speculative execution with minimal impact on computational and structural complexity.
    Type: Grant
    Filed: December 21, 1995
    Date of Patent: February 6, 2001
    Assignee: Intergraph Corporation
    Inventor: Siamak Arya
  • Patent number: 6185660
    Abstract: An apparatus in a computer, called a pending access queue, for providing data for register load instructions after a cache miss. After a cache miss, when data is available for a register load instruction, the data is first directed to the pending access queue and is provided to an execution pipeline directly from the pending access queue, without requiring the data to be entered in the cache. Entries in the pending access queue include destination register identification, enabling injection of the data into the pipeline during intermediate pipeline phases. The pending access queue provides results to the requesting unit in any order needed, supporting out-of-order cache returns, and provides for arbitration when multiple sources have data ready to be processed. Each separate request to a single line is provided a separate entry, and each entry is provided with its appropriate part of the line as soon as the line is available, thereby rapidly providing data for multiple misses to a single line.
    Type: Grant
    Filed: September 23, 1997
    Date of Patent: February 6, 2001
    Assignee: Hewlett-Packard Company
    Inventors: Dean A. Mulla, Sorin Iacobovici
  • Patent number: 6178496
    Abstract: A converter (130) comprises a multiplex-buffer (410) at a bus (120), a decoder (420), an output buffer (430) and a comparator (440). The multiplex-buffer (410) forwards VMAX bits (260) of Huffman coded code portions (222) from the bus (120) to the decoder (410). The VMAX bits (260) can comprise further bits set to logical “1” or “0” at random. On a control output (414), the multiplex-buffer (410) provides a signal K identifying which of the VMAX bits are valid or invalid. The decoder (420) maps the VMAX bits (260) into a preliminary bit cluster (426) and indicates a code length V regardless whether some or all bits are valid or not. The comparator (440) checks V and K and allows the output buffer (430) to copy the preliminary bit cluster (426) into instruction portions (212) only when the code length fits the identification of valid bits.
    Type: Grant
    Filed: February 17, 1999
    Date of Patent: January 23, 2001
    Assignee: Motorola, Inc.
    Inventors: Rami Natan, Alex Miretsky, Vitaly Sukonik
  • Patent number: 6178492
    Abstract: A data processor comprises an instruction decoding unit having two decoders decoding respective instructions of an instruction group consisting of a plurality of instructions including a first instruction and a second instruction succeeding the first instruction, and a judging unit judging whether or not a combination of the first instruction and the second instruction can be executed in parallel and a bus for transferring two data in parallel between an operand access unit and an integer operation unit. The data processor uses a superscalar technique. Two instructions having an operand interference can be executed in parallel at high speed and two instructions accessing a memory can be executed in parallel without considerable hardware.
    Type: Grant
    Filed: November 9, 1995
    Date of Patent: January 23, 2001
    Assignee: Mitsubishi Denki Kabushiki Kaisha
    Inventor: Masahito Matsuo
  • Patent number: 6170040
    Abstract: A microprocessor of a superscalar structure having a datapath, a data cache, a bus unit and first and second pipelines includes a write buffer equipped in the bus unit and a write back buffer in the data cache to reduce write cycles. The write buffer receives data of a burst write cycle from the write back buffer and data of a single write cycle from the datapath. The write buffer in the microprocessor allows data to be written in the write buffer and then to be written in the external memory when the microprocessor is available for performing an external cycle. The processor includes a state machine to control the write buffer and also includes one write buffer for each of the first and second pipelines in order to diminish the write cycles. The write buffers also include a bit block which indicates whether information in the write buffer is written by a cache miss or a hit in a line having a shared state.
    Type: Grant
    Filed: November 6, 1997
    Date of Patent: January 2, 2001
    Assignee: Hyundai Electronics Industries Co., Ltd.
    Inventors: Suk Joong Lee, Han Heung Kim
  • Patent number: 6170052
    Abstract: Systems, apparatus, and methods are disclosed for generating pairs of conditional instructions corresponding to special predicate sequences from single instructions having a predicate. These pairs of conditional instructions update a destination register regardless of the truth or falsity of the predicate. The destination register is renamed to a new physical location. In this manner, register renaming can be used with predicate sequences to gain performance efficiencies and to overcome limitations of the prior attempted approaches.
    Type: Grant
    Filed: December 31, 1997
    Date of Patent: January 2, 2001
    Assignee: Intel Corporation
    Inventor: Michael J. Morrison
  • Patent number: 6167503
    Abstract: In a superscalar computer system, a plurality of instructions are executed concurrently. The instructions being executed access data stored at addresses of the superscalar computer system. An instruction generator, such as a compiler, partitions the instructions into a plurality of sets. The plurality of sets are disjoint according to the addresses of the data to be accessed by the instructions while executing in the superscalar computer system. The system includes a plurality of clusters for executing the instructions. There is one cluster for each one of the plurality of sets of instructions. Each set of instructions is distributed to the plurality of clusters so that the addresses of the data accessed by the instructions are substantially disjoint among the clusters while immediately executing the instructions. This partitioning and distributing minimizes the number of interconnects between the clusters of the superscalar computer.
    Type: Grant
    Filed: October 6, 1995
    Date of Patent: December 26, 2000
    Assignee: Compaq Computer Corporation
    Inventor: Norman P. Jouppi
  • Patent number: 6163838
    Abstract: A computer processor includes a multiplexer having a first input, a second input, and an output, and a scheduler coupled to the multiplexer first input. The processor further includes an execution unit coupled to the multiplexer output. The execution unit is adapted to receive a plurality of instructions from the multiplexer. The processor further includes a replay system coupled to the second multiplexer input and the scheduler. The replay system replays an instruction that has not correctly executed by sending a stop scheduler signal to the scheduler and sending the instruction to the multiplexer.
    Type: Grant
    Filed: June 30, 1998
    Date of Patent: December 19, 2000
    Assignee: Intel Corporation
    Inventors: Amit A. Merchant, David J. Sager, Darrell D. Boggs
  • Patent number: 6161159
    Abstract: An alternate route to improved multimedia performance without replacing the central processor unit (CPU) is presented, through the utilization of general-purpose components available in a computer. The method relies on the use of integrated circuit memory boards having a data port for directly inputting encoded image signals from an I/O device into the memory. An on-board decoder provided on the IC memory is used to decode the variable-length encoded input signals. This approach enalbes to reduce the computational load on the CPU so that the usual bottleneck which is the slow process of data exchange between the CPU and the memory boards is eliminated. The CPU directly accesses the processed image data in the memory and displays the final image on the monitor. This route to increasing the image processing speed of a computer has considerable merits because it is low cost and is readily applicable to mass-produced IC memories with only a few additional fabrication steps.
    Type: Grant
    Filed: September 25, 1997
    Date of Patent: December 12, 2000
    Assignee: NEC Corporation
    Inventor: Kazumasa Suzuki
  • Patent number: 6157994
    Abstract: A control bit vector storage is provided. The present control bit vector storage (preferably included within a functional unit) stores control bits indicative of a particular instruction. The control bits are divided into multiple control vectors, each vector indicative of one instruction operation. The control bits control dataflow elements within the functional unit to cause the instruction operation to be performed. Additionally, the present control bit vector storage allows complex instructions (or instructions which produce multiple results) to be divided into simpler operations. The hardware included within the functional unit may be reduced to that employed to perform the simpler operations. In one embodiment, the control bit vector storage comprises a plurality of vector storages. Each vector storage comprises a pair of individual vector storages and a shared vector storage. The shared vector storage stores control bits common to both control vectors.
    Type: Grant
    Filed: July 8, 1998
    Date of Patent: December 5, 2000
    Assignee: Advanced Micro Devices, Inc.
    Inventor: Marty L. Pflum
  • Patent number: 6157998
    Abstract: A branch prediction unit apparatus and method uses an instruction buffer (20), a completion unit (24), and a branch prediction unit (BPU) (28). The instruction buffer (20) and/or the completion unit (24) contain a plurality of instruction entries that contain valid bits and stream identifier (SID) bits. The branch prediction unit contains a plurality of branch prediction buffers (28a-28c). The SID bits are used to associate the pending and executing instructions in the units (20 and 24) into instruction streams related to predicted branches located in the buffers (28a-28c). The SID bits as well as age bits associated with the buffers (28a-28c) are used to perform efficient branch prediction, branch resolution/retirement, and branch misprediction recovery.
    Type: Grant
    Filed: April 3, 1998
    Date of Patent: December 5, 2000
    Assignee: Motorola Inc.
    Inventors: Jeffrey Pidge Rupley, II, Marvin A. Denman, Bradley G. Burgess, David C. Holloway
  • Patent number: 6154828
    Abstract: A method and apparatus including means for storing an executable file which includes a group of bits which define functional operations and cycle bits associated with each functional operation and means for completing a variable number of the functional operations in parallel during a single execution cycle in accordance with a state of the associated cycle bit. The method and apparatus eliminates the need for complex data dependency checking hardware and allows a minimum amount of control logic to complete execution of executable files. The method and apparatus further minimizes the necessity of adding null operations (NOPs) to executable files which reduces the amount of storage space necessary to store the executable files and allows executable files to be used on multiple hardware implementations and for register values to be used for multiple purposes during single execution cycles.
    Type: Grant
    Filed: June 3, 1993
    Date of Patent: November 28, 2000
    Assignee: Compaq Computer Corporation
    Inventors: Joseph Dominic Macri, Francis X. McKeen, Joel S. Emer, William Robert Grundmann, Robert P. Nix, David Arthur James Webb, Jr.
  • Patent number: 6148391
    Abstract: Embodiments of the present invention provide a stack renaming method and apparatus for stack based processors. Using principles of the present invention, a stack can be accessed simultaneously by one or more functional units in a stack processor. The stack apparatus includes a stack renaming unit capable of renaming a logical stack address to a real stack address. Each logical stack address corresponds to a storage element in the stack renaming unit which stores a real stack address. A circular counter is used in the stack renaming unit to sequentially cycle through each of the logical stack addresses. The real stack addresses corresponding to each of the logical stack addresses can be stored out of order in the stack renaming unit. A stack control unit is coupled to the stack renaming unit and provides one or more control signals to the stack renaming unit and coordinates the operation of the stack renaming unit within the stack apparatus.
    Type: Grant
    Filed: March 26, 1998
    Date of Patent: November 14, 2000
    Assignee: Sun Microsystems, Inc.
    Inventor: Bruce Petrick
  • Patent number: 6148395
    Abstract: A single-chip multiprocessor (2, 102) is disclosed. The multiprocessor (2, 102) includes multiple central processing units, or CPUs, (10, 110) that share a floating-point unit (5, 105). The floating-point unit (5, 105) may receive floating-point instruction codes from either or both of the multiple CPUs (10, 110) in the multiprocessor (2, 102), and includes circuitry (52) for decoding the floating-point instructions for execution by its execution circuitry (65). A dispatch unit (56) in the floating-point unit (5, 105) performs arbitration between floating-point instructions if more than one of the CPUs (10, 110) is forwarding instructions to the floating-point unit (5, 105) at the same time. Dedicated register banks, preferably in the form of stacks (60), are provided in the floating-point unit (5, 105).
    Type: Grant
    Filed: May 15, 1997
    Date of Patent: November 14, 2000
    Assignee: Texas Instruments Incorporated
    Inventors: Tuan Q. Dao, Donald E. Steiss
  • Patent number: 6134645
    Abstract: Each execution unit within a superscalar processor has an associated completion table that contains a copy of the status of all instructions dispatched but not completed. A central completion table maintains the status of every dispatched instruction as reported by the dispatch unit and the individual execution units. Execution units send finish signals to the completion table responsible for retiring a particular type of instruction. The central completion table retires instructions that may cause an interrupt and instructions whose results may target the same register. The execution units' associated completion tables retire the balance of the instructions and the execution units send instruction status to the central completion table and to each execution unit. This reduces the number of instructions that are retired by the central completion table, increasing the number of instructions retired per clock cycle.
    Type: Grant
    Filed: June 1, 1998
    Date of Patent: October 17, 2000
    Assignee: International Business Machines Corporation
    Inventor: Dung Quoc Nguyen
  • Patent number: 6134646
    Abstract: In a processor, store instructions are divided or cracked into store data and store address generation portions for separate and parallel execution within two execution units. The address generation portion of the store instruction is executed within the load store unit, while the store data portion of the instruction is executed in an execution unit other than the load store unit. If the store instruction is a fixed point execution unit, then the store data portion is executed within the fixed point unit. If the store instruction is a floating point store instruction, then the store data portion of the store instruction is executed within the floating point unit. The store instruction is completed when all older instructions have completed and when all instructions in the instruction group have finished.
    Type: Grant
    Filed: July 29, 1999
    Date of Patent: October 17, 2000
    Assignee: International Business Machines Corp.
    Inventors: Kurt Alan Feiste, Tai Dinh Ngo, Amy May Tuvell
  • Patent number: 6134647
    Abstract: A data processing system includes a plurality of nodes, a serial data bus interconnecting the nodes in series in a closed loop for passing address and data information, and at least one processing node. In one construction, this processing node has a processor, a printed circuit board, a memory partitioned into first and second sections and a local bus connecting the processor, a block sharable memory section of the memory, and the printed circuit board. The local bus is used for transferring data in parallel from the processor to a directly sharable memory section of the memory on the printed circuit board and for transferring data from the block sharable memory to the printed circuit board.
    Type: Grant
    Filed: April 14, 1999
    Date of Patent: October 17, 2000
    Assignee: Sun Microsystems, Inc.
    Inventors: John D. Acton, Michael D. Derbish, Gavin G. Gibson, Jack M. Hardy, Jr., Hugh M. Humphreys, Steven P. Kent, Steven E. Schelong, Ricardo Yong, William B. DeRolf
  • Patent number: 6134651
    Abstract: A reorder buffer is configured into multiple lines of storage, wherein a line of storage includes sufficient storage for instruction results regarding a predefined maximum number of concurrently dispatchable instructions. A line of storage is allocated whenever one or more instructions are dispatched. A microprocessor employing the reorder buffer is also configured with fixed, symmetrical issue positions. The symmetrical nature of the issue positions may increase the average number of instructions to be concurrently dispatched and executed by the microprocessor. The average number of unused locations within the line decreases as the average number of concurrently dispatched instructions increases. One particular implementation of the reorder buffer includes a future file. The future file comprises a storage location corresponding to each register within the microprocessor.
    Type: Grant
    Filed: December 10, 1999
    Date of Patent: October 17, 2000
    Assignee: Advanced Micro Devices, Inc.
    Inventors: David B. Witt, Thang M. Tran
  • Patent number: 6128722
    Abstract: An apparatus for integer exception register (XER) renaming and methods of using the same are implemented. In a central processing unit (CPU) having a pipelined architecture, integer instructions that use or update the XER may be executed out-of-order using the XER renaming mechanism. Any instruction that updates the XER has an associated instruction identifier (IID) stored in a register. Subsequent instructions that use data in the XER use the stored IID to determine when the XER data has been updated by the execution of the instruction corresponding to the stored IID. As each instruction that updates XER data is executed, the data is stored in an XER rename buffer. Instructions using XER data then obtain the updated, valid, XER data from the rename buffer. In this way, these instructions can obtain valid XER data prior to completion of the preceding instructions. The XER data is retrieved from the XER rename buffer by indexing into the buffer by using an index derived from the stored IID.
    Type: Grant
    Filed: February 13, 1998
    Date of Patent: October 3, 2000
    Assignee: International Business Machines Corporation
    Inventors: Richard Edmund Fry, Dung Quoc Nguyen, Albert Thomas Williams
  • Patent number: 6128721
    Abstract: A processor method and apparatus. The processor has an execution pipeline, a register file and a controller. The execution pipeline is for executing an instruction and has a first stage for generating a first result and a last stage for generating a final result. The register file is for storing the first result and the final result. The controller makes the first result stored in the register file available in the event that the first result is needed for the execution of a subsequent instruction. By storing the result of the first stage in the register file, the length of the execution pipeline is reduced from that of the prior art. Furthermore, logic required for providing inputs to the execution pipeline is greatly simplified over that required by the prior art.
    Type: Grant
    Filed: November 17, 1993
    Date of Patent: October 3, 2000
    Assignee: Sun Microsystems, Inc.
    Inventors: Robert Yung, William N. Joy, Marc Tremblay
  • Patent number: 6128723
    Abstract: A high-performance, superscalar-based computer system with out-of-order instruction execution for enhanced resource utilization and performance throughput. The computer system fetches a plurality of fixed length instructions with a specified, sequential program order (in-order). The computer system includes an instruction execution unit including a register file, a plurality of functional units, and an instruction control unit for examining the instructions and scheduling the instructions for out-of-order execution by the functional units. The register file includes a set of temporary data registers that are utilized by the instruction execution control unit to receive data results generated by the functional units. The data results of each executed instruction are stored in the temporary data registers until all prior instructions have been executed, thereby retiring the executed instructions in-order.
    Type: Grant
    Filed: May 11, 1999
    Date of Patent: October 3, 2000
    Assignee: Seiko Epson Corporation
    Inventors: Le Trong Nguyen, Derek J. Lentz, Yoshiyuki Miyayama, Sanjiv Garg, Yasuaki Hagiwara, Johannes Wang, Te-Li Lau, Sze-Shun Wang, Quang H. Trang
  • Patent number: 6122727
    Abstract: An instruction queue is physically divided into two (or more) instruction queues. Each instruction queue is configured to store a dependency vector for each instruction operation stored in that instruction queue. The dependency vector is evaluated to determine if the corresponding instruction operation may be scheduled for execution. Instruction scheduling logic in each physical queue may schedule instruction operations based on the instruction operations stored in that physical queue independent of the scheduling logic in other queues. The instruction queues evaluate the dependency vector in portions, during different phases of the clock. During a first phase, a first instruction queue evaluates a first portion of the dependency vectors and generates a set of intermediate scheduling request signals. During a second phase, the first instruction queue evaluates a second portion of the dependency vector and the intermediate scheduling request signal to generate a scheduling request signal.
    Type: Grant
    Filed: August 24, 1998
    Date of Patent: September 19, 2000
    Assignee: Advanced Micro Devices, Inc.
    Inventor: David B. Witt
  • Patent number: 6122725
    Abstract: A method and apparatus are provided for executing scalar packed data instructions. According to one aspect of the invention, a processor includes a plurality of registers, a register renaming unit coupled to the plurality of registers, and a decoder coupled to the register renaming unit. The register renaming unit provides an architectural register file to store packed data operands each of which include a plurality of data elements. The decoder is configured to decode a first and second set of instructions (e.g., a set of full-width packed data instructions and a set of partial-width packed data instructions) that each specify one or more registers in the architectural register file. Each of the instructions in the first set of instructions specify operations to be performed on all of the data elements stored in the one or more specified registers.
    Type: Grant
    Filed: March 31, 1998
    Date of Patent: September 19, 2000
    Assignee: Intel Corporation
    Inventors: Patrice Roussel, Ticky Thakkar, Mohammad A. Abdallah, Vladimir Pentkovski, James Coke
  • Patent number: 6122721
    Abstract: A reservation station with format conversion logic enables the implementation of a superscalar computer processing system which incorporates both a floating point functional unit and non-floating point functional units. By converting operand data in a floating point reservation station from external formats to an internal floating point format, a system incorporating such a floating point reservation station enables the representation of operand data in uniform external formats outside floating point arithmetic units (e.g., in a reorder buffer, on operand and result busses, and within non-floating functional units) while also enabling the use of a specialized internal representation (internal floating point format) within floating point arithmetic units.
    Type: Grant
    Filed: March 1, 1999
    Date of Patent: September 19, 2000
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Michael D. Goddard, Kelvin D. Goveas, Norman Bujanos
  • Patent number: 6115812
    Abstract: An apparatus and method for performing vertical parallel operations on packed data is described. A first set of data operands and a second set of data operands are accessed. Each of these sets of data represents graphical data stored in a first format. The first set of data operands is convereted into a converted set and the second set of data operands is replicated to generate a replicated set. A vertical matrix multiplication is performed on the converted set and the replicated set to generate transformed graphical data.
    Type: Grant
    Filed: April 1, 1998
    Date of Patent: September 5, 2000
    Assignee: Intel Corporation
    Inventors: Mohammad Abdallah, Thomas Huff, Gregory C. Parrish, Shreekant S. Thakkar
  • Patent number: 6112289
    Abstract: A data processor comprises an instruction decoding unit having two decoders decoding respective instructions of an instruction group consisting of a plurality of instructions including a first instruction and a second instruction succeeding the first instruction, and a judging unit judging whether or not a combination of the first instruction and the second instruction can be executed in parallel and a bus for transferring two data in parallel between an operand access unit and an integer operation unit. The data processor uses a superscalar technique. Two instructions having an operand interference can be executed in parallel at high speed and two instructions accessing a memory can be executed in parallel without considerable hardware.
    Type: Grant
    Filed: August 28, 1998
    Date of Patent: August 29, 2000
    Assignee: Mitsubishi Denki Kabushiki Kaisha
    Inventor: Masahito Matsuo
  • Patent number: 6108760
    Abstract: A method and an apparatus for position independent reconfiguration in a network of multiple context processing elements are provided. Wach multiple context processing element in a networked array of multiple context processing elements has an assigned physical identification. Virtual identifications may also be assigned to a number of the multiple context processing elements. Data is transmitted to at least one of the multiple context processing elements of the array, the data comprising control data, configuration data, an address mask, and a destination identification. The transmitted address mask is applied to either the physical or virtual identification and to a destination identification. The masked physical or virtual identification is compared to the masked destination identification.
    Type: Grant
    Filed: October 31, 1997
    Date of Patent: August 22, 2000
    Assignee: Silicon Spice
    Inventors: Ethan Mirsky, Robert French, Ian Eslick
  • Patent number: 6106573
    Abstract: A microprocessor implements an instruction tracing mechanism that saves the state of the microprocessor without special hardware. Prior to the execution of a traced instruction, a trace microcode routine is implemented that saves the state of the microprocessor to external memory. The state information saved by the trace microcode routine varies depending upon the amount of data needed by the end user. After the state of the processor has been saved, the trace instruction is executed. State information that changed during the execution of the trace instruction is saved to memory prior to a subsequent instruction. The trace instruction mechanism advantageously requires minimal special hardware and expedites the saving of the processor state information.
    Type: Grant
    Filed: May 14, 1999
    Date of Patent: August 22, 2000
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Rupaka Mahalingaiah, James K. Pickett
  • Patent number: 6108769
    Abstract: A dependency table stores a reorder buffer tag for each register. The stored reorder buffer tag corresponds to the last of the instructions within the reorder buffer (in program order) to update the register. Otherwise, the dependency table indicates that the value stored in the register is valid. When operand fetch is performed for a set of concurrently decoded instructions, dependency checking is performed including checking for dependencies between the set of concurrently decoded instructions as well as accessing the dependency table to select the reorder buffer tag stored therein. Either the reorder buffer tag of one of the concurrently decoded instructions, the reorder buffer tag stored in the dependency table, the instruction result corresponding to the stored reorder buffer tag, or the value from the register itself is forwarded as the source operand for the instruction. Information from the comparators and the information stored in the dependency table is sufficient to select which value is forwarded.
    Type: Grant
    Filed: May 17, 1996
    Date of Patent: August 22, 2000
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Muralidharan S. Chinnakonda, Thang M. Tran, Wade A. Walker
  • Patent number: 6105127
    Abstract: A multithreaded processor for executing multiple instruction streams is provided. This multithreaded processor includes: a plurality of functional units for executing instructions; a plurality of instruction decode units, corresponding to the multiple instruction streams on a one-to-one basis, for respectively decoding an instruction, and producing an instruction issue request for designating to which functional unit the decoded instruction should be issued and requesting for the issuance of the decoded instruction to the designated functional unit; a holding unit for holding the priority level of each instruction stream; and a control unit for deciding which decoded instruction should be issued to a functional unit designated by two or more instruction issue requests at the same time, in accordance with the priority levels held by the holding unit.
    Type: Grant
    Filed: August 27, 1997
    Date of Patent: August 15, 2000
    Assignee: Matsushita Electric Industrial Co., Ltd.
    Inventors: Kozo Kimura, Tokuzo Kiyohara, Kousuke Yoshioka
  • Patent number: 6101594
    Abstract: A high-performance, superscalar-based computer system with out-of-order instruction execution for enhanced resource utilization and performance throughput. The computer system fetches a plurality of fixed length instructions with a specified, sequential program order (in-order). The computer system includes an instruction execution unit including a register file, a plurality of functional units, and an instruction control unit for examining the instructions and scheduling the instructions for out-of-order execution by the functional units. The register file includes a set of temporary data registers that are utilized by the instruction execution control unit to receive data results generated by the functional units. The data results of each executed instruction are stored in the temporary data registers until all prior instructions have been executed, thereby retiring the executed instructions in-order.
    Type: Grant
    Filed: May 11, 1999
    Date of Patent: August 8, 2000
    Assignee: Seiko Epson Corporation
    Inventors: Le Trong Nguyen, Derek J. Lentz, Yoshiyuki Miyayama, Sanjiv Garg, Yasuaki Hagiwara, Johannes Wang, Te-Li Lau, Sze-Shun Wang, Quang H. Trang
  • Patent number: 6101595
    Abstract: An instruction fetch unit that employs sequential way prediction. The instruction fetch unit comprises a control unit configured to convey a first index and a first way to an instruction cache in a first clock cycle. The first index and first way select a first group of contiguous instruction bytes within the instruction cache, as well as a corresponding branch prediction block. The branch prediction block is stored in a branch prediction storage, and includes a predicted sequential way value. The control unit is further configured to convey a second index and a second way to the instruction cache in a second clock cycle succeeding the first clock cycle. This second index and second way select a second group of contiguous instruction bytes from the instruction cache.
    Type: Grant
    Filed: February 8, 1999
    Date of Patent: August 8, 2000
    Assignee: Advanced Micro Devices, Inc.
    Inventors: James K. Pickett, Thang M. Tran
  • Patent number: 6098168
    Abstract: A mechanism structured to check for instruction collisions at the Dispatch Unit rather than the Completion Unit. In processors which issue multiple commands simultaneously, a flag bit is sent to the Completion Unit and attached to the instruction in the queue that follows the other in program order if they both have the same targeted address. When the instructions from position 1 and position 2 of the instruction queue are ready to issue, the Completion Unit checks position 2 for a flag bit. If there is a bit, then the instruction in position 1 is discarded and the instruction in position 2 is written to the target address. If there is no flag bit with the instruction in position 2, the instruction in position 1 is written to the target register. This method eliminates the need to compare all the targeted addresses that are associated with the rename registers. It requires two comparisons instead of a minimum of 15 comparisons.
    Type: Grant
    Filed: March 24, 1998
    Date of Patent: August 1, 2000
    Assignee: International Business Machines Corporation
    Inventors: Lee Evan Eisen, Michael Putrino
  • Patent number: 6094719
    Abstract: In an out-of-order processor having single-precision floating-point registers aliased into double-precision floating-point registers, a single-precision floating-point arithmetic operation having four possible register dependencies is converted into two microinstructions which are processed normally within the processor. The first microinstruction is coded to perform the arithmetic operation specified by the single-precision instruction using the first and second source registers specified and storing the result in a phantom register. The second microinstruction is coded for merging the contents of the phantom register and the destination register specified. Each microinstruction has at most two possible register dependencies, thereby reducing the total number of register dependencies which the processor is required to track.
    Type: Grant
    Filed: June 25, 1997
    Date of Patent: July 25, 2000
    Assignee: Sun Microsystems, Inc.
    Inventor: Ramesh Panwar
  • Patent number: 6094716
    Abstract: An apparatus for accelerating move operations includes a lookahead unit which detects move instructions prior to the execution of the move instructions (e.g. upon selection of the move operations for dispatch within a processor). Upon detecting a move instruction, the lookahead unit signals a register rename unit, which reassigns the rename register associated with the source register to the destination register. In one particular embodiment, the lookahead unit attempts to accelerate moves from a base pointer register to a stack pointer register (and vice versa). An embodiment of the lookahead unit generates lookahead values for the stack pointer register by maintaining cumulative effects of the increments and decrements of previously dispatched instructions. The cumulative effects of the increments and decrements prior to a particular instruction may be added to a previously generated value of the stack pointer register to generate a lookahead value for that particular instruction.
    Type: Grant
    Filed: July 14, 1998
    Date of Patent: July 25, 2000
    Assignee: Advanced Micro Devices, Inc.
    Inventor: David B. Witt
  • Patent number: 6092175
    Abstract: A method and organization for implementing the registers required in a computer system supporting multithreading and dynamic out-of-order execution. Multithreaded computer systems are those in which the processor supports multiple contexts (threads), and either rapid context switching from thread to thread or scheduling of instructions from different threads within a single cycle. An important component of processors for such systems is the register file; the processor needs a large register file or resource to provide the registers used for the threads. One form of the invention maintains a set of private architecturally specified registers, and a set of private renaming register for each different thread. In the other three embodiments, sharing of renaming registers between different threads is permitted, to enable a reduction in the total number of registers required.
    Type: Grant
    Filed: April 2, 1998
    Date of Patent: July 18, 2000
    Assignee: University of Washington
    Inventors: Henry M. Levy, Susan J. Eggers, Jack Lo, Dean M. Tullsen