Superscalar Patents (Class 712/23)
-
Patent number: 6212619Abstract: A superscalar computer architecture for executing instructions out-of-order, comprising a multiplicity of execution units, a plurality of registers, and a register renaming circuit which generates a list of tags corresponding to specific registers that are not in use during loading of a given instruction. A table is constructed having one bit for each register per instruction in flight. The entries in the table may be combined in a logical OR fashion to create a vector that identifies which registers are in use by instructions that are in flight. Validity bits can also be generated to indicate validity of the generated tags, wherein a generated tag is invalid only if an insufficient number of registers are available during loading of the given instruction. The execution units are preferably pipelined.Type: GrantFiled: May 11, 1998Date of Patent: April 3, 2001Assignee: International Business Machines CorporationInventors: Sang Hoo Dhong, Harm Peter Hofstee, Kevin John Nowka, Joel Abraham Silberman
-
Patent number: 6212626Abstract: A computer processor that has a checker for receiving an instruction. The checker includes a scoreboard, an input for receiving an external replay signal, and decision logic coupled to the scoreboard and the input. The decision logic determines whether the instruction executed correctly based on both the scoreboard and the external replay signal.Type: GrantFiled: December 30, 1998Date of Patent: April 3, 2001Assignee: Intel CorporationInventors: Amit A. Merchant, David J. Sager
-
Patent number: 6209020Abstract: A computer system architecture in which each processor has its own memory, strategically distributed along the stages of an execution pipeline of the processor, to provide fast access to often used information, such as the contents of the address and data registers, the program counter, etc. Memory storage is strategically located in close physical proximity to a stage in an execution pipeline at which memory is commonly or repeatedly accessed. Coupled to the pipeline at various stages are small memory cells for storing information that is consistently and repeatedly requested at that stage in the execution pipeline. The speed of the execution pipeline in a processor is critical to overall performance of the processor and the computer architecture of the present invention as a whole. To that end, the clock cycle time at which the pipeline is operated is increased as much as the operating characteristics of the logic and associated circuitry will allow.Type: GrantFiled: September 20, 1996Date of Patent: March 27, 2001Assignee: Nortel Networks LimitedInventors: Richard L. Angle, Edward S. Harriman, Jr., Geoffrey B. Ladwig
-
Patent number: 6209084Abstract: A dependency table stores a reorder buffer tag for each register. When operand fetch is performed for a set of concurrently decoded instructions, dependency checking is performed including checking for dependencies between the set of concurrently decoded instructions as well as accessing the dependency table to select the reorder buffer tag stored therein. Either the reorder buffer tag of one of the concurrently decoded instructions, the reorder buffer tag stored in the dependency table, the instruction result corresponding to the stored reorder buffer tag, or the value from the register itself is forwarded as the source operand for the instruction. The dependency table stores the width of the register being updated. Prior to forwarding the reorder buffer tag stored within the dependency table, the width stored therein is compared to the width of the source operand being requested.Type: GrantFiled: May 5, 2000Date of Patent: March 27, 2001Assignee: Advanced Micro Devices, Inc.Inventors: Muralidharan S. Chinnakonda, Thang M. Tran, Wade A. Walker
-
Patent number: 6209081Abstract: A method and system for permitting nonsequential instruction dispatch in a superscalar processor system which dispatches sequentially ordered multiple instructions simultaneously to a group of execution units on an opportunistic basis for execution and placement of results thereof within specified general purpose registers. Each instruction generally includes at least one source operand and one destination operand. A plurality of intermediate storage buffers are provided and each time an instruction is dispatched to an available execution unit, a particular one of the intermediate storage buffers is assigned to any destination operand within the dispatched instruction, permitting the results of the execution of each instruction to be stored within an intermediate storage buffer.Type: GrantFiled: June 7, 1994Date of Patent: March 27, 2001Assignee: International Business Machines CorporationInventors: James Allan Kahle, Donald Emil Waldecker
-
Patent number: 6205542Abstract: The invention provides a method for executing instructions. The method includes dispatching and executing a first and second plurality of instructions in a portion of a pipeline without first determining whether stages of the portion of the pipeline are ready. The method further includes determining if an execution problem is encountered and replaying the first plurality of instructions in response to determining that the first plurality of instructions encountered an execution problem. The invention also provides a processor pipeline. The processor pipeline includes a front end to fetch a plurality of instructions for execution and a back end to execute the plurality of instructions fetched by the front end. The back end includes a retirement stage to determine if an instruction had an execution problem. The back end is non-stallable.Type: GrantFiled: January 14, 1999Date of Patent: March 20, 2001Assignee: Intel CorporationInventors: Edward T. Grochowski, Derrick C. Lin
-
Patent number: 6195746Abstract: Dynamically typed registers in a processor are provided by associating a type specifier with a register specifier for each register in the processor, storing the register specifiers and associated type specifiers in a register type table. The type specifier associated with an operand register of an instruction is employed to dispatch the instruction to an appropriate execution unit within the processor. The results of the instruction are stored in a register having an associated type specifier matching the execution unit type. Register specifiers are dynamically allocated to particular execution units within the processor by altering the type specifier associated with the register specifiers. Register values may be either discarded or converted when the register specifier type is altered. A general instruction allows conversion of the value from one type to another without storing the converted value in memory.Type: GrantFiled: January 31, 1997Date of Patent: February 27, 2001Assignee: International Business Machines CorporationInventor: Ravindra Kumar Nair
-
Patent number: 6195744Abstract: A superscalar processor includes a scheduler which selects operations for out-of-order execution. The scheduler contains storage and control logic which is partitioned into entries corresponding to operations to be executed, being executed, or completed. The scheduler issues operations to execution units for parallel pipelined execution, selects and provides operands as required for execution, and acts as a reorder buffer keeping the results of operations until the results can be safely committed. The scheduler is tightly coupled to execution pipelines and provides a large parallel path for initial operation stages which minimize pipeline bottlenecks and hold ups into and out of the execution units. The scheduler monitors the entries to determine when all operands required for execution of an operation are available and provides required operands to the execution units. The operands selected can be from a register file, a scheduler entry, or an execution unit.Type: GrantFiled: February 18, 1999Date of Patent: February 27, 2001Assignee: Advanced Micro Devices, Inc.Inventors: John G. Favor, Amos Ben-Meir, Warren G. Stapleton
-
Patent number: 6192461Abstract: One aspect of the invention relates to an apparatus for processing a store instruction on a superscalar processor that employs in-order completion of instructions, the processor having an instruction dispatch unit, an architected register file, a rename register file, a load store unit, a completion unit and cache memory. In one embodiment of the invention, the apparatus includes a pointer queue having an entry corresponding to the store instruction, the entry containing a pointer to the entries in the architected and rename register files that contain data required by the store instruction; and a multiplexer coupled to read ports on the architected and rename register files so that data can be passed from one of the register files into an entry in a data queue, the data queue being coupled to the cache memory.Type: GrantFiled: January 30, 1998Date of Patent: February 20, 2001Assignee: International Business Machines CorporationInventors: Barry D. Williamson, Jim E. Phillips, Dq Nguyen
-
Patent number: 6192462Abstract: A superscalar microprocessor is provided which maintains coherency between a pair of caches accessed from different stages of an instruction processing pipeline. A dependency checking structure is provided within the microprocessor. The dependency checking structure compares memory accesses performed from the execution stage of the instruction processing pipeline to memory accesses performed from the decode stage. The decode stage performs memory accesses to a stack cache, while the execution stage performs its accesses (address for which are formed via indirect addressing) to the stack cache and to a data cache. If a read memory access performed by the execution stage is dependent upon a write memory access performed by the decode stage, the read memory access is stalled until the write memory access completes.Type: GrantFiled: September 28, 1998Date of Patent: February 20, 2001Assignee: Advanced Micro Devices, Inc.Inventors: Thang M. Tran, David B. Witt, William M. Johnson
-
Patent number: 6189088Abstract: The present invention is directed to method and apparatus for reordering load operations in a computer processing system.Type: GrantFiled: February 3, 1999Date of Patent: February 13, 2001Assignee: International Business Machines CorporationInventor: Michael K. Gschwind
-
Patent number: 6189089Abstract: A superscalar microprocessor includes a reorder buffer to correctly handle dependency checking and multiple updates to the same destination. The reorder buffer stores instructions in program order, and retires instructions that have executed and the results obtained. When a instruction is retired, the results of the instruction are stored and the memory space in the reorder buffer is deallocated. The results of the retired instructions are stored to a register file via a retire bus. If the results of two or more retired instructions output to the same register in the register file, then only the newest instruction, the later instruction in the original program sequence, is stored to the program register. The register file has a plurality of write ports for the transfer of data via the retire bus. If two retired instructions output to the same register, then a write port is not utilized. The retire window is the number of instructions monitored for retirement.Type: GrantFiled: January 6, 1999Date of Patent: February 13, 2001Assignee: Advanced Micro Devices, Inc.Inventors: Wade A. Walker, D. T. Matheny
-
Patent number: 6189068Abstract: A superscalar microprocessor employing a data cache configured to perform store accesses in a single clock cycle is provided. The superscalar microprocessor speculatively stores data within a predicted way of the data cache after capturing the data currently being stored in that predicted way. During a subsequent clock cycle, the cache hit information for the store access validates the way prediction. If the way prediction is correct, then the store is complete, utilizing a single clock cycle of data cache bandwidth. Additionally, the way prediction structure implemented within the data cache bypasses the tag comparisons of the data cache to select data bytes for the output. Therefore, the access time of the associative data cache may be substantially similar to a direct-mapped cache access time. The superscalar microprocessor may therefore be capable of high frequency operation.Type: GrantFiled: June 28, 1999Date of Patent: February 13, 2001Assignee: Advanced Micro Devices, Inc.Inventors: David B. Witt, Rajiv M. Hattangadi
-
Patent number: 6185671Abstract: The present invention discloses a method and apparatus for matching data types of operands in an instruction. A type code of an operand used by the instruction is determined. An attribute value of a storage element which corresponds to the operand is read from a speculative array. This attribute value is then compared with the type code.Type: GrantFiled: March 31, 1998Date of Patent: February 6, 2001Assignee: Intel CorporationInventors: Vladimir Pentovski, Gerald Bennett, Stephen A. Fischer, Eric Heit, Glenn J. Hinton, Patrice L. Roussel
-
Patent number: 6185668Abstract: An apparatus and method are described for implementing handling of exceptions caused by speculated instructions in a CPU having speculative execution capabilities. A CPU implementing speculative execution contains a speculative bit register file. Each speculative bit in the speculative bit register file is logically associated with a particular general purpose register, while remaining physically separate. This is accomplished through the use of a physically separate register file (the speculative bit register file) and register selection circuitry allowing simultaneous access to the two register files. The present invention provides instruction execution hardware supporting speculative execution with minimal impact on computational and structural complexity.Type: GrantFiled: December 21, 1995Date of Patent: February 6, 2001Assignee: Intergraph CorporationInventor: Siamak Arya
-
Patent number: 6185660Abstract: An apparatus in a computer, called a pending access queue, for providing data for register load instructions after a cache miss. After a cache miss, when data is available for a register load instruction, the data is first directed to the pending access queue and is provided to an execution pipeline directly from the pending access queue, without requiring the data to be entered in the cache. Entries in the pending access queue include destination register identification, enabling injection of the data into the pipeline during intermediate pipeline phases. The pending access queue provides results to the requesting unit in any order needed, supporting out-of-order cache returns, and provides for arbitration when multiple sources have data ready to be processed. Each separate request to a single line is provided a separate entry, and each entry is provided with its appropriate part of the line as soon as the line is available, thereby rapidly providing data for multiple misses to a single line.Type: GrantFiled: September 23, 1997Date of Patent: February 6, 2001Assignee: Hewlett-Packard CompanyInventors: Dean A. Mulla, Sorin Iacobovici
-
Patent number: 6178496Abstract: A converter (130) comprises a multiplex-buffer (410) at a bus (120), a decoder (420), an output buffer (430) and a comparator (440). The multiplex-buffer (410) forwards VMAX bits (260) of Huffman coded code portions (222) from the bus (120) to the decoder (410). The VMAX bits (260) can comprise further bits set to logical “1” or “0” at random. On a control output (414), the multiplex-buffer (410) provides a signal K identifying which of the VMAX bits are valid or invalid. The decoder (420) maps the VMAX bits (260) into a preliminary bit cluster (426) and indicates a code length V regardless whether some or all bits are valid or not. The comparator (440) checks V and K and allows the output buffer (430) to copy the preliminary bit cluster (426) into instruction portions (212) only when the code length fits the identification of valid bits.Type: GrantFiled: February 17, 1999Date of Patent: January 23, 2001Assignee: Motorola, Inc.Inventors: Rami Natan, Alex Miretsky, Vitaly Sukonik
-
Patent number: 6178492Abstract: A data processor comprises an instruction decoding unit having two decoders decoding respective instructions of an instruction group consisting of a plurality of instructions including a first instruction and a second instruction succeeding the first instruction, and a judging unit judging whether or not a combination of the first instruction and the second instruction can be executed in parallel and a bus for transferring two data in parallel between an operand access unit and an integer operation unit. The data processor uses a superscalar technique. Two instructions having an operand interference can be executed in parallel at high speed and two instructions accessing a memory can be executed in parallel without considerable hardware.Type: GrantFiled: November 9, 1995Date of Patent: January 23, 2001Assignee: Mitsubishi Denki Kabushiki KaishaInventor: Masahito Matsuo
-
Patent number: 6170040Abstract: A microprocessor of a superscalar structure having a datapath, a data cache, a bus unit and first and second pipelines includes a write buffer equipped in the bus unit and a write back buffer in the data cache to reduce write cycles. The write buffer receives data of a burst write cycle from the write back buffer and data of a single write cycle from the datapath. The write buffer in the microprocessor allows data to be written in the write buffer and then to be written in the external memory when the microprocessor is available for performing an external cycle. The processor includes a state machine to control the write buffer and also includes one write buffer for each of the first and second pipelines in order to diminish the write cycles. The write buffers also include a bit block which indicates whether information in the write buffer is written by a cache miss or a hit in a line having a shared state.Type: GrantFiled: November 6, 1997Date of Patent: January 2, 2001Assignee: Hyundai Electronics Industries Co., Ltd.Inventors: Suk Joong Lee, Han Heung Kim
-
Patent number: 6170052Abstract: Systems, apparatus, and methods are disclosed for generating pairs of conditional instructions corresponding to special predicate sequences from single instructions having a predicate. These pairs of conditional instructions update a destination register regardless of the truth or falsity of the predicate. The destination register is renamed to a new physical location. In this manner, register renaming can be used with predicate sequences to gain performance efficiencies and to overcome limitations of the prior attempted approaches.Type: GrantFiled: December 31, 1997Date of Patent: January 2, 2001Assignee: Intel CorporationInventor: Michael J. Morrison
-
Patent number: 6167503Abstract: In a superscalar computer system, a plurality of instructions are executed concurrently. The instructions being executed access data stored at addresses of the superscalar computer system. An instruction generator, such as a compiler, partitions the instructions into a plurality of sets. The plurality of sets are disjoint according to the addresses of the data to be accessed by the instructions while executing in the superscalar computer system. The system includes a plurality of clusters for executing the instructions. There is one cluster for each one of the plurality of sets of instructions. Each set of instructions is distributed to the plurality of clusters so that the addresses of the data accessed by the instructions are substantially disjoint among the clusters while immediately executing the instructions. This partitioning and distributing minimizes the number of interconnects between the clusters of the superscalar computer.Type: GrantFiled: October 6, 1995Date of Patent: December 26, 2000Assignee: Compaq Computer CorporationInventor: Norman P. Jouppi
-
Patent number: 6163838Abstract: A computer processor includes a multiplexer having a first input, a second input, and an output, and a scheduler coupled to the multiplexer first input. The processor further includes an execution unit coupled to the multiplexer output. The execution unit is adapted to receive a plurality of instructions from the multiplexer. The processor further includes a replay system coupled to the second multiplexer input and the scheduler. The replay system replays an instruction that has not correctly executed by sending a stop scheduler signal to the scheduler and sending the instruction to the multiplexer.Type: GrantFiled: June 30, 1998Date of Patent: December 19, 2000Assignee: Intel CorporationInventors: Amit A. Merchant, David J. Sager, Darrell D. Boggs
-
Patent number: 6161159Abstract: An alternate route to improved multimedia performance without replacing the central processor unit (CPU) is presented, through the utilization of general-purpose components available in a computer. The method relies on the use of integrated circuit memory boards having a data port for directly inputting encoded image signals from an I/O device into the memory. An on-board decoder provided on the IC memory is used to decode the variable-length encoded input signals. This approach enalbes to reduce the computational load on the CPU so that the usual bottleneck which is the slow process of data exchange between the CPU and the memory boards is eliminated. The CPU directly accesses the processed image data in the memory and displays the final image on the monitor. This route to increasing the image processing speed of a computer has considerable merits because it is low cost and is readily applicable to mass-produced IC memories with only a few additional fabrication steps.Type: GrantFiled: September 25, 1997Date of Patent: December 12, 2000Assignee: NEC CorporationInventor: Kazumasa Suzuki
-
Patent number: 6157994Abstract: A control bit vector storage is provided. The present control bit vector storage (preferably included within a functional unit) stores control bits indicative of a particular instruction. The control bits are divided into multiple control vectors, each vector indicative of one instruction operation. The control bits control dataflow elements within the functional unit to cause the instruction operation to be performed. Additionally, the present control bit vector storage allows complex instructions (or instructions which produce multiple results) to be divided into simpler operations. The hardware included within the functional unit may be reduced to that employed to perform the simpler operations. In one embodiment, the control bit vector storage comprises a plurality of vector storages. Each vector storage comprises a pair of individual vector storages and a shared vector storage. The shared vector storage stores control bits common to both control vectors.Type: GrantFiled: July 8, 1998Date of Patent: December 5, 2000Assignee: Advanced Micro Devices, Inc.Inventor: Marty L. Pflum
-
Patent number: 6157998Abstract: A branch prediction unit apparatus and method uses an instruction buffer (20), a completion unit (24), and a branch prediction unit (BPU) (28). The instruction buffer (20) and/or the completion unit (24) contain a plurality of instruction entries that contain valid bits and stream identifier (SID) bits. The branch prediction unit contains a plurality of branch prediction buffers (28a-28c). The SID bits are used to associate the pending and executing instructions in the units (20 and 24) into instruction streams related to predicted branches located in the buffers (28a-28c). The SID bits as well as age bits associated with the buffers (28a-28c) are used to perform efficient branch prediction, branch resolution/retirement, and branch misprediction recovery.Type: GrantFiled: April 3, 1998Date of Patent: December 5, 2000Assignee: Motorola Inc.Inventors: Jeffrey Pidge Rupley, II, Marvin A. Denman, Bradley G. Burgess, David C. Holloway
-
Patent number: 6154828Abstract: A method and apparatus including means for storing an executable file which includes a group of bits which define functional operations and cycle bits associated with each functional operation and means for completing a variable number of the functional operations in parallel during a single execution cycle in accordance with a state of the associated cycle bit. The method and apparatus eliminates the need for complex data dependency checking hardware and allows a minimum amount of control logic to complete execution of executable files. The method and apparatus further minimizes the necessity of adding null operations (NOPs) to executable files which reduces the amount of storage space necessary to store the executable files and allows executable files to be used on multiple hardware implementations and for register values to be used for multiple purposes during single execution cycles.Type: GrantFiled: June 3, 1993Date of Patent: November 28, 2000Assignee: Compaq Computer CorporationInventors: Joseph Dominic Macri, Francis X. McKeen, Joel S. Emer, William Robert Grundmann, Robert P. Nix, David Arthur James Webb, Jr.
-
Patent number: 6148391Abstract: Embodiments of the present invention provide a stack renaming method and apparatus for stack based processors. Using principles of the present invention, a stack can be accessed simultaneously by one or more functional units in a stack processor. The stack apparatus includes a stack renaming unit capable of renaming a logical stack address to a real stack address. Each logical stack address corresponds to a storage element in the stack renaming unit which stores a real stack address. A circular counter is used in the stack renaming unit to sequentially cycle through each of the logical stack addresses. The real stack addresses corresponding to each of the logical stack addresses can be stored out of order in the stack renaming unit. A stack control unit is coupled to the stack renaming unit and provides one or more control signals to the stack renaming unit and coordinates the operation of the stack renaming unit within the stack apparatus.Type: GrantFiled: March 26, 1998Date of Patent: November 14, 2000Assignee: Sun Microsystems, Inc.Inventor: Bruce Petrick
-
Patent number: 6148395Abstract: A single-chip multiprocessor (2, 102) is disclosed. The multiprocessor (2, 102) includes multiple central processing units, or CPUs, (10, 110) that share a floating-point unit (5, 105). The floating-point unit (5, 105) may receive floating-point instruction codes from either or both of the multiple CPUs (10, 110) in the multiprocessor (2, 102), and includes circuitry (52) for decoding the floating-point instructions for execution by its execution circuitry (65). A dispatch unit (56) in the floating-point unit (5, 105) performs arbitration between floating-point instructions if more than one of the CPUs (10, 110) is forwarding instructions to the floating-point unit (5, 105) at the same time. Dedicated register banks, preferably in the form of stacks (60), are provided in the floating-point unit (5, 105).Type: GrantFiled: May 15, 1997Date of Patent: November 14, 2000Assignee: Texas Instruments IncorporatedInventors: Tuan Q. Dao, Donald E. Steiss
-
Patent number: 6134645Abstract: Each execution unit within a superscalar processor has an associated completion table that contains a copy of the status of all instructions dispatched but not completed. A central completion table maintains the status of every dispatched instruction as reported by the dispatch unit and the individual execution units. Execution units send finish signals to the completion table responsible for retiring a particular type of instruction. The central completion table retires instructions that may cause an interrupt and instructions whose results may target the same register. The execution units' associated completion tables retire the balance of the instructions and the execution units send instruction status to the central completion table and to each execution unit. This reduces the number of instructions that are retired by the central completion table, increasing the number of instructions retired per clock cycle.Type: GrantFiled: June 1, 1998Date of Patent: October 17, 2000Assignee: International Business Machines CorporationInventor: Dung Quoc Nguyen
-
Patent number: 6134646Abstract: In a processor, store instructions are divided or cracked into store data and store address generation portions for separate and parallel execution within two execution units. The address generation portion of the store instruction is executed within the load store unit, while the store data portion of the instruction is executed in an execution unit other than the load store unit. If the store instruction is a fixed point execution unit, then the store data portion is executed within the fixed point unit. If the store instruction is a floating point store instruction, then the store data portion of the store instruction is executed within the floating point unit. The store instruction is completed when all older instructions have completed and when all instructions in the instruction group have finished.Type: GrantFiled: July 29, 1999Date of Patent: October 17, 2000Assignee: International Business Machines Corp.Inventors: Kurt Alan Feiste, Tai Dinh Ngo, Amy May Tuvell
-
Patent number: 6134647Abstract: A data processing system includes a plurality of nodes, a serial data bus interconnecting the nodes in series in a closed loop for passing address and data information, and at least one processing node. In one construction, this processing node has a processor, a printed circuit board, a memory partitioned into first and second sections and a local bus connecting the processor, a block sharable memory section of the memory, and the printed circuit board. The local bus is used for transferring data in parallel from the processor to a directly sharable memory section of the memory on the printed circuit board and for transferring data from the block sharable memory to the printed circuit board.Type: GrantFiled: April 14, 1999Date of Patent: October 17, 2000Assignee: Sun Microsystems, Inc.Inventors: John D. Acton, Michael D. Derbish, Gavin G. Gibson, Jack M. Hardy, Jr., Hugh M. Humphreys, Steven P. Kent, Steven E. Schelong, Ricardo Yong, William B. DeRolf
-
Patent number: 6134651Abstract: A reorder buffer is configured into multiple lines of storage, wherein a line of storage includes sufficient storage for instruction results regarding a predefined maximum number of concurrently dispatchable instructions. A line of storage is allocated whenever one or more instructions are dispatched. A microprocessor employing the reorder buffer is also configured with fixed, symmetrical issue positions. The symmetrical nature of the issue positions may increase the average number of instructions to be concurrently dispatched and executed by the microprocessor. The average number of unused locations within the line decreases as the average number of concurrently dispatched instructions increases. One particular implementation of the reorder buffer includes a future file. The future file comprises a storage location corresponding to each register within the microprocessor.Type: GrantFiled: December 10, 1999Date of Patent: October 17, 2000Assignee: Advanced Micro Devices, Inc.Inventors: David B. Witt, Thang M. Tran
-
Patent number: 6128722Abstract: An apparatus for integer exception register (XER) renaming and methods of using the same are implemented. In a central processing unit (CPU) having a pipelined architecture, integer instructions that use or update the XER may be executed out-of-order using the XER renaming mechanism. Any instruction that updates the XER has an associated instruction identifier (IID) stored in a register. Subsequent instructions that use data in the XER use the stored IID to determine when the XER data has been updated by the execution of the instruction corresponding to the stored IID. As each instruction that updates XER data is executed, the data is stored in an XER rename buffer. Instructions using XER data then obtain the updated, valid, XER data from the rename buffer. In this way, these instructions can obtain valid XER data prior to completion of the preceding instructions. The XER data is retrieved from the XER rename buffer by indexing into the buffer by using an index derived from the stored IID.Type: GrantFiled: February 13, 1998Date of Patent: October 3, 2000Assignee: International Business Machines CorporationInventors: Richard Edmund Fry, Dung Quoc Nguyen, Albert Thomas Williams
-
Patent number: 6128721Abstract: A processor method and apparatus. The processor has an execution pipeline, a register file and a controller. The execution pipeline is for executing an instruction and has a first stage for generating a first result and a last stage for generating a final result. The register file is for storing the first result and the final result. The controller makes the first result stored in the register file available in the event that the first result is needed for the execution of a subsequent instruction. By storing the result of the first stage in the register file, the length of the execution pipeline is reduced from that of the prior art. Furthermore, logic required for providing inputs to the execution pipeline is greatly simplified over that required by the prior art.Type: GrantFiled: November 17, 1993Date of Patent: October 3, 2000Assignee: Sun Microsystems, Inc.Inventors: Robert Yung, William N. Joy, Marc Tremblay
-
Patent number: 6128723Abstract: A high-performance, superscalar-based computer system with out-of-order instruction execution for enhanced resource utilization and performance throughput. The computer system fetches a plurality of fixed length instructions with a specified, sequential program order (in-order). The computer system includes an instruction execution unit including a register file, a plurality of functional units, and an instruction control unit for examining the instructions and scheduling the instructions for out-of-order execution by the functional units. The register file includes a set of temporary data registers that are utilized by the instruction execution control unit to receive data results generated by the functional units. The data results of each executed instruction are stored in the temporary data registers until all prior instructions have been executed, thereby retiring the executed instructions in-order.Type: GrantFiled: May 11, 1999Date of Patent: October 3, 2000Assignee: Seiko Epson CorporationInventors: Le Trong Nguyen, Derek J. Lentz, Yoshiyuki Miyayama, Sanjiv Garg, Yasuaki Hagiwara, Johannes Wang, Te-Li Lau, Sze-Shun Wang, Quang H. Trang
-
Patent number: 6122727Abstract: An instruction queue is physically divided into two (or more) instruction queues. Each instruction queue is configured to store a dependency vector for each instruction operation stored in that instruction queue. The dependency vector is evaluated to determine if the corresponding instruction operation may be scheduled for execution. Instruction scheduling logic in each physical queue may schedule instruction operations based on the instruction operations stored in that physical queue independent of the scheduling logic in other queues. The instruction queues evaluate the dependency vector in portions, during different phases of the clock. During a first phase, a first instruction queue evaluates a first portion of the dependency vectors and generates a set of intermediate scheduling request signals. During a second phase, the first instruction queue evaluates a second portion of the dependency vector and the intermediate scheduling request signal to generate a scheduling request signal.Type: GrantFiled: August 24, 1998Date of Patent: September 19, 2000Assignee: Advanced Micro Devices, Inc.Inventor: David B. Witt
-
Patent number: 6122725Abstract: A method and apparatus are provided for executing scalar packed data instructions. According to one aspect of the invention, a processor includes a plurality of registers, a register renaming unit coupled to the plurality of registers, and a decoder coupled to the register renaming unit. The register renaming unit provides an architectural register file to store packed data operands each of which include a plurality of data elements. The decoder is configured to decode a first and second set of instructions (e.g., a set of full-width packed data instructions and a set of partial-width packed data instructions) that each specify one or more registers in the architectural register file. Each of the instructions in the first set of instructions specify operations to be performed on all of the data elements stored in the one or more specified registers.Type: GrantFiled: March 31, 1998Date of Patent: September 19, 2000Assignee: Intel CorporationInventors: Patrice Roussel, Ticky Thakkar, Mohammad A. Abdallah, Vladimir Pentkovski, James Coke
-
Patent number: 6122721Abstract: A reservation station with format conversion logic enables the implementation of a superscalar computer processing system which incorporates both a floating point functional unit and non-floating point functional units. By converting operand data in a floating point reservation station from external formats to an internal floating point format, a system incorporating such a floating point reservation station enables the representation of operand data in uniform external formats outside floating point arithmetic units (e.g., in a reorder buffer, on operand and result busses, and within non-floating functional units) while also enabling the use of a specialized internal representation (internal floating point format) within floating point arithmetic units.Type: GrantFiled: March 1, 1999Date of Patent: September 19, 2000Assignee: Advanced Micro Devices, Inc.Inventors: Michael D. Goddard, Kelvin D. Goveas, Norman Bujanos
-
Patent number: 6115812Abstract: An apparatus and method for performing vertical parallel operations on packed data is described. A first set of data operands and a second set of data operands are accessed. Each of these sets of data represents graphical data stored in a first format. The first set of data operands is convereted into a converted set and the second set of data operands is replicated to generate a replicated set. A vertical matrix multiplication is performed on the converted set and the replicated set to generate transformed graphical data.Type: GrantFiled: April 1, 1998Date of Patent: September 5, 2000Assignee: Intel CorporationInventors: Mohammad Abdallah, Thomas Huff, Gregory C. Parrish, Shreekant S. Thakkar
-
Patent number: 6112289Abstract: A data processor comprises an instruction decoding unit having two decoders decoding respective instructions of an instruction group consisting of a plurality of instructions including a first instruction and a second instruction succeeding the first instruction, and a judging unit judging whether or not a combination of the first instruction and the second instruction can be executed in parallel and a bus for transferring two data in parallel between an operand access unit and an integer operation unit. The data processor uses a superscalar technique. Two instructions having an operand interference can be executed in parallel at high speed and two instructions accessing a memory can be executed in parallel without considerable hardware.Type: GrantFiled: August 28, 1998Date of Patent: August 29, 2000Assignee: Mitsubishi Denki Kabushiki KaishaInventor: Masahito Matsuo
-
Patent number: 6108760Abstract: A method and an apparatus for position independent reconfiguration in a network of multiple context processing elements are provided. Wach multiple context processing element in a networked array of multiple context processing elements has an assigned physical identification. Virtual identifications may also be assigned to a number of the multiple context processing elements. Data is transmitted to at least one of the multiple context processing elements of the array, the data comprising control data, configuration data, an address mask, and a destination identification. The transmitted address mask is applied to either the physical or virtual identification and to a destination identification. The masked physical or virtual identification is compared to the masked destination identification.Type: GrantFiled: October 31, 1997Date of Patent: August 22, 2000Assignee: Silicon SpiceInventors: Ethan Mirsky, Robert French, Ian Eslick
-
Patent number: 6106573Abstract: A microprocessor implements an instruction tracing mechanism that saves the state of the microprocessor without special hardware. Prior to the execution of a traced instruction, a trace microcode routine is implemented that saves the state of the microprocessor to external memory. The state information saved by the trace microcode routine varies depending upon the amount of data needed by the end user. After the state of the processor has been saved, the trace instruction is executed. State information that changed during the execution of the trace instruction is saved to memory prior to a subsequent instruction. The trace instruction mechanism advantageously requires minimal special hardware and expedites the saving of the processor state information.Type: GrantFiled: May 14, 1999Date of Patent: August 22, 2000Assignee: Advanced Micro Devices, Inc.Inventors: Rupaka Mahalingaiah, James K. Pickett
-
Patent number: 6108769Abstract: A dependency table stores a reorder buffer tag for each register. The stored reorder buffer tag corresponds to the last of the instructions within the reorder buffer (in program order) to update the register. Otherwise, the dependency table indicates that the value stored in the register is valid. When operand fetch is performed for a set of concurrently decoded instructions, dependency checking is performed including checking for dependencies between the set of concurrently decoded instructions as well as accessing the dependency table to select the reorder buffer tag stored therein. Either the reorder buffer tag of one of the concurrently decoded instructions, the reorder buffer tag stored in the dependency table, the instruction result corresponding to the stored reorder buffer tag, or the value from the register itself is forwarded as the source operand for the instruction. Information from the comparators and the information stored in the dependency table is sufficient to select which value is forwarded.Type: GrantFiled: May 17, 1996Date of Patent: August 22, 2000Assignee: Advanced Micro Devices, Inc.Inventors: Muralidharan S. Chinnakonda, Thang M. Tran, Wade A. Walker
-
Patent number: 6105127Abstract: A multithreaded processor for executing multiple instruction streams is provided. This multithreaded processor includes: a plurality of functional units for executing instructions; a plurality of instruction decode units, corresponding to the multiple instruction streams on a one-to-one basis, for respectively decoding an instruction, and producing an instruction issue request for designating to which functional unit the decoded instruction should be issued and requesting for the issuance of the decoded instruction to the designated functional unit; a holding unit for holding the priority level of each instruction stream; and a control unit for deciding which decoded instruction should be issued to a functional unit designated by two or more instruction issue requests at the same time, in accordance with the priority levels held by the holding unit.Type: GrantFiled: August 27, 1997Date of Patent: August 15, 2000Assignee: Matsushita Electric Industrial Co., Ltd.Inventors: Kozo Kimura, Tokuzo Kiyohara, Kousuke Yoshioka
-
Patent number: 6101594Abstract: A high-performance, superscalar-based computer system with out-of-order instruction execution for enhanced resource utilization and performance throughput. The computer system fetches a plurality of fixed length instructions with a specified, sequential program order (in-order). The computer system includes an instruction execution unit including a register file, a plurality of functional units, and an instruction control unit for examining the instructions and scheduling the instructions for out-of-order execution by the functional units. The register file includes a set of temporary data registers that are utilized by the instruction execution control unit to receive data results generated by the functional units. The data results of each executed instruction are stored in the temporary data registers until all prior instructions have been executed, thereby retiring the executed instructions in-order.Type: GrantFiled: May 11, 1999Date of Patent: August 8, 2000Assignee: Seiko Epson CorporationInventors: Le Trong Nguyen, Derek J. Lentz, Yoshiyuki Miyayama, Sanjiv Garg, Yasuaki Hagiwara, Johannes Wang, Te-Li Lau, Sze-Shun Wang, Quang H. Trang
-
Patent number: 6101595Abstract: An instruction fetch unit that employs sequential way prediction. The instruction fetch unit comprises a control unit configured to convey a first index and a first way to an instruction cache in a first clock cycle. The first index and first way select a first group of contiguous instruction bytes within the instruction cache, as well as a corresponding branch prediction block. The branch prediction block is stored in a branch prediction storage, and includes a predicted sequential way value. The control unit is further configured to convey a second index and a second way to the instruction cache in a second clock cycle succeeding the first clock cycle. This second index and second way select a second group of contiguous instruction bytes from the instruction cache.Type: GrantFiled: February 8, 1999Date of Patent: August 8, 2000Assignee: Advanced Micro Devices, Inc.Inventors: James K. Pickett, Thang M. Tran
-
Patent number: 6098168Abstract: A mechanism structured to check for instruction collisions at the Dispatch Unit rather than the Completion Unit. In processors which issue multiple commands simultaneously, a flag bit is sent to the Completion Unit and attached to the instruction in the queue that follows the other in program order if they both have the same targeted address. When the instructions from position 1 and position 2 of the instruction queue are ready to issue, the Completion Unit checks position 2 for a flag bit. If there is a bit, then the instruction in position 1 is discarded and the instruction in position 2 is written to the target address. If there is no flag bit with the instruction in position 2, the instruction in position 1 is written to the target register. This method eliminates the need to compare all the targeted addresses that are associated with the rename registers. It requires two comparisons instead of a minimum of 15 comparisons.Type: GrantFiled: March 24, 1998Date of Patent: August 1, 2000Assignee: International Business Machines CorporationInventors: Lee Evan Eisen, Michael Putrino
-
Patent number: 6094719Abstract: In an out-of-order processor having single-precision floating-point registers aliased into double-precision floating-point registers, a single-precision floating-point arithmetic operation having four possible register dependencies is converted into two microinstructions which are processed normally within the processor. The first microinstruction is coded to perform the arithmetic operation specified by the single-precision instruction using the first and second source registers specified and storing the result in a phantom register. The second microinstruction is coded for merging the contents of the phantom register and the destination register specified. Each microinstruction has at most two possible register dependencies, thereby reducing the total number of register dependencies which the processor is required to track.Type: GrantFiled: June 25, 1997Date of Patent: July 25, 2000Assignee: Sun Microsystems, Inc.Inventor: Ramesh Panwar
-
Patent number: 6094716Abstract: An apparatus for accelerating move operations includes a lookahead unit which detects move instructions prior to the execution of the move instructions (e.g. upon selection of the move operations for dispatch within a processor). Upon detecting a move instruction, the lookahead unit signals a register rename unit, which reassigns the rename register associated with the source register to the destination register. In one particular embodiment, the lookahead unit attempts to accelerate moves from a base pointer register to a stack pointer register (and vice versa). An embodiment of the lookahead unit generates lookahead values for the stack pointer register by maintaining cumulative effects of the increments and decrements of previously dispatched instructions. The cumulative effects of the increments and decrements prior to a particular instruction may be added to a previously generated value of the stack pointer register to generate a lookahead value for that particular instruction.Type: GrantFiled: July 14, 1998Date of Patent: July 25, 2000Assignee: Advanced Micro Devices, Inc.Inventor: David B. Witt
-
Patent number: 6092175Abstract: A method and organization for implementing the registers required in a computer system supporting multithreading and dynamic out-of-order execution. Multithreaded computer systems are those in which the processor supports multiple contexts (threads), and either rapid context switching from thread to thread or scheduling of instructions from different threads within a single cycle. An important component of processors for such systems is the register file; the processor needs a large register file or resource to provide the registers used for the threads. One form of the invention maintains a set of private architecturally specified registers, and a set of private renaming register for each different thread. In the other three embodiments, sharing of renaming registers between different threads is permitted, to enable a reduction in the total number of registers required.Type: GrantFiled: April 2, 1998Date of Patent: July 18, 2000Assignee: University of WashingtonInventors: Henry M. Levy, Susan J. Eggers, Jack Lo, Dean M. Tullsen