Superscalar Patents (Class 712/23)
-
Publication number: 20010034824Abstract: A simultaneous and redundantly threaded, pipelined processor executes the same set of instructions simultaneously as two separate threads to provide fault tolerance. One thread is processed ahead of the other thread so that the instructions in one thread are processed through the processor's pipeline ahead of the corresponding instructions from the other thread. The thread, whose instructions are processed earlier, places its committed stores in a store queue. Subsequently, the second thread places its committed stores in the store queue. A compare circuit periodically scans the store queue for matching store instructions. If otherwise matching store instructions differ in any way (address or data), then a fault has occurred in the processing and the compare circuits initiates fault recovery. If comparison of the two instructions reveals they are identical, the compare circuit allows only a single store instruction to pass to the data cache or the system main memory.Type: ApplicationFiled: April 19, 2001Publication date: October 25, 2001Inventors: Shubhendu S. Mukherjee, Steven K. Reinhardt
-
Patent number: 6308259Abstract: An instruction queue is physically divided into two (or more) instruction queues. Each instruction queue is configured to store a dependency vector for each instruction operation stored in that instruction queue. The dependency vector is evaluated to determine if the corresponding instruction operation may be scheduled for execution. Instruction scheduling logic in each physical queue may schedule instruction operations based on the instruction operations stored in that physical queue independent of the scheduling logic in other queues. The instruction queues evaluate the dependency vector in portions, during different phases of the clock. During a first phase, a first instruction queue evaluates a first portion of the dependency vectors and generates a set of intermediate scheduling request signals. During a second phase, the first instruction queue evaluates a second portion of the dependency vector and the intermediate scheduling request signal to generate a scheduling request signal.Type: GrantFiled: July 25, 2000Date of Patent: October 23, 2001Assignee: Advanced Micro Devices, Inc.Inventor: David B. Witt
-
Patent number: 6304962Abstract: A method and apparatus for prefetching superblocks in a computer processing system having a fetch mechanism for fetching instructions for execution includes the step of controlling the fetch mechanism to begin fetching at a starting address of a current superblock. A superblock includes a set of instructions in consecutive address locations terminated by a branch instruction known to have been taken. A Superblock Target Buffer (STB) is supplied with the starting address of the current superblock. The STB has a plurality of entries each indexed by a starting address of a superblock and including a run length of the superblock and a target address of the terminating branch of the superblock. The run length corresponds to the sum of a length of the terminating branch and the difference between a starting address of the terminating branch of the superblock and the starting address of the superblock.Type: GrantFiled: June 2, 1999Date of Patent: October 16, 2001Assignee: International Business Machines CorporationInventor: Ravindra K. Nair
-
Patent number: 6304953Abstract: One embodiment of the present invention is a computer processor that includes a first scheduler adapted to dispatch a first type of computer instructions, and a second scheduler coupled to the first scheduler and adapted to dispatch a second type of computer instructions. The first type of instructions all have a first latency and the second type of instructions all have a second latency. The first scheduler is skewed relative to the second scheduler so that when the first scheduler dispatches one of the first type of computer instructions having a first latency, the second scheduler will dispatch one of the second type of computer instructions that is dependent on the first type of computer instruction at a time equal to the first latency.Type: GrantFiled: July 31, 1998Date of Patent: October 16, 2001Assignee: Intel CorporationInventors: Alexander Paul Henstrom, David J. Sager
-
Patent number: 6304954Abstract: Three parallel instruction processing pipelines of a microprocessor share two data memory ports for obtaining operands and writing back results. Since a significant proportion of the instructions of a typical computer program do not require reading operands from the memory, the probability is high that at least one of any three program instructions to be executed at the same time need not fetch an operand from memory. The two memory ports are thus connected at any given time with the two of the three pipelines which are processing instructions that require memory access, the pipeline without access to the memory processing an instruction that does not need it. To do so, the added third pipeline need not have all the same resources as the other two pipelines, so its stages are made to have a reduced capability in order to save space and reduce power consumption.Type: GrantFiled: September 11, 1998Date of Patent: October 16, 2001Assignee: Rise Technology CompanyInventor: Kenneth K. Munson
-
Patent number: 6298436Abstract: A method and system for atomic memory accesses in a processor system, wherein the processor system is able to issue and execute multiple instructions out of order with respect to a particular program order. A first reservation instruction is speculatively issued to an execution unit of the processor system. Upon issuance, instructions queued for the execution unit which occur after the first reservation instruction in the program order are flushed from the execution unit, in response to detecting any previously executed reservation instructions in the execution unit which occur after the first reservation instruction in the program order.Type: GrantFiled: June 8, 1999Date of Patent: October 2, 2001Assignee: International Business Machines CorporationInventors: James Allan Kahle, Hung Qui Le, Larry Edward Thatcher, David James Shippy
-
Patent number: 6295598Abstract: A split directory-based cache coherency technique utilizes a secondary directory in memory to implement a bit mask used to indicate when more than one processor cache in a multi-processor computer system contains the same line of memory which thereby reduces the searches required to perform the coherency operations and the overall size of the memory needed to support the coherency system. The technique includes the attachment of a “coherency tag” to a line of memory so that its status can be tracked without having to read each processor's cache to see if the line of memory is contained within that cache. In this manner, only relatively short cache coherency commands need be transmitted across the communication network (which may comprise a Sebring ring) instead of across the main data path bus thus freeing the main bus from being slowed down by cache coherency data transmissions while removing the bandwidth limitations inherent in other cache coherency techniques.Type: GrantFiled: June 30, 1998Date of Patent: September 25, 2001Assignee: SRC Computers, Inc.Inventors: Jonathan L. Bertoni, Lee A. Burton
-
Patent number: 6292884Abstract: A reorder buffer is provided which stores a last in buffer (LIB) indication corresponding to each instruction. The last in buffer indication indicates whether or not the corresponding instruction is last, in program order, of the instructions within the buffer to update the storage location defined as the destination of that instruction. The LIB indication is included in the dependency checking comparisons. A dependency is indicated for a given source operand and a destination operand within the reorder buffer if the operand specifiers match and the corresponding LIB indication indicates that the instruction corresponding to the destination operand is last to update the corresponding storage location. At most one of the dependency comparisons for a given source operand can indicate dependency. According to one embodiment, the reorder buffer employs a line-oriented configuration.Type: GrantFiled: December 30, 1999Date of Patent: September 18, 2001Assignee: Advanced Micro Devices, Inc.Inventors: Thang M. Tran, David B. Witt
-
Patent number: 6292882Abstract: In one aspect, the invention includes an apparatus for filtering instructions within a digital system that eliminates the need to physically switch the valid instructions onto consecutive data lines of a buffer. The apparatus includes a filter for filtering instructions within a digital system. The filter includes an address generator capable of generating at least two addresses in response to receiving at least two micro-operations. The filter also includes a logic circuit coupled to the address generator. The logic circuit filters addresses corresponding to valid micro-operations in response to assessing the state of a portion of each of the micro-operations. In a second aspect, the invention includes a method for filtering instructions within a digital system that eliminates the need to physically switch the valid instructions onto consecutive data lines of a buffer. The method includes, generating at least two addresses in response to receiving at least two micro-operations.Type: GrantFiled: December 10, 1998Date of Patent: September 18, 2001Assignee: Intel CorporationInventors: Nazar A. Zaidi, Umair A. Khan
-
Patent number: 6289433Abstract: A register renaming system for out-of-order execution of a set of reduced instruction set computer instructions having addressable source and destination register fields, adapted for use in a computer having an instruction execution unit with a register file accessed by read address ports and for storing instruction operands. A data dependence check circuit is included for determining data dependencies between the instructions. A tag assignment circuit generates one of more tags to specify the location of operands, based on the data dependencies determined by the data dependence check circuit. A set of register file port multiplexers select the tags generated by the tag assignment circuit and pass the tags onto the read address ports of the register file for storing execution results.Type: GrantFiled: June 10, 1999Date of Patent: September 11, 2001Assignee: Transmeta CorporationInventors: Sanjiv Garg, Kevin Ray Iadonato, Le Trong Nguyen, Johannes Wang
-
Patent number: 6286095Abstract: A computer apparatus incorporating special instructions to force load and store operations to execute in program order. The present invention provides a new and novel store instruction that is suspended until all prior store instructions have been completed by an associated CPU. Also, a new load instruction is provided which blocks any subsequent load instructions from executing until this load instruction has been completed by an associated CPU. These instructions allow for high efficiency computer systems to be implemented which optimize instruction throughput by executing subsequent instructions while waiting for a prior instruction to complete.Type: GrantFiled: September 26, 1995Date of Patent: September 4, 2001Assignee: Hewlett-Packard CompanyInventors: Dale C. Morris, Barry J. Flahive, Michael L. Ziegler, Jerome C. Huck, Stephen G. Burger, Ruby B. L. Lee, Bernard L. Stumpf, Jeff Kurtze
-
Patent number: 6282629Abstract: A pipelined processor includes an instruction box including a register mapper, to map register operand fields of a set of instructions and an instruction scheduler, fed by said set of instructions, to reorder the issuance of said set of instructions from said instruction processor. The mapped register operand fields are associated with the corresponding instructions of said reordered set of instructions prior to issuance of the instructions. The processor further includes a branch prediction table which maps a stored pattern of past histories associated with a branch instruction to a more likely prediction direction of the branch instruction. The processor further includes a memory reference tagging store associated with the instruction scheduler so that the scheduler can reorder memory reference instructions without knowing the actual memory location addressed by the memory reference instruction.Type: GrantFiled: March 30, 1999Date of Patent: August 28, 2001Assignee: Compaq Computer CorporationInventor: David J. Sager
-
Patent number: 6282630Abstract: The high-performance, RISC core based microprocessor architecture includes an instruction fetch unit for fetching instruction sets from an instruction store and an execution unit that implements the concurrent execution of a plurality of instructions through a parallel array of functional units. The fetch unit generally maintains a predetermined number of instructions in an instruction buffer. The execution unit includes an instruction selection unit, coupled to the instruction buffer, for selecting instructions for execution, and a plurality of functional units for performing instruction specified functional operations. A unified instruction scheduler, within the instruction selection unit, initiates the processing of instructions through the functional units when instructions are determined to be available for execution and for which at least one of the functional units implementing a necessary computational function is available.Type: GrantFiled: September 10, 1999Date of Patent: August 28, 2001Assignee: Seiko Epson CorporationInventors: Le Trong Nguyen, Derek J. Lentz, Yoshiyuki Miyayama, Sanjiv Garg, Yasuaki Hagiwara, Johannes Wang, Te-Li Lau, Sze-Shun Wang, Quang H. Trang
-
Patent number: 6279099Abstract: An optimized, superscalar microprocessor architecture for supporting graphics operations in addition to the standard microprocessor integer floating point operations is provided. Independent execution paths are provided for different graphics instructions to allow parallel execution of instructions which commonly occur together. The invention also optimizes the use of register file accesses to avoid, as much as possible, interference between graphics instructions needing to access a register file and other instruction accesses which would occur in combination with graphics instructions, thereby avoiding pipeline stalls and allowing parallel execution.Type: GrantFiled: August 25, 2000Date of Patent: August 21, 2001Assignee: Sun Microsystems, Inc.Inventors: Timothy J. Van Hook, Leslie D. Kohn, Robert Yung
-
Patent number: 6272617Abstract: A system and method for performing register renaming of source registers in a processor having a variable advance instruction window for storing a group of instructions to be executed by the processor, wherein a new instruction is added to the variable advance instruction window when a location becomes available. A tag is assigned to each instruction in the variable advance instruction window. The tag of each instruction to leave the window is assigned to the next new instruction to be added to it. The results of instructions executed by the processor are stored in a temp buffer according to their corresponding tags to avoid output and anti-dependencies. The temp buffer therefore permits the processor to execute instructions out of order and in parallel. Data dependency checks for input dependencies are performed only for each new instruction added to the variable advance instruction window and register renaming is performed to avoid input dependencies.Type: GrantFiled: September 17, 1999Date of Patent: August 7, 2001Assignee: Seiko Epson CorporationInventors: Trevor A. Deosaran, Sanjiv Garg, Kevin R. Iadonato
-
Patent number: 6272520Abstract: A method for detecting thread switch conditions provides first and second scoreboard bits for each register in a register file. The first scoreboard bit associated with a register is set when a load is generated to return data to the register. The second scoreboard bit is set if the load misses in a selected processor cache. Register read instructions are monitored, and a thread switch condition is indicated when a register read instruction to the register is detected while its first and second scoreboard bits are set.Type: GrantFiled: December 31, 1997Date of Patent: August 7, 2001Assignees: Intel Corporation, Hewlette PackardInventors: Harshvardhan Sharangpani, Rajiv Gupta, Judge K. Arora
-
Patent number: 6272619Abstract: A high-performance, superscalar-based computer system with out-of-order instruction execution for enhanced resource utilization and performance throughput. The computer system fetches a plurality of fixed length instructions with a specified, sequential program order (in-order). The computer system includes an instruction execution unit including a register file, a plurality of functional units, and an instruction control unit for examining the instructions and scheduling the instructions for out-of-order execution by the functional units. The register file includes a set of temporary data registers that are utilized by the instruction execution control unit to receive data results generated by the functional units. The data results of each executed instruction are stored in the temporary data registers until all prior instructions have been executed, thereby retiring the executed instructions in-order.Type: GrantFiled: November 10, 1999Date of Patent: August 7, 2001Assignee: Seiko Epson CorporationInventors: Le Trong Nguyen, Derek J. Lentz, Yoshiyuki Miyayama, Sanjiv Garg, Yasuaki Hagiwara, Johannes Wang, Te-Li Lau, Sze-Shun Wang, Quang H. Trang
-
Patent number: 6269436Abstract: A microprocessor is provided which is configured to predict return addresses for return instructions according to a return stack storage included therein. The return stack storage is a stack structure configured to store return addresses associated with previously detected call instructions. Return addresses may be predicted for return instructions early in the instruction processing pipeline of the microprocessor. In one embodiment, the return stack storage additionally stores a call tag and a return tag with each return address. The call tag and return tag respectively identify call and return instructions associated with the return address. These tags may be compared to a branch tag conveyed to the return prediction unit upon detection of a branch misprediction. The results of the comparisons may be used to adjust the contents of the return stack storage with respect to the misprediction. The microprocessor may continue to predict return addresses correctly following a mispredicted branch instruction.Type: GrantFiled: September 8, 1999Date of Patent: July 31, 2001Assignee: Advanced Micro Devices, Inc.Inventors: Thang M. Tran, Rupaka Mahalingaiah
-
Patent number: 6266761Abstract: A method and system in an information processing system are disclosed for efficiently maintaining copies of values stored within a plurality of registers. The information processing system includes first circuitry, second circuitry, and a plurality of buffers. The first circuitry processes an execution state of a first type of instruction which always specifies a destination of at least one of a first type of register or a second type of register, and which outputs first information in response thereto. The first circuitry also processes an execution stage of a second type of instruction which always specifies a destination of only a third type of register, and outputs second information in response thereto.Type: GrantFiled: June 12, 1998Date of Patent: July 24, 2001Assignees: International Business Machines Corporation, Motorola, Inc.Inventors: Michael David Carlson, Thomas Alan Hoy, Terence Matthew Potter, David Domenic Putti
-
Patent number: 6266744Abstract: A processor employing a dependency link file. Upon detection of a load which hits a store for which store data is not available, the processor allocates an entry within the dependency link file for the load. The entry stores a load identifier identifying the load and a store data identifier identifying a source of the store data. The dependency link file monitors results generated by execution units within the processor to detect the store data being provided. The dependency link file then causes the store data to be forwarded as the load data in response to detecting that the store data is provided. The latency from store data being provided to the load data being forwarded may thereby be minimized. Particularly, the load data may be forwarded without requiring that the load memory operation be scheduled.Type: GrantFiled: May 18, 1999Date of Patent: July 24, 2001Assignee: Advanced Micro Devices, Inc.Inventors: William Alexander Hughes, Derrick R. Meyer
-
Patent number: 6266765Abstract: A system for issuing a family of instructions during a single clock includes a decoder for decoding the family of instructions and logic, responsive to the decode result, for determining whether resource conflicts would occur if the family were issued during one clock. If no resource conflicts occur, an execution unit executes the family regardless of whether dependencies among the instructions in the family exist.Type: GrantFiled: July 7, 2000Date of Patent: July 24, 2001Inventor: Robert W. Horst
-
Patent number: 6263416Abstract: In a superscalar processor, multiple instructions are executed in parallel to obtain multiple execution results, and the multiple execution results are stored in a working register file. Each execution result in the working register file has at least one status bit associated therewith which identifies the execution result as valid data. The multiple execution results contained in the working register data then retired by changing the status bits associated with each execution result to identify the execution result as final data. In this manner, the speculative data is retired as the final data without data movement of the speculative data, thus reducing a number of ports needed in the superscalar processor.Type: GrantFiled: June 27, 1997Date of Patent: July 17, 2001Assignee: Sun Microsystems, Inc.Inventor: Rajasekhar Cherabuddi
-
Patent number: 6256722Abstract: A data processing system comprises a plurality of nodes and a serial data bus interconnecting the nodes in series in a closed loop, for passing address and data information. At least one processing node includes a processor, a printed circuit board and a memory which is partitioned into a plurality of sections, including a first section for directly sharable memory located on the printed circuit board, and a second section for block sharable memory. A local bus connects the processor, block sharable memory and printed circuit board, for transferring data in parallel from the processor to the directly sharable memory on the printed circuit board, and for transferring data from the block sharable memory to the printed circuit board.Type: GrantFiled: December 13, 1999Date of Patent: July 3, 2001Assignee: Sun Microsystems, Inc.Inventors: John D. Acton, Michael D. Derbish, Gavin G. Gibson, Jack M. Hardy, Jr., Hugh M. Humphreys, Steven P. Kent, Steven E. Schelong, Ricardo Yong, William B. DeRolf
-
Patent number: 6256720Abstract: A high-performance, superscalar-based computer system with out-of-order instruction execution for enhanced resource utilization and performance throughput. The computer system fetches a plurality of fixed length instructions with a specified, sequential program order (in-order). The computer system includes an instruction execution unit including a register file, a plurality of functional units, and an instruction control unit for examining the instructions and scheduling the instructions for out-of-order execution by the functional units. The register file includes a set of temporary data registers that are utilized by the instruction execution control unit to receive data results generated by the functional units. The data results of each executed instruction are stored in the temporary data registers until all prior instructions have been executed, thereby retiring the executed instructions in-order.Type: GrantFiled: November 9, 1999Date of Patent: July 3, 2001Assignee: Seiko Epson CorporationInventors: Le Trong Nguyen, Derek J. Lentz, Yoshiyuki Miyayama, Sanjiv Garg, Yasuaki Hagiwara, Johannes Wang, Te-Li Lau, Sze-Shun Wang, Quang H. Trang
-
Patent number: 6256721Abstract: An apparatus for accelerating move operations includes a lookahead unit which detects move instructions prior to the execution of the move instructions (e.g. upon selection of the move operations for dispatch within a processor). Upon detecting a move instruction, the lookahead unit signals a register rename unit, which reassigns the rename register associated with the source register to the destination register. In one particular embodiment, the lookahead unit attempts to accelerate moves from a base pointer register to a stack pointer register (and vice versa). An embodiment of the lookahead unit generates lookahead values for the stack pointer register by maintaining cumulative effects of the increments and decrements of previously dispatched instructions. The cumulative effects of the increments and decrements prior to a particular instruction may be added to a previously generated value of the stack pointer register to generate a lookahead value for that particular instruction.Type: GrantFiled: June 16, 2000Date of Patent: July 3, 2001Assignee: Advanced Micro Devices, Inc.Inventor: David B. Witt
-
Patent number: 6253312Abstract: An apparatus and method are provided for concurrently loading single-precision operands into registers in a microprocessor floating point register file. The apparatus includes translation logic, data logic, and write back logic. The translation logic receives a load macro instruction prescribing an address, and decodes the load macro instruction into a double load micro instruction. The double load micro instruction directs the microprocessor to retrieve the two single-precision operands from the address and to load the two single-precision operands into the two floating point registers. The data logic, coupled to the translation logic, executes the double load micro instruction and retrieves the two single-precision operands from the address. The write back logic, coupled to the data logic, loads the two single-precision operands into the two floating point registers during a single write cycle.Type: GrantFiled: August 7, 1998Date of Patent: June 26, 2001Assignee: IP First, L.L.C.Inventors: Timothy A. Elliott, G. Glenn Henry, Terry Parks
-
Patent number: 6253309Abstract: A microprocessor configured to rapidly decode variable-length instructions is disclosed. The microprocessor is configured with a predecoder and an instruction cache. The predecoder is configured to expand variable-length instructions to create fixed-length instructions by padding instruction fields within each variable-length instruction with constants until each field reaches a predetermined maximum width. The fixed-width instructions are then stored within the instruction cache and output for execution when a corresponding requested address is received. The instruction cache may store both variable- and fixed-width instructions, or just fixed-width instructions. An array of pointers may be used to access particular fixed-length instructions. The fixed-length instructions may be configured to all have the same fields and the same lengths, or they may be divided into groups, wherein instructions within each group have the same fields and the same lengths.Type: GrantFiled: September 21, 1998Date of Patent: June 26, 2001Assignee: Advanced Micro Devices, Inc.Inventor: Rupaka Mahalingaiah
-
Patent number: 6253306Abstract: Accordingly, a prefetch instruction mechanism is desired for implementing a prefetch instruction which is non-faulting, non-blocking, and non-modifying of architectural register state. Advantageously, a prefetch mechanism described herein is provided largely without the addition of substantial complexity to a load execution unit. In one embodiment, the non-faulting attribute of the prefetch mechanism is provided though use of the vector decode supplied Op sequence that activates an alternate exception handler. The non-modifying of architectural register state attribute is provided (in an exemplary embodiment) by first decoding a PREFETCH instruction to an Op sequence targeting a scratch register wherein the scratch register has scope limited to the Op sequence corresponding to the PREFETCH instruction.Type: GrantFiled: July 29, 1998Date of Patent: June 26, 2001Assignee: Advanced Micro Devices, Inc.Inventors: Amos Ben-Meir, John G. Favor
-
Patent number: 6253310Abstract: A microprocessor capable of delaying the deallocation of an arithmetic flags register is described. A system processes instructions of a first instruction set architecture which has an arithmetic flags register. The system also processes instructions of a second instruction set architecture which is not compatible with the first instruction set architecture. In order to process a first instruction of the first instruction set architecture that implicitly updates the arithmetic flags register, the arithmetic flags register shares a physical destination register with a general register containing a result for the first instruction. An instruction that does not update the arithmetic flags but would deallocate the register containing the arithmetic flags triggers the delayed deallocation mechanism of the present invention.Type: GrantFiled: December 31, 1998Date of Patent: June 26, 2001Assignee: Intel CorporationInventors: Ricardo Ramirez, Mike Morrison
-
Patent number: 6249857Abstract: In accordance with a first embodiment, a processing apparatus is provided. The processing apparatus (10) includes a register (12) including a first and a second programming instruction, a first processing unit (16) responsive to the first programming instruction, and a second processing unit (22) responsive to the second programming instruction. The second processing unit (22) includes a logarithm based processor having at least one digital logarithm converter (80), a digital logic device (82), and a digital inverse logarithm converter (84). In other embodiments, the processing apparatus (10) is incorporated into a communication device (100) and a video system (300).Type: GrantFiled: October 20, 1997Date of Patent: June 19, 2001Assignee: Motorola, Inc.Inventors: Matthew H. Klapman, Jeffrey G. Toler
-
Patent number: 6249855Abstract: An arbiter system for the instruction issue logic of a CPU has at least two encoder circuits that select instructions in an instruction queue for issue to first and second execution units, respectively, based upon the positions of the instructions within the queue and requests by the instructions for the first and/or second execution units. As a result, since the instruction can request different execution units, this system is compatible with architectures where the execution units may have different capabilities to execute different instructions, i.e., each integer execution unit may not be able to execute all of the instructions in the CPU's integer instruction set. According to the present invention, one of the encoder circuits is subordinate to the other circuit. The subordinate encoder circuit selects instructions from the instruction queue based not only on the positions of the instructions and their requests, but the instruction selection of the dominant encoder circuit.Type: GrantFiled: June 2, 1998Date of Patent: June 19, 2001Assignee: Compaq Computer CorporationInventors: James A. Farrell, Bruce A. Gieseke
-
Patent number: 6249862Abstract: A dependency table stores a reorder buffer tag for each register. The stored reorder buffer tag corresponds to the last of the instructions within the reorder buffer (in program order) to update the register. Otherwise, the dependency table indicates that the value stored in the register is valid. When operand fetch is performed for a set of concurrently decoded instructions, dependency checking is performed including checking for dependencies between the set of concurrently decoded instructions as well as accessing the dependency table to select the reorder buffer tag stored therein. Either the reorder buffer tag of one of the concurrently decoded instructions, the reorder buffer tag stored in the dependency table, the instruction result corresponding to the stored reorder buffer tag, or the value from the register itself is forwarded as the source operand for the instruction. Information from the comparators and the information stored in the dependency table is sufficient to select which value is forwarded.Type: GrantFiled: November 15, 2000Date of Patent: June 19, 2001Assignee: Advanced Micro Devices, Inc.Inventors: Muralidharan S. Chinnakonda, Thang M. Tran, Wade A. Walker
-
Patent number: 6249856Abstract: A register system for a data processor which operates in a plurality of modes. The register system provides multiple, identical banks of register sets, the data processor controlling access such that instructions and processes need not specify any given bank. An integer register set includes first (RA[23:0]) and second (RA[31:24]) subsets, and a shadow subset (RT[31:24]). While the data processor is in a first mode, instructions access the first and second subsets. While the data processor is in a second mode, instructions may access the first subset, but any attempts to access the second subset are re-routed to the shadow subset instead, transparently to the instructions, allowing system routines to seemingly use the second subset without having to save and restore data which user routines have written to the second subset.Type: GrantFiled: January 10, 2000Date of Patent: June 19, 2001Assignee: Seiko Epson CorporationInventors: Sanjiy Garg, Derek J. Lentz, Le Trong Nguyen, Sho Long Chen
-
Patent number: 6247114Abstract: A microprocessor having an instruction queue capable of out-of-order instruction dispatch and rapidly selecting one or more oldest eligible entries is disclosed. The microprocessor may comprise a plurality of instruction execution pipelines, an instruction cache, and an instruction queue coupled to the instruction cache and execution pipelines. The instruction queue may comprise a plurality of instruction storage locations and may be configured to output up to a predetermined number of non-sequential out of order instructions per clock cycle. The microprocessor may be further configured with high speed control logic coupled to the instruction queue. The control logic may comprise a number of pluralities of multiplexers, wherein the first plurality of multiplexers are configured to select a first subset of the instructions stored in the queue. The second plurality of multiplexers then select a second subset of instructions from the first subset.Type: GrantFiled: February 19, 1999Date of Patent: June 12, 2001Assignee: Advanced Micro Devices, Inc.Inventor: Jeffrey E. Trull
-
Patent number: 6247124Abstract: A computing system contains an apparatus having an instruction memory to store a plurality of lines of a plurality of instructions, and a branch memory to store a plurality of branch prediction entries, each branch prediction entry containing information for predicting whether a branch designated by a branch instruction stored in the instruction memory will be taken when the branch instruction is executed. Each branch prediction entry includes a branch target field for indicating a target address of a line containing a target instruction to be executed if the branch is taken, a destination field indicating where the target instruction is located within the line indicated by the branch target address, and a source field indicating where the branch instruction is located within the line corresponding to the target address.Type: GrantFiled: July 30, 1999Date of Patent: June 12, 2001Assignee: MIPS Technologies, Inc.Inventors: Chandra Joshi, Paul Rodman, Peter Hsu, Monica R. Nofal
-
Method and apparatus for performing branch prediction combining static and dynamic branch predictors
Patent number: 6247122Abstract: An apparatus and method for improving microprocessor performance by improving the prediction accuracy of conditional branch instructions is provided. A static branch predictor makes a prediction of the outcome of a conditional branch instruction based on the branch test type and the branch target address displacement sign. A branch history table stores a bit indicating whether the prediction of the static predictor agreed with the outcome of the last execution of the branch instruction. If the history table bit agrees, then the static prediction is used. Otherwise, the opposite of the static prediction is used.Type: GrantFiled: December 2, 1998Date of Patent: June 12, 2001Assignee: IP-First, L.L.C.Inventors: G. Glenn Henry, Terry Parks -
Patent number: 6247106Abstract: A processor employing a map unit including register renaming hardware is shown. The map unit may assign virtual register numbers to source registers by scanning instruction operations to detect intraline dependencies. Subsequently, physical register numbers are mapped to the source register numbers responsive to the virtual register numbers. The map unit may stores (e.g. in a map silo) a current lookahead state corresponding to each line of instruction operations which are processed by the map unit Additionally, the map unit stores an indication of which instruction operations within the line update logical registers, which logical registers are updated, and the physical register numbers assigned to the instruction operations. Upon detection of an exception condition for an instruction operation with a line, the current lookahead state corresponding to the line is restored from the map silo.Type: GrantFiled: July 27, 2000Date of Patent: June 12, 2001Assignee: Advanced Micro Devices, Inc.Inventor: David B. Witt
-
Patent number: 6240503Abstract: A processor is configured to generate lookahead values using a cumulative constant. The processor classifies operations to a particular register (e.g. the stack pointer register, or ESP in an embodiment employing the x86 instruction set architecture) as either accelerated or non-accelerated. For example, instructions which are defined to increment/decrement the particular register by an explicit or implicit constant value may be accelerated operations. Upon the occurrence of a non-accelerated operation, the processor may begin accumulating the cumulative effect of accelerated operations to the result of the non-accelerated operation as a cumulative offset. The result of the non-accelerated operation (upon execution thereof) may then be added to the cumulative offset values corresponding to each accelerated operation to generate the particular register value corresponding to that accelerated operation. Accordingly, dependencies upon the register due to the accelerated operations may be alleviated.Type: GrantFiled: November 12, 1998Date of Patent: May 29, 2001Assignee: Advanced Micro Devices, Inc.Inventor: David B. Witt
-
Patent number: 6237082Abstract: A reorder buffer is configured into multiple lines of storage, wherein a line of storage includes sufficient storage for instruction results regarding a predefined maximum number of concurrently dispatchable instructions. A line of storage is allocated whenever one or more instructions are dispatched. A microprocessor employing the reorder buffer is also configured with fixed, symmetrical issue positions. The symmetrical nature of the issue positions may increase the average number of instructions to be concurrently dispatched and executed by the microprocessor. The average number of unused locations within the line decreases as the average number of concurrently dispatched instructions increases. One particular implementation of the reorder buffer includes a future file. The future file comprises a storage location corresponding to each register within the microprocessor.Type: GrantFiled: August 22, 2000Date of Patent: May 22, 2001Assignee: Advanced Micro Devices, Inc.Inventors: David B. Witt, Thang M. Tran
-
Patent number: 6237076Abstract: A method and system for renaming registers of said system is proposed in which mixed instruction sets, e.g. 32 bit and 64 bit instructions are carried out concurrently in one program. In case of an instruction sequence of a preceding 64 bit instruction and one or more 32 bit instructions to be executed in-order after the 64 bit instruction and where the 32 bit instructions having a data dependence to the preceding 64 bit instruction, said rest of the register range changed by the preceding 64 bit instruction is copied to the corresponding location in a target register of the succeeding 32 bit instruction, at least if the same logical register is specified by the 32 bit instruction as it was specified by the preceding 64 bit instruction. The copy source is addressed by the register number and hold in a list (28).Type: GrantFiled: August 28, 1998Date of Patent: May 22, 2001Assignee: International Business Machines CorporationInventors: Ute Gaertner, Klaus Jörg Getzlaff, Oliver Laub, Erwin Pfeffer
-
Patent number: 6230254Abstract: The present invention provides a system and method for managing load and store operations necessary for reading from and writing to memory or I/O in a superscalar RISC architecture environment. To perform this task, a load/store unit is provided whose main purpose is to make load requests out-of-order whenever possible to get the load data back for use by an instruction execution unit as quickly as possible. A load operation can only be performed out-of-order if there are no address collisions and no write pendings. An address collision occurs when a read is requested at a memory location where an older instruction will be writing. Write pending refers to the case where an older instruction requests a store operation, but the store address has not yet been calculated. The data cache unit returns 8 bytes of unaligned data. The load/store unit aligns this data properly before it is returned to the instruction execution unit.Type: GrantFiled: November 12, 1999Date of Patent: May 8, 2001Assignee: Seiko Epson CorporationInventors: Cheryl D. Senter, Johannes Wang
-
Patent number: 6223277Abstract: A packed data structure processor (25) is disclosed. The packed data structure processor (25) includes a register file (24) of multiple registers (REG0 through REG31), each of which is connected to an input of each of a plurality of operand multiplexers (26). Each operand multiplexer (26) is associated with a shift/mask circuit (28), which permits the selection of a particular portion (e.g., BYIE, WORD, DWORD) of the contents of a selected register file, for use as an operand. An arithmetic logic unit (ALU) (30) performs data processing operations upon the operands, and presents results on writeback bus (WBBUS), to external memory (18) over a memory interface (37), or to a register file (42) associated with other circuitry (44) over a coprocessor interface (41). A destination selector (40) is capable of writing to only a selected portion of a selected register, thus permitting a packed data structure to be present within the register file (24).Type: GrantFiled: December 22, 1997Date of Patent: April 24, 2001Assignee: Texas Instruments IncorporatedInventor: Brian J. Karguth
-
Patent number: 6223278Abstract: A method for performing floating point (FP) instruction handling is provided. A floating point store status word (FSTSW) instruction is inserted within a plurality of micro-ops corresponding to a plurality of FP instructions and the plurality of micro-ops are ordered for execution. In another aspect, a processor is provided for executing a plurality of floating point (FP) instructions. The processor includes a fetcher/decoder unit to retrieve a plurality of FP instructions from a memory structure and generate a plurality of micro-ops from the FP instructions. The processor further generates a floating point store status word (FSTSW) instruction and includes a scheduler unit to re-order the micro-ops for execution.Type: GrantFiled: November 5, 1998Date of Patent: April 24, 2001Assignee: Intel CorporationInventor: Michael J. Morrison
-
Patent number: 6219778Abstract: A processor including at least one execution unit generating out-of-order results and out-of-order condition codes. Precise architectural state of the processor is maintained by providing a results buffer having a number of slots and providing a condition code buffer having the same number of slots as the results buffer, each slot in the condition code buffer in one-to-one correspondence with a slot in the results buffer. Each live instruction in the processor is assigned a slot in the results buffer and the condition code buffer. Each speculative result produced by the execution units is stored in the assigned slot in the results buffer. When an instruction is retired, the results for that instruction are transferred to an architectural result register and any condition codes generated by that instruction are transferred to an architectural condition code register.Type: GrantFiled: December 20, 1999Date of Patent: April 17, 2001Assignee: Sun Microsystems, Inc.Inventors: Ramesh Panwar, Arjun Prabhu
-
Patent number: 6219775Abstract: A massively-parallel computer includes a plurality of processing nodes and at least one control node interconnected by a network. The network faciliates the transfer of data among the processing nodes and of commands from the control node to the processing nodes. Each processing node includes an interface for transmitting data over, and receiving data and commands from, the network, at least one memory module for storing data, a node processor and an auxiliary processor. The node processor receives commands received by the interface and processes data in response thereto, in the process generating memory access requests for facilitating the retrieval of data from or storage of data in the memory module. The node processor further controlling the transfer of data over the network by the interface. The auxiliary processor is connected to the memory module and the node processor.Type: GrantFiled: March 18, 1998Date of Patent: April 17, 2001Assignee: Thinking Machines CorporationInventors: Jon P. Wade, Daniel R. Cassiday, Robert D. Lordi, Guy Lewis Steele, Jr., Margaret A. St. Pierre, Monica C. Wong-Chan, Zahi S. Abuhamdeh, David C. Douglas, Mahesh N. Ganmukhi, Jeffrey V. Hill, W. Daniel Hillis, Scott J. Smith, Shaw-Wen Yang, Robert C. Zak, Jr.
-
Patent number: 6219777Abstract: Disclosed is a register file used in a multiprocessor composition composed of a plurality of processor elements, the register file having a plurality of words and being provided for each of the plurality of processor elements, wherein: the plurality of words are divided into a word part that can be simultaneously accessed by some of the plurality of processor elements to use in common with other processor element, and a word part that can be accessed only by its own processor element.Type: GrantFiled: July 10, 1998Date of Patent: April 17, 2001Assignee: NEC CorporationInventor: Toshiaki Inoue
-
Patent number: 6219723Abstract: A system and method for thermal overload detection and protection for a processor which allows the processor to run at near maximum potential for the vast majority of its execution life. This is effectuated by the provision of circuitry to detect when the processor has exceeded its thermal thresholds and which then causes the processor to automatically reduce the clock rate to a fraction of the nominal clock while execution continues. When the thermal condition has stabilized, the clock may be raised in a stepwise fashion back to the nominal clock rate. Throughout the period of cycling the clock frequency from nominal to minimum and back, the program continues to be executed. Also provided is a queue activity rise time detector and method to control the rate of acceleration of a functional unit from idle to full throttle by a localized stall mechanism at the boundary of each stage in the pipe.Type: GrantFiled: March 23, 1999Date of Patent: April 17, 2001Assignee: Sun Microsystems, Inc.Inventors: Ricky C. Hetherington, Ramesh Panwar
-
Patent number: 6219833Abstract: The compilation of source code to a primary and a secondary processor. The method relates to reconfigurable secondary processors, and is especially relevant to secondary processors which can be reconfigured to some degree during execution of code. Selective extraction of dataflows from the source code is followed by transformation of the extracted dataflows into trees. The trees are then matched against each other to determine minimum edit cost relationships for transformation of one tree into another, where these minimum edit cost relationships are determined by the architecture of the secondary processor. A group or a plurality of groups of dataflows is determined on the basis of said minimum edit cost relationships and for each group a generic dataflow capable of supporting each dataflow in that group is created.Type: GrantFiled: December 11, 1998Date of Patent: April 17, 2001Assignee: Hewlett-Packard CompanyInventors: Charles Reed Solomon, Andrea Olgiati
-
Patent number: 6216220Abstract: The data processing system, a combination of multithreaded architecture and a VLIW (Very Long Instruction Word) processor is adapted to process plural threads. The system uses multiple program counters for context-switching only a subinstruction which causes a long latency. A method is provided for processing instructions in a data processing system having an active thread block, a ready thread block and a waiting thread block, and a instruction execution block, for processing a plurality of threads. The method includes combining instructions issued from the respective active threads into one new instruction, each active thread having a plurality of instructions, and the issued instructions being used as subinstructions in the combined one instruction. The combined instruction as processed by the instruction execution block, while tracing contexts relating to the threads which provide the respective subinstructions by using multiple program counters.Type: GrantFiled: October 8, 1998Date of Patent: April 10, 2001Assignee: Hyundai Electronics Industries Co., Ltd.Inventor: Myeong Eun Hwang
-
Patent number: 6216215Abstract: The present invention discloses a method and apparatus for implementing a senior load instruction type. An instruction requesting a memory reference is decoded. The decoded instruction is then dispatched to a memory ordering unit. The instruction is retired from a load buffer and is executed after retiring.Type: GrantFiled: April 2, 1998Date of Patent: April 10, 2001Assignee: Intel CorporationInventors: Salvador Palanca, Shekoufeh Qawami, Niranjan L. Cooray, Angad Narang, Subramaniam Maiyuran