Commitment Control Or Register Bypass Patents (Class 712/218)
  • Publication number: 20020144095
    Abstract: The present invention, in various embodiments, provides techniques for retiring instructions that typically complete early as compared to most instructions. In a first embodiment, at each stage of the various processing stages, each instruction capable of early retirement is processed in accordance with that stage. At a particular stage, if the instruction meets the criteria for early retirement, then the instruction is terminated, e.g., “retired,” and the system is updated to reflect that the instruction has been terminated. However, if, at that particular stage, the instruction does not meet the criteria for early retirement, then the instruction is processed to the next stage, and it is determined again whether the instruction meets the criteria for early retirement. If the instruction meets the criteria, then the instruction is terminated, or if the instruction does not meet the criteria, then the instruction is processed to the next stage, and so on, until the instruction is retired.
    Type: Application
    Filed: March 30, 2001
    Publication date: October 3, 2002
    Inventor: Carl D. Burch
  • Patent number: 6449710
    Abstract: The invention provides a method and system for performing instructions in a microprocessor having a set of registers, in which instructions which operate on portions of a register are recognized, and “stitching” instructions are inserted into the instruction stream to couple the instructions operating on the portions of the register. The “stitching” parcels are serialized along with other instruction parcels, so that instructions which read from or write to portions of a register can proceed independently and out of their original order, while maintaining the results of that out-or-order operation to be the same as if all instructions were performed in the original order. In a preferred embodiment, the choice of stitching parcels is optimized to the Intel x86 architecture and instruction set.
    Type: Grant
    Filed: October 29, 1999
    Date of Patent: September 10, 2002
    Assignee: STMicroelectronics, Inc.
    Inventor: David L. Isaman
  • Publication number: 20020124155
    Abstract: An architecture for a pipeline processor circuit, preferably of the VLIW type, comprises a plurality of stages and a network of forwarding paths which connect pairs of said stages, as well as a register file for operand write-back. An optimization-of-power-consumption function is provided via inhibition of writing and subsequent readings in said register file of operands retrievable from said forwarding network on account of their reduced liveness length.
    Type: Application
    Filed: October 11, 2001
    Publication date: September 5, 2002
    Applicant: STMicroelectronics S.r.l.
    Inventors: Mariagiovanna Sami, Donatella Sciuto, Cristina Silvano, Vittorio Zaccaria, Danilo Pau, Roberto Zafalon
  • Patent number: 6442678
    Abstract: In one method, a processor comprises both a speculative register file to store speculative register values and an architectural register file to store architectural register values. An output of the architectural register file is coupled to an input of the speculative register file to update the speculative register file when a misspeculation is detected.
    Type: Grant
    Filed: December 31, 1998
    Date of Patent: August 27, 2002
    Assignee: Intel Corporation
    Inventor: Judge K. Arora
  • Patent number: 6442679
    Abstract: Guard prediction apparatus for predicting guard outcomes for predicated instructions, each of which specifies a guard operator to be applied to a guard source to generate the guard outcome. The guard prediction apparatus includes a cache, availability logic, a selection circuit, a deduction circuit and write back circuitry. The cache stores previous predictions of guard outcomes for a set of guard sources and guard operators. The availability logic determines whether the cache includes a previous prediction that is relevant to a first guard source and a first guard operator and, if so, couples that previous prediction to the selection circuit. The selection circuit generates the final guard outcome prediction by selecting between the previous prediction, if available, and an initial prediction, if a previous prediction is not available. The deduction circuit deduces from the initial prediction of the guard outcome other consistent guard outcomes for a set of guard operators when applied to the guard source.
    Type: Grant
    Filed: August 17, 1999
    Date of Patent: August 27, 2002
    Assignee: Compaq Computer Technologies Group, L.P.
    Inventors: Arthur Klauser, Keith Istvan Farkas
  • Publication number: 20020116599
    Abstract: To eliminate pipeline stall due to data hazard in a superscalar system and to increase the processing speed. An instruction decoder is provided with a circuit which detects two neighboring 2-operand instructions which are equivalent to one 3-operand instruction, and a circuit which, if it is equivalent, integrates the two instructions into the 3-operand instruction and sends it to a succeeding execution stage. Or, provision is made of a circuit which sends the source data of a preceding instruction to an arithmetic unit for a succeeding instruction when the two neighboring instructions have a relationship of data flow but cannot be integrated into one 3-operand instruction. It is allowed to execute the processing of two instructions in one clock, which so far required two clocks due to data flow between the neighboring instructions. Therefore, the number of execution clocks as a whole can be decreased.
    Type: Application
    Filed: March 13, 1997
    Publication date: August 22, 2002
    Inventors: MASAHIRO KAINAGA, YASUHIKO SAITOO
  • Publication number: 20020116600
    Abstract: In a multithreaded processor, events are categorized according to which of a “soft” state clearing (“nuke”) process and a “hard” nuke process should be performed in response to each event. When an event is detected for a thread, either the soft nuke or hard nuke process is executed, according to the type of event, prior to invoking an event handler. The soft nuke process performs less than all of the actions performed by the hard nuke process and requires much less time to execute. If multiple threads are being processed, the hard nuke process requires synchronization between the threads and clears state for each thread, whereas the soft nuke process does not require cross-thread synchronization and clears state only for the thread in which the event was detected. In one embodiment, the soft nuke process is implemented in microcode, while the hard nuke process is hardware-implemented.
    Type: Application
    Filed: November 30, 2001
    Publication date: August 22, 2002
    Inventors: Lawrence O. Smith, S. Dion Rodgers
  • Patent number: 6438681
    Abstract: A computer system utilizing a processing system capable of efficiently comparing register identifiers to detect data hazards between instructions of a computer program is used to execute the computer program. The processing system utilizes at least one pipeline, a first decoder, a second decoder, and comparison logic. The pipeline receives and simultaneously processes instructions of a computer program. The first and second decoders are coupled to the pipeline and decode register identifiers associated with instructions being processed by the pipeline. The comparison logic is interfaced with the first and second decoders and respectively compares the decoded register identifiers produced by the first and second decoders to other decoded register identifiers.
    Type: Grant
    Filed: January 24, 2000
    Date of Patent: August 20, 2002
    Assignee: Hewlett-Packard Company
    Inventors: Ronny Lee Arnold, Donald Charles Soltis Jr.
  • Publication number: 20020108026
    Abstract: A data processing apparatus for increasing the speed of data transfer from one processor instruction to another processor instruction. First (78) and second (80) functional unit groups, each including a plurality of functional units, are connected to a register file (76) comprising a plurality of registers having corresponding register numbers. A comparator (181) receives an indication of the operand register number of a current instruction for a functional unit in the first functional unit group, and an indication of the destination register number of an immediately preceding instruction for the second functional unit group, and indicates whether the register numbers match.
    Type: Application
    Filed: December 8, 2000
    Publication date: August 8, 2002
    Inventors: Keith Balmer, Richard D. Simpson, Iain Robertson, John Keay
  • Patent number: 6430679
    Abstract: A pre-arbitrated bypassing system in a speculative execution microprocessor is provided. The bypassing system provides execution units enhanced to include a comparator and an enabled driver. The comparator compares a bypass address that is broadcast upon instruction decode with the destination address within each execution unit. If there is a match, then the result data is driven onto the bypass bus. Additionally, a suppress signal and validation scheme/apparatus are included to ensure that valid data is being driven onto the bypass bus. A bypass bus and associated apparatus may be included for every potential source operand.
    Type: Grant
    Filed: October 2, 1998
    Date of Patent: August 6, 2002
    Assignee: Intel Corporation
    Inventor: Jay S. Heeb
  • Patent number: 6427207
    Abstract: An apparatus is presented for expediting the execution of dependent micro instructions in a pipeline microprocessor having design characteristics-complexity, power, and timing—that are not significantly impacted by the number of stages in the microprocessor's pipeline. In contrast to conventional result distribution schemes where an intermediate result is distributed to multiple pipeline stages, the present invention provides a cache for storage of multiple intermediate results. The cache is accessed by a dependent micro instruction to retrieve required operands. The apparatus includes a result forwarding cache, result update logic, and operand configuration logic. The result forwarding cache stores the intermediate results. The result update logic receives the intermediate results as they are generated and enters the intermediate results into the result forwarding cache.
    Type: Grant
    Filed: July 20, 2001
    Date of Patent: July 30, 2002
    Assignee: I.P. First L.L.C.
    Inventors: Gerard M. Col, G. Glenn Henry
  • Patent number: 6425069
    Abstract: A method and system for optimizing execution of an instruction stream which includes a very long instruction word (VLIW) dispatch group in which ordering is not maintained is disclosed. The method and system comprises examining an access which initiated a flush operation; capturing an indice related to the flush operation; and causing all storage access instructions related to this indice to be dispatched as single-IOP groups until the indice is updated. Storage access to address space which is safe such as Guarded (G=1) or Direct Store (E=DS) must be handled in a non-speculative manner such that operations which could potentially go to volatile I/O devices or control locations that do not get processed out of order. Since the address is not known in the front end of the processor, this can only be determined by the load store unit or functional block which performs translation.
    Type: Grant
    Filed: March 5, 1999
    Date of Patent: July 23, 2002
    Assignee: International Business Machines Corporation
    Inventors: Larry Edward Thatcher, John Edward Derrick
  • Patent number: 6425072
    Abstract: An apparatus and method for implementing a register free list scheme is provided. An instruction received in an execution unit can be assigned an absolute register number as its destination register. A new physical register tag from a free list can be assigned to the absolute register number and a tag future file can be updated with the new physical register tag. The old physical register tag can be read from the tag future file and stored in a retire queue entry corresponding to the instruction along with the new physical register tag and an architectural register identifier corresponding to the absolute register number. A valid bit corresponding to the entry can be set in response to the entry being written. In response to an abort signal, a swap bit corresponding to the entry can be set, the valid bit can be reset, and the new physical register tag can be conveyed to a rename unit in response to receiving a free register request.
    Type: Grant
    Filed: August 31, 1999
    Date of Patent: July 23, 2002
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Stephan Meier, Chetana N. Keltcher
  • Publication number: 20020091913
    Abstract: By using an entry number (WRB number) of a re-order buffer 6, each of function units such as an operation unit 3, a store unit 4, a load unit 5, etc. notifies to the re-order buffer 6 the processing end for a instruction stored in the entry concerned in the unit thereof. The load unit 5 manages the latest speculation state of a load instruction issued on the basis of a branch prediction success/failure signal output from the branch unit 2, and makes no notification to the re-order buffer 6 on the basis of WRB number for subsequent load instructions of a branch-prediction failed branch instruction even when the processing of the instruction is finished. Accordingly, the re-order buffer 6 can re-use entries in which the subsequent instructions of the branch prediction failed branch instruction are stored.
    Type: Application
    Filed: January 9, 2002
    Publication date: July 11, 2002
    Applicant: NEC CORPORATION
    Inventor: Masao Fukagawa
  • Publication number: 20020091914
    Abstract: A processor is provided that includes an execution unit for executing instructions and a replay system for replaying instructions which have not executed properly. The replay system is coupled to the execution unit and includes a checker for determining whether each instruction has executed properly and a plurality of replay queues or replay queue sections coupled to the checker for temporarily storing one or more instructions for replay. In one embodiment, thread-specific replay queue sections may each be used to store a long latency instruction for each thread until the long latency instruction is ready to be executed (e.g., data for a load instruction has been retrieved from external memory). By storing the long latency instruction and its dependents in a replay queue section for one thread which has stalled, execution resources are made available for improving the speed of execution of other threads which have not stalled.
    Type: Application
    Filed: February 1, 2002
    Publication date: July 11, 2002
    Inventors: Amit A. Merchant, Darrell D. Buggs, David J. Sager
  • Publication number: 20020087839
    Abstract: There is disclosed a data processor that executes variable latency load operations using bypass circuitry that allows load word operations to avoid stalls caused by shifting circuitry.
    Type: Application
    Filed: December 29, 2000
    Publication date: July 4, 2002
    Inventors: Anthony X. Jarvis, Paolo Faraboschi
  • Publication number: 20020087836
    Abstract: A processor having a register renaming structure and method is disclosed to recover a free list. The processor includes a physical register file including physical registers. The processor also includes a decoder to decode an instruction to indicate a destination logical register. The processor also includes a register allocation table to map the destination logical register to an allocated physical register. The processor also includes an active list that includes an old field and a new field. The old field includes at least one evicted physical register from the register alias table. The new field includes the allocated physical register. The processor also includes the free list of unallocated physical registers reclaimed from the active list.
    Type: Application
    Filed: December 29, 2000
    Publication date: July 4, 2002
    Inventors: Stephan J. Jourdan, Michael Bekerman, Ronny Ronen
  • Publication number: 20020087838
    Abstract: There is disclosed a data processor for stalling the instruction execution pipeline after a cache miss and re-loading the correct cache data into any bypass devices before restarting the pipeline.
    Type: Application
    Filed: December 29, 2000
    Publication date: July 4, 2002
    Inventor: Anthony X. Jarvis
  • Publication number: 20020083304
    Abstract: An improved method and system for operating an out of order processor at a high frequency enabled by an increased pipeline length. It is proposed to shorten the pipeline by a considerable number of stages by accepting that a write after read conflict may occur, when directly after renaming, during the “read ROB” pipeline stage, all the information (tag, validity and data) is read from an Reorder Buffer ROB entry, and is next written, in a following pipeline stage “write RS”, into a reservation station (RS) entry. In order to assure the correctness of processing in particular in cases of dependencies, e.g., write after read conflicts a separate inventional add in logic covers these cases. The logic detects the write after read conflict case of an Instructional Execution Unit (IEU) writing into the particular entry that is selected by the renaming logic during “read ROB”.
    Type: Application
    Filed: December 20, 2001
    Publication date: June 27, 2002
    Applicant: IBM
    Inventors: Jens Leenstra, Dieter Wendel
  • Patent number: 6412061
    Abstract: A method of dynamically adjusting a multiple stage pipeline to execute one of a set of instructions, wherein each stage has a latency and performs a selected data operation. An instruction to be executed is received and a number of stages of the pipeline is selected to execute the instruction as needed to perform a corresponding data operation. Unnecessary stages are bypassed to a reduced latency and the instruction is executed with the selected stages.
    Type: Grant
    Filed: January 14, 1998
    Date of Patent: June 25, 2002
    Assignee: Cirrus Logic, Inc.
    Inventor: Thomas Anthony Dye
  • Patent number: 6412064
    Abstract: An system and method for retiring instructions in a superscalar microprocessor which executes a program comprising a set of instructions having a predetermined program order, the retirement system for simultaneously retiring groups of instructions executed in or out of order by the microprocessor. The retirement system comprises a done block for monitoring the status of the instructions to determine which instruction or group of instructions have been executed, a retirement control block for determining whether each executed instruction is retirable, a temporary buffer for storing results of instructions executed out of program order, and a register array for storing retirable-instruction results.
    Type: Grant
    Filed: August 2, 2000
    Date of Patent: June 25, 2002
    Assignee: Seiko Epson Corporation
    Inventors: Johannes Wang, Sanjiv Garg, Trevor Deosaran
  • Publication number: 20020078326
    Abstract: In one embodiment, a programmable processor is adapted to include a speculative count register. The speculative count register may be loaded with data associated with an instruction before the instruction commits. However, if the instruction is terminated before it commits, the speculative count register may be adjusted. A set of counters may monitor the difference between the speculative count register and its architectural counterpart.
    Type: Application
    Filed: December 20, 2000
    Publication date: June 20, 2002
    Applicant: Intel Corporation and Analog Devices, Inc.
    Inventors: Charles P. Roth, Ravi P. Singh, Gregory A. Overkamp
  • Patent number: 6408379
    Abstract: An apparatus and method for executing floating-point store instructions in a microprocessor is provided. If store data of a floating-point store instruction corresponds to a tiny number and an underflow exception is masked, then a trap routine can be executed to generate corrected store data and complete the store operation. In response to detecting that store data corresponds to a tiny number and the underflow exception is masked, the store data, store address information, and opcode information can be stored prior to initiating the trap routine. The trap routine can be configured to access the store data, store address information, and opcode information. The trap routine can be configured to generate corrected store data and complete the store operation using the store data, store address information, and opcode information.
    Type: Grant
    Filed: June 10, 1999
    Date of Patent: June 18, 2002
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Norbert Juffa, Stephan Meier, Stuart Oberman, Scott White
  • Patent number: 6408372
    Abstract: A RAM (12) used by the CPU comprises a work buffer (14) and a work register (151) for pipelined processing. The work buffer (14) consists of the first to fourth work buffers (141 to 144) each of which stores information on predetermined data, e.g., a current processing on the data. When the CPU accesses the first to fourth work buffers (141 to 144), an address decoder performs an address conversion on the basis of a value (R151) of the work register (151). For example, when the value (R151) of the work register (151) is “1”, addresses (P1, P2, P3 and P4) in an address space are converted (mapped) to addresses (AD141, AD142, AD143 and AD144) of work buffers (141, 142, 143 and 144). With this constitution, in performing a plurality of data processings in parallel, the CPU can improve its operation efficiency while controlling a currently performed processing on each data.
    Type: Grant
    Filed: March 2, 2000
    Date of Patent: June 18, 2002
    Assignee: Mitsubishi Denki Kabushiki Kaisha
    Inventor: Shigenori Miyauchi
  • Patent number: 6408378
    Abstract: An apparatus and method of multi-bit scoreboarding to handle write-after-write hazards and eliminate bypass comparators. The method comprises providing a set of data storage units that include a set of scoreboard return path bits associated with the data storage units. The scoreboard return path bits include a set of return path indicators. A first execution unit is provided to provide an output based on an input. A first switching unit is provided to select the input to the first execution unit and to receive the output from the first execution unit. A first bypass control unit is provided to cause the first switching unit to couple the output the output from the first execution unit to the first execution unit based on the scoreboard return path bits, such that the input to the first execution unit is selected from among the data storage units and the output from the first execution unit.
    Type: Grant
    Filed: May 31, 2001
    Date of Patent: June 18, 2002
    Assignee: Intel Corporation
    Inventor: Dennis M. O'Connor
  • Patent number: 6405304
    Abstract: A technique for managing register assignments. The technique involves maintaining, in a register list memory circuit having entries that respectively correspond to physical registers, a list of register assignments that assign logical registers to the physical registers. The technique further involves maintaining, in a vector memory circuit having bits that respectively correspond to the physical registers, a valid vector that forms, in combination with the list of register assignments, a list of valid register assignments. Furthermore, the technique involves storing, for an instruction that is mapped by the data processor, a copy of the valid vector from the vector memory circuit to a silo memory circuit. Preferably, the processor using the technique has the ability to execute branches of instructions speculatively, and to recover if it is determined that the processor executed down an incorrect instruction branch.
    Type: Grant
    Filed: August 24, 1998
    Date of Patent: June 11, 2002
    Assignee: Compaq Information Technologies Group, L.P.
    Inventors: James Arthur Farrell, Sharon Marie Britton, Harry Ray Fair, III, Bruce Gieseke, Daniel Lawrence Leibholz, Derrick R. Meyer
  • Patent number: 6401195
    Abstract: In one method, a hazard on a register is detected based on the register ID from a latch of a first stage of a processor pipeline. The pipeline is stalled after a stale value of the register is stored in a latch of a later stage of the pipeline. The stale value in the latch is then replaced with a fresh value while the pipeline is stalled.
    Type: Grant
    Filed: December 30, 1998
    Date of Patent: June 4, 2002
    Assignee: Intel Corporation
    Inventors: Judge K. Arora, Harshvardhan P. Sharangpani, Ghassan W. Khadder
  • Publication number: 20020066005
    Abstract: The present invention provides a detector for detecting at least one kind of dependence in address between instructions executed by at least a processor, the detector being adopted to detect a possibility of presence of the at least one kind of dependence, wherein if the at least one kind of dependence is present in fact, then the detector detects a possibility of presence of the at least one kind of dependence, and if the at least one kind of dependence is not present in fact, then the detector is allowed to detect the at least one kind of dependence.
    Type: Application
    Filed: November 28, 2001
    Publication date: May 30, 2002
    Applicant: NEC CORPORATION
    Inventors: Atsufumi Shibayama, Satoshi Matsushita, Sunao Torii, Naoki Nishi
  • Patent number: 6385715
    Abstract: A processor is provided that includes an execution unit for executing instructions and a replay system for replaying instructions which have not executed properly. The replay system is coupled to the execution unit and includes a checker for determining whether each instruction has executed properly and a plurality of replay queues or replay queue sections coupled to the checker for temporarily storing one or more instructions for replay. In one embodiment, thread-specific replay queue sections may each be used to store long latency instruction for each thread until the long latency instruction is ready to be executed (e.g., data for a load instruction has been retrieved from external memory). By storing the long latency instruction and its dependents in a replay queue section for one thread which has stalled, execution resources are made available for improving the speed of execution of other threads which have not stalled.
    Type: Grant
    Filed: May 4, 2001
    Date of Patent: May 7, 2002
    Assignee: Intel Corporation
    Inventors: Amit A. Merchant, Darrell D. Boggs, David J. Sager
  • Patent number: 6381689
    Abstract: A reorder buffer is configured into multiple lines of storage, wherein a line of storage includes sufficient storage for instruction results regarding a predefined maximum number of concurrently dispatchable instructions. A line of storage is allocated whenever one or more instructions are dispatched. A microprocessor employing the reorder buffer is also configured with fixed, symmetrical issue positions. The symmetrical nature of the issue positions may increase the average number of instructions to be concurrently dispatched and executed by the microprocessor. The average number of unused locations within the line decreases as the average number of concurrently dispatched instructions increases. One particular implementation of the reorder buffer includes a future file. The future file comprises a storage location corresponding to each register within the microprocessor.
    Type: Grant
    Filed: March 13, 2001
    Date of Patent: April 30, 2002
    Assignee: Advanced Micro Devices, Inc.
    Inventors: David B. Witt, Thang M. Tran
  • Patent number: 6377998
    Abstract: An improved frame processing apparatus for a network that supports high speed frame processing is disclosed. The frame processing apparatus uses a combination of fixed hardware and programmable hardware to implement network processing, including frame processing and media access control (MAC) processing. Although generally applicable to frame processing for networks, the improved frame processing apparatus is particular suited for token-ring networks and ethernet networks. The invention can be implemented in numerous ways, including as an apparatus, an integrated circuit and network equipment.
    Type: Grant
    Filed: August 22, 1997
    Date of Patent: April 23, 2002
    Assignee: Nortel Networks Limited
    Inventors: Michael Noll, Michael Clarke, Mark Smallwood
  • Publication number: 20020038417
    Abstract: A pipelined microprocessor for processing instructions includes at least one pipeline. The pipeline includes and instruction fetching functional stage, an instruction decoding functional stage, an execution functional stage comprising a number of execution units and a commit functional stage. The commit functional stage includes or is associated with a reorder buffer. Detecting means are provided for detecting instruction irregularities. When an instruction irregularity is detected, an irregularity indication and a flush instruction are generated. The irregularity indication is used to initiate a flush made whereas the flush instruction, when received in a stage or unit set in flush mode, resets the flush mode in said stage/unit.
    Type: Application
    Filed: September 27, 2001
    Publication date: March 28, 2002
    Inventors: Joachim Strombergsson, Magnus Carlesson, Jonas Vasell
  • Patent number: 6360314
    Abstract: A bypass mechanism is disclosed for a computer system that executes load and store instructions out of order. The bypass mechanism compares the address of each issuing load instruction with a set of recent store instructions that have not yet updated memory. A match of the recent stores provides the load data instead of having to retrieve the data from memory. A store queue holds the recently issued stores. Each store queue entry and the issuing load includes a data size indicator. Subsequent to a data bypass, the data size indicator of the issuing load is compared against the data size indicator of the matching store queue entry. A trap is signaled when the data size indicator of the issuing load differs from the data size indicator of the matching store queue entry. The trap signal indicates that the data provided by the bypass mechanism was insufficient to satisfy the requirements of the load instruction.
    Type: Grant
    Filed: July 14, 1998
    Date of Patent: March 19, 2002
    Assignee: Compaq Information Technologies Group, L.P.
    Inventors: David Arthur James Webb, Jr., James B. Keller, Derrick R. Meyer
  • Patent number: 6356918
    Abstract: A method and a system in a data processing system for managing registers in a register array wherein the data processing system has M architected registers and the register array has greater than M registers. A first physical register address is selected from a group of available physical register addresses in a renamed table in response to dispatching a register-modifying instruction that specifies an architected target register address. The architected target register address is then associated with the first physical register address, and a result of executing the register-modifying instruction is stored in a physical register pointed to by the first physical register address. In response to completing the register-modifying instruction, the first physical address in the rename table is exchanged with a second physical address in a completion renamed table, wherein the second physical address is located in the completion rename table at a location pointed to by the architected target register address.
    Type: Grant
    Filed: July 26, 1995
    Date of Patent: March 12, 2002
    Assignee: International Business Machines Corporation
    Inventors: Chiao-Mei Chuang, Hung Qui Le
  • Patent number: 6351805
    Abstract: A system and method of executing instructions within a counterflow pipeline processor. The counterflow pipeline processor includes an instruction pipeline, a data pipeline, a reorder buffer and a plurality of execution units. An instruction and one or more operands issue into the instruction pipeline and a determination is made at one of the execution units whether the instruction is ready for execution. If so, the operands are loaded into the execution unit and the instruction executes. The execution unit is monitored for a result and, when the result arrives, it is stored into the result pipeline. If the instruction reaches the end of the pipeline without executing it wraps around and is sent down the instruction pipeline again.
    Type: Grant
    Filed: February 23, 2001
    Date of Patent: February 26, 2002
    Assignee: Intel Corporation
    Inventors: Kenneth J. Janik, Shih-Lien L. Lu, Michael F. Miller
  • Publication number: 20020016903
    Abstract: The high-performance, RISC core based microprocessor architecture includes an instruction fetch unit for fetching instruction sets from an instruction store and an execution unit that implements the concurrent execution of a plurality of instructions through a parallel array of functional units. The fetch unit generally maintains a predetermined number of instructions in an instruction buffer. The execution unit includes an instruction selection unit, coupled to the instruction buffer, for selecting instructions for execution, and a plurality of functional units for performing instruction specified functional operations. A unified instruction scheduler, within the instruction selection unit, initiates the processing of instructions through the functional units when instructions are determined to be available for execution and for which at least one of the functional units implementing a necessary computational function is available.
    Type: Application
    Filed: May 8, 2001
    Publication date: February 7, 2002
    Inventors: Le Trong Nguyen, Derek J. Lentz, Yoshiyuki Miyayama, Sanjiv Garg, Yasuaki Hagiwara, Johannes Wang, Te-Li Lau, Sze-Shun Wang, Quang H. Trang
  • Publication number: 20020007450
    Abstract: A reorder buffer is configured into multiple lines of storage, wherein a line of storage includes sufficient storage for instruction results regarding a predefined maximum number of concurrently dispatchable instructions. A line of storage is allocated whenever one or more instructions are dispatched. A microprocessor employing the reorder buffer is also configured with fixed, symmetrical issue positions. The symmetrical nature of the issue positions may increase the average number of instructions to be concurrently dispatched and executed by the microprocessor. The average number of unused locations within the line decreases as the average number of concurrently dispatched instructions increases. One particular implementation of the reorder buffer includes a future file. The future file comprises a storage location corresponding to each register within the microprocessor.
    Type: Application
    Filed: March 13, 2001
    Publication date: January 17, 2002
    Inventors: David B. Witt, Thang M. Tran
  • Patent number: 6311266
    Abstract: A method and system for executing instructions in a computer. Each instruction has a look-ahead code indicating the number of instructions after which may be executed before its own execution is completed. The look-ahead code increments a counter associated with the instruction one past the look-ahead location. The instruction then begins execution. The next instructions will also be cleared to begin execution if they are less than the look-ahead code away from the current instruction. A large number of instructions can thus begin execution and be executing at the same time, thus increasing the speed of the computer operation.
    Type: Grant
    Filed: December 23, 1998
    Date of Patent: October 30, 2001
    Assignee: Cray Inc.
    Inventors: Burton J. Smith, Robert L. Alverson
  • Publication number: 20010034825
    Abstract: What is disclosed is an apparatus including a set of data storage units having a set of scoreboard bits associated with the set of data storage units; a first execution unit having an output coupled to the data storage unit and a first input; a first switching unit having an output coupled to the first input of the first execution unit and a first input coupled to the output of the first execution unit; and, a first bypass control unit coupled to the first switching unit; wherein the first bypass control unit is configured to cause the first switching unit to couple the output of the first switching unit to the first input of the first switching unit based upon the set of scoreboard bits. What is also disclosed is a method including the steps of receiving a first instruction; and, storing a first address location and a first access path specifier for a first operand associated with the first instruction; wherein the first access path specifier indicates a source of the first operand.
    Type: Application
    Filed: May 31, 2001
    Publication date: October 25, 2001
    Inventor: Dennis M. O'Connor
  • Publication number: 20010034824
    Abstract: A simultaneous and redundantly threaded, pipelined processor executes the same set of instructions simultaneously as two separate threads to provide fault tolerance. One thread is processed ahead of the other thread so that the instructions in one thread are processed through the processor's pipeline ahead of the corresponding instructions from the other thread. The thread, whose instructions are processed earlier, places its committed stores in a store queue. Subsequently, the second thread places its committed stores in the store queue. A compare circuit periodically scans the store queue for matching store instructions. If otherwise matching store instructions differ in any way (address or data), then a fault has occurred in the processing and the compare circuits initiates fault recovery. If comparison of the two instructions reveals they are identical, the compare circuit allows only a single store instruction to pass to the data cache or the system main memory.
    Type: Application
    Filed: April 19, 2001
    Publication date: October 25, 2001
    Inventors: Shubhendu S. Mukherjee, Steven K. Reinhardt
  • Patent number: 6301654
    Abstract: In a load/store unit within a microprocessor, load and store instructions are executed out of order. The load and store instructions are assigned tags in a predetermined manner, and then assigned to load and store reorder queues for keeping track of the program order of the load and store instructions. Then when new load or store instructions are issued, the new load or store instructions are compared to entries within the load and store reorder queues to detect out of order problems.
    Type: Grant
    Filed: December 16, 1998
    Date of Patent: October 9, 2001
    Assignee: International Business Machines Corporation
    Inventors: Bruce Joseph Ronchetti, Dave Shippy, Larry Edward Thatcher
  • Publication number: 20010023479
    Abstract: In the control section, an operation instruction not prescribing a functional specification, and a unit for processing the specific application-purpose operation instruction is provided within the processor core. The structure of this unit can be changed based on a flexible pipeline structure, and is separately designed for each application field. A register that prescribes a latency from when an instruction of the above unit is issued till when a result can be utilized is also provided in the processor core so as to prevent contention of an output port. Another register that prescribes a latency relating to a constraint of an interval of issuing an instruction of the above unit is also provided in the processor core so as to prevent contention of a resource with the preceding instructions.
    Type: Application
    Filed: December 22, 2000
    Publication date: September 20, 2001
    Inventors: Michihide Kimura, Atsuhiro Suga, Hideo Miyake, Satoshi Imai, Yasuki Nakamura
  • Patent number: 6292884
    Abstract: A reorder buffer is provided which stores a last in buffer (LIB) indication corresponding to each instruction. The last in buffer indication indicates whether or not the corresponding instruction is last, in program order, of the instructions within the buffer to update the storage location defined as the destination of that instruction. The LIB indication is included in the dependency checking comparisons. A dependency is indicated for a given source operand and a destination operand within the reorder buffer if the operand specifiers match and the corresponding LIB indication indicates that the instruction corresponding to the destination operand is last to update the corresponding storage location. At most one of the dependency comparisons for a given source operand can indicate dependency. According to one embodiment, the reorder buffer employs a line-oriented configuration.
    Type: Grant
    Filed: December 30, 1999
    Date of Patent: September 18, 2001
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Thang M. Tran, David B. Witt
  • Patent number: 6289438
    Abstract: A method and apparatus for bypassing defective cache memory locations on-board a microprocessor integrated circuit chip, which includes a processor, a cache memory, a store buffer, a tag RAM, and a comparator. The cache memory has a plurality of valid cache memory locations and at least one defective cache memory location. The store buffer has buffer entries and redundancy entries for storing data sent by the processor for storage in the cache memory. The tag RAM has buffer tag entries for storing addresses of data stored in the buffer entries and redundancy tag entries that store addresses of defective cache memory locations in the cache memory. The comparator compares addresses of data sent by the processor for cache memory storage with addresses stored in the redundancy tag entries.
    Type: Grant
    Filed: July 29, 1998
    Date of Patent: September 11, 2001
    Assignee: Kabushiki Kaisha Toshiba
    Inventor: Toshinari Takayanagi
  • Patent number: 6289433
    Abstract: A register renaming system for out-of-order execution of a set of reduced instruction set computer instructions having addressable source and destination register fields, adapted for use in a computer having an instruction execution unit with a register file accessed by read address ports and for storing instruction operands. A data dependence check circuit is included for determining data dependencies between the instructions. A tag assignment circuit generates one of more tags to specify the location of operands, based on the data dependencies determined by the data dependence check circuit. A set of register file port multiplexers select the tags generated by the tag assignment circuit and pass the tags onto the read address ports of the register file for storing execution results.
    Type: Grant
    Filed: June 10, 1999
    Date of Patent: September 11, 2001
    Assignee: Transmeta Corporation
    Inventors: Sanjiv Garg, Kevin Ray Iadonato, Le Trong Nguyen, Johannes Wang
  • Patent number: 6282636
    Abstract: A decentralized exception processing system includes a plurality of local exception units. Each local exception unit is coupled to process local exception signals from one or more processing resources that are proximate to it. Each local exception unit generates local commit signals, using order information for the instruction in an issue group and any local exception signals it receives. The local commit signals are combined to generate a global commit signal for each instruction in the issue group. Local exception signals are collected at a selected one of the local exception units and processed to generate a global exception unit. The selected local exception unit resteers control of the processing resources to an exception handler associated with the global exception unit.
    Type: Grant
    Filed: December 23, 1998
    Date of Patent: August 28, 2001
    Assignee: Intel Corporation
    Inventors: Tse-Yu Yeh, Gregory Mathews, Steven Tu
  • Patent number: 6282630
    Abstract: The high-performance, RISC core based microprocessor architecture includes an instruction fetch unit for fetching instruction sets from an instruction store and an execution unit that implements the concurrent execution of a plurality of instructions through a parallel array of functional units. The fetch unit generally maintains a predetermined number of instructions in an instruction buffer. The execution unit includes an instruction selection unit, coupled to the instruction buffer, for selecting instructions for execution, and a plurality of functional units for performing instruction specified functional operations. A unified instruction scheduler, within the instruction selection unit, initiates the processing of instructions through the functional units when instructions are determined to be available for execution and for which at least one of the functional units implementing a necessary computational function is available.
    Type: Grant
    Filed: September 10, 1999
    Date of Patent: August 28, 2001
    Assignee: Seiko Epson Corporation
    Inventors: Le Trong Nguyen, Derek J. Lentz, Yoshiyuki Miyayama, Sanjiv Garg, Yasuaki Hagiwara, Johannes Wang, Te-Li Lau, Sze-Shun Wang, Quang H. Trang
  • Publication number: 20010014940
    Abstract: Three parallel instruction processing pipelines of a microprocessor share two data memory ports for obtaining operands and writing back results. Since a significant proportion of the instructions of a typical computer program do not require reading operands from the memory, the probability is high that at least one of any three program instructions to be executed at the same time need not fetch an operand from memory. The two memory ports are thus connected at any given time with the two of the three pipelines which are processing instructions that require memory access, the pipeline without access to the memory processing an instruction that does not need it. To do so, the added third pipeline need not have all the same resources as the other two pipelines, so its stages are made to have a reduced capability in order to save space and reduce power consumption.
    Type: Application
    Filed: April 26, 2001
    Publication date: August 16, 2001
    Applicant: RISE TECHNOLOGY COMPANY
    Inventor: Kenneth K. Munson
  • Publication number: 20010014939
    Abstract: Three parallel instruction processing pipelines of a microprocessor share two data memory ports for obtaining operands and writing back results. Since a significant proportion of the instructions of a typical computer program do not require reading operands from the memory, the probability is high that at least one of any three program instructions to be executed at the same time need not fetch an operand from memory. The two memory ports are thus connected at any given time with the two of the three pipelines which are processing instructions that require memory access, the pipeline without access to the memory processing an instruction that does not need it. To do so, the added third pipeline need not have all the same resources as the other two pipelines, so its stages are made to have a reduced capability in order to save space and reduce power consumption.
    Type: Application
    Filed: April 26, 2001
    Publication date: August 16, 2001
    Applicant: RISE TECHNOLOGY COMPANY
    Inventor: Kenneth K. Munson
  • Publication number: 20010011343
    Abstract: A system and method for performing register renaming of source registers in a processor having a variable advance instruction window for storing a group of instructions to be executed by the processor, wherein a new instruction is added to the variable advance instruction window when a location becomes available. A tag is assigned to each instruction in the variable advance instruction window. The tag of each instruction to leave the window is assigned to the next new instruction to be added to it. The results of instructions executed by the processor are stored in a temp buffer according to their corresponding tags to avoid output and anti-dependencies. The temp buffer therefore permits the processor to execute instructions out of order and in parallel. Data dependency checks for input dependencies are performed only for each new instruction added to the variable advance instruction window and register renaming is performed to avoid input dependencies.
    Type: Application
    Filed: April 5, 2001
    Publication date: August 2, 2001
    Inventors: Trevor A. Deosaran, Sanjiv Garg, Kevin R. Iadonato