Commitment Control Or Register Bypass Patents (Class 712/218)
  • Patent number: 6732251
    Abstract: A processor or processor core has register file circuitry having a plurality of physical registers and a plurality of tag storing portions corresponding respectively to the physical registers. Each tag storing portion stores a tag representing a logical register ID allocated to the corresponding physical register. A register selection unit receives a logical register ID and selects one of the logical registers whose tag matches the received logical register ID. A tag changing unit changes the stored tags so as to change a mapping between at least one logical register ID and one of the physical registers. Such register circuitry permits a mapping between logical register IDs and physical registers to be changed quickly efficiently and can permit a desired physical register to be selected quickly.
    Type: Grant
    Filed: November 1, 2001
    Date of Patent: May 4, 2004
    Assignee: PTS Corporation
    Inventors: Jonathan Michael Harris, Adrian Philip Wise, Nigel Peter Topham
  • Publication number: 20040083351
    Abstract: A processor includes a memory execution unit for executing load and store instructions and a replay system for replaying instructions which have not executed properly. The memory execution unit including an invalid store flag that is set for a store instruction if the replay system detects that the store instruction has not executed properly and is cleared if the store instruction has executed properly. If an invalid store flag is set for a store instruction, the replay system replays load instructions which are programmatically younger than the invalid store instruction until the store instruction executes properly.
    Type: Application
    Filed: October 23, 2003
    Publication date: April 29, 2004
    Inventors: Amit A. Merchant, Darrell D. Boggs, David J. Sager
  • Patent number: 6728869
    Abstract: A method and apparatus for avoiding latency in a processing system that includes a memory for storing intermediate results is presented. The processing system stores results produced by an operation unit in memory, where the results may be used by subsequent dependent operations. In order to avoid the latency of the memory, the output for the operation unit may be routed directly back into the operation unit as a subsequent operand. Furthermore, one or more memory bypass registers are included such that the results produced by the operation unit during recent operations that have not yet satisfied the latency requirements of the memory are also available. A first memory bypass register may thus provide the result of an operation that completed one cycle earlier, a second memory bypass register may provide the result of an operation that completed two cycles earlier, etc.
    Type: Grant
    Filed: April 21, 2000
    Date of Patent: April 27, 2004
    Assignee: ATI International Srl
    Inventors: Michael Andrew Mang, Michael Mantor, Robert Scott Hartog
  • Patent number: 6728865
    Abstract: Instructions asserted in a microprocessors instruction pipeline (3) are accompanied by control information, comprising a group of bits, asserted within a control information pipeline (5) that is synchronized to the instruction pipeline. At the execution stage, the control information is interpreted and appropriate action taken. The control information may indicate that the instruction has been reasserted (asserted again following an initial assertion) and may also indicate the number of times that the instruction has been consecutively asserted in the instruction pipeline. Applied to unaligned memory operations, in which a memory atom is asserted twice, the control information indicates which part of the unaligned data is to be fetched each time the atom is executed.
    Type: Grant
    Filed: October 20, 1999
    Date of Patent: April 27, 2004
    Assignee: Transmeta Corporation
    Inventors: Brett Coon, Godfrey D'Souza, Paul Serris
  • Patent number: 6728873
    Abstract: Disclosed is a method of operation within a processor, that enhances speculative branch processing. A speculative execution path contains an instruction sequence that includes a barrier instruction followed by a load instruction. While a barrier operation associated with the barrier instruction is pending, a load request associated with the load instruction is speculatively issued to memory. A flag is set for the load request when it is speculatively issued and reset when an acknowledgment is received for the barrier operation. Data which is returned by the speculatively issued load request is temporarily held and forwarded to a register or execution unit of the data processing system after the acknowledgment is received. All process results, including data returned by the speculatively issued load instructions are discarded when the speculative execution path is determined to be incorrect.
    Type: Grant
    Filed: June 6, 2000
    Date of Patent: April 27, 2004
    Assignee: International Business Machines Corporation
    Inventors: Guy Lynn Guthrie, Ravi Kumar Arimilli, John Steven Dodson, Derek Edward Williams
  • Patent number: 6728868
    Abstract: The present invention generally relates to a processing system and method for coalescing instruction data to efficiently detect data hazards between instructions of a computer program. In architecture, the system of the present invention utilizes a plurality of pipelines, coalescing circuitry, and hazard detection circuitry. The plurality of pipelines is configured to process instructions of a computer program, and the coalescing circuitry is configured to receive, from the pipelines, a plurality of register identifiers identifying a plurality of registers. The coalescing circuitry is configured to coalesce said register identifiers thereby generating a coalesced register identifier identifying each of said plurality of registers. The hazard detection circuitry is configured to receive the coalesced register identifier and to perform a comparison of the coalesced register identifier with other information received from the pipelines.
    Type: Grant
    Filed: October 28, 2002
    Date of Patent: April 27, 2004
    Assignee: Hewlett-Packard Development Company, L.P.
    Inventors: Ronny Lee Arnold, Donald Charles Soltis, Jr.
  • Publication number: 20040078728
    Abstract: One embodiment of the present invention provides a system that corrects bit errors in temporary results within a central processing unit (CPU). During operation, the system receives a temporary result during execution of an in-flight instruction. Next, the system generates a parity bit for the temporary result, and stores the temporary result and the parity bit in a temporary register within the CPU. Before the temporary result is committed to the architectural state of the CPU, the system checks the temporary result and the parity bit to detect a bit error. If a bit error is detected, the system performs a micro-trap operation to re-execute the instruction that generated the temporary result, thereby regenerating the temporary result. Otherwise, if a bit error is not detected, the system commits the temporary result to the architectural state of the CPU.
    Type: Application
    Filed: May 14, 2002
    Publication date: April 22, 2004
    Inventors: Marc Tremblay, Shailender Chaudhry, Quinn A. Jacobson
  • Patent number: 6721874
    Abstract: A method and system for utilizing a completion table in a superscalar processor is disclosed. The method and system comprises providing a plurality of threads to the processor and associating a link list with each of the threads, wherein each entry associated with a thread is linked to a next entry. A method and system in accordance with the present invention implements the completion table as link lists. Each entry in the completion table in a thread is linked to the next entry via a pointer that is stored in a link list. In a second aspect a method of determining the relative order between instructions is provided. A method and system in accordance with the present invention implements a flush mask array which is accessed to determine the relative order of entries in the said completion table. A method and system in accordance with the present invention implements a restore head pointer table to save and restore the state of the pointer of said completion table.
    Type: Grant
    Filed: October 12, 2000
    Date of Patent: April 13, 2004
    Assignee: International Business Machines Corporation
    Inventors: Hung Qui Le, Peichun Liu, Balaram Sinharoy
  • Publication number: 20040064677
    Abstract: A data processor includes program registers with individual byte-location write enables. Bypass networks allow a precision pipeline to respond to read requests by accessing a program register or pipeline stage on a byte-by-byte basis. The data processor can thus write to individual byte locations without overwriting other byte locations within the same register. The data processor has an instruction set with instructions that combine two operands and yield a one-byte result that is stored in a specified byte location of a specified result register. Eight instances of this instruction can pack eight results into a single 64-bit result register without additional packing instructions and without using a read port to read the result register before writing to it. As plural functional units can write concurrently to different subwords of the same result register, a system with four functional units can pack eight results into a result register in two instruction cycles.
    Type: Application
    Filed: September 30, 2002
    Publication date: April 1, 2004
    Inventor: Dale Morris
  • Publication number: 20040064680
    Abstract: One embodiment of the present invention provides a system that reduces the time required to access registers from a register file within a processor. During operation, the system receives an instruction to be executed, wherein the instruction identifies at least one operand to be accessed from the register file. Next, the system looks up the operands in a register pane, wherein the register pane is smaller and faster than the register file and contains copies of a subset of registers from the register file. If the lookup is successful, the system retrieves the operands from the register pane to execute the instruction. Otherwise, if the lookup is not successful, the system retrieves the operands from the register file, and stores the operands into the register pane. This triggers the system to reissue the instruction to be executed again, so that the re-issued instruction retrieves the operands from the register pane.
    Type: Application
    Filed: September 26, 2002
    Publication date: April 1, 2004
    Inventors: Sudarshan Kadambi, Adam R. Talcott, Wayne I. Yamamoto
  • Publication number: 20040054875
    Abstract: A method for executing an instruction with a semi-fast operation in a staggered ALU. The method of one embodiment comprises generating a first operation and a second operation from a micro-instruction. The first and second operations are scheduled for execution in a staggered arithmetic logic unit (ALU). The first and second operations are separated by N clock cycles. Data from the first operation is communicated to the second operation for use with execution of the second operation.
    Type: Application
    Filed: September 13, 2002
    Publication date: March 18, 2004
    Inventor: Ross A. Segelken
  • Publication number: 20040054876
    Abstract: The present invention provides an apparatus and method for synchronizing a first pipeline and a second pipeline of a processor arranged to execute a sequence of instructions. The processor is arranged to route an instruction in the sequence through either the first or the second pipeline dependent on predetermined criteria, each pipeline having a plurality of pipeline stages including a retirement stage. Counter logic is provided for maintaining a first counter relating to the first pipeline and a second counter relating to the second pipeline. For each instruction in the first pipeline a determination is made as to when that instruction reaches a point within the first pipeline where an exception status of that instruction is resolved, and the counter logic is arranged to increment the first counter responsive to such determination.
    Type: Application
    Filed: September 13, 2002
    Publication date: March 18, 2004
    Inventors: Richard Roy Grisenthwaite, Ian Victor Devereux
  • Patent number: 6708269
    Abstract: In a multi-threaded system, such as in a multi-processor system, different types of fences are provided to force completion of programmatically earlier instructions in a program. The types of fences can be thread-specific, and different types of fences are used based on different kinds of conditions, instructions, operations, or memory types. When a fence is executed, senior stores, request buffers, bus queues, or any combination of these stages in an execution pipeline can be drained. Fetches at a front end of the pipeline can also be killed to ensure that the bus queue can be drained.
    Type: Grant
    Filed: December 30, 1999
    Date of Patent: March 16, 2004
    Assignee: Intel Corporation
    Inventors: Keshavan K. Tiruvallur, Douglas M. Carmean, Robert J. Greiner, Muntaquim Chowdhury, Madhavan Parthasarathy
  • Patent number: 6704856
    Abstract: A method of compacting an instruction queue in an out of order processor includes determining the number of invalid instructions below and including each row in the queue, by counting invalid bits or validity indicators associated with rows below and up to the current row. For each row, multiplexor select signals are generated from the flat vector counts for the N rows above and including the present row, and from the validity indicators associated with the N rows, where N is a predetermined value. A multiplexor associated with a particular row selects one of the N rows according to the select value, and moves or passes the instruction held in the selected row to the present row. A row's select value is determined by forming a diagonal from the N count vectors corresponding to the N rows above and including the present row, and logically ANDing, each diagonal bit with the valid bit associated with the same row. Each row's count vector is determined in two stages.
    Type: Grant
    Filed: December 17, 1999
    Date of Patent: March 9, 2004
    Assignee: Hewlett-Packard Development Company, L.P.
    Inventors: James A. Farrell, Timothy C. Fischer, Daniel L. Leibholz, Bruce A. Gieseke
  • Publication number: 20040044881
    Abstract: In an embodiment, the present invention describes a method and apparatus for detecting RAW condition earlier in an instruction pipeline. The store instructions are stored in a special store bypass buffer (SBB) within an instruction decode unit (IDU). The IDU compares the instruction fields that are used for address generation of all ‘load’ instructions against ‘store’ instructions within a group of fetched instructions and ‘store’ instructions previously stored in the SBB. If a match of instruction fields is found, the IDU ‘speculates’ that the load instruction has dependency on the ‘store’ instruction. A data cache unit (DCU) validates the dependency of the load instruction ‘speculated’ by the IDU. If a false dependency is ‘speculated’ by the IDU, the DCU forces a re-fetch of the load instruction.
    Type: Application
    Filed: August 28, 2002
    Publication date: March 4, 2004
    Applicant: Sun Microsystems, Inc.
    Inventors: Robert M. Maier, Sorin Iacobovici, Rabin Sugumar, Robert Nuckolls, Ali Vahidsafa, Chandra M. R. Thimmannagari
  • Publication number: 20040044882
    Abstract: A multi-port register file may be selectively bypassed such that any element in a result vector is bypassed to the same index of an input vector of a succeeding operation when the element is requested in the succeeding operation in the same index as it was generated. Alternatively, the results to be placed in a register file may be bypassed to a succeeding operation when the N elements that dynamically compose a vector are requested as inputs to the next operation exactly in the same order as they were generated. That is, for the purposes of bypassing, the N vector elements are treated as a single entity. Similar rules apply for the write-through path.
    Type: Application
    Filed: August 29, 2002
    Publication date: March 4, 2004
    Applicant: International Business Machines Corporation
    Inventors: Sameh Asaad, Jaime H. Moreno, Victor Zyuban
  • Patent number: 6701425
    Abstract: A computer system with parallel execution pipelines and a memory access controller has store address queues holding addresses for store operations, store data queues holding a plurality of data for storing in the memory and load address storage holding addresses for load operations, said access controller including comparator circuitry to compare load addresses received by the controller with addresses in the store address queue and locate any addresses which are the same, each of said addresses including a first set of bits representing a word address together with a second set of byte enable bits and said comparator having circuitry to compare the byte enable bits of two addresses as well as said first set of bits.
    Type: Grant
    Filed: May 2, 2000
    Date of Patent: March 2, 2004
    Assignee: STMicroelectronics S.A.
    Inventors: Ahmed Dabbagh, Nicolas Grossier, Bruno Bernard, Pierre-Yves Taloud
  • Patent number: 6701427
    Abstract: A data processing apparatus for processing floating point instructions is responsive to a floating point instruction to apply a floating point operation to a number of operands to produce a final result, result data being generated during a predetermined pipelined stage with further processing then being performed on the result data in one or more subsequent pipelined stages to generate the final result. Exception determination logic determines whether an exception may occur during application of the floating point operation to the operands, and to prevent the execution unit applying the floating point operation to those operands if it is determined that an exception may occur. The exception determination logic is arranged to use at least some of the predetermined control data to compensate for differences between the forwarded result data and the final result relevant when determining whether an exception may occur when processing the second floating point instruction.
    Type: Grant
    Filed: December 22, 1999
    Date of Patent: March 2, 2004
    Assignee: ARM Limited
    Inventors: Christopher Neal Hinds, Arun Kumar Varadarajan Rajagopal
  • Publication number: 20040039898
    Abstract: A processor (50) operable in response to an instruction set comprising a plurality of instructions. The processor comprises a functional unit (52) comprising an integer number S of sub-units (541, 542, 543), wherein S is greater than one. Each of the sub-units is operable to execute, during an execution cycle, at least one of the instructions in the instruction set in response to at least two data arguments (A, B). The processor further comprises circuitry (58A1, 58A2, 58A3, 58B1, 58B2) for providing an updated value of the at least two data arguments to less than all S of the sub-units for a single execution cycle.
    Type: Application
    Filed: August 20, 2002
    Publication date: February 26, 2004
    Applicant: TEXAS INSTRUMENTS INCORPORATED
    Inventor: Patrick W. Bosshart
  • Publication number: 20040034762
    Abstract: A mispredicted path side memory is configured to be coupled to a stage in an instruction pipeline. As instructions advance through the pipeline, a result from the stage is stored into the mispredicted path side memory. The result is restored from the mispredicted path side memory into a pipeline stage when a branch is mispredicted.
    Type: Application
    Filed: August 19, 2003
    Publication date: February 19, 2004
    Inventor: Nicolas I. Kacevas
  • Patent number: 6691222
    Abstract: A system and method of executing instructions within a counterflow pipeline processor. The counterflow pipeline processor includes an instruction pipeline, a data pipeline, a reorder buffer and a plurality of execution units. An instruction and one or more operands issue into the instruction pipeline and a determination is made at one of the execution units whether the instruction is ready for execution. If so, the operands are loaded into the execution unit and the instruction executes. The execution unit is monitored for a result and, when the result arrives, it is stored into the result pipeline. If the instruction reaches the end of the pipeline without executing it wraps around and is sent down the instruction pipeline again.
    Type: Grant
    Filed: March 18, 2003
    Date of Patent: February 10, 2004
    Assignee: Intel Corporation
    Inventors: Kenneth J. Janik, Shih-Lien L. Lu, Michael F. Miller
  • Publication number: 20040024993
    Abstract: An apparatus and method for maintaining a floating point data segment selector are described. In one embodiment, the method includes the detection of a micro-operation of a memory referencing macro-instruction from one or more micro-operations to be retired during a system clock cycle. When the detected micro-operation triggers an event, a micro-code event handler is triggered to initiate an update of a floating point data segment selector information associated with the detected micro-operation. Otherwise, FDS update device is triggered to update the floating point data segment selector information associated with the detected micro-operation.
    Type: Application
    Filed: August 5, 2002
    Publication date: February 5, 2004
    Inventor: Rajesh S. Parthasarathy
  • Patent number: 6687809
    Abstract: An apparatus in a first processor includes a first data structure to store addresses of store instruction dispatched during a last predetermined number of cycles. The apparatus further includes logic to determine whether a load address of a load instruction being executed matches one of the store addresses in the first data structure. The apparatus still further includes logic to replay to the respective load instruction if the load address of the respective load instruction matches of the store addresses in the first data structure.
    Type: Grant
    Filed: October 24, 2002
    Date of Patent: February 3, 2004
    Assignee: Intel Corporation
    Inventors: Muntaquim F. Chowdhury, Douglas M. Carmean
  • Publication number: 20040015679
    Abstract: A mechanism is provided for execution of an instruction having one or more parameters that need to be resolved at runtime. Instructions being executed may be stored in non-rewritable storage. The present invention allows costly parameter resolution to be circumvented during subsequent executions of the same instruction. An interpreter invokes an optimization module when it encounters an instruction with one or more associated parameters that need to be resolved at runtime. If the optimization module determines that resolved values associated with the instruction are available in a cache, then optimization module obtains resolved values associated with the instruction from the cache. Resolving parameters into their corresponding object references is time-consuming and utilizes valuable computer resources. By obtaining resolved values stored during a previous execution of an instruction, the optimization module avoids repeatedly resolving parameters associated with an instruction.
    Type: Application
    Filed: July 17, 2002
    Publication date: January 22, 2004
    Inventor: Ioi K. Lam
  • Patent number: 6681320
    Abstract: Causality-based memory ordering in a multiprocessing environment. A disclosed embodiment includes a plurality of processors and arbitration logic coupled to the plurality of processors. The processors and arbitration logic maintain processor consistency yet allow stores generated in a first order by any two or more of the processors to be observed consistent with a different order of stores by at least one of the other processors. Causality monitoring logic coupled to the arbitration logic monitors any causal relationships with respect to observed stores.
    Type: Grant
    Filed: December 29, 1999
    Date of Patent: January 20, 2004
    Assignee: Intel Corporation
    Inventor: Deborah T. Marr
  • Patent number: 6681322
    Abstract: Methods for emulating an instruction set extension, comprising providing data to be operated upon, executing a first instruction with respect to a first portion of the data without committing the results of the first executed instruction, if no unmasked exceptions occur with respect to the first portion of the data, executing a second instruction with respect to a second portion of the data, and if no unmasked exceptions occur with respect to the second portion of the data, committing the results of the second executed instruction and again executing the first instruction with respect to the first portion of the data. If the first instruction is executed again, its results are committed. A handler is invoked if an unmasked exception occurs.
    Type: Grant
    Filed: November 26, 1999
    Date of Patent: January 20, 2004
    Assignee: Hewlett-Packard Development Company L.P.
    Inventors: Kevin David Safford, Patrick Knebel
  • Publication number: 20040006684
    Abstract: An instruction execution apparatus comprising a register 43 for storing a copy of contents of the maximum number of entries that are executable simultaneously in one cycle with the entry storing the oldest unreleased instruction at the head among all entries in an instruction storage device 42 after execution of the instructions, a completion condition determination section 44 for determining whether the instructions stored in the entries of the register are completed in the cycle for determining completion conditions of the entries in the instruction storage device, and an entry release section 45 for releasing only the entries that are determined to be completed by the completion condition determination section among all entries in the instruction storage device, which allows the entries in the CSE to be released smoothly even though the number of entries in the CSE, or clock frequency, is increased.
    Type: Application
    Filed: December 31, 2002
    Publication date: January 8, 2004
    Applicant: FUJITSU LIMITED
    Inventors: Yasunobu Akizuki, Aiichiro Inoue
  • Publication number: 20040006686
    Abstract: A latest register update buffer which stores latest register update data is allocated and prepared every general register for storing source data. A latest register update processing unit stores a value in the general register as latest register update data into the latest register update buffer when a register update instruction is not speculatively executed, and overwrites a result of the speculative execution when the instruction is speculatively executed. Upon instruction decoding, a matching processing unit reads out the latest register update data from the latest register update allocation buffer and stores it into a data area in a reservation station.
    Type: Application
    Filed: January 21, 2003
    Publication date: January 8, 2004
    Applicant: Fujitsu Limited
    Inventor: Toshio Yoshida
  • Publication number: 20040006685
    Abstract: When a predetermined instruction is fetched and decoded, an instruction issuing unit develops the instruction operation into a multiflow of a previous flow and a following flow and issues the instruction by in-order. It is held into a reservation station. An instruction executing unit executes the instruction held in the reservation station by out-of-order. Further, an execution result of the instruction is committed by in-order. A multiflow guarantee processing unit guarantees an execution result of the previous flow stored in an allocation register on a register update buffer until the following flow is committed. Even if the previous flow is committed and the allocation register is released, the guaranteeing process is realized by stalling another instruction serving as a next register allocation destination in a decoding cycle until the following flow is committed.
    Type: Application
    Filed: January 21, 2003
    Publication date: January 8, 2004
    Applicant: Fujitsu Limited
    Inventor: Toshio Yoshida
  • Patent number: 6675288
    Abstract: A technique for managing register assignments. The technique involves maintaining, in a register list memory circuit having entries that respectively correspond to physical registers, a list of register assignments that assign logical registers to the physical registers. The technique further involves maintaining, in a vector memory circuit having bits that respectively correspond to the physical registers, a valid vector that forms, in combination with the list of register assignments, a list of valid register assignments. Furthermore, the technique involves storing, for an instruction that is mapped by the data processor, a copy of the valid vector from the vector memory circuit to a silo memory circuit. Preferably, the processor using the technique has the ability to execute branches of instructions speculatively, and to recover if it is determined that the processor executed down an incorrect instruction branch.
    Type: Grant
    Filed: May 9, 2002
    Date of Patent: January 6, 2004
    Assignee: Hewlett-Packard Development Company L.P.
    Inventors: James Arthur Farrell, Sharon Marie Britton, Harry Ray Fair, III, Bruce Gieseke, Daniel Lawrence Leibholz, Derrick R. Meyer
  • Publication number: 20040003207
    Abstract: A program counter control method controls instructions by an out-of-order method using a branch prediction mechanism and controls an architecture having delay instructions for branching. The method includes the steps of simultaneously committing a plurality of instructions including a branch instruction, when a branch prediction is successful and the branch instruction branches, and simultaneously updating a program counter and a next program counter depending on a number of committed instructions.
    Type: Application
    Filed: January 28, 2003
    Publication date: January 1, 2004
    Applicant: FUJITSU LIMITED
    Inventors: Ryuichi Sunayama, Kuniki Morita, Aiichiro Inoue
  • Publication number: 20040003206
    Abstract: A re-configurable, streaming vector processor (100) is provided which includes a number of function units (102), each having one or more inputs for receiving data values and an output for providing a data value, a re-configurable interconnection switch (104) and a micro-sequencer (118). The re-configurable interconnection switch (104) includes one or more links, each link operable to couple an output of a function unit (102) to an input of a function unit (102) as directed by the micro-sequencer (118). The vector processor may also include one or more input-stream units (122) for retrieving data from memory. Each input-stream unit is directed by a host processor and has a defined interface (116) to the host processor. The vector processor also includes one or more output-stream units (124) for writing data to memory or to the host processor. The defined interface of the input-stream and output-stream units forms a first part of the programming model.
    Type: Application
    Filed: June 28, 2002
    Publication date: January 1, 2004
    Inventors: Philip E. May, Kent Donald Moat, Raymond B. Essick, Silviu Chiricescu, Brian Geoffrey Lucas, James M. Norris, Michael Allen Schuette, Ali Saidi
  • Publication number: 20030229772
    Abstract: A register window fill technique for a retirement window having an entry size less than a number of fill instructions used in a fill condition is provided. The technique uses modified fill instructions that allow the retirement window to retire a portion of the fill instructions without having to determine whether a remaining portion of the fill instructions will execute without exceptions.
    Type: Application
    Filed: June 7, 2002
    Publication date: December 11, 2003
    Inventors: Chandra Thimmanagari, Sorin Iacobovici, Rabin Sugumar, Robert Nuckolls
  • Publication number: 20030229771
    Abstract: A register window spill technique for an retirement window having an entry size less than a number of spill instructions used in a spill condition is provided. The technique uses modified spill instructions that allow the retirement window to retire a portion of the spill instructions without having to determine whether a remaining portion of the spill instructions will execute without exceptions.
    Type: Application
    Filed: June 7, 2002
    Publication date: December 11, 2003
    Inventors: Chandra Thimmanagari, Sorin Iacobovici, Rabin Sugumar, Robert Nuckolls
  • Publication number: 20030226000
    Abstract: A collapsible pipeline structure, suitable for use in a microprocessor. The contains a first pipeline stage, under control by a clock to export a sequence of instruction stage results with respect to a clock cycle of the clock. A bypassing storage unit receives the sequence of instruction stage results and, when operating in collapsed mode, forwards that sequence onto the subsequent pipeline stage, bypassing the storage unit through a mutiplexer. A second pipeline stage receives the output from the bypassing storage unit, and exports its instruction stage results under control of the clock. Wherein if the collapsing function of the bypassing storage unit is disabled, then the instruction pipeline functions in the conventional manner.
    Type: Application
    Filed: May 30, 2002
    Publication date: December 4, 2003
    Inventor: Mike Rhoades
  • Patent number: 6654876
    Abstract: A method, processor, and data processing system implementing a delayed reject mechanism are disclosed. The processor includes an issue unit suitable for issuing an instruction in a first cycle and a load store unit (LSU). The LSU includes an extend reject calculator circuit configured to receive a set of completion information signals and generate a delay value based thereon. The LSU is adapted to determine whether to reject the instruction in a determination cycle. The number of cycles between the first cycle and the determination cycle is a function of the delay value such that reject timing is variable with respect to the first cycle. In one embodiment, the processor is further configured to reissue the instruction after the determination cycle if the instruction was rejected in the determination cycle. The delay value is conveyed via a 2-bit bus in one embodiment. The 2 bit bus permits delaying the determination cycle from 0 to 3 cycles after a finish cycle.
    Type: Grant
    Filed: November 4, 1999
    Date of Patent: November 25, 2003
    Assignee: International Business Machines Corporation
    Inventors: Hung Qui Le, David James Shippy
  • Patent number: 6654869
    Abstract: A microprocessor includes a fetch unit, an instruction cracking unit, and dispatch and completion control logic. The fetch unit retrieves a set of instructions from an instruction cache. The instruction cracking unit receives the set of fetched instructions and organizes the set of instructions into an instruction group. The dispatch and completion logic assigns a group tag to the instruction group and records the group tag in an entry of the completion table for tracking the completion status of the instructions comprising the instruction group. The dispatch and control logic may record a single instruction address in the completion table entry corresponding to the each instruction group. Preferably, the single instruction address is the instruction address of the first instruction in the instruction group. The processor may flush the instruction group in response to detecting an exception generated by an instruction in the instruction group.
    Type: Grant
    Filed: October 28, 1999
    Date of Patent: November 25, 2003
    Assignee: International Business Machines Corporation
    Inventors: James Allan Kahle, Hung Qui Le, Charles Roberts Moore
  • Patent number: 6643767
    Abstract: The present invention is related to a processor capable of speculatively executing an instruction having a data dependence upon a preceding instruction in order to improve the efficiency of dynamically scheduling instructions. The reissue of instructions is possible without lowering the efficiency of the instruction scheduling process by dividing the function of scheduling instructions and the function of the reissue of instructions.
    Type: Grant
    Filed: January 27, 2000
    Date of Patent: November 4, 2003
    Assignee: Kabushiki Kaisha Toshiba
    Inventor: Toshinori Sato
  • Publication number: 20030200422
    Abstract: A parallel processor has a plurality of operation units that execute operation instructions, and a multi-bank register file in which a plurality of banks each having a plurality of registers are formed. Each of simultaneously input machine instructions is split into a plurality of nano-instructions each of which includes at least one of an access instruction and operation instruction. The output clock cycles of operation instructions with respect to the operation units are arbitrated. Furthermore, the output clock cycles of access instructions to the multi-bank register file are arbitrated so as to prevent access instructions from contending in an identical bank in the multi-bank register file.
    Type: Application
    Filed: February 18, 2003
    Publication date: October 23, 2003
    Applicant: Semiconductor Technology Academic Research Center
    Inventors: Tetsuo Hironaka, Mattausch Hans Juergen, Takeshi Hiramatsu
  • Publication number: 20030196075
    Abstract: A memory disambiguation apparatus includes a store queue, a store forwarding buffer, and a version count buffer. The store queue includes an entry for each store instruction in the instruction window of a processor. Some store queue entries include resolved store addresses, and some do not. The store forwarding buffer is a set-associative buffer that has entries allocated for store instructions as store addresses are resolved. Each entry in the store forwarding buffer is allocated into a set determined in part by a subset of the store address. When the set in the store forwarding buffer is full, an older entry in the set is discarded in favor of the newly allocated entry. A version count buffer including an array of overflow indicators is maintained to track overflow occurrences. As load addresses are resolved for load instructions in the instruction window, the set-associative store forwarding buffer can be searched to provide memory disambiguation.
    Type: Application
    Filed: May 15, 2003
    Publication date: October 16, 2003
    Applicant: Intel Corporation
    Inventors: Haitham Akkary, Sehastien Hily
  • Publication number: 20030196074
    Abstract: A method, processor architecture, computer program product, and data processing system for determining when an instruction in a pipelined processor should be completed is provided. As each instruction is issued to an execution unit, an entry for that instruction is placed within a “finish pipe,” which consists of a series of consecutively numbered stages. Each clock cycle, the entries in the finish pipe advance one stage. When an entry has reached the stage corresponding to the latency of its associated execution unit, it becomes mature.
    Type: Application
    Filed: April 11, 2002
    Publication date: October 16, 2003
    Applicant: International Business Machines Corporation
    Inventors: Hung Qui Le, Dung Quoc Nguyen
  • Publication number: 20030196073
    Abstract: An apparatus is presented for expediting the execution of address-dependent micro instructions in a pipeline microprocessor. The apparatus computes a speculative result associated with an arithmetic operation, where the arithmetic operation is prescribed by a preceding micro instruction that is yet to generate a result. The apparatus utilizes the speculative result to configure a speculative address operand that is provided to an address-dependent micro instruction The apparatus includes speculative operand calculation logic and an update forwarding cache. The speculative operand calculation logic performs the arithmetic operation to generate the speculative result prior to when execute logic executes the preceding micro instruction to generate the result.
    Type: Application
    Filed: May 5, 2003
    Publication date: October 16, 2003
    Applicant: IP-First LLC
    Inventor: Gerard M. Col
  • Patent number: 6633971
    Abstract: A method for forwarding data within a pipeline of a pipelined data processor having a plurality of execution pipeline stages where each stage accepts a plurality of operand inputs and generates a result. The result generated by each execution pipeline stage is selectively coupled to an operand input of one of the execution pipeline stages.
    Type: Grant
    Filed: October 1, 1999
    Date of Patent: October 14, 2003
    Assignee: Hitachi, Ltd.
    Inventors: Chih-Jui Peng, Lew Chua-Eoan
  • Patent number: 6633970
    Abstract: A mechanism is provided for allowing a processor to recover from a failure of a predicted path of instructions (e.g., from a mispredicted branch or other event). The mechanism includes a plurality of physical registers, each physical register can store either architectural data or speculative data. The apparatus also includes a primary array to store a mapping from logical registers to physical registers, the primary array storing a speculative state of the processor. The apparatus also includes a buffer coupled to the primary array to store information identifying which physical registers store architectural data and which physical registers store speculative data. According to another embodiment, a history buffer is coupled to the secondary array and stores historical physical register to logical register mappings performed for each of a plurality of instructions part of a predicted path.
    Type: Grant
    Filed: December 28, 1999
    Date of Patent: October 14, 2003
    Assignee: Intel Corporation
    Inventors: David W. Clift, Darrell D. Boggs, David J. Sager
  • Publication number: 20030188133
    Abstract: A microprocessor apparatus and method are provided, for selectively controlling write back of condition codes. The microprocessor apparatus has translation logic and extended execution logic. The translation logic translates an extended instruction into corresponding micro instructions. The extended instruction includes an extended prefix and an extended prefix tag. The extended prefix disables write back of the condition codes, where the condition codes correspond to a result of a prescribed operation. The extended prefix tag indicates the extended prefix, where the extended prefix tag is an otherwise architecturally specified opcode within an instruction set for a microprocessor. The extended execution logic is coupled to the translation logic. The extended execution logic receives the corresponding micro instructions, and generates the result, and disables write back of the condition codes.
    Type: Application
    Filed: May 9, 2002
    Publication date: October 2, 2003
    Applicant: IP-First LLC
    Inventors: G. Glenn Henry, Rodney E. Hooker, Terry Parks
  • Patent number: 6629170
    Abstract: A multi-stage byte lane selectable bus. In a preferred embodiment, the bus in performance monitor mode includes a plurality of byte lanes and a selection mechanism. The selection mechanism acquires, from a plurality of signals, a subset of those signals, which are desired to be monitored, and places this subset of signals on the byte lanes that are input to the PMU. The number of the plurality of signals that potentially may be monitored is greater than the number of byte lanes and is also greater than the number of PMU counters.
    Type: Grant
    Filed: November 8, 1999
    Date of Patent: September 30, 2003
    Assignee: International Business Machines Corporation
    Inventors: Joel Roger Davidson, Michael Stephen Floyd, Paul Joseph Jordan, Judith E. K. Laurens, Alexander Erik Mericas, Kevin F. Reick
  • Patent number: 6629167
    Abstract: An apparatus for and a method of decoupling at least two multi-stage pipelines are described. At least two paths of data through which data from the first pipeline is send to the second pipeline are provided. During a pipelined execution of a task in the at least two pipelines, the second pipeline may not require every data produced in the first pipeline to process at least some subset of the task. The first pipeline may not be able to produce all data required by each of the stages of the second pipeline. One of the two data paths provides an early data path for a type of data that becomes available in a stage of the first pipeline and that may be processed in a stage of the second pipeline early in time. The other of the two data paths provides a late data path for a type of data that becomes available in a stage of the first pipeline and that may be processed in a stage of the second pipeline later in time. Each data path may comprise a buffer, e.g., a FIFO.
    Type: Grant
    Filed: February 18, 2000
    Date of Patent: September 30, 2003
    Assignee: Hewlett-Packard Development Company, L.P.
    Inventors: Stephen Undy, James E. McCormick, Jr.
  • Patent number: 6629233
    Abstract: A method, processor, and data processing system for enabling maximum instruction issue despite the presence of complex instructions that require multiple rename registers is disclosed. The method includes allocating a first rename register from a first reorder buffer for storing the contents of a first register affected by the complex instruction. A second rename register from a second reorder buffer is then allocated for storing the contents of a second register affected by the complex instruction. In an embodiment in which the first reorder buffer supports a maximum number of allocations per cycle, the allocation of the second register using the second reorder buffer prevents the complex instruction from requiring multiple allocation slots in the first reorder buffer. The method may further include issuing a second instruction that contains a dependency on a register that is allocated in the secondary reorder buffer.
    Type: Grant
    Filed: February 17, 2000
    Date of Patent: September 30, 2003
    Assignee: International Business Machines Corporation
    Inventor: James Allan Kahle
  • Publication number: 20030182540
    Abstract: A method of handling instructions in a load/store unit of a processor by dispatching instructions to the load/store unit, filling a portion of physical entries of a reorder queue with tags corresponding to the instructions while limiting usage of the physical entries of the reorder queue to less than a total number of physical entries, and further dispatching one or more additional instructions to the load/store unit while the filled physical entries in the reorder queue are still full, i.e., still contain tags for uncompleted instructions. The limiting of usage of the physical entries may be selectively applied. Multiple logical instruction tags are assigned in a count greater than the number of physical entries in the reorder queue. Of the multiple logical instruction tags assigned to a single one of the physical entries in the reorder queue, only the tag for the oldest instruction is allowed to execute.
    Type: Application
    Filed: January 30, 2003
    Publication date: September 25, 2003
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: William Elton Burky, Dung Quoc Nguyen, Balaram Sinharoy, Albert Thomas Williams
  • Publication number: 20030177337
    Abstract: A computer system comprising a data file having entries each of which is designed to hold data, an advanced and a completed mapping file each having entries each of which is designed to hold a data-file-entry address, an operation window that is a buffer to hold substances of operations waiting execution, and a state-modification queue that is designed to be able to hold a substance of a modification on the advanced mapping file for each clock cycle; wherein making a modification on the advanced mapping file, entering the substance of this modification into the state-modification queue, and entering substances of operations into the operation window are each to be done in one clock cycle, and operations held in the operation window are to be executed out of order. The system can attain high performance easily and utilize programs described in any machine language for traditional register-based/stack-based processors.
    Type: Application
    Filed: February 25, 2003
    Publication date: September 18, 2003
    Inventor: Hajime Seki