Commitment Control Or Register Bypass Patents (Class 712/218)
-
Patent number: 6732251Abstract: A processor or processor core has register file circuitry having a plurality of physical registers and a plurality of tag storing portions corresponding respectively to the physical registers. Each tag storing portion stores a tag representing a logical register ID allocated to the corresponding physical register. A register selection unit receives a logical register ID and selects one of the logical registers whose tag matches the received logical register ID. A tag changing unit changes the stored tags so as to change a mapping between at least one logical register ID and one of the physical registers. Such register circuitry permits a mapping between logical register IDs and physical registers to be changed quickly efficiently and can permit a desired physical register to be selected quickly.Type: GrantFiled: November 1, 2001Date of Patent: May 4, 2004Assignee: PTS CorporationInventors: Jonathan Michael Harris, Adrian Philip Wise, Nigel Peter Topham
-
Publication number: 20040083351Abstract: A processor includes a memory execution unit for executing load and store instructions and a replay system for replaying instructions which have not executed properly. The memory execution unit including an invalid store flag that is set for a store instruction if the replay system detects that the store instruction has not executed properly and is cleared if the store instruction has executed properly. If an invalid store flag is set for a store instruction, the replay system replays load instructions which are programmatically younger than the invalid store instruction until the store instruction executes properly.Type: ApplicationFiled: October 23, 2003Publication date: April 29, 2004Inventors: Amit A. Merchant, Darrell D. Boggs, David J. Sager
-
Patent number: 6728869Abstract: A method and apparatus for avoiding latency in a processing system that includes a memory for storing intermediate results is presented. The processing system stores results produced by an operation unit in memory, where the results may be used by subsequent dependent operations. In order to avoid the latency of the memory, the output for the operation unit may be routed directly back into the operation unit as a subsequent operand. Furthermore, one or more memory bypass registers are included such that the results produced by the operation unit during recent operations that have not yet satisfied the latency requirements of the memory are also available. A first memory bypass register may thus provide the result of an operation that completed one cycle earlier, a second memory bypass register may provide the result of an operation that completed two cycles earlier, etc.Type: GrantFiled: April 21, 2000Date of Patent: April 27, 2004Assignee: ATI International SrlInventors: Michael Andrew Mang, Michael Mantor, Robert Scott Hartog
-
Patent number: 6728865Abstract: Instructions asserted in a microprocessors instruction pipeline (3) are accompanied by control information, comprising a group of bits, asserted within a control information pipeline (5) that is synchronized to the instruction pipeline. At the execution stage, the control information is interpreted and appropriate action taken. The control information may indicate that the instruction has been reasserted (asserted again following an initial assertion) and may also indicate the number of times that the instruction has been consecutively asserted in the instruction pipeline. Applied to unaligned memory operations, in which a memory atom is asserted twice, the control information indicates which part of the unaligned data is to be fetched each time the atom is executed.Type: GrantFiled: October 20, 1999Date of Patent: April 27, 2004Assignee: Transmeta CorporationInventors: Brett Coon, Godfrey D'Souza, Paul Serris
-
Patent number: 6728873Abstract: Disclosed is a method of operation within a processor, that enhances speculative branch processing. A speculative execution path contains an instruction sequence that includes a barrier instruction followed by a load instruction. While a barrier operation associated with the barrier instruction is pending, a load request associated with the load instruction is speculatively issued to memory. A flag is set for the load request when it is speculatively issued and reset when an acknowledgment is received for the barrier operation. Data which is returned by the speculatively issued load request is temporarily held and forwarded to a register or execution unit of the data processing system after the acknowledgment is received. All process results, including data returned by the speculatively issued load instructions are discarded when the speculative execution path is determined to be incorrect.Type: GrantFiled: June 6, 2000Date of Patent: April 27, 2004Assignee: International Business Machines CorporationInventors: Guy Lynn Guthrie, Ravi Kumar Arimilli, John Steven Dodson, Derek Edward Williams
-
Patent number: 6728868Abstract: The present invention generally relates to a processing system and method for coalescing instruction data to efficiently detect data hazards between instructions of a computer program. In architecture, the system of the present invention utilizes a plurality of pipelines, coalescing circuitry, and hazard detection circuitry. The plurality of pipelines is configured to process instructions of a computer program, and the coalescing circuitry is configured to receive, from the pipelines, a plurality of register identifiers identifying a plurality of registers. The coalescing circuitry is configured to coalesce said register identifiers thereby generating a coalesced register identifier identifying each of said plurality of registers. The hazard detection circuitry is configured to receive the coalesced register identifier and to perform a comparison of the coalesced register identifier with other information received from the pipelines.Type: GrantFiled: October 28, 2002Date of Patent: April 27, 2004Assignee: Hewlett-Packard Development Company, L.P.Inventors: Ronny Lee Arnold, Donald Charles Soltis, Jr.
-
Publication number: 20040078728Abstract: One embodiment of the present invention provides a system that corrects bit errors in temporary results within a central processing unit (CPU). During operation, the system receives a temporary result during execution of an in-flight instruction. Next, the system generates a parity bit for the temporary result, and stores the temporary result and the parity bit in a temporary register within the CPU. Before the temporary result is committed to the architectural state of the CPU, the system checks the temporary result and the parity bit to detect a bit error. If a bit error is detected, the system performs a micro-trap operation to re-execute the instruction that generated the temporary result, thereby regenerating the temporary result. Otherwise, if a bit error is not detected, the system commits the temporary result to the architectural state of the CPU.Type: ApplicationFiled: May 14, 2002Publication date: April 22, 2004Inventors: Marc Tremblay, Shailender Chaudhry, Quinn A. Jacobson
-
Patent number: 6721874Abstract: A method and system for utilizing a completion table in a superscalar processor is disclosed. The method and system comprises providing a plurality of threads to the processor and associating a link list with each of the threads, wherein each entry associated with a thread is linked to a next entry. A method and system in accordance with the present invention implements the completion table as link lists. Each entry in the completion table in a thread is linked to the next entry via a pointer that is stored in a link list. In a second aspect a method of determining the relative order between instructions is provided. A method and system in accordance with the present invention implements a flush mask array which is accessed to determine the relative order of entries in the said completion table. A method and system in accordance with the present invention implements a restore head pointer table to save and restore the state of the pointer of said completion table.Type: GrantFiled: October 12, 2000Date of Patent: April 13, 2004Assignee: International Business Machines CorporationInventors: Hung Qui Le, Peichun Liu, Balaram Sinharoy
-
Publication number: 20040064677Abstract: A data processor includes program registers with individual byte-location write enables. Bypass networks allow a precision pipeline to respond to read requests by accessing a program register or pipeline stage on a byte-by-byte basis. The data processor can thus write to individual byte locations without overwriting other byte locations within the same register. The data processor has an instruction set with instructions that combine two operands and yield a one-byte result that is stored in a specified byte location of a specified result register. Eight instances of this instruction can pack eight results into a single 64-bit result register without additional packing instructions and without using a read port to read the result register before writing to it. As plural functional units can write concurrently to different subwords of the same result register, a system with four functional units can pack eight results into a result register in two instruction cycles.Type: ApplicationFiled: September 30, 2002Publication date: April 1, 2004Inventor: Dale Morris
-
Publication number: 20040064680Abstract: One embodiment of the present invention provides a system that reduces the time required to access registers from a register file within a processor. During operation, the system receives an instruction to be executed, wherein the instruction identifies at least one operand to be accessed from the register file. Next, the system looks up the operands in a register pane, wherein the register pane is smaller and faster than the register file and contains copies of a subset of registers from the register file. If the lookup is successful, the system retrieves the operands from the register pane to execute the instruction. Otherwise, if the lookup is not successful, the system retrieves the operands from the register file, and stores the operands into the register pane. This triggers the system to reissue the instruction to be executed again, so that the re-issued instruction retrieves the operands from the register pane.Type: ApplicationFiled: September 26, 2002Publication date: April 1, 2004Inventors: Sudarshan Kadambi, Adam R. Talcott, Wayne I. Yamamoto
-
Publication number: 20040054875Abstract: A method for executing an instruction with a semi-fast operation in a staggered ALU. The method of one embodiment comprises generating a first operation and a second operation from a micro-instruction. The first and second operations are scheduled for execution in a staggered arithmetic logic unit (ALU). The first and second operations are separated by N clock cycles. Data from the first operation is communicated to the second operation for use with execution of the second operation.Type: ApplicationFiled: September 13, 2002Publication date: March 18, 2004Inventor: Ross A. Segelken
-
Publication number: 20040054876Abstract: The present invention provides an apparatus and method for synchronizing a first pipeline and a second pipeline of a processor arranged to execute a sequence of instructions. The processor is arranged to route an instruction in the sequence through either the first or the second pipeline dependent on predetermined criteria, each pipeline having a plurality of pipeline stages including a retirement stage. Counter logic is provided for maintaining a first counter relating to the first pipeline and a second counter relating to the second pipeline. For each instruction in the first pipeline a determination is made as to when that instruction reaches a point within the first pipeline where an exception status of that instruction is resolved, and the counter logic is arranged to increment the first counter responsive to such determination.Type: ApplicationFiled: September 13, 2002Publication date: March 18, 2004Inventors: Richard Roy Grisenthwaite, Ian Victor Devereux
-
Patent number: 6708269Abstract: In a multi-threaded system, such as in a multi-processor system, different types of fences are provided to force completion of programmatically earlier instructions in a program. The types of fences can be thread-specific, and different types of fences are used based on different kinds of conditions, instructions, operations, or memory types. When a fence is executed, senior stores, request buffers, bus queues, or any combination of these stages in an execution pipeline can be drained. Fetches at a front end of the pipeline can also be killed to ensure that the bus queue can be drained.Type: GrantFiled: December 30, 1999Date of Patent: March 16, 2004Assignee: Intel CorporationInventors: Keshavan K. Tiruvallur, Douglas M. Carmean, Robert J. Greiner, Muntaquim Chowdhury, Madhavan Parthasarathy
-
Patent number: 6704856Abstract: A method of compacting an instruction queue in an out of order processor includes determining the number of invalid instructions below and including each row in the queue, by counting invalid bits or validity indicators associated with rows below and up to the current row. For each row, multiplexor select signals are generated from the flat vector counts for the N rows above and including the present row, and from the validity indicators associated with the N rows, where N is a predetermined value. A multiplexor associated with a particular row selects one of the N rows according to the select value, and moves or passes the instruction held in the selected row to the present row. A row's select value is determined by forming a diagonal from the N count vectors corresponding to the N rows above and including the present row, and logically ANDing, each diagonal bit with the valid bit associated with the same row. Each row's count vector is determined in two stages.Type: GrantFiled: December 17, 1999Date of Patent: March 9, 2004Assignee: Hewlett-Packard Development Company, L.P.Inventors: James A. Farrell, Timothy C. Fischer, Daniel L. Leibholz, Bruce A. Gieseke
-
Publication number: 20040044881Abstract: In an embodiment, the present invention describes a method and apparatus for detecting RAW condition earlier in an instruction pipeline. The store instructions are stored in a special store bypass buffer (SBB) within an instruction decode unit (IDU). The IDU compares the instruction fields that are used for address generation of all ‘load’ instructions against ‘store’ instructions within a group of fetched instructions and ‘store’ instructions previously stored in the SBB. If a match of instruction fields is found, the IDU ‘speculates’ that the load instruction has dependency on the ‘store’ instruction. A data cache unit (DCU) validates the dependency of the load instruction ‘speculated’ by the IDU. If a false dependency is ‘speculated’ by the IDU, the DCU forces a re-fetch of the load instruction.Type: ApplicationFiled: August 28, 2002Publication date: March 4, 2004Applicant: Sun Microsystems, Inc.Inventors: Robert M. Maier, Sorin Iacobovici, Rabin Sugumar, Robert Nuckolls, Ali Vahidsafa, Chandra M. R. Thimmannagari
-
Publication number: 20040044882Abstract: A multi-port register file may be selectively bypassed such that any element in a result vector is bypassed to the same index of an input vector of a succeeding operation when the element is requested in the succeeding operation in the same index as it was generated. Alternatively, the results to be placed in a register file may be bypassed to a succeeding operation when the N elements that dynamically compose a vector are requested as inputs to the next operation exactly in the same order as they were generated. That is, for the purposes of bypassing, the N vector elements are treated as a single entity. Similar rules apply for the write-through path.Type: ApplicationFiled: August 29, 2002Publication date: March 4, 2004Applicant: International Business Machines CorporationInventors: Sameh Asaad, Jaime H. Moreno, Victor Zyuban
-
Patent number: 6701425Abstract: A computer system with parallel execution pipelines and a memory access controller has store address queues holding addresses for store operations, store data queues holding a plurality of data for storing in the memory and load address storage holding addresses for load operations, said access controller including comparator circuitry to compare load addresses received by the controller with addresses in the store address queue and locate any addresses which are the same, each of said addresses including a first set of bits representing a word address together with a second set of byte enable bits and said comparator having circuitry to compare the byte enable bits of two addresses as well as said first set of bits.Type: GrantFiled: May 2, 2000Date of Patent: March 2, 2004Assignee: STMicroelectronics S.A.Inventors: Ahmed Dabbagh, Nicolas Grossier, Bruno Bernard, Pierre-Yves Taloud
-
Patent number: 6701427Abstract: A data processing apparatus for processing floating point instructions is responsive to a floating point instruction to apply a floating point operation to a number of operands to produce a final result, result data being generated during a predetermined pipelined stage with further processing then being performed on the result data in one or more subsequent pipelined stages to generate the final result. Exception determination logic determines whether an exception may occur during application of the floating point operation to the operands, and to prevent the execution unit applying the floating point operation to those operands if it is determined that an exception may occur. The exception determination logic is arranged to use at least some of the predetermined control data to compensate for differences between the forwarded result data and the final result relevant when determining whether an exception may occur when processing the second floating point instruction.Type: GrantFiled: December 22, 1999Date of Patent: March 2, 2004Assignee: ARM LimitedInventors: Christopher Neal Hinds, Arun Kumar Varadarajan Rajagopal
-
Publication number: 20040039898Abstract: A processor (50) operable in response to an instruction set comprising a plurality of instructions. The processor comprises a functional unit (52) comprising an integer number S of sub-units (541, 542, 543), wherein S is greater than one. Each of the sub-units is operable to execute, during an execution cycle, at least one of the instructions in the instruction set in response to at least two data arguments (A, B). The processor further comprises circuitry (58A1, 58A2, 58A3, 58B1, 58B2) for providing an updated value of the at least two data arguments to less than all S of the sub-units for a single execution cycle.Type: ApplicationFiled: August 20, 2002Publication date: February 26, 2004Applicant: TEXAS INSTRUMENTS INCORPORATEDInventor: Patrick W. Bosshart
-
Publication number: 20040034762Abstract: A mispredicted path side memory is configured to be coupled to a stage in an instruction pipeline. As instructions advance through the pipeline, a result from the stage is stored into the mispredicted path side memory. The result is restored from the mispredicted path side memory into a pipeline stage when a branch is mispredicted.Type: ApplicationFiled: August 19, 2003Publication date: February 19, 2004Inventor: Nicolas I. Kacevas
-
Patent number: 6691222Abstract: A system and method of executing instructions within a counterflow pipeline processor. The counterflow pipeline processor includes an instruction pipeline, a data pipeline, a reorder buffer and a plurality of execution units. An instruction and one or more operands issue into the instruction pipeline and a determination is made at one of the execution units whether the instruction is ready for execution. If so, the operands are loaded into the execution unit and the instruction executes. The execution unit is monitored for a result and, when the result arrives, it is stored into the result pipeline. If the instruction reaches the end of the pipeline without executing it wraps around and is sent down the instruction pipeline again.Type: GrantFiled: March 18, 2003Date of Patent: February 10, 2004Assignee: Intel CorporationInventors: Kenneth J. Janik, Shih-Lien L. Lu, Michael F. Miller
-
Publication number: 20040024993Abstract: An apparatus and method for maintaining a floating point data segment selector are described. In one embodiment, the method includes the detection of a micro-operation of a memory referencing macro-instruction from one or more micro-operations to be retired during a system clock cycle. When the detected micro-operation triggers an event, a micro-code event handler is triggered to initiate an update of a floating point data segment selector information associated with the detected micro-operation. Otherwise, FDS update device is triggered to update the floating point data segment selector information associated with the detected micro-operation.Type: ApplicationFiled: August 5, 2002Publication date: February 5, 2004Inventor: Rajesh S. Parthasarathy
-
Patent number: 6687809Abstract: An apparatus in a first processor includes a first data structure to store addresses of store instruction dispatched during a last predetermined number of cycles. The apparatus further includes logic to determine whether a load address of a load instruction being executed matches one of the store addresses in the first data structure. The apparatus still further includes logic to replay to the respective load instruction if the load address of the respective load instruction matches of the store addresses in the first data structure.Type: GrantFiled: October 24, 2002Date of Patent: February 3, 2004Assignee: Intel CorporationInventors: Muntaquim F. Chowdhury, Douglas M. Carmean
-
Publication number: 20040015679Abstract: A mechanism is provided for execution of an instruction having one or more parameters that need to be resolved at runtime. Instructions being executed may be stored in non-rewritable storage. The present invention allows costly parameter resolution to be circumvented during subsequent executions of the same instruction. An interpreter invokes an optimization module when it encounters an instruction with one or more associated parameters that need to be resolved at runtime. If the optimization module determines that resolved values associated with the instruction are available in a cache, then optimization module obtains resolved values associated with the instruction from the cache. Resolving parameters into their corresponding object references is time-consuming and utilizes valuable computer resources. By obtaining resolved values stored during a previous execution of an instruction, the optimization module avoids repeatedly resolving parameters associated with an instruction.Type: ApplicationFiled: July 17, 2002Publication date: January 22, 2004Inventor: Ioi K. Lam
-
Patent number: 6681320Abstract: Causality-based memory ordering in a multiprocessing environment. A disclosed embodiment includes a plurality of processors and arbitration logic coupled to the plurality of processors. The processors and arbitration logic maintain processor consistency yet allow stores generated in a first order by any two or more of the processors to be observed consistent with a different order of stores by at least one of the other processors. Causality monitoring logic coupled to the arbitration logic monitors any causal relationships with respect to observed stores.Type: GrantFiled: December 29, 1999Date of Patent: January 20, 2004Assignee: Intel CorporationInventor: Deborah T. Marr
-
Patent number: 6681322Abstract: Methods for emulating an instruction set extension, comprising providing data to be operated upon, executing a first instruction with respect to a first portion of the data without committing the results of the first executed instruction, if no unmasked exceptions occur with respect to the first portion of the data, executing a second instruction with respect to a second portion of the data, and if no unmasked exceptions occur with respect to the second portion of the data, committing the results of the second executed instruction and again executing the first instruction with respect to the first portion of the data. If the first instruction is executed again, its results are committed. A handler is invoked if an unmasked exception occurs.Type: GrantFiled: November 26, 1999Date of Patent: January 20, 2004Assignee: Hewlett-Packard Development Company L.P.Inventors: Kevin David Safford, Patrick Knebel
-
Publication number: 20040006684Abstract: An instruction execution apparatus comprising a register 43 for storing a copy of contents of the maximum number of entries that are executable simultaneously in one cycle with the entry storing the oldest unreleased instruction at the head among all entries in an instruction storage device 42 after execution of the instructions, a completion condition determination section 44 for determining whether the instructions stored in the entries of the register are completed in the cycle for determining completion conditions of the entries in the instruction storage device, and an entry release section 45 for releasing only the entries that are determined to be completed by the completion condition determination section among all entries in the instruction storage device, which allows the entries in the CSE to be released smoothly even though the number of entries in the CSE, or clock frequency, is increased.Type: ApplicationFiled: December 31, 2002Publication date: January 8, 2004Applicant: FUJITSU LIMITEDInventors: Yasunobu Akizuki, Aiichiro Inoue
-
Publication number: 20040006686Abstract: A latest register update buffer which stores latest register update data is allocated and prepared every general register for storing source data. A latest register update processing unit stores a value in the general register as latest register update data into the latest register update buffer when a register update instruction is not speculatively executed, and overwrites a result of the speculative execution when the instruction is speculatively executed. Upon instruction decoding, a matching processing unit reads out the latest register update data from the latest register update allocation buffer and stores it into a data area in a reservation station.Type: ApplicationFiled: January 21, 2003Publication date: January 8, 2004Applicant: Fujitsu LimitedInventor: Toshio Yoshida
-
Publication number: 20040006685Abstract: When a predetermined instruction is fetched and decoded, an instruction issuing unit develops the instruction operation into a multiflow of a previous flow and a following flow and issues the instruction by in-order. It is held into a reservation station. An instruction executing unit executes the instruction held in the reservation station by out-of-order. Further, an execution result of the instruction is committed by in-order. A multiflow guarantee processing unit guarantees an execution result of the previous flow stored in an allocation register on a register update buffer until the following flow is committed. Even if the previous flow is committed and the allocation register is released, the guaranteeing process is realized by stalling another instruction serving as a next register allocation destination in a decoding cycle until the following flow is committed.Type: ApplicationFiled: January 21, 2003Publication date: January 8, 2004Applicant: Fujitsu LimitedInventor: Toshio Yoshida
-
Patent number: 6675288Abstract: A technique for managing register assignments. The technique involves maintaining, in a register list memory circuit having entries that respectively correspond to physical registers, a list of register assignments that assign logical registers to the physical registers. The technique further involves maintaining, in a vector memory circuit having bits that respectively correspond to the physical registers, a valid vector that forms, in combination with the list of register assignments, a list of valid register assignments. Furthermore, the technique involves storing, for an instruction that is mapped by the data processor, a copy of the valid vector from the vector memory circuit to a silo memory circuit. Preferably, the processor using the technique has the ability to execute branches of instructions speculatively, and to recover if it is determined that the processor executed down an incorrect instruction branch.Type: GrantFiled: May 9, 2002Date of Patent: January 6, 2004Assignee: Hewlett-Packard Development Company L.P.Inventors: James Arthur Farrell, Sharon Marie Britton, Harry Ray Fair, III, Bruce Gieseke, Daniel Lawrence Leibholz, Derrick R. Meyer
-
Publication number: 20040003207Abstract: A program counter control method controls instructions by an out-of-order method using a branch prediction mechanism and controls an architecture having delay instructions for branching. The method includes the steps of simultaneously committing a plurality of instructions including a branch instruction, when a branch prediction is successful and the branch instruction branches, and simultaneously updating a program counter and a next program counter depending on a number of committed instructions.Type: ApplicationFiled: January 28, 2003Publication date: January 1, 2004Applicant: FUJITSU LIMITEDInventors: Ryuichi Sunayama, Kuniki Morita, Aiichiro Inoue
-
Publication number: 20040003206Abstract: A re-configurable, streaming vector processor (100) is provided which includes a number of function units (102), each having one or more inputs for receiving data values and an output for providing a data value, a re-configurable interconnection switch (104) and a micro-sequencer (118). The re-configurable interconnection switch (104) includes one or more links, each link operable to couple an output of a function unit (102) to an input of a function unit (102) as directed by the micro-sequencer (118). The vector processor may also include one or more input-stream units (122) for retrieving data from memory. Each input-stream unit is directed by a host processor and has a defined interface (116) to the host processor. The vector processor also includes one or more output-stream units (124) for writing data to memory or to the host processor. The defined interface of the input-stream and output-stream units forms a first part of the programming model.Type: ApplicationFiled: June 28, 2002Publication date: January 1, 2004Inventors: Philip E. May, Kent Donald Moat, Raymond B. Essick, Silviu Chiricescu, Brian Geoffrey Lucas, James M. Norris, Michael Allen Schuette, Ali Saidi
-
Publication number: 20030229772Abstract: A register window fill technique for a retirement window having an entry size less than a number of fill instructions used in a fill condition is provided. The technique uses modified fill instructions that allow the retirement window to retire a portion of the fill instructions without having to determine whether a remaining portion of the fill instructions will execute without exceptions.Type: ApplicationFiled: June 7, 2002Publication date: December 11, 2003Inventors: Chandra Thimmanagari, Sorin Iacobovici, Rabin Sugumar, Robert Nuckolls
-
Publication number: 20030229771Abstract: A register window spill technique for an retirement window having an entry size less than a number of spill instructions used in a spill condition is provided. The technique uses modified spill instructions that allow the retirement window to retire a portion of the spill instructions without having to determine whether a remaining portion of the spill instructions will execute without exceptions.Type: ApplicationFiled: June 7, 2002Publication date: December 11, 2003Inventors: Chandra Thimmanagari, Sorin Iacobovici, Rabin Sugumar, Robert Nuckolls
-
Publication number: 20030226000Abstract: A collapsible pipeline structure, suitable for use in a microprocessor. The contains a first pipeline stage, under control by a clock to export a sequence of instruction stage results with respect to a clock cycle of the clock. A bypassing storage unit receives the sequence of instruction stage results and, when operating in collapsed mode, forwards that sequence onto the subsequent pipeline stage, bypassing the storage unit through a mutiplexer. A second pipeline stage receives the output from the bypassing storage unit, and exports its instruction stage results under control of the clock. Wherein if the collapsing function of the bypassing storage unit is disabled, then the instruction pipeline functions in the conventional manner.Type: ApplicationFiled: May 30, 2002Publication date: December 4, 2003Inventor: Mike Rhoades
-
Patent number: 6654876Abstract: A method, processor, and data processing system implementing a delayed reject mechanism are disclosed. The processor includes an issue unit suitable for issuing an instruction in a first cycle and a load store unit (LSU). The LSU includes an extend reject calculator circuit configured to receive a set of completion information signals and generate a delay value based thereon. The LSU is adapted to determine whether to reject the instruction in a determination cycle. The number of cycles between the first cycle and the determination cycle is a function of the delay value such that reject timing is variable with respect to the first cycle. In one embodiment, the processor is further configured to reissue the instruction after the determination cycle if the instruction was rejected in the determination cycle. The delay value is conveyed via a 2-bit bus in one embodiment. The 2 bit bus permits delaying the determination cycle from 0 to 3 cycles after a finish cycle.Type: GrantFiled: November 4, 1999Date of Patent: November 25, 2003Assignee: International Business Machines CorporationInventors: Hung Qui Le, David James Shippy
-
Patent number: 6654869Abstract: A microprocessor includes a fetch unit, an instruction cracking unit, and dispatch and completion control logic. The fetch unit retrieves a set of instructions from an instruction cache. The instruction cracking unit receives the set of fetched instructions and organizes the set of instructions into an instruction group. The dispatch and completion logic assigns a group tag to the instruction group and records the group tag in an entry of the completion table for tracking the completion status of the instructions comprising the instruction group. The dispatch and control logic may record a single instruction address in the completion table entry corresponding to the each instruction group. Preferably, the single instruction address is the instruction address of the first instruction in the instruction group. The processor may flush the instruction group in response to detecting an exception generated by an instruction in the instruction group.Type: GrantFiled: October 28, 1999Date of Patent: November 25, 2003Assignee: International Business Machines CorporationInventors: James Allan Kahle, Hung Qui Le, Charles Roberts Moore
-
Patent number: 6643767Abstract: The present invention is related to a processor capable of speculatively executing an instruction having a data dependence upon a preceding instruction in order to improve the efficiency of dynamically scheduling instructions. The reissue of instructions is possible without lowering the efficiency of the instruction scheduling process by dividing the function of scheduling instructions and the function of the reissue of instructions.Type: GrantFiled: January 27, 2000Date of Patent: November 4, 2003Assignee: Kabushiki Kaisha ToshibaInventor: Toshinori Sato
-
Publication number: 20030200422Abstract: A parallel processor has a plurality of operation units that execute operation instructions, and a multi-bank register file in which a plurality of banks each having a plurality of registers are formed. Each of simultaneously input machine instructions is split into a plurality of nano-instructions each of which includes at least one of an access instruction and operation instruction. The output clock cycles of operation instructions with respect to the operation units are arbitrated. Furthermore, the output clock cycles of access instructions to the multi-bank register file are arbitrated so as to prevent access instructions from contending in an identical bank in the multi-bank register file.Type: ApplicationFiled: February 18, 2003Publication date: October 23, 2003Applicant: Semiconductor Technology Academic Research CenterInventors: Tetsuo Hironaka, Mattausch Hans Juergen, Takeshi Hiramatsu
-
Publication number: 20030196075Abstract: A memory disambiguation apparatus includes a store queue, a store forwarding buffer, and a version count buffer. The store queue includes an entry for each store instruction in the instruction window of a processor. Some store queue entries include resolved store addresses, and some do not. The store forwarding buffer is a set-associative buffer that has entries allocated for store instructions as store addresses are resolved. Each entry in the store forwarding buffer is allocated into a set determined in part by a subset of the store address. When the set in the store forwarding buffer is full, an older entry in the set is discarded in favor of the newly allocated entry. A version count buffer including an array of overflow indicators is maintained to track overflow occurrences. As load addresses are resolved for load instructions in the instruction window, the set-associative store forwarding buffer can be searched to provide memory disambiguation.Type: ApplicationFiled: May 15, 2003Publication date: October 16, 2003Applicant: Intel CorporationInventors: Haitham Akkary, Sehastien Hily
-
Publication number: 20030196074Abstract: A method, processor architecture, computer program product, and data processing system for determining when an instruction in a pipelined processor should be completed is provided. As each instruction is issued to an execution unit, an entry for that instruction is placed within a “finish pipe,” which consists of a series of consecutively numbered stages. Each clock cycle, the entries in the finish pipe advance one stage. When an entry has reached the stage corresponding to the latency of its associated execution unit, it becomes mature.Type: ApplicationFiled: April 11, 2002Publication date: October 16, 2003Applicant: International Business Machines CorporationInventors: Hung Qui Le, Dung Quoc Nguyen
-
Publication number: 20030196073Abstract: An apparatus is presented for expediting the execution of address-dependent micro instructions in a pipeline microprocessor. The apparatus computes a speculative result associated with an arithmetic operation, where the arithmetic operation is prescribed by a preceding micro instruction that is yet to generate a result. The apparatus utilizes the speculative result to configure a speculative address operand that is provided to an address-dependent micro instruction The apparatus includes speculative operand calculation logic and an update forwarding cache. The speculative operand calculation logic performs the arithmetic operation to generate the speculative result prior to when execute logic executes the preceding micro instruction to generate the result.Type: ApplicationFiled: May 5, 2003Publication date: October 16, 2003Applicant: IP-First LLCInventor: Gerard M. Col
-
Mechanism for forward data in a processor pipeline using a single pipefile connected to the pipeline
Patent number: 6633971Abstract: A method for forwarding data within a pipeline of a pipelined data processor having a plurality of execution pipeline stages where each stage accepts a plurality of operand inputs and generates a result. The result generated by each execution pipeline stage is selectively coupled to an operand input of one of the execution pipeline stages.Type: GrantFiled: October 1, 1999Date of Patent: October 14, 2003Assignee: Hitachi, Ltd.Inventors: Chih-Jui Peng, Lew Chua-Eoan -
Patent number: 6633970Abstract: A mechanism is provided for allowing a processor to recover from a failure of a predicted path of instructions (e.g., from a mispredicted branch or other event). The mechanism includes a plurality of physical registers, each physical register can store either architectural data or speculative data. The apparatus also includes a primary array to store a mapping from logical registers to physical registers, the primary array storing a speculative state of the processor. The apparatus also includes a buffer coupled to the primary array to store information identifying which physical registers store architectural data and which physical registers store speculative data. According to another embodiment, a history buffer is coupled to the secondary array and stores historical physical register to logical register mappings performed for each of a plurality of instructions part of a predicted path.Type: GrantFiled: December 28, 1999Date of Patent: October 14, 2003Assignee: Intel CorporationInventors: David W. Clift, Darrell D. Boggs, David J. Sager
-
Publication number: 20030188133Abstract: A microprocessor apparatus and method are provided, for selectively controlling write back of condition codes. The microprocessor apparatus has translation logic and extended execution logic. The translation logic translates an extended instruction into corresponding micro instructions. The extended instruction includes an extended prefix and an extended prefix tag. The extended prefix disables write back of the condition codes, where the condition codes correspond to a result of a prescribed operation. The extended prefix tag indicates the extended prefix, where the extended prefix tag is an otherwise architecturally specified opcode within an instruction set for a microprocessor. The extended execution logic is coupled to the translation logic. The extended execution logic receives the corresponding micro instructions, and generates the result, and disables write back of the condition codes.Type: ApplicationFiled: May 9, 2002Publication date: October 2, 2003Applicant: IP-First LLCInventors: G. Glenn Henry, Rodney E. Hooker, Terry Parks
-
Patent number: 6629170Abstract: A multi-stage byte lane selectable bus. In a preferred embodiment, the bus in performance monitor mode includes a plurality of byte lanes and a selection mechanism. The selection mechanism acquires, from a plurality of signals, a subset of those signals, which are desired to be monitored, and places this subset of signals on the byte lanes that are input to the PMU. The number of the plurality of signals that potentially may be monitored is greater than the number of byte lanes and is also greater than the number of PMU counters.Type: GrantFiled: November 8, 1999Date of Patent: September 30, 2003Assignee: International Business Machines CorporationInventors: Joel Roger Davidson, Michael Stephen Floyd, Paul Joseph Jordan, Judith E. K. Laurens, Alexander Erik Mericas, Kevin F. Reick
-
Patent number: 6629167Abstract: An apparatus for and a method of decoupling at least two multi-stage pipelines are described. At least two paths of data through which data from the first pipeline is send to the second pipeline are provided. During a pipelined execution of a task in the at least two pipelines, the second pipeline may not require every data produced in the first pipeline to process at least some subset of the task. The first pipeline may not be able to produce all data required by each of the stages of the second pipeline. One of the two data paths provides an early data path for a type of data that becomes available in a stage of the first pipeline and that may be processed in a stage of the second pipeline early in time. The other of the two data paths provides a late data path for a type of data that becomes available in a stage of the first pipeline and that may be processed in a stage of the second pipeline later in time. Each data path may comprise a buffer, e.g., a FIFO.Type: GrantFiled: February 18, 2000Date of Patent: September 30, 2003Assignee: Hewlett-Packard Development Company, L.P.Inventors: Stephen Undy, James E. McCormick, Jr.
-
Patent number: 6629233Abstract: A method, processor, and data processing system for enabling maximum instruction issue despite the presence of complex instructions that require multiple rename registers is disclosed. The method includes allocating a first rename register from a first reorder buffer for storing the contents of a first register affected by the complex instruction. A second rename register from a second reorder buffer is then allocated for storing the contents of a second register affected by the complex instruction. In an embodiment in which the first reorder buffer supports a maximum number of allocations per cycle, the allocation of the second register using the second reorder buffer prevents the complex instruction from requiring multiple allocation slots in the first reorder buffer. The method may further include issuing a second instruction that contains a dependency on a register that is allocated in the secondary reorder buffer.Type: GrantFiled: February 17, 2000Date of Patent: September 30, 2003Assignee: International Business Machines CorporationInventor: James Allan Kahle
-
Publication number: 20030182540Abstract: A method of handling instructions in a load/store unit of a processor by dispatching instructions to the load/store unit, filling a portion of physical entries of a reorder queue with tags corresponding to the instructions while limiting usage of the physical entries of the reorder queue to less than a total number of physical entries, and further dispatching one or more additional instructions to the load/store unit while the filled physical entries in the reorder queue are still full, i.e., still contain tags for uncompleted instructions. The limiting of usage of the physical entries may be selectively applied. Multiple logical instruction tags are assigned in a count greater than the number of physical entries in the reorder queue. Of the multiple logical instruction tags assigned to a single one of the physical entries in the reorder queue, only the tag for the oldest instruction is allowed to execute.Type: ApplicationFiled: January 30, 2003Publication date: September 25, 2003Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: William Elton Burky, Dung Quoc Nguyen, Balaram Sinharoy, Albert Thomas Williams
-
Publication number: 20030177337Abstract: A computer system comprising a data file having entries each of which is designed to hold data, an advanced and a completed mapping file each having entries each of which is designed to hold a data-file-entry address, an operation window that is a buffer to hold substances of operations waiting execution, and a state-modification queue that is designed to be able to hold a substance of a modification on the advanced mapping file for each clock cycle; wherein making a modification on the advanced mapping file, entering the substance of this modification into the state-modification queue, and entering substances of operations into the operation window are each to be done in one clock cycle, and operations held in the operation window are to be executed out of order. The system can attain high performance easily and utilize programs described in any machine language for traditional register-based/stack-based processors.Type: ApplicationFiled: February 25, 2003Publication date: September 18, 2003Inventor: Hajime Seki