Simultaneous Parallel Fetching Or Executing Of Both Branch And Fall-through Path Patents (Class 712/235)
  • Patent number: 6714961
    Abstract: The invention is directed toward a multiprocessing system having multiple processing units. For at least one of the processing units in the multiprocessing system, a first job signal is assigned to the processing unit for speculative execution of a corresponding first job, and a further job signal is assigned to the processing unit for speculative execution of a corresponding further job. The speculative execution of said further job is initiated when the processing unit has completed execution of the first job. If desirable, even more job signals may be assigned to the processing unit for speculative execution. In this way, multiple job signals are assigned to the processing units of the processing system, and the processing units are allowed to execute a plurality of jobs speculatively while waiting for commit priority.
    Type: Grant
    Filed: November 12, 1999
    Date of Patent: March 30, 2004
    Assignee: Telefonaktiebolaget LM Ericsson (publ)
    Inventors: Per Anders Holmberg, Terje Egeland, Nils Ola Linnermark, Karl Oscar Joachim Strömbergson, Magnus Carlsson
  • Publication number: 20040059898
    Abstract: Various methods, apparatuses, and systems in which a processor includes an issue engine and an in-order execution pipeline. The issue engine categorizes operations as at least one of either a speculative operation which perform computations or an architectural operation which has potential to fault or cause an exception. Each architectural operation issues with an associated architectural micro-operation. A first micro-operation checks whether a first speculative operation is dependent upon an intervening first architectural operation. The in-order execution pipeline executes the speculative operation, the architectural operation, and the associated architectural micro-operations.
    Type: Application
    Filed: September 19, 2002
    Publication date: March 25, 2004
    Inventors: Jeffery J. Baxter, Gary N. Hammond, Nazar A. Zaidi
  • Publication number: 20040059896
    Abstract: A method and apparatus are provided for implementing two-tiered thread state multithreading support with a high clock rate. A first tier thread state storage stores a limited number of runnable thread register states. The limited number is less than a threshold value. Next thread selection logic coupled between the first tier thread state storage and a currently executing processor state, picks a next thread to run on a processor from the limited number of runnable thread register states. A second tier thread storage facility stores a second number of thread states that is greater than the limited number of runnable thread register states. A runnable thread selection logic coupled between the first tier thread state storage and the second tier thread storage facility, selectively exchanges thread states between the first tier limited number of runnable thread register states and the second tier thread storage facility.
    Type: Application
    Filed: September 19, 2002
    Publication date: March 25, 2004
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Harold F. Kossman, Timothy John Mullins
  • Patent number: 6691220
    Abstract: A method of operation within a processor that permits load instructions following barrier instructions in an instruction sequence to be issued speculatively. The barrier instruction is executed and while the barrier operation is pending, a load request associated with the load instruction is speculatively issued. A speculation flag is set to indicate the load instruction was speculatively issued. The flag is reset when an acknowledgment of the barrier operation is received. Data that is returned before the acknowledgment is received is temporarily held, and the data is forwarded to the register and/or execution unit of the processor only after the acknowledgment is received. If a snoop invalidate is detected for the speculatively issued load request before the barrier operation completes, the data is discarded and the load request is re-issued.
    Type: Grant
    Filed: June 6, 2000
    Date of Patent: February 10, 2004
    Assignee: International Business Machines Corporation
    Inventors: Guy Lynn Guthrie, Ravi Kumar Arimilli, John Steven Dodson, Derek Edward Williams
  • Publication number: 20040024996
    Abstract: A circuit and method for maintaining a correct value in performance monitor counter within a speculative computer microprocessor is disclosed. In response to determining the begin of speculative execution within the microprocessor, the value of the performance monitor counter is stored in a rewind register. The performance monitor counter is incremented in response to predetermined events. If the microprocessor determines the speculative execution was incorrect, the value of the rewind register is loaded into the counter, restoring correct value for the counter.
    Type: Application
    Filed: July 31, 2002
    Publication date: February 5, 2004
    Applicants: International Business Machines Corporation, Hitachi, Ltd.
    Inventors: Hung Qui Le, Alexander Erik Mericas, Robert Dominick Mirabella, Toshihiko Kurihara, Michitaka Okuno, Masahiro Tokoro
  • Patent number: 6687812
    Abstract: Disclosed is a parallel processing apparatus capable of reducing power consumption by efficiently executing a fork instruction for activating a plurality of processors. The parallel processing apparatus has a processor element (10) for generating (forking) a thread consisting of a plurality of instructions of an external unit. The processor element comprises a fork-instruction predicting section (14) which includes a predicting section for predicting whether or not the fork condition of a fork-conditioned fork instruction is satisfied after fetching but before executing the instruction.
    Type: Grant
    Filed: April 20, 2000
    Date of Patent: February 3, 2004
    Assignee: NEC Corporation
    Inventor: Sachiko Shimada
  • Publication number: 20040019772
    Abstract: A microprocessor includes a register (5) rewritable with software outputs a signal A for determining which one of a successor instruction to be executed when a condition for a conditional branch is satisfied and another successor instruction to be executed when the condition is unsatisfied is to be introduced into a delay slot. When the microprocessor executes a conditional branch, a decode circuit (6) delivers a signal B indicating which one of the successor instruction and the other successor instruction is to be selected as the next instruction to be supplied next to a CPU (1) to a code interface circuit (2).
    Type: Application
    Filed: January 22, 2003
    Publication date: January 29, 2004
    Inventors: Hiroshi Ueki, Masahiro Yokoyama
  • Patent number: 6675374
    Abstract: A technique is provided for inserting memory prefetch instructions only at appropriate locations in program code. The instructions are inserted into the program code such that, when the code is executed, the speed and efficiency of execution of the code may be improved, cache conflicts arising from execution of the prefetch instruction may be substantially eliminated, and the number of simultaneously-executing memory prefetch operations may be limited to prevent stalling and/or overtaxing of the processor executing the code.
    Type: Grant
    Filed: October 12, 1999
    Date of Patent: January 6, 2004
    Assignee: Hewlett-Packard Development Company, L.P.
    Inventors: John Samuel Pieper, Steven Orodon Hobbs, Stephen Corridon Root
  • Patent number: 6675285
    Abstract: A method and apparatus for eliminating memory contention in a computation module is presented. The method includes, for a current operation being performed by a computation engine of the computation model, processing that begins by identifying one of a plurality of threads for which the current operation is being performed. The plurality of threads constitutes an application (e.g., geometric primitive applications, video graphic applications, drawing applications, etc.). The processing continues by identifying an operation code from a set of operation codes corresponding to the one of the plurality of threads. As such, the thread that has been identified for the current operation, one of its operation codes is being identified for the current operation. The processing then continues by determining a particular location of a particular one of a plurality of data flow memory devices based on the particular thread and the particular operation code for storing the result of the current operation.
    Type: Grant
    Filed: April 21, 2000
    Date of Patent: January 6, 2004
    Assignee: ATI International, Srl
    Inventors: Michael Andrew Mang, Michael Mantor
  • Publication number: 20040003215
    Abstract: A method and apparatus for executing low power validations for high confidence predictions. More particularly, the present invention pertains to using confidence levels of speculative executions to decrease power consumption of a processor without affecting its performance. Non-critical instructions, or those instructions whose prediction, rather than verification, lie on the critical path, can thus be optimized to consume less power.
    Type: Application
    Filed: June 28, 2002
    Publication date: January 1, 2004
    Inventors: Evgeni Krimer, Bishara Shomar, Ronny Ronen, Doron Orenstein
  • Patent number: 6662295
    Abstract: The present invention is related to branch instructions in a pipeline process of a microprocessor system. The microprocessor system executes branch prediction if a conditional branch instruction code calls for branch prediction, and on the other hand, suspends successive instruction execution until a branch evaluation of the conditional branch instruction settles if the conditional branch instruction code does not call for branch prediction.
    Type: Grant
    Filed: September 10, 1998
    Date of Patent: December 9, 2003
    Assignee: Ricoh Company, Ltd.
    Inventor: Shinichi Yamaura
  • Patent number: 6651247
    Abstract: In a computer having rotating registers, a schedule-assigner for allocating the rotating registers. The scheduler-assigner includes a software-pipelined instruction scheduler that generates a first software-pipelined instruction schedule based on an intermediate representation that has data flow information in SSA form. The scheduler-assigner also includes a rotating register allocator that designates live ranges of loop-variant variables in the first software-pipelined instruction schedule as being allocated to rotating registers, when available. The first software-pipelined instruction schedule may be a modulo schedule. When a rotating register is not available, the software-pipelined instruction scheduler may generate a second software-pipelined instruction schedule having an initiation interval greater than the initiation interval of the first software-pipelined instruction schedule.
    Type: Grant
    Filed: May 9, 2000
    Date of Patent: November 18, 2003
    Assignee: Hewlett-Packard Development Company, L.P.
    Inventor: Uma Srinivasan
  • Patent number: 6643770
    Abstract: A mispredicted path side memory is configured to be coupled to a stage in an instruction pipeline. As instructions advance through the pipeline, a result from the stage is stored into the mispredicted path side memory. The result is restored from the mispredicted path side memory into a pipeline stage when a branch is mispredicted.
    Type: Grant
    Filed: September 16, 1999
    Date of Patent: November 4, 2003
    Assignee: Intel Corporation
    Inventor: Nicolas I. Kacevas
  • Patent number: 6636960
    Abstract: The system is a method and an apparatus for resteering failing speculation check instructions in the pipeline of a processor. A branch offset immediate value and an instruction pointer correspond to each failing instruction. These values are used to determine the correct target recovery address. A relative adder adds the immediate value and the instruction pointer value to arrive at the target recovery address. This is done by flushing the pipeline upon the occurrence of a failing speculation check instruction. The pipeline flush is extended to allow the instruction stream to be resteered. The immediate value and the instruction pointer are then routed through the existing data paths of the pipeline, into the relative adder, which calculates the correct address. A sequencer tracks the progression of these values through the pipeline and causes a branch at the desired time.
    Type: Grant
    Filed: February 16, 2000
    Date of Patent: October 21, 2003
    Assignee: Hewlett-Packard Development Company, L.P.
    Inventors: James Douglas Gibson, Rohit Bhatia
  • Publication number: 20030188141
    Abstract: One embodiment of the present invention provides a system that facilitates interleaved execution of a head thread and a speculative thread within a single processor pipeline. The system operates by executing program instructions using the head thread, and by speculatively executing program instructions in advance of the head thread using the speculative thread, wherein the head thread and the speculative thread execute concurrently through time-multiplexed interleaving in the single processor pipeline.
    Type: Application
    Filed: February 12, 2003
    Publication date: October 2, 2003
    Inventors: Shailender Chaudhry, Marc Tremblay
  • Publication number: 20030182539
    Abstract: It has been determined that, in a superscalar computer processor, executing load instructions issued along an incorrectly predicted path of a conditional branch instruction eventually reduces the number of cache misses observed on the correct branch path. Executing these wrong-path loads provides an indirect prefetching effect. If the processor has a small L1 data cache, however, this prefetching pollutes the cache causing an overall slowdown in performance. By storing the execution results of mispredicted paths in memory, such as in a wrong path cache, the pollution is eliminated. A wrong path cache can improve processor performance up to 17% in simulations using a 32 KB data cache. A fully-associative eight-entry wrong path cache in parallel with a 4 KB direct-mapped data cache allows the execution of wrong path loads to produce an average processor speedup of 46%. The wrong path cache also results in 16% better speedup compared to the baseline processor equipped with a victim cache of the same size.
    Type: Application
    Filed: March 20, 2002
    Publication date: September 25, 2003
    Applicant: International Business Machines Corporation
    Inventors: Steven R. Kunkel, David J. Lilja, Resit Sendag
  • Publication number: 20030182542
    Abstract: In accordance with one embodiment, the invention provides a method comprising monitoring a power consumption of a processor in executing a program while running in a speculative execution mode wherein instructions are speculatively executed; and turning off said speculative execution mode if said power consumption is above a predetermined threshold. According to another embodiment the invention provides a processor comprising a speculative mode wherein instructions are speculatively executed; a non-speculative execution mode wherein instructions are executed non-speculatively; and a speculation control mechanism to selectively cause said processor to operate in said non-speculative mode based on a power consumption criterion.
    Type: Application
    Filed: March 20, 2002
    Publication date: September 25, 2003
    Inventors: Robert L. Davies, Aaron M. Tsirkel
  • Publication number: 20030177343
    Abstract: A multi-processing computer architecture and a method of operating the same are provided. The multi-processing architecture provides a main processor and multiple sub-processors cascaded together to efficiently execute loop operations. The main processor executes operations outside of a loop and controls the loop. The multiple sub-processors are operably interconnected, and are each assigned by the main processor to a given loop iteration. Each sub-processor is operable to receive one or more sub-instructions sequentially, operate on each sub-instruction and propagate the sub-instruction to a subsequent sub-processor.
    Type: Application
    Filed: July 24, 2002
    Publication date: September 18, 2003
    Applicant: Sony Computer Entertainment America Inc.
    Inventor: Hidetaka Magoshi
  • Patent number: 6604191
    Abstract: An instruction fetching system (and/or architecture) which may be utilized by a high-frequency short-pipeline microprocessor, for efficient fetching of both in-line and target instructions. The system contains an instruction fetching unit (IFU), having a control logic and associated components for controlling a specially designed instruction cache (I-cache). The I-cache is a sum-address cache, i.e., it receives two address inputs, which compiled by a decoder to provide the address of the line of instructions desired fetch. The I-cache is designed with an array of cache lines that can contain 32 instructions, and three buffers that each have a capacity of 32 instructions.
    Type: Grant
    Filed: February 4, 2000
    Date of Patent: August 5, 2003
    Assignee: International Business Machines Corporation
    Inventors: Brian King Flacks, David Meltzer, Joel Abraham Silberman
  • Publication number: 20030135722
    Abstract: A system, method and apparatus is provided that splits a microprocessor load instruction into two (2) parts, a speculative load instruction and a check speculative load instruction. The speculative load instruction can be moved ahead in the instruction stream by the compiler as soon as the address and result registers are available. This is true even when the data to be loaded is not actually required. This speculative load instruction will not cause a fault in the memory if the access is invalid, i.e. the load misses and a token bit is set. The check speculative load instruction will cause the speculative load instruction to be retried in the event the token bit was set equal to one. In this manner, the latency associated with branching to an interrupt routine will be eliminated a significant amount of the time. It is very possible that the reasons for invalidating the speculative load operation are no longer present (e.g. page in memory is not present) and the load will be allowed to complete.
    Type: Application
    Filed: January 10, 2002
    Publication date: July 17, 2003
    Applicant: International Business Machines Corporation
    Inventor: Andrew Johnson
  • Publication number: 20030126416
    Abstract: Techniques for suspending execution of a thread in a multi-threaded processor. In one embodiment, a processor includes resources that can be partitioned between multiple threads. Processor logic receives an instruction in a first thread of execution, and, in response to that instruction, relinquishes portions of the portioned resources for use by other threads.
    Type: Application
    Filed: December 31, 2001
    Publication date: July 3, 2003
    Inventors: Deborah T. Marr, Dion Rodgers, David L. Hill, Shiv Kaushik, James B. Crossland, David A. Koufaty
  • Publication number: 20030126417
    Abstract: A method and apparatus to execute data speculative instructions in a processor comprising at least one source register, each source register comprising a bit to indicate validity of data in the at least one source register. A data validity circuit coupled to the one or more source registers to determine the validity of the data in the source registers, and to indicate the validity of the data in a destination register based upon the validity bit in the at least one source register. The processor optionally comprising a checker unit to retire those instructions from the execution unit which write valid data to the destination register, and to re-schedules those instructions for execution which write invalid data to the destination register.
    Type: Application
    Filed: January 2, 2002
    Publication date: July 3, 2003
    Inventors: Eric Sprangle, Michael J. Haertel, David J. Sager
  • Patent number: 6564315
    Abstract: A scheduler issues instruction operations for execution, but also retains the instruction operations. If a particular instruction operation is subsequently found to be required to execute non-speculatively, the particular instruction operation is still stored in the scheduler. Subsequent to determining that the particular operation has become non-speculative (through the issuance and execution of instruction operations prior to the particular instruction operation), the particular instruction operation may be reissued from the scheduler. The penalty for incorrect scheduling of instruction operations which are to execute non-speculatively may be reduced as compared to purging the particular instruction operation and younger instruction operations from the pipeline and refetching the particular instruction operation. Additionally, the scheduler may maintain the dependency indications for each instruction operation which has been issued.
    Type: Grant
    Filed: January 3, 2000
    Date of Patent: May 13, 2003
    Assignee: Advanced Micro Devices, Inc.
    Inventors: James B. Keller, Ramsey W. Haddad, Stephan G. Meier
  • Publication number: 20030088760
    Abstract: According to one aspect of the invention, a method is provided in which store addresses of store instructions dispatched during a last predetermined number of cycles are maintained in a first data structure of a first processor. It is determined whether a load address of a first load instruction matches one of the store addresses in the first data structure. The first load instruction is replayed if the load address of the first load instruction matches one of the store addresses in the first data structure.
    Type: Application
    Filed: October 24, 2002
    Publication date: May 8, 2003
    Inventors: Muntaquim F. Chowdhury, Douglas M. Carmean
  • Publication number: 20030084274
    Abstract: In processing an instruction request, the invention determines whether the request is speculative or not based upon a bit field within the instruction. If the request is speculative, bus congestion and/or target memory is assessed for conditions and a decision is made, based on the conditions, as to whether or not to process the request. To facilitate the invention, certain bit fields within the instruction are encoded to identify the request as speculative or not. Additional bit fields may define a priority of a speculative request to influence the decision to process as based on the conditions. CPU architectures incorporating prefetch logic may be modified to recognize instructions encoded with speculation and priority identification fields to implement the invention in existing systems. Other logic, e.g., bus controllers and switches, may similarly process speculative requests to enhance system performance.
    Type: Application
    Filed: October 26, 2001
    Publication date: May 1, 2003
    Inventors: Blaine D. Gaither, Robert J. Brooks
  • Publication number: 20030079116
    Abstract: One embodiment of the present invention provides a system that predicts a result produced by a section of code in order to support speculative program execution. The system begins by executing the section of code using a head thread in order to produce a result. Before the head thread produces the result, the system generates a predicted result to be used in place of the result. Next, the system allows a speculative thread to use the predicted result in speculatively executing subsequent code that follows the section of code. After the head thread finishes executing the section of code, the system determines if a difference between the predicted result and the result generated by the head thread has affected execution of the speculative thread. If so, the system executes the subsequent code again using the result generated by the head thread. If not, the system performs a join operation to merge state associated with the speculative thread with state associated with the head thread.
    Type: Application
    Filed: January 16, 2001
    Publication date: April 24, 2003
    Inventors: Shailender Chaudlhry, Marc Tremblay
  • Publication number: 20030074544
    Abstract: A method for conditionally performing a SIMD operation causing a predetermined number of result objects to be held in a combination of different ones of a plurality of destination stores, the method comprising receiving and decoding instruction fields to determine at least one source store, a plurality of destination stores and at least one control store, said source and destination stores being capable of holding one or a plurality of objects, each object defining a SIMD lane. Conditional execution of the operation on a per SIMD lane basis is controlled using a plurality of pre-set indicators of the at least one control store designated in the instruction, wherein each said pre-set indicator i controls a predetermined number of result lanes p, where p takes a value greater than or equal to two. A predetermined number of result objects are sent to the destination stores such that the predetermined number of result objects are held by a combination of different ones of the plurality of destination stores.
    Type: Application
    Filed: April 11, 2002
    Publication date: April 17, 2003
    Inventor: Sophie Wilson
  • Publication number: 20030046517
    Abstract: One embodiment of the present invention provides a system to facilitate multithreading a computer processor pipeline. The system includes a pipeline that is configured to accept instructions from multiple independent threads of operation, wherein each thread of operation is unrelated to the other threads of operation. This system also includes a control mechanism that is configured to control the pipeline. This control mechanism is statically scheduled to execute multiple threads in round-robin succession. This static scheduling eliminates the need for communication between stages of the pipeline.
    Type: Application
    Filed: September 4, 2001
    Publication date: March 6, 2003
    Inventor: Gary R. Lauterbach
  • Publication number: 20030037226
    Abstract: A processor architecture includes a program counter which executes M independent program streams in time division in units of one instruction, a pipeline which is shared by each of the program streams and has N pipeline stages operable at a frequency F, and a mechanism which executes only s program streams depending on a required operation performance, where M and N are integers greater than or equal to one and having no mutual dependency, s is an integer greater than or equal to zero and satisfying s≦M. An apparent number of pipeline stages viewed from each of the program streams is set to N/M so that M parallel processors having an apparent operating frequency F/M are formed.
    Type: Application
    Filed: April 29, 2002
    Publication date: February 20, 2003
    Applicant: FUJITSU LIMITED
    Inventors: Toru Tsuruta, Norichika Kumamoto, Hideki Yoshizawa
  • Patent number: 6523110
    Abstract: There is provided a decoupled fetch-execute engine with static branch prediction support. A method for prefetching targets of branch instructions in a computer processing system having instruction fetch decoupled from an execution pipeline includes the step of generating a prepare-to-branch (PBR) operation. The PBR operation includes address bits corresponding to a branch paired thereto and address bits corresponding to an expected target of the branch. The execution of the PBR operation is scheduled prior to execution of the paired branch to enforce a desired latency therebetween. Upon execution of the PBR operation, it is determined whether the paired branch is available using the address bits of the PBR operation corresponding to the paired branch. When the paired branch is available, the expected branch target is fetched using the address bits of the PBR operation corresponding to the expected branch target.
    Type: Grant
    Filed: July 23, 1999
    Date of Patent: February 18, 2003
    Assignee: International Business Machines Corporation
    Inventors: Arthur A. Bright, Jason E. Fritts
  • Publication number: 20030033511
    Abstract: In one embodiment of the invention, a processor includes an execution pipeline to concurrently execute at least portions of threads, wherein at least one of the threads is dependent on at least another one of the threads. The processor also includes detection circuitry to detect speculation errors in the execution of the threads. In another embodiment, the processor includes thread management logic to control dynamic creation of threads from a program.
    Type: Application
    Filed: October 8, 2002
    Publication date: February 13, 2003
    Inventors: Haitham Akkary, Kingsum Chow
  • Publication number: 20030033510
    Abstract: Mechanisms and techniques operate in a computerized device to enable or disable speculative execution of instructions such as reordering of load and store instructions a multiprocessing computerized device. The mechanisms and techniques provide a speculative execution controller that can detect a multiaccess memory condition between the first and second processors, such as concurrent access to shared data pages via page table entries. This can be done by monitoring page table entry accesses by other processors. The speculative execution controller sets a value of a speculation indicator in the memory system based on the multiaccess memory condition. If the value of the speculation indicator indicates that speculative execution of instructions is allowed in the computerized device, the speculative execution controller allows speculative execution of instructions in at least one of the first and second processors in the computerized device.
    Type: Application
    Filed: January 3, 2002
    Publication date: February 13, 2003
    Inventor: David Dice
  • Publication number: 20030028755
    Abstract: In a parallel processor system for executing a plurality of threads which are obtained by dividing a single program in parallel each other by a plurality of processors, when a processor executing a master thread conducts forking of a slave thread in other processor, at every write to a general register in the master thread after forking, the fork source processor transmits an updated register value to the fork destination processor through a communication bus. The fork destination processor executes the slave thread for speculation and upon detecting an offense against Read After Write (RAW) related to the general register, cancels the thread being executed to conduct re-execution of the thread.
    Type: Application
    Filed: June 7, 2002
    Publication date: February 6, 2003
    Applicant: NEC CORPORATION
    Inventors: Taku Ohsawa, Satoshi Matsushita
  • Patent number: 6516462
    Abstract: Compiler optimization methods and systems for preventing delays associated with a speculative load operation on a data when the data is not in the data cache of a processor. A compiler optimizer analyzes various criteria to determine whether a cache miss savings transformation is useful. Depending on the results of the analysis, the load operation and/or the successor operations to the load operation are transferred into a predicated mode of operation to enhance overall system efficiency and execution speed.
    Type: Grant
    Filed: February 17, 2000
    Date of Patent: February 4, 2003
    Assignee: Elbrus International
    Inventors: Sergev K. Okunev, Vladimir Y. Volkonsky
  • Publication number: 20030023838
    Abstract: In lieu of branch prediction, a merged fetch-branch unit operates in parallel with the decode unit within a processor. Upon detection of a branch instruction within a group of one or more fetched instructions, any instructions preceding the branch are marked regular instructions, the branch instruction is marked as such, and any instructions following branch are marked sequential instructions. Within two cycles, sequential instructions following the last fetched instruction are retrieved and marked, target instructions beginning at the branch target address are retrieved and marked, and the branch is resolved. Either the sequential or target instructions are then dropped depending on the branch resolution, incurring a fixed, 1 cycle branch penalty.
    Type: Application
    Filed: July 27, 2001
    Publication date: January 30, 2003
    Inventors: Faraydon O. Karim, Ramesh Chandra
  • Publication number: 20030014612
    Abstract: A processor improves throughput efficiency and exploits increased parallelism by introducing multithreading to an existing and mature processor core. The multithreading is implemented in two steps including vertical multithreading and horizontal multithreading. The processor core is retrofitted to support multiple machine states. System embodiments that exploit retrofitting of an existing processor core advantageously leverage hundreds of man-years of hardware and software development by extending the lifetime of a proven processor pipeline generation. A processor implements N-bit flip-flop global substitution. To implement multiple machine states, the processor converts 1-bit flip-flops in storage cells of the stalling vertical thread to an N-bit global flip-flop where N is the number of vertical threads.
    Type: Application
    Filed: May 11, 1999
    Publication date: January 16, 2003
    Inventors: WILLIAM N. JOY, MARC TREMBLAY, GARY LAUTERBACH, JOSEPH I. CHAMDANI
  • Publication number: 20030005262
    Abstract: The present invention provides a mechanism for supporting high bandwidth instruction fetching in a multi-threaded processor. A multi-threaded processor includes an instruction cache (I-cache) and a temporary instruction cache (TIC). In response to an instruction pointer (IP) of a first thread hitting in the I-cache, a first block of instructions for the thread is provided to an instruction buffer and a second block of instructions for the thread are provided to the TIC. On a subsequent clock interval, the second block of instructions is provided to the instruction buffer, and first and second blocks of instructions from a second thread are loaded into a second instruction buffer and the TIC, respectively.
    Type: Application
    Filed: June 28, 2001
    Publication date: January 2, 2003
    Inventors: Sailesh Kottapalli, James S. Burns, Kenneth D. Shoemaker
  • Patent number: 6493820
    Abstract: In one embodiment of the invention, a processor includes an execution pipeline to concurrently execute at least portions of threads, wherein at least one of the threads is dependent on at least another one of the threads. The processor also includes detection circuitry to detect speculation errors in the execution of the threads. In another embodiment, the processor includes thread management logic to control dynamic creation of threads from a program.
    Type: Grant
    Filed: December 29, 2000
    Date of Patent: December 10, 2002
    Assignee: Intel Corporation
    Inventors: Haitham Akkary, Kingsum Chow
  • Publication number: 20020184478
    Abstract: The problem of mis-match between a program counter (14) of a CPU (10) and a byte code counter (18) of an instruction path coprocessor (IPC) (16) is addressed by causing the IPC (16) to translate IPC branch instructions to the CPU branch instructions, in which the CPU branch instructions implicitly indicate whether a corresponding IPC branch instructions should be taken and in which the CPU branch instruction will cause the CPU (10) to set its own program counter (14) to a safe location in the IPC range to avoid overflow.
    Type: Application
    Filed: April 8, 2002
    Publication date: December 5, 2002
    Inventors: Adrianus Josephus Bink, Alexander Augusteijn, Paul Ferenc Hoogendijk, Hendrikus Wilhelmus Johannes Van De Wiel
  • Publication number: 20020178349
    Abstract: When a processor executes a memory operation instruction by means of data dependence speculative execution, a speculative execution result history table which stores history information concerning success/failure results of the speculative execution of memory operation instructions of the past is referred to and thereby whether the speculative execution will succeed or fail is predicted. In the prediction, the target address of the memory operation instruction is converted by a hash function circuit into an entry number of the speculative execution result history table (allowing the existence of aliases), and an entry of the table designated by the entry number is referred to. If the prediction is “success”, the memory operation instruction is executed in out-of-order execution speculatively (with regard to data dependence relationship between the instructions).
    Type: Application
    Filed: May 22, 2002
    Publication date: November 28, 2002
    Applicant: NEC CORPORATION
    Inventors: Atsufumi Shibayama, Satoshi Matsushita, Sunao Torii, Naoki Nishi
  • Publication number: 20020174328
    Abstract: A first tag is assigned to a branch instruction. Dependent on the type of branch instruction, a second tag is assigned to an instruction in the branch delay slot of the branch instruction. The second tag may equal the first tag if the branch delay slot is unconditional for that branch, and may equal a different tag if the branch delay slot is conditional for the branch. If the branch is mispredicted, the first tag is broadcast to pipeline stages that may have speculative instructions, and the first tag is compared to tags in the pipeline stages. If the tag in a pipeline stage matches the first tag, the instruction is not cancelled. If the tag mismatches, the instruction is cancelled.
    Type: Application
    Filed: May 17, 2001
    Publication date: November 21, 2002
    Inventor: David A. Kruckemyer
  • Publication number: 20020144087
    Abstract: A kind of architecture of method for fetching microprocessor's instructions is provided to pre-read and pre-decode a next instruction. If the instruction pre-decoded is found a conditional branch instruction, an instruction reading-amount register is set for reading two instructions next to the current instruction in the program memory, or one is read instead if the next instruction is found an instruction other than the conditional branch one so as to waive reading of unnecessary program memory and thereby reduce power consumption.
    Type: Application
    Filed: December 18, 2001
    Publication date: October 3, 2002
    Inventors: Pao-Lung Chen, Chen-Yi Lee
  • Publication number: 20020138717
    Abstract: A processor includes logic for tagging a thread identifier (TID) for usage with processor blocks that are not stalled. Pertinent non-stalling blocks include caches, translation look-aside buffers (TLB), a load buffer asynchronous interface, an external memory management unit (MMU) interface, and others. A processor includes a cache that is segregated into a plurality of N cache parts. Cache segregation avoids interference, “pollution”, or “cross-talk” between threads. One technique for cache segregation utilizes logic for storing and communicating thread identification (TID) bits. The cache utilizes cache indexing logic. For example, the TID bits can be inserted at the most significant bits of the cache index.
    Type: Application
    Filed: May 23, 2002
    Publication date: September 26, 2002
    Inventors: William N. Joy, Marc Tremblay, Gary Lauterbach, Joseph I. Chamdani
  • Patent number: 6457117
    Abstract: The processor is configured to predecode instruction bytes prior to their storage within an instruction cache. During the predecoding, relative branch instructions are detected. The displacement included within the relative branch instruction is added to the address corresponding to the relative branch instruction, thereby generating the target address. The processor replaces the displacement field of the relative branch instruction with an encoding of the target address, and stores the modified relative branch instruction in the instruction cache. The branch prediction mechanism may select the target address from the displacement field of the relative branch instruction instead of performing an addition to generate the target address. In one embodiment, relative branch instructions having eight bit and 32-bit displacement fields are included in the instruction set executed by the processor.
    Type: Grant
    Filed: November 7, 2000
    Date of Patent: September 24, 2002
    Assignee: Advanced Micro Devices, Inc.
    Inventor: David B. Witt
  • Publication number: 20020129227
    Abstract: A time multiplex changing function for priorities among threads is added to a multi-thread processor, and capability for large-scale out-of-order execution is achieved by confining the flows of data among threads, prescribing the execution order in the flow sequence, and executing a plurality of threads having data dependency either simultaneously or in time multiplex.
    Type: Application
    Filed: December 20, 2001
    Publication date: September 12, 2002
    Inventor: Fumio Arakawa
  • Patent number: 6430682
    Abstract: Reliable branch predictions for real-time applications reduce both conditional branch execution time and uncertainties associated with their prediction in a computer implemented application. One method ensures that certain conditional branches are always correctly predicted, effectively converting them to jump instructions during program execution. Another method exploits the fact that some conditional branches always branch in the same direction within a task invocation, although that direction may vary across invocations. These methods improve computer processor utilization and performance.
    Type: Grant
    Filed: September 11, 1998
    Date of Patent: August 6, 2002
    Assignee: Agere Systems Guardian Corp.
    Inventor: Harry Dwyer, III
  • Publication number: 20020091913
    Abstract: By using an entry number (WRB number) of a re-order buffer 6, each of function units such as an operation unit 3, a store unit 4, a load unit 5, etc. notifies to the re-order buffer 6 the processing end for a instruction stored in the entry concerned in the unit thereof. The load unit 5 manages the latest speculation state of a load instruction issued on the basis of a branch prediction success/failure signal output from the branch unit 2, and makes no notification to the re-order buffer 6 on the basis of WRB number for subsequent load instructions of a branch-prediction failed branch instruction even when the processing of the instruction is finished. Accordingly, the re-order buffer 6 can re-use entries in which the subsequent instructions of the branch prediction failed branch instruction are stored.
    Type: Application
    Filed: January 9, 2002
    Publication date: July 11, 2002
    Applicant: NEC CORPORATION
    Inventor: Masao Fukagawa
  • Publication number: 20020087849
    Abstract: Described is a data processing system and processor that provides full multiprocessor speculation by which all instructions subsequent to barrier operations in a instruction sequence are speculatively executed before the barrier operation completes on the system bus. The processor comprises a load/store unit (LSU) with a barrier operation (BOP) controller that permits load instructions subsequent to syncs in an instruction sequence to be speculatively issued by the LRQ prior to the return of the sync acknowledgment. Load data returned by the speculative load request is immediately forwarded to the processor's execution units for speculative execution with subsequent instructions. The returned data and results of subsequent operations are held temporarily in the rename registers. A multiprocessor speculation flag is set in the corresponding rename registers to indicate that the value is “barrier” speculative.
    Type: Application
    Filed: December 28, 2000
    Publication date: July 4, 2002
    Applicant: International Business Machines Corporation
    Inventors: Ravi Kumar Arimilli, John Steven Dodson, Guy Lynn Guthrie, Derek Edward Williams
  • Publication number: 20020078326
    Abstract: In one embodiment, a programmable processor is adapted to include a speculative count register. The speculative count register may be loaded with data associated with an instruction before the instruction commits. However, if the instruction is terminated before it commits, the speculative count register may be adjusted. A set of counters may monitor the difference between the speculative count register and its architectural counterpart.
    Type: Application
    Filed: December 20, 2000
    Publication date: June 20, 2002
    Applicant: Intel Corporation and Analog Devices, Inc.
    Inventors: Charles P. Roth, Ravi P. Singh, Gregory A. Overkamp
  • Publication number: 20020073301
    Abstract: A method of executing microprocessor instructions and an associated microprocessor are disclosed. Initially, a conditional branch instruction is fetched from a storage unit such as an instruction cache. Branch prediction information embedded in the branch instruction is detected by a fetch unit of the microprocessor. Depending upon the state of the branch prediction information, instructions from the branch-taken path and the branch-not-taken path of the branch instruction are fetched. The branch-not-taken path instructions and the branch-taken path instruction may be speculatively executed. Upon executing the conditional branch instruction, the speculative results from the branch-taken path are discarded if the branch is not taken and speculative results from the branch-not-taken path are discarded if the branch is taken. The branch prediction information may include compiler generated information indicative of the context in which the conditional branch instruction is used.
    Type: Application
    Filed: December 7, 2000
    Publication date: June 13, 2002
    Applicant: International Business Machines Corporation
    Inventors: James Allan Kahle, Charles Roberts Moore