Simultaneous Parallel Fetching Or Executing Of Both Branch And Fall-through Path Patents (Class 712/235)
-
Patent number: 6714961Abstract: The invention is directed toward a multiprocessing system having multiple processing units. For at least one of the processing units in the multiprocessing system, a first job signal is assigned to the processing unit for speculative execution of a corresponding first job, and a further job signal is assigned to the processing unit for speculative execution of a corresponding further job. The speculative execution of said further job is initiated when the processing unit has completed execution of the first job. If desirable, even more job signals may be assigned to the processing unit for speculative execution. In this way, multiple job signals are assigned to the processing units of the processing system, and the processing units are allowed to execute a plurality of jobs speculatively while waiting for commit priority.Type: GrantFiled: November 12, 1999Date of Patent: March 30, 2004Assignee: Telefonaktiebolaget LM Ericsson (publ)Inventors: Per Anders Holmberg, Terje Egeland, Nils Ola Linnermark, Karl Oscar Joachim Strömbergson, Magnus Carlsson
-
Publication number: 20040059898Abstract: Various methods, apparatuses, and systems in which a processor includes an issue engine and an in-order execution pipeline. The issue engine categorizes operations as at least one of either a speculative operation which perform computations or an architectural operation which has potential to fault or cause an exception. Each architectural operation issues with an associated architectural micro-operation. A first micro-operation checks whether a first speculative operation is dependent upon an intervening first architectural operation. The in-order execution pipeline executes the speculative operation, the architectural operation, and the associated architectural micro-operations.Type: ApplicationFiled: September 19, 2002Publication date: March 25, 2004Inventors: Jeffery J. Baxter, Gary N. Hammond, Nazar A. Zaidi
-
Publication number: 20040059896Abstract: A method and apparatus are provided for implementing two-tiered thread state multithreading support with a high clock rate. A first tier thread state storage stores a limited number of runnable thread register states. The limited number is less than a threshold value. Next thread selection logic coupled between the first tier thread state storage and a currently executing processor state, picks a next thread to run on a processor from the limited number of runnable thread register states. A second tier thread storage facility stores a second number of thread states that is greater than the limited number of runnable thread register states. A runnable thread selection logic coupled between the first tier thread state storage and the second tier thread storage facility, selectively exchanges thread states between the first tier limited number of runnable thread register states and the second tier thread storage facility.Type: ApplicationFiled: September 19, 2002Publication date: March 25, 2004Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Harold F. Kossman, Timothy John Mullins
-
Patent number: 6691220Abstract: A method of operation within a processor that permits load instructions following barrier instructions in an instruction sequence to be issued speculatively. The barrier instruction is executed and while the barrier operation is pending, a load request associated with the load instruction is speculatively issued. A speculation flag is set to indicate the load instruction was speculatively issued. The flag is reset when an acknowledgment of the barrier operation is received. Data that is returned before the acknowledgment is received is temporarily held, and the data is forwarded to the register and/or execution unit of the processor only after the acknowledgment is received. If a snoop invalidate is detected for the speculatively issued load request before the barrier operation completes, the data is discarded and the load request is re-issued.Type: GrantFiled: June 6, 2000Date of Patent: February 10, 2004Assignee: International Business Machines CorporationInventors: Guy Lynn Guthrie, Ravi Kumar Arimilli, John Steven Dodson, Derek Edward Williams
-
Publication number: 20040024996Abstract: A circuit and method for maintaining a correct value in performance monitor counter within a speculative computer microprocessor is disclosed. In response to determining the begin of speculative execution within the microprocessor, the value of the performance monitor counter is stored in a rewind register. The performance monitor counter is incremented in response to predetermined events. If the microprocessor determines the speculative execution was incorrect, the value of the rewind register is loaded into the counter, restoring correct value for the counter.Type: ApplicationFiled: July 31, 2002Publication date: February 5, 2004Applicants: International Business Machines Corporation, Hitachi, Ltd.Inventors: Hung Qui Le, Alexander Erik Mericas, Robert Dominick Mirabella, Toshihiko Kurihara, Michitaka Okuno, Masahiro Tokoro
-
Patent number: 6687812Abstract: Disclosed is a parallel processing apparatus capable of reducing power consumption by efficiently executing a fork instruction for activating a plurality of processors. The parallel processing apparatus has a processor element (10) for generating (forking) a thread consisting of a plurality of instructions of an external unit. The processor element comprises a fork-instruction predicting section (14) which includes a predicting section for predicting whether or not the fork condition of a fork-conditioned fork instruction is satisfied after fetching but before executing the instruction.Type: GrantFiled: April 20, 2000Date of Patent: February 3, 2004Assignee: NEC CorporationInventor: Sachiko Shimada
-
Publication number: 20040019772Abstract: A microprocessor includes a register (5) rewritable with software outputs a signal A for determining which one of a successor instruction to be executed when a condition for a conditional branch is satisfied and another successor instruction to be executed when the condition is unsatisfied is to be introduced into a delay slot. When the microprocessor executes a conditional branch, a decode circuit (6) delivers a signal B indicating which one of the successor instruction and the other successor instruction is to be selected as the next instruction to be supplied next to a CPU (1) to a code interface circuit (2).Type: ApplicationFiled: January 22, 2003Publication date: January 29, 2004Inventors: Hiroshi Ueki, Masahiro Yokoyama
-
Patent number: 6675374Abstract: A technique is provided for inserting memory prefetch instructions only at appropriate locations in program code. The instructions are inserted into the program code such that, when the code is executed, the speed and efficiency of execution of the code may be improved, cache conflicts arising from execution of the prefetch instruction may be substantially eliminated, and the number of simultaneously-executing memory prefetch operations may be limited to prevent stalling and/or overtaxing of the processor executing the code.Type: GrantFiled: October 12, 1999Date of Patent: January 6, 2004Assignee: Hewlett-Packard Development Company, L.P.Inventors: John Samuel Pieper, Steven Orodon Hobbs, Stephen Corridon Root
-
Patent number: 6675285Abstract: A method and apparatus for eliminating memory contention in a computation module is presented. The method includes, for a current operation being performed by a computation engine of the computation model, processing that begins by identifying one of a plurality of threads for which the current operation is being performed. The plurality of threads constitutes an application (e.g., geometric primitive applications, video graphic applications, drawing applications, etc.). The processing continues by identifying an operation code from a set of operation codes corresponding to the one of the plurality of threads. As such, the thread that has been identified for the current operation, one of its operation codes is being identified for the current operation. The processing then continues by determining a particular location of a particular one of a plurality of data flow memory devices based on the particular thread and the particular operation code for storing the result of the current operation.Type: GrantFiled: April 21, 2000Date of Patent: January 6, 2004Assignee: ATI International, SrlInventors: Michael Andrew Mang, Michael Mantor
-
Publication number: 20040003215Abstract: A method and apparatus for executing low power validations for high confidence predictions. More particularly, the present invention pertains to using confidence levels of speculative executions to decrease power consumption of a processor without affecting its performance. Non-critical instructions, or those instructions whose prediction, rather than verification, lie on the critical path, can thus be optimized to consume less power.Type: ApplicationFiled: June 28, 2002Publication date: January 1, 2004Inventors: Evgeni Krimer, Bishara Shomar, Ronny Ronen, Doron Orenstein
-
Method and system dynamically presenting the branch target address in conditional branch instruction
Patent number: 6662295Abstract: The present invention is related to branch instructions in a pipeline process of a microprocessor system. The microprocessor system executes branch prediction if a conditional branch instruction code calls for branch prediction, and on the other hand, suspends successive instruction execution until a branch evaluation of the conditional branch instruction settles if the conditional branch instruction code does not call for branch prediction.Type: GrantFiled: September 10, 1998Date of Patent: December 9, 2003Assignee: Ricoh Company, Ltd.Inventor: Shinichi Yamaura -
Patent number: 6651247Abstract: In a computer having rotating registers, a schedule-assigner for allocating the rotating registers. The scheduler-assigner includes a software-pipelined instruction scheduler that generates a first software-pipelined instruction schedule based on an intermediate representation that has data flow information in SSA form. The scheduler-assigner also includes a rotating register allocator that designates live ranges of loop-variant variables in the first software-pipelined instruction schedule as being allocated to rotating registers, when available. The first software-pipelined instruction schedule may be a modulo schedule. When a rotating register is not available, the software-pipelined instruction scheduler may generate a second software-pipelined instruction schedule having an initiation interval greater than the initiation interval of the first software-pipelined instruction schedule.Type: GrantFiled: May 9, 2000Date of Patent: November 18, 2003Assignee: Hewlett-Packard Development Company, L.P.Inventor: Uma Srinivasan
-
Patent number: 6643770Abstract: A mispredicted path side memory is configured to be coupled to a stage in an instruction pipeline. As instructions advance through the pipeline, a result from the stage is stored into the mispredicted path side memory. The result is restored from the mispredicted path side memory into a pipeline stage when a branch is mispredicted.Type: GrantFiled: September 16, 1999Date of Patent: November 4, 2003Assignee: Intel CorporationInventor: Nicolas I. Kacevas
-
Patent number: 6636960Abstract: The system is a method and an apparatus for resteering failing speculation check instructions in the pipeline of a processor. A branch offset immediate value and an instruction pointer correspond to each failing instruction. These values are used to determine the correct target recovery address. A relative adder adds the immediate value and the instruction pointer value to arrive at the target recovery address. This is done by flushing the pipeline upon the occurrence of a failing speculation check instruction. The pipeline flush is extended to allow the instruction stream to be resteered. The immediate value and the instruction pointer are then routed through the existing data paths of the pipeline, into the relative adder, which calculates the correct address. A sequencer tracks the progression of these values through the pipeline and causes a branch at the desired time.Type: GrantFiled: February 16, 2000Date of Patent: October 21, 2003Assignee: Hewlett-Packard Development Company, L.P.Inventors: James Douglas Gibson, Rohit Bhatia
-
Publication number: 20030188141Abstract: One embodiment of the present invention provides a system that facilitates interleaved execution of a head thread and a speculative thread within a single processor pipeline. The system operates by executing program instructions using the head thread, and by speculatively executing program instructions in advance of the head thread using the speculative thread, wherein the head thread and the speculative thread execute concurrently through time-multiplexed interleaving in the single processor pipeline.Type: ApplicationFiled: February 12, 2003Publication date: October 2, 2003Inventors: Shailender Chaudhry, Marc Tremblay
-
Publication number: 20030182539Abstract: It has been determined that, in a superscalar computer processor, executing load instructions issued along an incorrectly predicted path of a conditional branch instruction eventually reduces the number of cache misses observed on the correct branch path. Executing these wrong-path loads provides an indirect prefetching effect. If the processor has a small L1 data cache, however, this prefetching pollutes the cache causing an overall slowdown in performance. By storing the execution results of mispredicted paths in memory, such as in a wrong path cache, the pollution is eliminated. A wrong path cache can improve processor performance up to 17% in simulations using a 32 KB data cache. A fully-associative eight-entry wrong path cache in parallel with a 4 KB direct-mapped data cache allows the execution of wrong path loads to produce an average processor speedup of 46%. The wrong path cache also results in 16% better speedup compared to the baseline processor equipped with a victim cache of the same size.Type: ApplicationFiled: March 20, 2002Publication date: September 25, 2003Applicant: International Business Machines CorporationInventors: Steven R. Kunkel, David J. Lilja, Resit Sendag
-
Publication number: 20030182542Abstract: In accordance with one embodiment, the invention provides a method comprising monitoring a power consumption of a processor in executing a program while running in a speculative execution mode wherein instructions are speculatively executed; and turning off said speculative execution mode if said power consumption is above a predetermined threshold. According to another embodiment the invention provides a processor comprising a speculative mode wherein instructions are speculatively executed; a non-speculative execution mode wherein instructions are executed non-speculatively; and a speculation control mechanism to selectively cause said processor to operate in said non-speculative mode based on a power consumption criterion.Type: ApplicationFiled: March 20, 2002Publication date: September 25, 2003Inventors: Robert L. Davies, Aaron M. Tsirkel
-
Publication number: 20030177343Abstract: A multi-processing computer architecture and a method of operating the same are provided. The multi-processing architecture provides a main processor and multiple sub-processors cascaded together to efficiently execute loop operations. The main processor executes operations outside of a loop and controls the loop. The multiple sub-processors are operably interconnected, and are each assigned by the main processor to a given loop iteration. Each sub-processor is operable to receive one or more sub-instructions sequentially, operate on each sub-instruction and propagate the sub-instruction to a subsequent sub-processor.Type: ApplicationFiled: July 24, 2002Publication date: September 18, 2003Applicant: Sony Computer Entertainment America Inc.Inventor: Hidetaka Magoshi
-
Patent number: 6604191Abstract: An instruction fetching system (and/or architecture) which may be utilized by a high-frequency short-pipeline microprocessor, for efficient fetching of both in-line and target instructions. The system contains an instruction fetching unit (IFU), having a control logic and associated components for controlling a specially designed instruction cache (I-cache). The I-cache is a sum-address cache, i.e., it receives two address inputs, which compiled by a decoder to provide the address of the line of instructions desired fetch. The I-cache is designed with an array of cache lines that can contain 32 instructions, and three buffers that each have a capacity of 32 instructions.Type: GrantFiled: February 4, 2000Date of Patent: August 5, 2003Assignee: International Business Machines CorporationInventors: Brian King Flacks, David Meltzer, Joel Abraham Silberman
-
Publication number: 20030135722Abstract: A system, method and apparatus is provided that splits a microprocessor load instruction into two (2) parts, a speculative load instruction and a check speculative load instruction. The speculative load instruction can be moved ahead in the instruction stream by the compiler as soon as the address and result registers are available. This is true even when the data to be loaded is not actually required. This speculative load instruction will not cause a fault in the memory if the access is invalid, i.e. the load misses and a token bit is set. The check speculative load instruction will cause the speculative load instruction to be retried in the event the token bit was set equal to one. In this manner, the latency associated with branching to an interrupt routine will be eliminated a significant amount of the time. It is very possible that the reasons for invalidating the speculative load operation are no longer present (e.g. page in memory is not present) and the load will be allowed to complete.Type: ApplicationFiled: January 10, 2002Publication date: July 17, 2003Applicant: International Business Machines CorporationInventor: Andrew Johnson
-
Publication number: 20030126416Abstract: Techniques for suspending execution of a thread in a multi-threaded processor. In one embodiment, a processor includes resources that can be partitioned between multiple threads. Processor logic receives an instruction in a first thread of execution, and, in response to that instruction, relinquishes portions of the portioned resources for use by other threads.Type: ApplicationFiled: December 31, 2001Publication date: July 3, 2003Inventors: Deborah T. Marr, Dion Rodgers, David L. Hill, Shiv Kaushik, James B. Crossland, David A. Koufaty
-
Publication number: 20030126417Abstract: A method and apparatus to execute data speculative instructions in a processor comprising at least one source register, each source register comprising a bit to indicate validity of data in the at least one source register. A data validity circuit coupled to the one or more source registers to determine the validity of the data in the source registers, and to indicate the validity of the data in a destination register based upon the validity bit in the at least one source register. The processor optionally comprising a checker unit to retire those instructions from the execution unit which write valid data to the destination register, and to re-schedules those instructions for execution which write invalid data to the destination register.Type: ApplicationFiled: January 2, 2002Publication date: July 3, 2003Inventors: Eric Sprangle, Michael J. Haertel, David J. Sager
-
Patent number: 6564315Abstract: A scheduler issues instruction operations for execution, but also retains the instruction operations. If a particular instruction operation is subsequently found to be required to execute non-speculatively, the particular instruction operation is still stored in the scheduler. Subsequent to determining that the particular operation has become non-speculative (through the issuance and execution of instruction operations prior to the particular instruction operation), the particular instruction operation may be reissued from the scheduler. The penalty for incorrect scheduling of instruction operations which are to execute non-speculatively may be reduced as compared to purging the particular instruction operation and younger instruction operations from the pipeline and refetching the particular instruction operation. Additionally, the scheduler may maintain the dependency indications for each instruction operation which has been issued.Type: GrantFiled: January 3, 2000Date of Patent: May 13, 2003Assignee: Advanced Micro Devices, Inc.Inventors: James B. Keller, Ramsey W. Haddad, Stephan G. Meier
-
Publication number: 20030088760Abstract: According to one aspect of the invention, a method is provided in which store addresses of store instructions dispatched during a last predetermined number of cycles are maintained in a first data structure of a first processor. It is determined whether a load address of a first load instruction matches one of the store addresses in the first data structure. The first load instruction is replayed if the load address of the first load instruction matches one of the store addresses in the first data structure.Type: ApplicationFiled: October 24, 2002Publication date: May 8, 2003Inventors: Muntaquim F. Chowdhury, Douglas M. Carmean
-
Publication number: 20030084274Abstract: In processing an instruction request, the invention determines whether the request is speculative or not based upon a bit field within the instruction. If the request is speculative, bus congestion and/or target memory is assessed for conditions and a decision is made, based on the conditions, as to whether or not to process the request. To facilitate the invention, certain bit fields within the instruction are encoded to identify the request as speculative or not. Additional bit fields may define a priority of a speculative request to influence the decision to process as based on the conditions. CPU architectures incorporating prefetch logic may be modified to recognize instructions encoded with speculation and priority identification fields to implement the invention in existing systems. Other logic, e.g., bus controllers and switches, may similarly process speculative requests to enhance system performance.Type: ApplicationFiled: October 26, 2001Publication date: May 1, 2003Inventors: Blaine D. Gaither, Robert J. Brooks
-
Publication number: 20030079116Abstract: One embodiment of the present invention provides a system that predicts a result produced by a section of code in order to support speculative program execution. The system begins by executing the section of code using a head thread in order to produce a result. Before the head thread produces the result, the system generates a predicted result to be used in place of the result. Next, the system allows a speculative thread to use the predicted result in speculatively executing subsequent code that follows the section of code. After the head thread finishes executing the section of code, the system determines if a difference between the predicted result and the result generated by the head thread has affected execution of the speculative thread. If so, the system executes the subsequent code again using the result generated by the head thread. If not, the system performs a join operation to merge state associated with the speculative thread with state associated with the head thread.Type: ApplicationFiled: January 16, 2001Publication date: April 24, 2003Inventors: Shailender Chaudlhry, Marc Tremblay
-
Publication number: 20030074544Abstract: A method for conditionally performing a SIMD operation causing a predetermined number of result objects to be held in a combination of different ones of a plurality of destination stores, the method comprising receiving and decoding instruction fields to determine at least one source store, a plurality of destination stores and at least one control store, said source and destination stores being capable of holding one or a plurality of objects, each object defining a SIMD lane. Conditional execution of the operation on a per SIMD lane basis is controlled using a plurality of pre-set indicators of the at least one control store designated in the instruction, wherein each said pre-set indicator i controls a predetermined number of result lanes p, where p takes a value greater than or equal to two. A predetermined number of result objects are sent to the destination stores such that the predetermined number of result objects are held by a combination of different ones of the plurality of destination stores.Type: ApplicationFiled: April 11, 2002Publication date: April 17, 2003Inventor: Sophie Wilson
-
Publication number: 20030046517Abstract: One embodiment of the present invention provides a system to facilitate multithreading a computer processor pipeline. The system includes a pipeline that is configured to accept instructions from multiple independent threads of operation, wherein each thread of operation is unrelated to the other threads of operation. This system also includes a control mechanism that is configured to control the pipeline. This control mechanism is statically scheduled to execute multiple threads in round-robin succession. This static scheduling eliminates the need for communication between stages of the pipeline.Type: ApplicationFiled: September 4, 2001Publication date: March 6, 2003Inventor: Gary R. Lauterbach
-
Publication number: 20030037226Abstract: A processor architecture includes a program counter which executes M independent program streams in time division in units of one instruction, a pipeline which is shared by each of the program streams and has N pipeline stages operable at a frequency F, and a mechanism which executes only s program streams depending on a required operation performance, where M and N are integers greater than or equal to one and having no mutual dependency, s is an integer greater than or equal to zero and satisfying s≦M. An apparent number of pipeline stages viewed from each of the program streams is set to N/M so that M parallel processors having an apparent operating frequency F/M are formed.Type: ApplicationFiled: April 29, 2002Publication date: February 20, 2003Applicant: FUJITSU LIMITEDInventors: Toru Tsuruta, Norichika Kumamoto, Hideki Yoshizawa
-
Patent number: 6523110Abstract: There is provided a decoupled fetch-execute engine with static branch prediction support. A method for prefetching targets of branch instructions in a computer processing system having instruction fetch decoupled from an execution pipeline includes the step of generating a prepare-to-branch (PBR) operation. The PBR operation includes address bits corresponding to a branch paired thereto and address bits corresponding to an expected target of the branch. The execution of the PBR operation is scheduled prior to execution of the paired branch to enforce a desired latency therebetween. Upon execution of the PBR operation, it is determined whether the paired branch is available using the address bits of the PBR operation corresponding to the paired branch. When the paired branch is available, the expected branch target is fetched using the address bits of the PBR operation corresponding to the expected branch target.Type: GrantFiled: July 23, 1999Date of Patent: February 18, 2003Assignee: International Business Machines CorporationInventors: Arthur A. Bright, Jason E. Fritts
-
Publication number: 20030033511Abstract: In one embodiment of the invention, a processor includes an execution pipeline to concurrently execute at least portions of threads, wherein at least one of the threads is dependent on at least another one of the threads. The processor also includes detection circuitry to detect speculation errors in the execution of the threads. In another embodiment, the processor includes thread management logic to control dynamic creation of threads from a program.Type: ApplicationFiled: October 8, 2002Publication date: February 13, 2003Inventors: Haitham Akkary, Kingsum Chow
-
Publication number: 20030033510Abstract: Mechanisms and techniques operate in a computerized device to enable or disable speculative execution of instructions such as reordering of load and store instructions a multiprocessing computerized device. The mechanisms and techniques provide a speculative execution controller that can detect a multiaccess memory condition between the first and second processors, such as concurrent access to shared data pages via page table entries. This can be done by monitoring page table entry accesses by other processors. The speculative execution controller sets a value of a speculation indicator in the memory system based on the multiaccess memory condition. If the value of the speculation indicator indicates that speculative execution of instructions is allowed in the computerized device, the speculative execution controller allows speculative execution of instructions in at least one of the first and second processors in the computerized device.Type: ApplicationFiled: January 3, 2002Publication date: February 13, 2003Inventor: David Dice
-
Publication number: 20030028755Abstract: In a parallel processor system for executing a plurality of threads which are obtained by dividing a single program in parallel each other by a plurality of processors, when a processor executing a master thread conducts forking of a slave thread in other processor, at every write to a general register in the master thread after forking, the fork source processor transmits an updated register value to the fork destination processor through a communication bus. The fork destination processor executes the slave thread for speculation and upon detecting an offense against Read After Write (RAW) related to the general register, cancels the thread being executed to conduct re-execution of the thread.Type: ApplicationFiled: June 7, 2002Publication date: February 6, 2003Applicant: NEC CORPORATIONInventors: Taku Ohsawa, Satoshi Matsushita
-
Patent number: 6516462Abstract: Compiler optimization methods and systems for preventing delays associated with a speculative load operation on a data when the data is not in the data cache of a processor. A compiler optimizer analyzes various criteria to determine whether a cache miss savings transformation is useful. Depending on the results of the analysis, the load operation and/or the successor operations to the load operation are transferred into a predicated mode of operation to enhance overall system efficiency and execution speed.Type: GrantFiled: February 17, 2000Date of Patent: February 4, 2003Assignee: Elbrus InternationalInventors: Sergev K. Okunev, Vladimir Y. Volkonsky
-
Publication number: 20030023838Abstract: In lieu of branch prediction, a merged fetch-branch unit operates in parallel with the decode unit within a processor. Upon detection of a branch instruction within a group of one or more fetched instructions, any instructions preceding the branch are marked regular instructions, the branch instruction is marked as such, and any instructions following branch are marked sequential instructions. Within two cycles, sequential instructions following the last fetched instruction are retrieved and marked, target instructions beginning at the branch target address are retrieved and marked, and the branch is resolved. Either the sequential or target instructions are then dropped depending on the branch resolution, incurring a fixed, 1 cycle branch penalty.Type: ApplicationFiled: July 27, 2001Publication date: January 30, 2003Inventors: Faraydon O. Karim, Ramesh Chandra
-
Publication number: 20030014612Abstract: A processor improves throughput efficiency and exploits increased parallelism by introducing multithreading to an existing and mature processor core. The multithreading is implemented in two steps including vertical multithreading and horizontal multithreading. The processor core is retrofitted to support multiple machine states. System embodiments that exploit retrofitting of an existing processor core advantageously leverage hundreds of man-years of hardware and software development by extending the lifetime of a proven processor pipeline generation. A processor implements N-bit flip-flop global substitution. To implement multiple machine states, the processor converts 1-bit flip-flops in storage cells of the stalling vertical thread to an N-bit global flip-flop where N is the number of vertical threads.Type: ApplicationFiled: May 11, 1999Publication date: January 16, 2003Inventors: WILLIAM N. JOY, MARC TREMBLAY, GARY LAUTERBACH, JOSEPH I. CHAMDANI
-
Publication number: 20030005262Abstract: The present invention provides a mechanism for supporting high bandwidth instruction fetching in a multi-threaded processor. A multi-threaded processor includes an instruction cache (I-cache) and a temporary instruction cache (TIC). In response to an instruction pointer (IP) of a first thread hitting in the I-cache, a first block of instructions for the thread is provided to an instruction buffer and a second block of instructions for the thread are provided to the TIC. On a subsequent clock interval, the second block of instructions is provided to the instruction buffer, and first and second blocks of instructions from a second thread are loaded into a second instruction buffer and the TIC, respectively.Type: ApplicationFiled: June 28, 2001Publication date: January 2, 2003Inventors: Sailesh Kottapalli, James S. Burns, Kenneth D. Shoemaker
-
Patent number: 6493820Abstract: In one embodiment of the invention, a processor includes an execution pipeline to concurrently execute at least portions of threads, wherein at least one of the threads is dependent on at least another one of the threads. The processor also includes detection circuitry to detect speculation errors in the execution of the threads. In another embodiment, the processor includes thread management logic to control dynamic creation of threads from a program.Type: GrantFiled: December 29, 2000Date of Patent: December 10, 2002Assignee: Intel CorporationInventors: Haitham Akkary, Kingsum Chow
-
Publication number: 20020184478Abstract: The problem of mis-match between a program counter (14) of a CPU (10) and a byte code counter (18) of an instruction path coprocessor (IPC) (16) is addressed by causing the IPC (16) to translate IPC branch instructions to the CPU branch instructions, in which the CPU branch instructions implicitly indicate whether a corresponding IPC branch instructions should be taken and in which the CPU branch instruction will cause the CPU (10) to set its own program counter (14) to a safe location in the IPC range to avoid overflow.Type: ApplicationFiled: April 8, 2002Publication date: December 5, 2002Inventors: Adrianus Josephus Bink, Alexander Augusteijn, Paul Ferenc Hoogendijk, Hendrikus Wilhelmus Johannes Van De Wiel
-
Publication number: 20020178349Abstract: When a processor executes a memory operation instruction by means of data dependence speculative execution, a speculative execution result history table which stores history information concerning success/failure results of the speculative execution of memory operation instructions of the past is referred to and thereby whether the speculative execution will succeed or fail is predicted. In the prediction, the target address of the memory operation instruction is converted by a hash function circuit into an entry number of the speculative execution result history table (allowing the existence of aliases), and an entry of the table designated by the entry number is referred to. If the prediction is “success”, the memory operation instruction is executed in out-of-order execution speculatively (with regard to data dependence relationship between the instructions).Type: ApplicationFiled: May 22, 2002Publication date: November 28, 2002Applicant: NEC CORPORATIONInventors: Atsufumi Shibayama, Satoshi Matsushita, Sunao Torii, Naoki Nishi
-
Publication number: 20020174328Abstract: A first tag is assigned to a branch instruction. Dependent on the type of branch instruction, a second tag is assigned to an instruction in the branch delay slot of the branch instruction. The second tag may equal the first tag if the branch delay slot is unconditional for that branch, and may equal a different tag if the branch delay slot is conditional for the branch. If the branch is mispredicted, the first tag is broadcast to pipeline stages that may have speculative instructions, and the first tag is compared to tags in the pipeline stages. If the tag in a pipeline stage matches the first tag, the instruction is not cancelled. If the tag mismatches, the instruction is cancelled.Type: ApplicationFiled: May 17, 2001Publication date: November 21, 2002Inventor: David A. Kruckemyer
-
Publication number: 20020144087Abstract: A kind of architecture of method for fetching microprocessor's instructions is provided to pre-read and pre-decode a next instruction. If the instruction pre-decoded is found a conditional branch instruction, an instruction reading-amount register is set for reading two instructions next to the current instruction in the program memory, or one is read instead if the next instruction is found an instruction other than the conditional branch one so as to waive reading of unnecessary program memory and thereby reduce power consumption.Type: ApplicationFiled: December 18, 2001Publication date: October 3, 2002Inventors: Pao-Lung Chen, Chen-Yi Lee
-
Publication number: 20020138717Abstract: A processor includes logic for tagging a thread identifier (TID) for usage with processor blocks that are not stalled. Pertinent non-stalling blocks include caches, translation look-aside buffers (TLB), a load buffer asynchronous interface, an external memory management unit (MMU) interface, and others. A processor includes a cache that is segregated into a plurality of N cache parts. Cache segregation avoids interference, “pollution”, or “cross-talk” between threads. One technique for cache segregation utilizes logic for storing and communicating thread identification (TID) bits. The cache utilizes cache indexing logic. For example, the TID bits can be inserted at the most significant bits of the cache index.Type: ApplicationFiled: May 23, 2002Publication date: September 26, 2002Inventors: William N. Joy, Marc Tremblay, Gary Lauterbach, Joseph I. Chamdani
-
Patent number: 6457117Abstract: The processor is configured to predecode instruction bytes prior to their storage within an instruction cache. During the predecoding, relative branch instructions are detected. The displacement included within the relative branch instruction is added to the address corresponding to the relative branch instruction, thereby generating the target address. The processor replaces the displacement field of the relative branch instruction with an encoding of the target address, and stores the modified relative branch instruction in the instruction cache. The branch prediction mechanism may select the target address from the displacement field of the relative branch instruction instead of performing an addition to generate the target address. In one embodiment, relative branch instructions having eight bit and 32-bit displacement fields are included in the instruction set executed by the processor.Type: GrantFiled: November 7, 2000Date of Patent: September 24, 2002Assignee: Advanced Micro Devices, Inc.Inventor: David B. Witt
-
Publication number: 20020129227Abstract: A time multiplex changing function for priorities among threads is added to a multi-thread processor, and capability for large-scale out-of-order execution is achieved by confining the flows of data among threads, prescribing the execution order in the flow sequence, and executing a plurality of threads having data dependency either simultaneously or in time multiplex.Type: ApplicationFiled: December 20, 2001Publication date: September 12, 2002Inventor: Fumio Arakawa
-
Patent number: 6430682Abstract: Reliable branch predictions for real-time applications reduce both conditional branch execution time and uncertainties associated with their prediction in a computer implemented application. One method ensures that certain conditional branches are always correctly predicted, effectively converting them to jump instructions during program execution. Another method exploits the fact that some conditional branches always branch in the same direction within a task invocation, although that direction may vary across invocations. These methods improve computer processor utilization and performance.Type: GrantFiled: September 11, 1998Date of Patent: August 6, 2002Assignee: Agere Systems Guardian Corp.Inventor: Harry Dwyer, III
-
Publication number: 20020091913Abstract: By using an entry number (WRB number) of a re-order buffer 6, each of function units such as an operation unit 3, a store unit 4, a load unit 5, etc. notifies to the re-order buffer 6 the processing end for a instruction stored in the entry concerned in the unit thereof. The load unit 5 manages the latest speculation state of a load instruction issued on the basis of a branch prediction success/failure signal output from the branch unit 2, and makes no notification to the re-order buffer 6 on the basis of WRB number for subsequent load instructions of a branch-prediction failed branch instruction even when the processing of the instruction is finished. Accordingly, the re-order buffer 6 can re-use entries in which the subsequent instructions of the branch prediction failed branch instruction are stored.Type: ApplicationFiled: January 9, 2002Publication date: July 11, 2002Applicant: NEC CORPORATIONInventor: Masao Fukagawa
-
Publication number: 20020087849Abstract: Described is a data processing system and processor that provides full multiprocessor speculation by which all instructions subsequent to barrier operations in a instruction sequence are speculatively executed before the barrier operation completes on the system bus. The processor comprises a load/store unit (LSU) with a barrier operation (BOP) controller that permits load instructions subsequent to syncs in an instruction sequence to be speculatively issued by the LRQ prior to the return of the sync acknowledgment. Load data returned by the speculative load request is immediately forwarded to the processor's execution units for speculative execution with subsequent instructions. The returned data and results of subsequent operations are held temporarily in the rename registers. A multiprocessor speculation flag is set in the corresponding rename registers to indicate that the value is “barrier” speculative.Type: ApplicationFiled: December 28, 2000Publication date: July 4, 2002Applicant: International Business Machines CorporationInventors: Ravi Kumar Arimilli, John Steven Dodson, Guy Lynn Guthrie, Derek Edward Williams
-
Publication number: 20020078326Abstract: In one embodiment, a programmable processor is adapted to include a speculative count register. The speculative count register may be loaded with data associated with an instruction before the instruction commits. However, if the instruction is terminated before it commits, the speculative count register may be adjusted. A set of counters may monitor the difference between the speculative count register and its architectural counterpart.Type: ApplicationFiled: December 20, 2000Publication date: June 20, 2002Applicant: Intel Corporation and Analog Devices, Inc.Inventors: Charles P. Roth, Ravi P. Singh, Gregory A. Overkamp
-
Publication number: 20020073301Abstract: A method of executing microprocessor instructions and an associated microprocessor are disclosed. Initially, a conditional branch instruction is fetched from a storage unit such as an instruction cache. Branch prediction information embedded in the branch instruction is detected by a fetch unit of the microprocessor. Depending upon the state of the branch prediction information, instructions from the branch-taken path and the branch-not-taken path of the branch instruction are fetched. The branch-not-taken path instructions and the branch-taken path instruction may be speculatively executed. Upon executing the conditional branch instruction, the speculative results from the branch-taken path are discarded if the branch is not taken and speculative results from the branch-not-taken path are discarded if the branch is taken. The branch prediction information may include compiler generated information indicative of the context in which the conditional branch instruction is used.Type: ApplicationFiled: December 7, 2000Publication date: June 13, 2002Applicant: International Business Machines CorporationInventors: James Allan Kahle, Charles Roberts Moore