Systems and methods of dynamic branch prediction in a microprocessor
A hybrid branch prediction scheme for a multi-stage pipelined microprocessor that combines features of static and dynamic branch prediction to reduce complexity and enhance performance over conventional branch prediction techniques. Prior to microprocessor deployment, a branch prediction table is populated using static branch prediction techniques by executing instructions analogous to those to be executed during microprocessor deployment. The branch prediction table is stored, and then loaded into the BPU during deployment, for example, at the time of microprocessor power on. Dynamic branch prediction is then performed using the pre-loaded data, thereby enabling dynamic branch prediction with a required “warm-up” period. After resolving each branch in the selection stage of the microprocessor instruction pipeline, the BPU is updated with the address of the next instruction that resulted from that branch to enhance performance.
This application claims priority to provisional application No. 60/572,238 filed May 19, 2004, entitled “Microprocessor Architecture” hereby incorporated by reference in its entirety.
FIELD OF THE INVENTIONThis invention relates generally to microprocessor architecture and more specifically to improved systems and methods for performing branch prediction in a multi-stage pipelined microprocessor.
BACKGROUND OF THE INVENTIONMultistage pipeline microprocessor architecture is known in the art. A typical microprocessor pipeline consists of several stages of instruction handling hardware, wherein each rising pulse of a clock signal propagates instructions one stage further in the pipeline. Although the clock speed dictates the number of clock signals and therefore pipeline propagations per second, the effective operational speed of the processor is dependent partially upon the rate that instructions and operands are transferred between memory and the processor.
One method of increasing processor performance is branch prediction. Branch prediction uses instruction history to predict whether a branch or non-sequential instruction will be taken. Branch or non-sequential instructions are processor instructions that require a jump to a non-sequential memory address if a condition is satisfied. When an instruction is retrieved or fetched, if the instruction is a conditional branch, the result of the conditional branch, that is, the address of the next instruction to be executed following the conditional branch, is speculatively predicted based on past branch history. This predictive or speculative result is injected into the pipeline by referencing a branch history table. Whether or not the prediction is correct will not be known until a later stage of the pipeline. However, if the prediction is correct, several clock cycles will be saved by not having to go back to get the next non-sequential instruction address.
If the prediction is incorrect, the current pipeline behind the stage in which the prediction is determined to be incorrect must be flushed and the correct branch inserted back in the first stage. This may seem like a severe penalty in the event of an incorrect prediction because it results in the same number of clock cycles as if no branch prediction were used. However, in applications where small loops are repeated many times, such as applications typically implemented with embedded processors, branch prediction has a sufficiently high success rate that the benefits associated with correct predictions outweigh the cost of occasional incorrect predictions—i.e., pipeline flush. In these types of embedded applications branch prediction can achieve accuracy over ninety percent of the time. Thus, the risk of predicting an incorrect branch resulting in a pipeline flush is outweighed by the benefit of saved clock cycles.
There are essentially two techniques for implementing branch prediction. The first, dynamic branch prediction, records runtime program flow behavior in order to establish a history that can be used at the front of the pipeline to predict future non-sequential program flow. When a branch instruction comes in, the look up table is referenced for the address of the next instruction which is then predictively injected into the pipeline. Once the look up table is populated with a sufficient amount of data, dynamic branch prediction significantly increases performance. However, this technique is initially ineffective, and can even reduce system performance until a sufficient number of instructions have been processed to fill the branch history tables. Because of the required “warm-up” period for this technique to become effective, runtime behavior of critical code could become unpredictable making it unacceptable for certain embedded applications. Moreover, as noted above, mistaken branch predictions result in a flush of the entire pipeline wasting clock cycles and retarding performance.
The other primary branch prediction technique is static branch prediction. Static branch prediction uses profiling techniques to guide the complier to generate special branch instructions. These special branch instructions typically include hints to guide the processor to perform speculative branch prediction earlier in the pipeline when not all information required for branch resolution is yet available. However, a disadvantage of static branch prediction techniques is that they typically complicate the processor pipeline design because speculative as well as actual branch resolution has to be performed in several pipeline stages. Complication of design translates to increased silicon footprint and higher cost. Static branch prediction techniques can yield accurate results but they cannot cope with variation of run-time conditions. Therefore, static branch prediction also suffers from limitations which reduce its appeal for critical embedded applications.
Thus, it would be desirable to have a branch prediction technique that ameliorates and ideally eliminates one or more of the above-noted deficiencies of conventional branch prediction techniques. However, it should be appreciated that the description herein of various advantages and disadvantages associated with known apparatus, methods, and materials is not intended to limit the scope of the invention to their exclusion. Indeed, various embodiments of the invention may include one or more of the known apparatus, methods, and materials without suffering from their disadvantages.
As background to the techniques discussed herein, the following references are incorporated herein by reference: U.S. Pat. No. 6,862,563 issued Mar. 1, 2005 entitled “Method And Apparatus For Managing The Configuration And Functionality Of A Semiconductor Design” (Hakewill et al.); U.S. Ser. No. 10/423,745 filed Apr. 25, 2003, entitled “Apparatus and Method for Managing Integrated Circuit Designs”; and U.S. Ser. No. 10/651,560 filed Aug. 29, 2003, entitled “Improved Computerized Extension Apparatus and Methods”, all assigned to the assignee of the present invention.
SUMMARY OF THE INVENTIONVarious embodiments of the invention may ameliorate or overcome one or more of the shortcomings of conventional branch prediction techniques through a hybrid branch prediction technique that takes advantage of features of both static and dynamic branch prediction.
At least one exemplary embodiment of the invention may provide a method of performing branch prediction in a microprocessor having a multi-stage instruction pipeline. The method of performing branch prediction according to this embodiment comprises building a branch prediction history table of branch prediction data through static branch prediction prior to microprocessor deployment, storing the branch prediction data in a memory in the microprocessor, loading the branch prediction data into a branch prediction unit (BPU) of the microprocessor upon powering on, and performing dynamic branch prediction with the BPU based on the preloaded branch prediction data.
At least one additional exemplary embodiment of the invention may provide a method of enhancing branch prediction performance of a multi-stage pipelined microprocessor employing dynamic branch prediction. The method of enhancing branch prediction performance according to this embodiment comprises performing static branch prediction to build a branch prediction history table of branch prediction data prior to microprocessor deployment, storing the branch prediction history table in a memory in the microprocessor, loading the branch prediction history table into a branch prediction unit (BPU) of the microprocessor, and performing dynamic branch prediction with the BPU based on the preloaded branch prediction data.
Yet an additional exemplary embodiment of the invention may provide an embedded microprocessor architecture. The embedded microprocessor architecture according to this embodiment comprises a multi-stage instruction pipeline, and a BPU adapted to perform dynamic branch prediction, wherein the BPU is preloaded with branch history table created through static branch prediction, and subsequently updated to contain the actual address of the next instructed that resulted from that branch during dynamic branch prediction.
Other aspects and advantages of the invention will become apparent from the following detailed description, taken in conjunction with the accompanying drawings, illustrating by way of example the principles of the invention.
BRIEF DESCRIPTION OF THE DRAWINGS
The following description is intended to convey a thorough understanding of the invention by providing specific embodiments and details involving various aspects of a new and useful microprocessor architecture. It is understood, however, that the invention is not limited to these specific embodiments and details, which are exemplary only. It further is understood that one possessing ordinary skill in the art, in light of known systems and methods, would appreciate the use of the invention for its intended purposes and benefits in any number of alternative embodiments, depending upon specific design and other needs.
Another typical component of the fetch stage 110 of a multi-stage pipelined microprocessor is the branch prediction unit (BPU) 114. The branch prediction unit 114 increases processing speed by predicting whether a branch to a non-sequential instruction will be taken based upon past instruction processing history. The BPU 114 contains a branch look-up or prediction table that stores the address of branch instructions and an indication as to whether the branch was taken. Thus, when a branch instruction is fetched, the look-up table is referenced to make a prediction as to the address of the next instruction. As discussed herein, whether or not the prediction is correct will not be known until a later stage of the pipeline. In the example shown in
With continued reference to
Referring now to
To alleviate the limitations of both dynamic and static branch prediction techniques, the present invention discloses a hybrid branch prediction technique that combines the benefits of both dynamic and static branch prediction. With continued reference to
After developing a table of branch prediction data during static branch prediction, operation of the method continues to step 210 where the branch prediction table is stored in memory. In various exemplary embodiments, this step will involve storing the branch prediction table in a non-volatile memory that will be available for future use by the processor. Then, in step 215, when the processor is deployed in the desired embedded application, the static branch prediction data is preloaded into the branch history table in the BPU. In various exemplary embodiments, the branch prediction data is preloaded at power-up of the microprocessor, such as, for example, at power-up of the particular product containing the processor.
Operation of the method then advances to step 220 where, during ordinary operation, dynamic branch prediction is performed based on the preloaded branch prediction data without requiring a warm-up period or without unstable results. Then, in step 225, after resolving each branch in the selection stage of the multistage processor pipeline, the branch prediction table in the BPU is updated with the results to improve accuracy of the prediction information as necessary. Operation of the method terminates in step 230. It should be appreciated that in various exemplary embodiments, each time the processor is powered down, that the “current” branch prediction table may be stored in non-volatile memory so that each time the processor is powered up, the most recent branch prediction data is loaded into the BPU.
Referring now to
While the foregoing description includes many details and specificities, it is to be understood that these have been included for purposes of explanation only. The embodiments of the present invention are not to be limited in scope by the specific embodiments described herein. For example, although many of the embodiments disclosed herein have been described with reference to branch prediction in embedded RISC-type microprocessors, the principles herein are equally applicable to branch prediction in microprocessors in general. Indeed, various modifications of the embodiments of the present inventions, in addition to those described herein, will be apparent to those of ordinary skill in the art from the foregoing description and accompanying drawings. Thus, such modifications are intended to fall within the scope of the following appended claims. Further, although the embodiments of the present inventions have been described herein in the context of a particular implementation in a particular environment for a particular purpose, those of ordinary skill in the art will recognize that its usefulness is not limited thereto and that the embodiments of the present inventions can be beneficially implemented in any number of environments for any number of purposes. Accordingly, the claims set forth below should be construed in view of the full breadth and spirit of the embodiments of the present inventions as disclosed herein.
Claims
1. A method of performing branch prediction in a microprocessor having a multistage instruction pipeline, the method comprising:
- building a branch prediction history table of branch prediction data through static branch prediction prior to microprocessor deployment;
- storing the branch prediction data in a memory;
- loading the branch prediction data into a branch prediction unit (BPU) of the microprocessor upon power on; and
- performing dynamic branch prediction with the BPU based on the preloaded branch prediction data.
2. The method according to claim 1, further comprising updating the branch prediction data in the BPU if, during instruction processing, prediction data changes.
3. The method according to claim 2 wherein updating comprises after resolving a branch in a select stage of the instruction pipeline, updating the BPU with the address of a next instruction that resulted from that branch.
4. The method according to claim 1, wherein building a branch prediction history table comprises simulating instructions that will be executed by the processor during deployment and populating a table of branch history with information indicating whether conditional branches were taken or not.
5. The method according to claim 4, wherein building comprises using at least one of a simulator and a compiler to generate branch history.
6. The method according to claim 1, wherein performing dynamic branch prediction with the branch prediction unit based on the preloaded branch prediction data comprises parsing a branch history table in the BPU that indexes non-sequential instructions by their addresses in association with the next instruction taken.
7. The method according to claim 1, wherein the microprocessor is an embedded microprocessor.
8. The method according to claim 1, further comprising after performing dynamic branch prediction, storing branch history data in the branch prediction unit in a non-volatile memory for preload upon subsequent microprocessor use.
9. In a multistage pipeline microprocessor employing dynamic branch prediction, the method of enhancing branch prediction performance comprising:
- performing static branch prediction to build a branch prediction history table of branch prediction data prior to microprocessor deployment;
- storing the branch prediction history table in a memory;
- loading the branch prediction history table into a branch prediction unit (BPU) of the microprocessor; and
- performing dynamic branch prediction with the BPU based on the preloaded branch prediction data.
10. The method according to claim 9, wherein static branch prediction is performed prior to microprocessor deployment.
11. The method according to claim 9, wherein loading the branch prediction table is performed subsequent to microprocessor power on.
12. The method according to claim 9, further comprising updating the branch prediction data in the BPU if, during instruction processing, prediction data changes.
13. The method according to claim 12, wherein the microprocessor includes an instruction pipeline having a select stage, and updating comprises after resolving a branch in the select stage, updating the BPU with the address of the next instruction resulting from that branch.
14. The method according to claim 9, wherein building a branch prediction history table comprises simulating instructions that will be executed by the processor during deployment and populating a table of branch history with information indicating whether conditional branches were taken or not.
15. The method according to claim 14, wherein building comprises using at least one of a simulator and a compiler to generate branch history.
16. The method according to claim 9, wherein performing dynamic branch prediction with the branch prediction unit based on the preloaded branch prediction data comprises parsing a branch history table in the BPU that indexes non-sequential instructions by their addresses in association with the next instruction taken.
17. The method according to claim 9, wherein the microprocessor is an embedded microprocessor.
18. The method according to claim 9, further comprising after performing dynamic branch prediction, storing branch history data in the branch prediction unit in a non-volatile memory for preload upon subsequent microprocessor use
19. An embedded microprocessor comprising:
- a multistage instruction pipeline; and
- a BPU adapted to perform dynamic branch prediction, wherein the BPU is preloaded with branch history table created through static branch prediction, and subsequently updated to contain the actual address of the next instruction that resulted from that branch during dynamic branch prediction.
20. The microprocessor according to claim 19, wherein the branch history table contains data generated prior to microprocessor deployment and the BPU is preloaded at power on of the microprocessor.
21. The microprocessor according to claim 19, wherein after resolving a branch in a select stage of the instruction pipeline, the BPU is updated to contain the address of the next instruction that resulted from that branch.
22. The microprocessor according to claim 19, wherein the BPU is preloaded with a branch history table created through static branch prediction during a simulation processing that simulated instructions that will be executed by the microprocessor during deployment and wherein the BPU comprises a branch history table that indexes non-sequential instructions by their addresses in association with the next instruction taken.
Type: Application
Filed: May 19, 2005
Publication Date: Dec 15, 2005
Inventors: Aris Aristodemou (London), Rich Fuhler (Santa Cruz, CA), Kar-Lik Wong (Wokinham)
Application Number: 11/132,423