Method of evaluating system performance
The system performance evaluation method of the present invention confirms the existence of the occurrence of a memory access penalty for each cycle (S101) and executes a CPU model only when a memory access penalty has not occurred (S202).
Latest Matsushita Electric Industrial Co., Ltd. Patents:
- Cathode active material for a nonaqueous electrolyte secondary battery and manufacturing method thereof, and a nonaqueous electrolyte secondary battery that uses cathode active material
- Optimizing media player memory during rendering
- Navigating media content by groups
- Optimizing media player memory during rendering
- Information process apparatus and method, program, and record medium
1. Field of the Invention
The present invention relates to a technology for evaluating system performance for evaluating the performance of a system such as a system LSI, provided with a CPU and a memory.
2. Description of the Related Art
Conventionally, in system development in the field of device integration, a process that verifies whether the desired performance has been satisfied by using a system simulator that operates on a computer prior to integration of the actual system is very important.
Therefore, in recent years, systems that undergo simulation by means of a system simulator have often comprised a CPU (data processing device) and a memory (storage area). Normally, in order to solve the trade-off between speed and cost, a hierarchical structure comprising a cache system and so forth for caching data has been adopted for the memory, and this hierarchical structure has a huge influence on the performance of the system.
Therefore, as shown in
The conventional typical system simulator 2201 shown in
The system simulator 2201 comprises a CPU model 2204 that simulates the CPU mounted in the device integration system and a memory model 2205 that simulates the memory hierarchy constructed on the device integration system. Further, the system simulator 2201 comprises a scheduling portion 2202 that controls the execution order and so forth of the CPU model 2204 and memory model 2205. In addition, the scheduling portion 2202 contains an execution cycle number counting portion 2203 that counts the cycles in which simulation is performed.
The scheduling portion 2202 executes a model execution processing step S200 by means of the flow shown in
As shown in
Thereafter, the simulation state is evaluated in the termination condition judgment step S204 and, if the simulation state does not conform to the termination condition, the simulation is continued. If the simulation state conforms to the termination condition, the model execution processing step S200 is terminated.
In the case of the conventional method with the configuration above, a system simulation is performed to measure the performance of the system while reflecting the simulation result of the memory hierarchy in the CPU simulation.
However, it is not possible to specify whether there is a bottleneck in the system performance for both the CPU performance and memory performance simply by measuring the system performance. Therefore, the CPU performance is computed by totaling the memory access penalty (overhead time due to memory access) that occurs during system simulation and taking the differential of the execution time of the whole system and the memory access penalty.
However, there has been the possibility of overvaluing the CPU performance found in this manner. This will be described hereinafter by using
Furthermore, the CPU 4100 comprises a variety of pipeline stages delimited by a pipeline register and a register file 4121 that saves the context. The various pipeline stages of the CPU 4100 comprise an IF stage 4101 that fetches instructions, a DC stage 4102 that decodes the fetched instructions, an EX stage 4103 that executes the decoded instructions, a MEM stage 4104 that executes memory access by means of the executed instructions, a WB stage 4105 that changes the register file 4121 by means of the executed instructions, and a DIV1 stage 4111, DIV2 stage 4112, and DIV3 stage 4113 that perform division.
The respective pipeline stages operate in parallel, are capable of processing instructions one by one in each stage, and simultaneously process a plurality of instructions overall.
The execution states of the pipeline stages at a certain point in time are shown in
The four instructions in
DIV R0, R1: the value of register 0 is divided by the value of register 1 and the quotient is stored in register 0;
LD R2, (R3): the memory data of the address stored in the register 3 is read and stored in the register 2;
ADD R4, R0: the sum of the value of register 4 and the value of register 0 is stored in register 4;
MOV R5, R6: the value of register 6 is stored in register 5.
Each of these four instructions is fetched from the instruction memory in the order mentioned above. That is, at time T100, the DIV instruction fetch is started. The execution of the DIV instruction is performed in the DIV1 stage 4111 to DIV3 stage 4113. The ADD instruction that succeeds the DIV instruction executes computation processing by using the computation result of the DIV instruction.
The relationship between the DIV instruction and ADD instruction is known as data dependency. In the illustrated case where data dependence exists, the execution of the subsequent instruction (the ADD instruction in
As detailed above, in the simulation to evaluate CPU performance, when there is a drop in the execution performance arising from the pipeline stall that occurs due to the data dependence between instructions, if this pipeline stall is not correctly reproduced, the high performance is measured by means of the original CPU performance without consideration of the drop in execution performance. For example, in a state where a memory access penalty as shown in
As per
In the case of the LD instruction in
In the state of
As mentioned earlier, when the instruction execution time of the CPU rendered by removing the effect of the memory access penalty is computed by using conventional methods, the performance of the CPU is overvalued and, therefore, as a conventional countermeasure to this overvaluation, a system simulation environment rendered by removing an overhead pertaining to the memory hierarchy is additionally prepared and the genuine CPU performance is measured by using the system simulation environment to measure the effects on performance of the memory hierarchy.
However, with such a method, the simulation of the system is performed twice and there is the problem that the simulation time is extended. Moreover, there is also the problem of production costs for preparing two different simulation environments and errors resulting from the difference in conditions.
On the other hand, for a few conventional system performance evaluation methods, a method that isolates the CPU simulation and the simulation of the memory hierarchy in order to efficiently simulate the memory hierarchy as detailed in Japanese Koho Application Laid Open No. 2000-276381 (P 16, FIG. 1), for example, has been considered.
With this method, the system performance evaluation is performed efficiently by outputting the memory access log from the simulation results of the CPU and executing the cache simulation by using the outputted memory access log.
However, in conventional system performance evaluation methods as mentioned above, because software in the device integration field of recent years has grown complicated and the number of instructions executed by the system has rapidly increased, when there is a desire to accurately evaluate the CPU performance of the system by using the above method, the memory access log of the CPU is huge and there has been the problem that a huge disk capacity is therefore required.
SUMMARY OF THE INVENTIONThe present invention solves the conventional problems above and provides a method of evaluating system performance that allows it to accurately evaluate CPU performance in a case where the memory access penalty is 0, even with a small disk capacity, while performing a simulation of the memory hierarchy, and to correctly evaluate the performance even for a larger-scale system.
In order to solve this problem, a first invention is a system performance evaluation method for evaluating the performance of a system comprising at least one CPU and memory hierarchy. The method comprises a CPU simulation step of executing a simulation of the CPU, a memory simulation step of executing a simulation of the memory hierarchy, a system simulation step of executing the CPU simulation step and the memory simulation step in parallel, a CPU performance measurement step of measuring the performance of the CPU from which the effect of the memory hierarchy is removed, and a system performance measurement step of measuring a performance deterioration of the system caused by the effect of the memory hierarchy.
As detailed above, because the performance of the CPU unit and the deterioration in the performance due to the memory hierarchy is calculated separately from the calculation of the system performance while performing a simulation of the whole system, there is no need to save a massive memory access log and the performance of the CPU unit and the performance deterioration due to the memory hierarchy can be grasped by means of one system simulation.
Further, the second invention comprises the steps of the first invention, and further comprises a penalty occurrence judgment step of judging whether or not a memory access penalty has occurred in the memory simulation step, and a CPU simulation skip step of skipping the CPU simulation step when a memory access penalty has occurred as a result of the judgment in the penalty occurrence judgment step, wherein in the CPU performance measurement step, the CPU performance is measured based on the number of cycles in which the CPU simulation is finally executed as a result of skipping the CPU simulation step in the CPU simulation skip step.
As detailed above, by performing a system simulation without the effect of the memory access penalty being reflected in the CPU simulation, a state where the memory access penalty is 0 in the CPU simulation can be preserved and the performance of the CPU unit with the effect of the memory access penalty removed can be accurately measured.
Further, the third invention is a method that comprises the steps of the second invention, and further comprises a memory access simulation step that executes a simulation of only the memory access in the CPU simulation step, and a simulation selection step of executing the memory access simulation step when the memory access penalty has not occurred as a result of the penalty occurrence judgment step.
As detailed above, by performing a system simulation to satisfy the memory access protocol between the CPU and memory hierarchy, the communication protocol between the CPU of an existing simulator and the memory can be satisfied and the simulation can be applied by means of small changes to the existing simulator.
Further, the fourth invention is a method that comprises the steps of the third invention, and further comprises a simulation mode selection step of specifying whether or not the effect of the memory access penalty is reflected when executing the CPU simulation step.
As detailed above, because the decision whether or not to perform a simulation to substitute the input/output processing that the CPU is to perform on the memory hierarchy can be changed, two applications can be satisfied by means of one simulator when it is verified what kind of effect the memory access penalty has on the operation of the CPU and when the performance of the CPU is to be estimated at the same time as the system performance.
Further, the fifth invention is a system performance evaluation method that comprises, as a system simulation for evaluating the performance of a system comprising at least one CPU and memory hierarchy, a step of executing a calculation of the number of instruction execution cycles on the CPU when the effect of the memory hierarchy is removed. The method further comprises an instruction cache hit rate judgment step of judging, in accordance with a hit rate value of an instruction cache memory, simulation errors in the results of calculation of the number of instruction execution cycles on the CPU when the effect of the memory hierarchy is reflected in the calculation results, with respect to the calculation results when the effect of the memory hierarchy is removed, and an error display step of displaying the simulation error based on the results of the instruction cache hit rate judgment step.
Furthermore, the sixth invention is a system performance evaluation method that comprises, as a system simulation for evaluating the performance of a system that comprises at least one CPU and memory hierarchy, a step of executing a calculation of the number of instruction execution cycles on the CPU when the effect of the memory hierarchy is removed. The method further comprises a memory access penalty judgment step of judging, in accordance with the value of the memory access penalty, simulation errors in the results of the calculation when the effect of the memory hierarchy is reflected in the calculation results when the effect of the memory hierarchy is removed, with respect to the calculation results when the effect of the memory hierarchy is removed, and an error display step of displaying the simulation error based on the results of the instruction cache hit rate judgment step.
As detailed above, by calculating an index that indicates whether a permissible error occurs in cases where the effect of the memory access penalty is not reflected in the CPU simulation and cases where the effects are reflected, the person performing the system performance evaluation is able to identify to what extent the simulation can be relied upon and is able to avoid performing an erroneous performance evaluation.
According to the present invention as detailed above, it is possible to execute a simulation that measures the overhead caused by the memory hierarchy at the same time while accurately measuring the performance of the CPU without changing the system performance evaluation environment.
Further, resources for saving the memory access trace log are unnecessary, both the CPU performance and system performance can be easily measured at the same time, and the number of instruction execution cycles when the overhead caused by memory access is 0 can be accurately calculated.
As detailed above, efficient development is rendered possible by means of the device integration system. And even with a small disk capacity, the CPU performance when the memory access penalty is 0 can be accurately evaluated while performing a simulation of the memory hierarchy and the performance can be correctly evaluated even for a larger-scale system.
BRIEF DESCRIPTION OF THE DRAWINGS
The system performance evaluation method according to embodiments of the present invention will be described specifically hereinbelow with reference to the drawings.
First EmbodimentThe system performance evaluation method of the first embodiment of the present invention will now be described.
First, the appearance of the system performance evaluation system relating to the system performance evaluation method of the first embodiment will be described.
In the system performance evaluation system 1100 above, when the system is simulated in order to evaluate the performance of the system, a system simulation program that the system simulator 2101 shown in
Thereupon, the user of the system performance evaluation system 1100 supplies an instruction to the computer 1101 by using the input device 1103 and, by simulating the system subject to the evaluation in accordance with the supplied instruction, the computer 1101 displays the performance of the target system on the display device 1102 as a result of the simulation.
The input/output of the system performance evaluation system 1100 relating to the system performance evaluation method of the first embodiment will be described next.
The content displayed on
‘>’ in
As shown in
The row ‘System:’ in
The row ‘Memory:’ in
The row ‘CPU:’ in
The row ‘Instruction cache hit ratio:’ shows the hit rate of the instruction cache.
The row ‘System Performance’ in
In
The program execution cycle number (displayed in row ‘CPU: ’) in a case where the memory access penalty is 0 is the difference of the number of cycles totaled as the memory access penalty (displayed in row ‘Memory:’) from the program execution cycle number of the simulated system (displayed in the row ‘System:’).
The configuration of the system simulator 2101 in the system performance evaluation method of the first embodiment will be described next.
The system simulator 2101 of the first embodiment comprises a memory access alternate processing portion 2120 that performs the memory access simulation instead of the CPU model 2204.
The scheduling portion 2102 controls the execution order and so forth of the CPU model 2204 and memory model 2205 and comprises the CPU cycle number count portion 2110 that counts the CPU cycle number from which the effect of the memory access penalty is removed, the memory access penalty detection portion 2111 that detects the memory access penalty that occurs in the memory model 2205, and an instruction cache hit rate judgment portion 2131 that measures the rate at which the instruction cache is hit in the memory access to the memory model 2205.
The model execution processing step S100 in the system performance evaluation method of the first embodiment will be described next.
The scheduling portion 2102 executes the model execution processing step S100 in accordance with the flow shown in
The model execution processing step S100 comprises an instruction cache hit rate judgment processing step S800 that measures the rate at which the instruction cache is hit, a penalty existence judgment step S101 that judges the existence of a memory access penalty, a CPU cycle number increment step S102 that increments the CPU cycle number saved in the CPU cycle number counting portion 2110 shown in
As shown in
The first embodiment is characterized in that the CPU model execution step S202 and memory access alternate processing step S300 are selectively executed in accordance with the result of the penalty existence judgment step S101 and the CPU cycle number increment step S102 is executed only when the CPU model execution step S202 is executed.
The penalty existence judgment step S101 is executed by means of the memory access penalty detection portion 2111 shown in
When it is judged that there is no penalty in the penalty existence judgment step S101, the CPU model execution step S202 is executed, whereupon the CPU cycle number increment step S102 is executed. Meanwhile, when it is judged that there is a penalty in the penalty existence judgment step S101, the CPU model 2204 executes the memory access alternate processing step S300 without executing the CPU model execution step S202.
According to the control mentioned above, the CPU model execution step S202 performed by the CPU model 2204 shown in
Accordingly, the number of times the CPU model 2204 held by the CPU cycle number counting portion 2110 is executed by the CPU cycle number increment step S102 can be the instruction execution time when the memory access penalty is 0. As a result, the CPU performance when the effect of the memory access penalty is removed can be accurately measured.
Furthermore, another characteristic of the first embodiment of the present invention is that the instruction cache hit rate judgment step S400 that displays a message indicating whether the margin with respect to the execution cycle number of the system obtained as the simulation result is small is executed before the model execution termination processing.
By executing the instruction cache hit rate judgment processing step S400 (described subsequently) and communicating whether the margin of the obtained simulation result is small to the user of the system performance evaluation system 1100, the user is able to judge to what extent the value indicating the performance of the whole system can be relied upon. However, the instruction cache hit rate judgment processing step S400 is skipped when option -s has not been designated as the option of the sim command mentioned earlier.
The problem solved by the memory access alternate processing portion 2120 in the system performance evaluation method of the first embodiment will be described next with reference to
-
- CLK: system clock
- ST_REQ: store request from CPU model 2204
- ST_DATA: store data from CPU model 2204
- ACK: request acceptance communication from memory model 2205
In the case of the simulation system of the first embodiment, the CPU model 2204 shown in
The CPU model 2204 renders the store request ST_REQ active at time TO and outputs the store data ST_DATA at time T1, which is the next cycle. On the other hand, the memory model 2205 renders active the request acceptance communication ACK that indicates that the request from the CPU model 2204 was accepted at T1. The above rule is satisfied by performing the above processing.
The CPU model 2204 renders the store request ST_REQ active at time T0 and outputs the store data ST-DATA at time T1, which is the next cycle. On the other hand, the memory model 2205 renders the request acceptance communication ACK, which indicates that the request from the CPU model 2204 has been accepted, active at time T3.
By performing the above processing, the rule is satisfied. Here, the signal waveform in a case where the CPU model 2204 is operated simply only when there is no memory access penalty under the same conditions as
The memory access alternate processing portion 2120 of the first embodiment solves the problem shown above.
According to the method of the first embodiment of the present invention, although the CPU model 2204 is not executed between time T1 and time T3 in
As a result of the memory access alternate processing step S300, store data ST_DATA is outputted to the memory model 2205 at time T1 and the rule can be satisfied.
The memory access alternate processing step S300 in the system performance evaluation method of the first embodiment will be described next.
The memory access alternate processing step S300 comprises a store data judgment step S301 that judges whether there is store data for the memory model 2205 in the CPU model 2204, a store data output step S302 that outputs store data in the CPU model 2204, and a store data erasure step S303 that erases store data in the CPU model 2204.
The memory access alternate processing step S300 judges whether there is store data in the CPU model 2204 in the store data judgment step S301 and, when there is store data, executes the store data output step S302 and store data erasure step S303. When there is no store data, no steps are executed.
By adopting such a configuration, a memory access penalty is detected when the CPU model 2204 outputs a store request and, because the CPU model 2204 is no longer executed while a penalty occurs and the memory access alternate processing portion 2120 outputs store data in the CPU model 2204, the signal waveform shown in
The problems resolved by the instruction cache hit rate judgment processing step S400 in the system performance evaluation method of the first embodiment will be described next by using
The instruction access shown in
Access to the memory addresses 0x20 to 0x40 is instruction access that is performed in the background where the CPU executes instructions and constitutes a pre-fetch. Access to the memory address 0x40 in
The state shown in
With the instruction access shown in
As described hereinabove, the difference comes out in the system execution time whether or not the CPU model 2204 is executed when a memory access penalty occurs. As a result of the difference, there is the problem that the user of the system performance evaluation system 1100 undervalues the performance of the system. In order to resolve the problem, according to the first aspect of this embodiment, the instruction cache hit rate judgment processing step S400 is executed.
The instruction cache hit rate judgment processing step S400 in the system performance evaluation method of the first embodiment will be described next.
The scheduling portion 2102 shown in
The instruction cache hit rate judgment processing step S400 comprises a cache hit rate comparison step S401 of judging whether or not the hit rate of the instruction cache is equal to or more than a threshold value set beforehand, a minor error message display step S402 of displaying a message to the effect that the margin with respect to the program execution cycle number of the simulated system and the execution time is small, and a major error message display step S403 that displays a message to the effect that the margin with respect to the program execution cycle number of the simulated system and the execution time is large.
The instruction cache hit rate judgment processing step S400 executes the minor error message display step S402 when the hit rate of the instruction cache is equal to or more than a threshold value set beforehand as a result of the cache hit rate comparison step S401 and executes the major error message display step S403 when the hit rate of the instruction cache is less than the threshold value set beforehand.
As a result, a message indicating whether or not the margin with respect to the program execution cycle number of the simulated system and the execution time is small can be displayed.
Here, it is explained that, when the hit rate of the instruction cache is high, this is because the error with respect to the program execution cycle number of the simulated system and the execution time is small or the margin is small.
When the hit rate of the instruction cache is high, almost all instruction access is completed in a state where the memory access penalty is 0. When the memory access penalty of instruction access is 0, the memory access penalty of the instruction pre-fetch that is concealed by the time for executing instructions saved in the CPU instruction buffer is 0 and, therefore, there is no difference in the result depending on whether or not the effect of the memory access penalty is substantially reflected in the CPU simulation.
Furthermore, the memory access penalty when the instruction cache is missed is very large in comparison with the time for executing the instruction saved in the instruction buffer of the CPU. Consequently, when the instruction cache is missed, the memory access penalty of the instruction pre-fetch concealed by the time for executing the instruction that is saved in the instruction buffer of the CPU is relatively small. It may therefore be said that the difference in the result depending on whether the effect of the memory access penalty is substantially reflected in the CPU simulation is relatively small.
The instruction cache hit rate judgment processing step S800 in the system performance evaluation method of the first embodiment will be described next.
The instruction cache hit rate judgment processing step S800 comprises a response judgment step S401 of judging whether or not there is a response to the instruction memory request from the memory model 2205, a hit judgment step S402 of judging whether or not the instruction cache has been hit with respect to the instruction memory request, a hit number increment step S403 of incrementing the instruction cache hit number, and an instruction access number increment step S404 of incrementing the instruction access number.
The instruction cache hit rate judgment processing step S800 executes the instruction access number increment step S404 when there is a response to the instruction memory request in the response judgment step S401 and when the instruction cache is not hit in the hit judgment step S402. Further, the instruction cache hit rate judgment processing step S800 carries out a hit number increment step S403 when there is a response to the instruction memory request in the response judgment step S401 and the instruction cache is hit in the hit judgment step S402.
According to such a method, the instruction access number is incremented whenever there is a response to an instruction memory request and the instruction access number is counted. Further, the instruction cache hit number is incremented whenever the instruction cache is hit and the instruction cache hit number is counted. Accordingly, the quotient rendered by dividing the instruction cache hit number by the instruction access number can be calculated as the instruction cache hit number.
As detailed hereinabove, with this embodiment, it is possible to specify whether or not the memory access penalty is reflected in the CPU simulation by using the option -s of the sim command during execution of the system simulator 1101.
Further, supposing that the memory access penalty is not reflected in the CPU simulation, because the configuration is such that the CPU model 2204 is not executed when the memory access penalty occurs by means of the model execution processing step S100 that is executed by the scheduling portion 2102, the execution cycle number when the effect of the memory access penalty on the CPU simulation is removed can be accurately measured.
In addition, because the instruction cache hit rate judgment processing step S400 is included, a message with regard to the size of the margin of the system performance in a state where instruction access and instruction execution are executed exclusively can be displayed.
Second EmbodimentThe system performance evaluation method of the second embodiment of the present invention will now be described. Further, in the second embodiment, although the index for estimating the size of the margin of the system performance of the simulation that is executed at the time of the system performance evaluation of the first embodiment is the instruction cache hit rate, a case where the index is the occurrence rate of the instruction memory access penalty will be described. The majority of the configuration is the same as that of the case of the first embodiment and, in order to simplify the description, the focus of the description will be on the parts that are different between the first and second embodiments.
First, the appearance of the system performance evaluation system relating to the system performance evaluation method of the second embodiment will be described. Further, the appearance of the system performance evaluation system of the second embodiment is the same as that of the system performance evaluation system 1100 of the first embodiment and, therefore, a description thereof is omitted here.
The inputs and outputs to and from the system performance evaluation system relating to the system performance evaluation method of the second embodiment will be described next by using
The content that is illustrated in
The row ‘Instruction Memory Access Penalty ratio:’ in
Further, the judgment of whether the margin with respect to the program execution cycle number of the simulated system and the execution time is small is made in a penalty occurrence rate judgment processing step S500 (described subsequently).
Further, the display content when there is no option -s in the sim command shown in
The configuration of the system simulator 2301 in the system performance evaluation method of this embodiment will be described next.
In the system simulator 2301 of the second embodiment shown in
A model execution processing step S1000 in the system performance evaluation method of the second embodiment will be described next.
The scheduling portion 2302 executes the model execution processing step S1000 in accordance with the flow shown in
The penalty occurrence rate judgment processing step S500 in the system performance evaluation method of the second embodiment will be described next.
The flow of the penalty occurrence rate judgment processing step S500 is shown in
In the penalty occurrence rate judgment processing step S500 above, when the occurrence rate of the instruction memory access penalty as a result of the penalty occurrence rate comparison step S501 is less than the threshold value set beforehand, the minor error message display step S402 is executed, and, when the hit rate of the instruction cache is equal to or more than the threshold value set beforehand, the major error message display step S403 is executed.
As a result, a message that indicates whether or not the margin with respect to the program execution cycle number of the simulated system and the execution time can be displayed.
Here, it is explained that, when the occurrence rate of the instruction memory access penalty is low, this is because the margin with respect to the program execution cycle number of the simulated system and the execution time is small.
When the occurrence rate of the instruction memory access penalty is low, most of the instruction access is ended in a state where the memory access penalty is 0. That is, it may be said that this is a state that is the same as the case where the instruction cache hit rate is high that was described in the first embodiment.
Accordingly, as per the first embodiment, it may be said that the difference in the result depending on whether or not the effect of the memory access penalty is substantially reflected in the CPU simulation, that is, the margin, is relatively small.
The penalty occurrence rate measurement processing step S900 in the system performance evaluation method of the second embodiment will be described next.
The penalty occurrence rate measurement processing step S900 as mentioned above executes the instruction access number increment step S404 when there is a response to the instruction memory request in the response judgment step S401 and when the instruction memory access penalty does not occur in the penalty judgment step S902.
Further, the penalty occurrence rate measurement processing step S900 executes the penalty number increment step S903 and instruction access number increment step S404 when there is a response to the instruction memory request in the response judgment step S401 and when the instruction memory access penalty occurs in the penalty judgment step S902.
As mentioned above, the instruction access number is incremented whenever there is a response to the instruction memory request and the instruction access number can be counted. Further, the penalty occurrence number is incremented whenever the instruction memory access penalty occurs and the occurrence number of the instruction memory access penalty can be counted. Accordingly, a quotient rendered by dividing the occurrence number of the instruction memory access penalty by the instruction access number can be accurately calculated as the penalty occurrence rate.
As mentioned above, according to the embodiment, an index for estimating the size of the margin of the system performance of the first embodiment can be moved from the instruction cache hit rate to the occurrence rate of the instruction memory access penalty, whereby the same effects as those of the case in the first embodiment can be obtained even when the instruction cache hit rate is not measured.
Claims
1. A system performance evaluation method for evaluating a performance of a system comprising at least one CPU and memory hierarchy, comprising:
- a CPU simulation step of executing a simulation of the CPU;
- a memory simulation step of executing a simulation of the memory hierarchy;
- a system simulation step of executing the CPU simulation step and the memory simulation step in parallel;
- a CPU performance measurement step of measuring the performance of the CPU from which an effect of the memory hierarchy is removed; and
- a system performance measurement step of measuring a performance deterioration of the system caused by the effect of the memory hierarchy.
2. The system performance evaluation method according to claim 1, further comprising:
- a penalty occurrence judgment step of judging whether or not a memory access penalty has occurred in the memory simulation step; and
- a CPU simulation skip step of skipping the CPU simulation step when it is known that a memory access penalty has occurred as a result of the judgment in the penalty occurrence judgment step, wherein
- in the CPU performance measurement step, the CPU performance is measured based on the number of cycles in which the CPU simulation is finally executed as a result of skipping the CPU simulation step in the CPU simulation skip step.
3. The system performance evaluation method according to claim 2, further comprising:
- a memory access simulation step of executing a simulation of only a memory access in the CPU simulation step; and
- a simulation selection step of executing the memory access simulation step when it is known that a memory access penalty has not occurred as a result of the penalty occurrence judgment step.
4. The system performance evaluation method according to claim 3, further comprising:
- a simulation mode selection step of specifying whether or not an effect of the memory access penalty is reflected when executing the CPU simulation step.
5. A system performance evaluation method comprising, as a system simulation for evaluating a performance of a system comprising at least one CPU and memory hierarchy, a step of executing a calculation in number of instruction execution cycles on the CPU when an effect of the memory hierarchy is removed, said method comprising:
- an instruction cache hit rate judgment step of judging, in accordance with a hit rate value of an instruction cache memory, simulation errors in results of calculation in number of instruction execution cycles on the CPU when an effect of the memory hierarchy is reflected in the calculation results, with respect to the calculation results when the effect of the memory hierarchy is removed; and
- an error display step of displaying the simulation error based on the results of the instruction cache hit rate judgment step.
6. A system performance evaluation method comprising, as a system simulation for evaluating a performance of a system comprising at least one CPU and memory hierarchy, a step of executing a calculation in number of instruction execution cycles on the CPU when an effect of the memory hierarchy is removed, said method comprising:
- a memory access penalty judgment step of judging, in accordance with a memory access penalty value, simulation errors in results of calculation in number of instruction execution cycles on the CPU when an effect of the memory hierarchy is reflected in the calculation results, with respect to the calculation results when the effect of the memory hierarchy is removed; and
- an error display step of displaying the simulation error based on the results of the instruction cache hit rate judgment step.
Type: Application
Filed: Dec 15, 2005
Publication Date: Jun 22, 2006
Applicant: Matsushita Electric Industrial Co., Ltd. (Kadoma-shi)
Inventor: Kohsaku Shibata (Takatsuki-shi)
Application Number: 11/300,325
International Classification: G06F 9/45 (20060101);