Soft-Core Processor Design - CiteSeer
Soft-Core Processor Design - CiteSeer
Soft-Core Processor Design - CiteSeer
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
5.4.1. Performance<br />
The metric used in the performance comparison is the wall clock time required to execute a<br />
benchmark program. The wall clock time can be expressed as [34]:<br />
T = IC × CPI × C<br />
where IC is the instruction count, CPI is the average number of clock cycles needed to execute an<br />
instruction, and C is the cycle time (i.e. duration of a clock cycle). The instruction count is the<br />
same for both processors, because the same binary programs are run on both systems. Therefore,<br />
the performance will be defined by the number of cycles needed to execute the program (cycle<br />
count) and the cycle time. The best cycle times and the corresponding best Fmax for the four<br />
systems used in the performance comparison are given in Table 5.3. UT Nios has almost 50%<br />
longer cycle time than the Altera Nios for the SRAM system, and 40% longer cycle time for the<br />
ONCHIP system.<br />
Altera Nios<br />
UT Nios<br />
System Cycle Time (ns) Fmax (MHz)<br />
SRAM 8.47 118<br />
ONCHIP 8.59 116<br />
SRAM 12.66 79<br />
ONCHIP 12.03 83<br />
Table 5.3 Cycle time and Fmax of the systems used in performance comparison<br />
All system configurations were run at the clock speed of 50 MHz. Thus, the run times<br />
obtained can be directly used to compare the cycle counts of the two architectures. To obtain the<br />
wall clock run times, the measured times are prorated with the appropriate factor. The comparison<br />
of the SRAM systems based on the Altera and UT Nios is presented in Figures 5.10 and 5.11.<br />
The graphs in the figures show the improvement of UT Nios over the Altera Nios. Both the wall<br />
clock performance ratio and the cycle count ratio are given. Analysis of the cycle count advantage<br />
of the UT Nios for the Loops benchmark shows that the UT Nios implements branch logic better,<br />
since the Loops benchmark executes almost 56% faster in terms of the cycle count. However,<br />
because of the longer cycle time, the advantage of UT Nios over the Altera Nios in terms of the<br />
wall clock time for the Loops benchmark is only 4%. The lower cycle count comes from the<br />
branch logic implementation in UT Nios, where the control-flow instructions are<br />
committed early in the pipeline (in the operand stage), as described in section 4.1.5. The Altera<br />
Nios executes the control-flow instructions less efficiently, which is expected because of the<br />
deeper pipeline.<br />
70