17.11.2012 Views

Soft-Core Processor Design - CiteSeer

Soft-Core Processor Design - CiteSeer

Soft-Core Processor Design - CiteSeer

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

5.4.1. Performance<br />

The metric used in the performance comparison is the wall clock time required to execute a<br />

benchmark program. The wall clock time can be expressed as [34]:<br />

T = IC × CPI × C<br />

where IC is the instruction count, CPI is the average number of clock cycles needed to execute an<br />

instruction, and C is the cycle time (i.e. duration of a clock cycle). The instruction count is the<br />

same for both processors, because the same binary programs are run on both systems. Therefore,<br />

the performance will be defined by the number of cycles needed to execute the program (cycle<br />

count) and the cycle time. The best cycle times and the corresponding best Fmax for the four<br />

systems used in the performance comparison are given in Table 5.3. UT Nios has almost 50%<br />

longer cycle time than the Altera Nios for the SRAM system, and 40% longer cycle time for the<br />

ONCHIP system.<br />

Altera Nios<br />

UT Nios<br />

System Cycle Time (ns) Fmax (MHz)<br />

SRAM 8.47 118<br />

ONCHIP 8.59 116<br />

SRAM 12.66 79<br />

ONCHIP 12.03 83<br />

Table 5.3 Cycle time and Fmax of the systems used in performance comparison<br />

All system configurations were run at the clock speed of 50 MHz. Thus, the run times<br />

obtained can be directly used to compare the cycle counts of the two architectures. To obtain the<br />

wall clock run times, the measured times are prorated with the appropriate factor. The comparison<br />

of the SRAM systems based on the Altera and UT Nios is presented in Figures 5.10 and 5.11.<br />

The graphs in the figures show the improvement of UT Nios over the Altera Nios. Both the wall<br />

clock performance ratio and the cycle count ratio are given. Analysis of the cycle count advantage<br />

of the UT Nios for the Loops benchmark shows that the UT Nios implements branch logic better,<br />

since the Loops benchmark executes almost 56% faster in terms of the cycle count. However,<br />

because of the longer cycle time, the advantage of UT Nios over the Altera Nios in terms of the<br />

wall clock time for the Loops benchmark is only 4%. The lower cycle count comes from the<br />

branch logic implementation in UT Nios, where the control-flow instructions are<br />

committed early in the pipeline (in the operand stage), as described in section 4.1.5. The Altera<br />

Nios executes the control-flow instructions less efficiently, which is expected because of the<br />

deeper pipeline.<br />

70

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!