17.11.2012 Views

Soft-Core Processor Design - CiteSeer

Soft-Core Processor Design - CiteSeer

Soft-Core Processor Design - CiteSeer

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

There are many paths in each group, because the Quartus II Timing Analyzer does not group<br />

the bits corresponding to a bus or a register, but reports each path individually. It is interesting<br />

that the groups of the analyzed paths are scattered throughout the datapath. This suggests that the<br />

delays in the design are well balanced, and that major performance improvements cannot be<br />

achieved by changing a single part of the design. In fact, even if the 200 most critical paths were<br />

removed, the Fmax would only increase to 81.5 MHz (from 79 MHz) for the SRAM system.<br />

Furthermore, if the 10,000 most critical paths were removed, the Fmax would increase to 87.6<br />

MHz. Although the number of actual paths in the design corresponding to the 10,000 paths<br />

reported by the Timing Analyzer is much lower (because the results are expressed in bit<br />

granularity), it is obvious that major design changes have to be made to obtain a significant<br />

performance improvement.<br />

Comparing the UT Nios design to the Altera Nios reveals an interesting trade-off between the<br />

Fmax and the cycle count, the two parameters that define the processor performance. The UT Nios<br />

design process was guided by the assumption that pipeline stalls should be minimized. The initial<br />

design with only two pipeline stages was perfect in this regard, because it had the best CPI<br />

theoretically possible. However, since the performance was not satisfactory, the condition to<br />

minimize the pipeline stalls was relaxed until better performance was achieved. On the other<br />

hand, the Altera Nios implementation is optimized for high Fmax, while sacrificing the cycle<br />

count. Having in mind that both implementations achieve similar performance, it is not obvious<br />

which approach is better. Higher Fmax is better from the marketing perspective. The UT Nios has<br />

approximately 30% lower Fmax than the Altera Nios (Table 5.3), which may result in lower power<br />

consumption. However, since the UT Nios based system uses approximately 20% more logic<br />

(Table 5.4), its power consumption would be only 16% lower than that of the Altera Nios,<br />

assuming that the power consumption grows linearly with the Fmax and the area used.<br />

Another interesting result of the timing analysis of the UT Nios implementation is that none of<br />

the most critical paths includes a shifter ALU unit, even though the shift operations are performed<br />

within a single cycle, while the Altera Nios takes two cycles to perform the shift operation. This<br />

could be because either the Altera Nios implements the shifter unit poorly, or the shift operations<br />

become more critical when the number of pipeline stages increases.<br />

As suggested in the previous chapter, having a smaller general-purpose register file may open<br />

new opportunities for the organization of the pipeline. For example, in traditional pipeline<br />

organizations [34], the operand reading is performed in parallel with instruction decoding. This is<br />

not possible with the current UT Nios implementation, because of the variety of addressing<br />

modes supported by the Nios architecture. Therefore, the instruction has to be decoded first to<br />

81

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!