Soft-Core Processor Design - CiteSeer
Soft-Core Processor Design - CiteSeer
Soft-Core Processor Design - CiteSeer
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
There are many paths in each group, because the Quartus II Timing Analyzer does not group<br />
the bits corresponding to a bus or a register, but reports each path individually. It is interesting<br />
that the groups of the analyzed paths are scattered throughout the datapath. This suggests that the<br />
delays in the design are well balanced, and that major performance improvements cannot be<br />
achieved by changing a single part of the design. In fact, even if the 200 most critical paths were<br />
removed, the Fmax would only increase to 81.5 MHz (from 79 MHz) for the SRAM system.<br />
Furthermore, if the 10,000 most critical paths were removed, the Fmax would increase to 87.6<br />
MHz. Although the number of actual paths in the design corresponding to the 10,000 paths<br />
reported by the Timing Analyzer is much lower (because the results are expressed in bit<br />
granularity), it is obvious that major design changes have to be made to obtain a significant<br />
performance improvement.<br />
Comparing the UT Nios design to the Altera Nios reveals an interesting trade-off between the<br />
Fmax and the cycle count, the two parameters that define the processor performance. The UT Nios<br />
design process was guided by the assumption that pipeline stalls should be minimized. The initial<br />
design with only two pipeline stages was perfect in this regard, because it had the best CPI<br />
theoretically possible. However, since the performance was not satisfactory, the condition to<br />
minimize the pipeline stalls was relaxed until better performance was achieved. On the other<br />
hand, the Altera Nios implementation is optimized for high Fmax, while sacrificing the cycle<br />
count. Having in mind that both implementations achieve similar performance, it is not obvious<br />
which approach is better. Higher Fmax is better from the marketing perspective. The UT Nios has<br />
approximately 30% lower Fmax than the Altera Nios (Table 5.3), which may result in lower power<br />
consumption. However, since the UT Nios based system uses approximately 20% more logic<br />
(Table 5.4), its power consumption would be only 16% lower than that of the Altera Nios,<br />
assuming that the power consumption grows linearly with the Fmax and the area used.<br />
Another interesting result of the timing analysis of the UT Nios implementation is that none of<br />
the most critical paths includes a shifter ALU unit, even though the shift operations are performed<br />
within a single cycle, while the Altera Nios takes two cycles to perform the shift operation. This<br />
could be because either the Altera Nios implements the shifter unit poorly, or the shift operations<br />
become more critical when the number of pipeline stages increases.<br />
As suggested in the previous chapter, having a smaller general-purpose register file may open<br />
new opportunities for the organization of the pipeline. For example, in traditional pipeline<br />
organizations [34], the operand reading is performed in parallel with instruction decoding. This is<br />
not possible with the current UT Nios implementation, because of the variety of addressing<br />
modes supported by the Nios architecture. Therefore, the instruction has to be decoded first to<br />
81