Soft-Core Processor Design - CiteSeer
Soft-Core Processor Design - CiteSeer
Soft-Core Processor Design - CiteSeer
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
Speedup Over Buffer Size 1<br />
1.6<br />
1.4<br />
1.2<br />
1<br />
0.8<br />
0.6<br />
0.4<br />
0.2<br />
0<br />
1 bit/loop<br />
because this benchmark does not experience any stalls if the prefetch unit always has the next<br />
instruction ready. However, with the unit buffer size, the prefetch unit can only issue two reads in<br />
three cycles, because the FIFO buffer has only one register to store the instruction coming from<br />
the memory, while the other one is forwarded to the decode stage immediately. Two instruction<br />
fetches per three clock cycles correspond to the performance improvement of 3/2 when<br />
instructions are fetched continuously, which is the case when the FIFO buffer of size two is used.<br />
This value explains the speedup of the Pipeline benchmark in Figure 5.2. Further increases in the<br />
FIFO buffer size do not influence the performance because the benchmark does not include<br />
memory operations.<br />
Size = 2 Size = 3 Size = 4 Size = 15<br />
Recursive by<br />
nibbles<br />
Non-recursive by<br />
nibbles<br />
Non-recursive by<br />
bytes<br />
Shift and count<br />
CRC32<br />
Figure 5.3 Performance of the application benchmarks vs. the FIFO buffer size<br />
on the SRAM system<br />
The Stringsearch benchmark suffers significant performance penalty when a buffer size of 15<br />
is used over the unit buffer size. The performance of Stringsearch on a system with the FIFO<br />
buffer size 15 was analyzed using ModelSim. We determined that the Stringsearch benchmark<br />
contains store instructions that take as long as 16 cycles to commit. These instructions are targets<br />
of the branches in the program. When a branch commits, the prefetch FIFO buffer is flushed, and<br />
instructions from the target address are fetched. When the store instruction reaches the execute<br />
pipeline stage and issues a write request, the request is not handled before the prefetch unit lowers<br />
55<br />
Dijkstra Large<br />
Game of Life<br />
Patricia<br />
Qsort Large<br />
SHA<br />
Stringsearch