17.11.2012 Views

Soft-Core Processor Design - CiteSeer

Soft-Core Processor Design - CiteSeer

Soft-Core Processor Design - CiteSeer

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Speedup Over Buffer Size 1<br />

1.6<br />

1.4<br />

1.2<br />

1<br />

0.8<br />

0.6<br />

0.4<br />

0.2<br />

0<br />

1 bit/loop<br />

because this benchmark does not experience any stalls if the prefetch unit always has the next<br />

instruction ready. However, with the unit buffer size, the prefetch unit can only issue two reads in<br />

three cycles, because the FIFO buffer has only one register to store the instruction coming from<br />

the memory, while the other one is forwarded to the decode stage immediately. Two instruction<br />

fetches per three clock cycles correspond to the performance improvement of 3/2 when<br />

instructions are fetched continuously, which is the case when the FIFO buffer of size two is used.<br />

This value explains the speedup of the Pipeline benchmark in Figure 5.2. Further increases in the<br />

FIFO buffer size do not influence the performance because the benchmark does not include<br />

memory operations.<br />

Size = 2 Size = 3 Size = 4 Size = 15<br />

Recursive by<br />

nibbles<br />

Non-recursive by<br />

nibbles<br />

Non-recursive by<br />

bytes<br />

Shift and count<br />

CRC32<br />

Figure 5.3 Performance of the application benchmarks vs. the FIFO buffer size<br />

on the SRAM system<br />

The Stringsearch benchmark suffers significant performance penalty when a buffer size of 15<br />

is used over the unit buffer size. The performance of Stringsearch on a system with the FIFO<br />

buffer size 15 was analyzed using ModelSim. We determined that the Stringsearch benchmark<br />

contains store instructions that take as long as 16 cycles to commit. These instructions are targets<br />

of the branches in the program. When a branch commits, the prefetch FIFO buffer is flushed, and<br />

instructions from the target address are fetched. When the store instruction reaches the execute<br />

pipeline stage and issues a write request, the request is not handled before the prefetch unit lowers<br />

55<br />

Dijkstra Large<br />

Game of Life<br />

Patricia<br />

Qsort Large<br />

SHA<br />

Stringsearch

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!