01.12.2012 Views

Architecture of Computing Systems (Lecture Notes in Computer ...

Architecture of Computing Systems (Lecture Notes in Computer ...

Architecture of Computing Systems (Lecture Notes in Computer ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

12 J. Mische et al.<br />

% <strong>of</strong> cycles spent on fetch<strong>in</strong>g<br />

45<br />

40<br />

none<br />

AHEAD<br />

35 ENOUGH<br />

30<br />

25<br />

20<br />

15<br />

10<br />

5<br />

0<br />

both<br />

a2time canrdr aifirf rspeed crc fft1 mm Average<br />

Fig. 5. Percentage <strong>of</strong> cycles that were required to fetch <strong>in</strong>structions<br />

5.2 Instruction Fetch Optimisation<br />

To decrease the fetch conflicts and hence <strong>in</strong>crease the performance <strong>of</strong> lower priority<br />

threads, we <strong>in</strong>troduced the AHEAD and the ENOUGH fetch policies. They reduce<br />

the number <strong>of</strong> fetches <strong>of</strong> the HPT without affect<strong>in</strong>g its real-time behaviour. The<br />

AHEAD logic occupies 18 ALUTs per thread slot, ENOUGH 44 and both together<br />

require 60 ALUTs per slot, an acceptable size compared to 6000 ALUTs for a<br />

whole thread slot.<br />

Fig. 5 shows the percentage <strong>of</strong> fetch cycles execut<strong>in</strong>g the benchmarks s<strong>in</strong>glethreaded<br />

on the CarCore. The percentages vary significantly depend<strong>in</strong>g on<br />

the benchmark. Both optimised policies reduce the number <strong>of</strong> fetches by about<br />

5 to 10 percent and as the average <strong>of</strong> the benchmarks shows, ENOUGH is on average<br />

better than AHEAD, but not for all. Very <strong>in</strong>terest<strong>in</strong>g is the comb<strong>in</strong>ation <strong>of</strong><br />

both policies: their sets <strong>of</strong> elim<strong>in</strong>ated fetches are nearly dist<strong>in</strong>ct, hence it is not<br />

surpris<strong>in</strong>g that the numbers <strong>of</strong> elim<strong>in</strong>ated fetches could nearly be added if comb<strong>in</strong><strong>in</strong>g<br />

both. But for some benchmarks the sav<strong>in</strong>gs <strong>of</strong> the comb<strong>in</strong>ation is even<br />

bigger than the sum <strong>of</strong> the s<strong>in</strong>gle sav<strong>in</strong>gs. Regrettably the explanation <strong>of</strong> this<br />

complicated <strong>in</strong>teraction goes beyond the scope <strong>of</strong> this paper.<br />

6 Conclusion<br />

We expla<strong>in</strong>ed how a s<strong>in</strong>glethreaded superscalar TriCor compatible processor can<br />

be enhanced to provide SMT while still allow<strong>in</strong>g the execution <strong>of</strong> one hard realtime<br />

thread with several non real-time threads concurrently <strong>in</strong> the background.<br />

The techniques described can easily be transferred to any other superscalar <strong>in</strong>order<br />

processor. The latency <strong>of</strong> the memory is the ma<strong>in</strong> reason for stall<strong>in</strong>g threads<br />

and thus the biggest problem <strong>of</strong> the architecture. Currently our group is <strong>in</strong>tegrat<strong>in</strong>g<br />

scratchpad memory <strong>in</strong>to the CarCore, to ease this problem. First results<br />

are available <strong>in</strong> [5].<br />

Acknowledgements. The research lead<strong>in</strong>g to these results has received fund<strong>in</strong>g<br />

from the European Community’s Seventh Framework Programme under grant<br />

agreement n ◦ 216415 and from the Deutsche Forschungsgeme<strong>in</strong>schaft (DFG).

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!