01.12.2012 Views

Architecture of Computing Systems (Lecture Notes in Computer ...

Architecture of Computing Systems (Lecture Notes in Computer ...

Architecture of Computing Systems (Lecture Notes in Computer ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

�����������<br />

�����<br />

���������<br />

�����<br />

�����������<br />

�������<br />

������������������<br />

������<br />

������<br />

How to Enhance a Superscalar Processor 5<br />

��������<br />

����<br />

��������<br />

����<br />

�������<br />

�����������<br />

Fig. 1. Block diagram <strong>of</strong> the CarCore architecture<br />

����������<br />

������� ����������<br />

No Branch Prediction. The HPT does not benefit <strong>of</strong> a branch prediction,<br />

because it only decreases the average, not the worst case execution time.<br />

For lower priority threads the benefit is likewise low, as between the branch<br />

<strong>in</strong>struction and the first target <strong>in</strong>struction usually HPT <strong>in</strong>structions must be<br />

executed anyway. Omitt<strong>in</strong>g the speculative execution replaces wasted mispredicted<br />

<strong>in</strong>structions by <strong>in</strong>structions <strong>of</strong> lower priority threads that <strong>in</strong>crease<br />

the overall throughput.<br />

No Loop Pipel<strong>in</strong>e. The TriCore loop pipel<strong>in</strong>e that speeds up special loops is<br />

not implemented, as the compiler can use it only <strong>in</strong> a few special cases and<br />

with multithread<strong>in</strong>g the latencies can be used to execute concurrent threads.<br />

No Dedicated Context Save Memory. Accord<strong>in</strong>g to the TriCore <strong>in</strong>struction<br />

set architecture, at a subrout<strong>in</strong>e call, 16 registers have to be saved. To<br />

speed this up, the Tricore has a special memory area with a very wide bus<br />

for context sav<strong>in</strong>g. In our version we use the standard memory bus, therefore<br />

a function call is about ten times slower.<br />

4 SMT Enhancements<br />

To enable multihread<strong>in</strong>g, some parts <strong>of</strong> the processor must be duplicated for<br />

every thread: the register set, the program counter and the <strong>in</strong>struction w<strong>in</strong>dow.<br />

To manage these <strong>in</strong>struction w<strong>in</strong>dows where fetched <strong>in</strong>structions are buffered<br />

and to decide which <strong>in</strong>struction from which thread should be issued to which<br />

pipel<strong>in</strong>e, a further pipel<strong>in</strong>e stage is added, the Real-TimeIssue(RTI)stage.<br />

Special attention demands the fetch stage, which is <strong>in</strong> charge <strong>of</strong> prevent<strong>in</strong>g the<br />

lower priority threads from delay<strong>in</strong>g the hard real-time capable HPT. The same<br />

reason applies to the memory controller that should issue memory accesses <strong>in</strong><br />

the same prioritised order like the RTI issues <strong>in</strong>structions. The other pipel<strong>in</strong>e<br />

stages rema<strong>in</strong> unchanged, besides an additional signal that passes the thread<br />

number through the pipel<strong>in</strong>e, <strong>in</strong> order that the write back stage writes to the<br />

appropriate register file. Fig. 1 shows the result<strong>in</strong>g architecture.<br />

����������������<br />

����������������

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!