01.12.2012 Views

Architecture of Computing Systems (Lecture Notes in Computer ...

Architecture of Computing Systems (Lecture Notes in Computer ...

Architecture of Computing Systems (Lecture Notes in Computer ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

4 J. Mische et al.<br />

The Real-time Virtual Multiprocessor (RVMP) [15] issues multiple <strong>in</strong>structions<br />

from multiple threads to multiple pipel<strong>in</strong>es, but it assumes multiple identical<br />

pipel<strong>in</strong>es and statically maps threads to pipel<strong>in</strong>es. Therefore multiple hard<br />

real-time threads can be executed, but the throughput is not <strong>in</strong>creased, as idle<br />

pipel<strong>in</strong>e slots cannot be used dynamically by other threads.<br />

The Precision Timed (PRET) <strong>Architecture</strong> [16] is another example <strong>of</strong> a hard<br />

real-time capable multithreaded processor, but aga<strong>in</strong> the schedule is very static:<br />

there are 6 threads and they are executed <strong>in</strong> fixed order, hence every thread gets<br />

exactly one sixth <strong>of</strong> the execution time. If a thread is stalled, the cycle cannot<br />

be used by another thread, as the PRET architecture supports only precisely<br />

timed hard real-time threads, no other threads with s<strong>of</strong>ter tim<strong>in</strong>g demands can<br />

be executed to <strong>in</strong>crease throughput.<br />

3 Basel<strong>in</strong>e<br />

Exemplary we use a TriCore compatible processor to present the SMT enhancements,<br />

but they can easily be transferred to other superscalar <strong>in</strong>-order architectures.<br />

TriCore-specific parts are explicitly marked.<br />

3.1 TriCore <strong>Architecture</strong><br />

The Inf<strong>in</strong>eon TriCore [17] is a microcontroller that is commonly used <strong>in</strong><br />

safety-critical applications <strong>of</strong> the automotive <strong>in</strong>dustry. It comb<strong>in</strong>es a real-time<br />

capable load-store microcontroller architecture with DSP <strong>in</strong>structions. The<br />

<strong>in</strong>struction set comprises more than 700 <strong>in</strong>structions. Besides the common arithmetic,<br />

logic, branch and load-store <strong>in</strong>structions it provides <strong>in</strong>structions for sophisticated<br />

logic, context sav<strong>in</strong>g, load-modify-store, packed arithmetic, saturated<br />

math and multiply-accumulate. The processor consists <strong>of</strong> a three-way superscalar<br />

<strong>in</strong>-order pipel<strong>in</strong>e with four stages. If an address, an <strong>in</strong>teger, and a loop <strong>in</strong>struction<br />

appear <strong>in</strong> this order <strong>in</strong> the <strong>in</strong>struction stream, they are issued with<strong>in</strong> one<br />

cycle, even if they are data-dependent.<br />

3.2 Simplifications for S<strong>in</strong>gle-Threaded CarCore<br />

As basel<strong>in</strong>e for the SMT enhancement we implemented a cycle-accurate System-<br />

C model and a synthesisable VHDL model <strong>of</strong> a Tricore-compatible processor. It<br />

differs from the orig<strong>in</strong>al Inf<strong>in</strong>eon TriCore <strong>in</strong> the follow<strong>in</strong>g aspects:<br />

Instruction Subset. Special DSP <strong>in</strong>structions and address<strong>in</strong>g modes which are<br />

never generated by the Hightec [18] compiler are not supported. This reduces<br />

the number <strong>of</strong> <strong>in</strong>structions to 433, but there is no impact on the execution<br />

time, as only pure C code without assembler code snippets is used.<br />

Later Address Calculation. CarCore calculates branch target and memory<br />

access addresses <strong>in</strong> the execute stage, one stage later than TriCore. Hence<br />

the branch and memory delay slots are <strong>in</strong>creased by one, but the critical<br />

path <strong>of</strong> the very complex and slow decode stage is shortened result<strong>in</strong>g <strong>in</strong> a<br />

higher overall clock rate.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!