01.12.2012 Views

Architecture of Computing Systems (Lecture Notes in Computer ...

Architecture of Computing Systems (Lecture Notes in Computer ...

Architecture of Computing Systems (Lecture Notes in Computer ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

10 J. Mische et al.<br />

% <strong>of</strong> unused pipel<strong>in</strong>e cycles<br />

100<br />

80<br />

60<br />

40<br />

20<br />

0<br />

branch memfix membusy pipel<strong>in</strong>e fetch<br />

0/0 1/0 2/0 0/1 1/1 2/1 0/2 1/2 2/2 0/3 1/3 2/3 0/1+1/1+2/1+0/2+1/2+2/2+0/3+1/3+2/3+<br />

<strong>in</strong>struction / data memory access latency<br />

Fig. 3. Reason why the HPT cannot be issued, depend<strong>in</strong>g on the memory latencies<br />

When execut<strong>in</strong>g multiple threads, the HPT reaches exactly 100% <strong>of</strong> its s<strong>in</strong>glethreaded<br />

speed, hence a WCET analysis for a s<strong>in</strong>glethreaded simplification<br />

<strong>of</strong> our architecture is also valid for the HPT <strong>in</strong> the multithreaded architecture<br />

[3]. The speed <strong>of</strong> the threads with lower priorities falls exponentially to about<br />

50, 35 and 20 percent <strong>of</strong> s<strong>in</strong>glethreaded performance (measured <strong>in</strong> Instructions<br />

Per Cycle, IPC), see [3] for a more detailed discussion.<br />

We used the Hightec GNU C/C++ compiler for TriCore [18] to compile benchmark<br />

programs from the EEMBC AutoBech 1.1 benchmark suite [20] (a2time,<br />

canrdr, aifirf, rspeed)andtheMälardalen WCET group [21] (crc, fft1, mm).<br />

1000 task-sets <strong>of</strong> 8 threads were randomly assembled from these seven benchmark<br />

and executed for one million cycles each. The given figures are the average<br />

values <strong>of</strong> these 1000 runs.<br />

5.1 Reasons for Stall<strong>in</strong>g Threads<br />

The reasons why a thread cannot issue any <strong>in</strong>structions <strong>in</strong> a certa<strong>in</strong> cycle can<br />

be divided <strong>in</strong>to five classes:<br />

branch. Fixed latency <strong>of</strong> a branch: 2 cycles<br />

memfix. M<strong>in</strong>imum latency <strong>of</strong> a memory access: 3 cycles<br />

membusy. Additional stall cycles, when a memory <strong>in</strong>struction cannot be executed,<br />

because the memory is busy with an operation from another thread.<br />

pipel<strong>in</strong>e. The desired pipel<strong>in</strong>e is already occupied by a higher priority thread.<br />

(never applies to the HPT)<br />

fetch. The <strong>in</strong>struction w<strong>in</strong>dow is empty.<br />

Fig. 3 shows the distribution <strong>of</strong> the reasons for not issu<strong>in</strong>g <strong>in</strong>structions <strong>of</strong> the<br />

highest priority thread (HPT), depend<strong>in</strong>g on the memory latencies. The x-axis<br />

gives the reason for a delay and the numbers on the x-axis <strong>in</strong>dicate the memory<br />

latencies: the first number is the latency <strong>of</strong> the <strong>in</strong>struction memory, the second<br />

one the latency for data memory.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!