01.12.2012 Views

Architecture of Computing Systems (Lecture Notes in Computer ...

Architecture of Computing Systems (Lecture Notes in Computer ...

Architecture of Computing Systems (Lecture Notes in Computer ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

How to Enhance a Superscalar Processor 3<br />

multithread<strong>in</strong>g capabilities that completely isolate threads from each other, the<br />

tight WCETs <strong>of</strong> superscalar <strong>in</strong>-order processors can be preserved, while the utilisation<br />

and energy-efficiency <strong>of</strong> the processor is <strong>in</strong>creased by concurrent threads.<br />

The contributions <strong>of</strong> this paper are:<br />

– an <strong>in</strong>-order SMT processor that isolates the highest priority thread (HPT),<br />

– the execution <strong>of</strong> the HPT as if the underly<strong>in</strong>g processor was a s<strong>in</strong>glethreaded<br />

superscalar processor to keep the WCET analysis tight,<br />

– a detailed description how the pipel<strong>in</strong>e must be modified to enable SMT and<br />

– a prototype <strong>of</strong> an SMT processor with TriCore <strong>in</strong>struction set architecture.<br />

The result<strong>in</strong>g architecture is called CarCore and we already published articles<br />

on other aspects <strong>of</strong> the architecture: [3] describes how lower priority memory accesses<br />

are delayed to avoid <strong>in</strong>fluence on the HPT and it presents a scheduler that<br />

executes multiple hard real-time threads by time slic<strong>in</strong>g the HPT. In [4] a s<strong>of</strong>t<br />

real-time scheduler with direct IPC control based on the CarCore architecture<br />

is <strong>in</strong>troduced and [5] discussed the <strong>in</strong>tegration <strong>of</strong> scratchpad memories.<br />

The rest <strong>of</strong> the paper is organised as follows: the next section presents the<br />

related work and section 3 expla<strong>in</strong>s the TriCore architecture and the differences<br />

to the basel<strong>in</strong>e s<strong>in</strong>glethreaded CarCore processor. In section 4 the enhancements<br />

to enable SMT are described <strong>in</strong> detail. Section 5 discusses our evaluation results<br />

and section 6 concludes the paper.<br />

2 Related Work<br />

Tullsen [1] def<strong>in</strong>ed SMT as multithread<strong>in</strong>g for superscalar pipel<strong>in</strong>es. He did not<br />

specify the execution order, but he used an out-<strong>of</strong>-order processor as base architecture<br />

and so did most <strong>of</strong> the later SMT researchers. Consequently, most <strong>of</strong><br />

the work on real-time and SMT is also based on out-<strong>of</strong>-order pipel<strong>in</strong>es [6,7,8,9].<br />

But the unpredictability <strong>of</strong> out-<strong>of</strong>-order pipel<strong>in</strong>es does not allow hard real-time<br />

execution, only s<strong>of</strong>t real-time schedul<strong>in</strong>g is addressed.<br />

Although Hily [10] already showed <strong>in</strong> 1999 that <strong>in</strong>-order SMT <strong>in</strong>creases total<br />

throughput, while out-<strong>of</strong>-order execution only boosts one s<strong>in</strong>gle thread and is less<br />

cost-effective, only few studies focus on design<strong>in</strong>g SMT processors with <strong>in</strong>-order<br />

pipel<strong>in</strong>es [11]. Similar results were published by Moon [12], who discovered, that<br />

static partition<strong>in</strong>g and execution <strong>in</strong>-order has only little negative effect on the<br />

performance while significantly reduc<strong>in</strong>g design complexity. Other studies that<br />

divide the pipel<strong>in</strong>e <strong>in</strong>to an out-<strong>of</strong>-order front-end and an <strong>in</strong>-order back-end [13]<br />

or that restrict certa<strong>in</strong> parts <strong>of</strong> the pipel<strong>in</strong>e to <strong>in</strong>-order execution [14] approved<br />

the advantages <strong>of</strong> <strong>in</strong>-order execution.<br />

Zang et al. [11] <strong>in</strong>vestigated issue mechanism for <strong>in</strong>-order SMT processors.<br />

Their processor has a 7 stage pipel<strong>in</strong>e and can issue up to 6 <strong>in</strong>structions from<br />

6 concurrent threads. A well-known commercial processor that has an <strong>in</strong>-order<br />

SMT architecture is the Intel Atom [2] with a two-way <strong>in</strong>-order pipel<strong>in</strong>e. But<br />

none <strong>of</strong> the mentioned works address hard real-time execution, to our knowledge<br />

our project is the first on hard real-time for <strong>in</strong>-order SMT processors.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!