21.01.2013 Views

Lecture Notes in Computer Science 4917

Lecture Notes in Computer Science 4917

Lecture Notes in Computer Science 4917

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

354 S. Sarkar and D.M. Tullsen<br />

Fraction of all misses<br />

100<br />

80<br />

60<br />

40<br />

20<br />

0<br />

ammp<br />

applu<br />

applu apsi<br />

apsi art<br />

art bzip2<br />

bzip2 crafty<br />

<strong>in</strong>ter-thread conflict<br />

T1 <strong>in</strong>tra-thread conflict<br />

T0 <strong>in</strong>tra-thread conflict<br />

crafty eon facerec fma3d galgel gzip mesa mgrid perl sixtrack swim twolf vortex vpr wupwise<br />

eon facerec fma3d galgel gzip mesa mgrid perl sixtrack swim twolf vortex vpr wupwiseammp<br />

Fig. 1. Percentage of data cache misses that are due to conflict. The cache is 32 KB<br />

direct-mapped, shared by two contexts <strong>in</strong> an SMT processor.<br />

been shown to be a much more energy efficient approach to accelerate processor<br />

performance than other traditional performance optimizations [4,5], and thus<br />

is a strong candidate for <strong>in</strong>clusion <strong>in</strong> high performance embedded architectures.<br />

In a simultaneous multithread<strong>in</strong>g processor with shared caches, however, objects<br />

from different threads compete for the same cache l<strong>in</strong>es – result<strong>in</strong>g <strong>in</strong> potentially<br />

expensive <strong>in</strong>ter-thread conflict misses. These conflicts cannot be analyzed <strong>in</strong><br />

the same manner that was applied successfully by prior work on <strong>in</strong>tra-thread<br />

conflicts. This is because <strong>in</strong>ter-thread conflicts are not determ<strong>in</strong>istic.<br />

Figure 1, which gives the percentage of conflict misses for various pairs of<br />

co-scheduled threads, shows two important trends. First, <strong>in</strong>ter-thread conflict<br />

misses are just as prevalent as <strong>in</strong>tra-thread conflicts (26% vs. 21% of all misses).<br />

Second, the <strong>in</strong>fusion of these new conflict misses significantly <strong>in</strong>creases the overall<br />

importance of conflict misses, relative to other types of misses.<br />

This phenomenon extends beyond multithreaded processors. Multi-core architectures<br />

may share on-chip L2 caches, or possibly even L1 caches [6,7]. However,<br />

<strong>in</strong> this work we focus <strong>in</strong> particular on multithreaded architectures, because they<br />

<strong>in</strong>teract and share caches at the lowest level.<br />

In this paper, we develop new techniques that allow the ideas of CCDP to<br />

be extended to multithreaded architectures, and be effective. We consider the<br />

follow<strong>in</strong>g compilation scenarios:<br />

(1) First we solve the most general case, where we cannot assume we know<br />

which applications will be co-scheduled. This may occur, even <strong>in</strong> an embedded<br />

processor, if we have a set of applications that can run <strong>in</strong> various comb<strong>in</strong>ations.<br />

(2) In more specialized embedded applications, we will be able to more precisely<br />

exploit specific knowledge about the applications and how they will be run.<br />

We may have aprioriknowledge about application sets to be co-scheduled <strong>in</strong> the<br />

multithreaded processor. In these situations, it should be feasible to co-compile,<br />

or at least cooperatively compile, these concurrently runn<strong>in</strong>g applications.<br />

This paper makes the follow<strong>in</strong>g contributions: (1) We show that traditional<br />

multithread<strong>in</strong>g-oblivious cache-conscious data placement is not effective <strong>in</strong> a<br />

multithread<strong>in</strong>g architecture. In some cases, it does more harm than good. (2)<br />

We propose two extensions to CCDP that can identify and elim<strong>in</strong>ate most of

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!