21.01.2013 Views

Lecture Notes in Computer Science 4917

Lecture Notes in Computer Science 4917

Lecture Notes in Computer Science 4917

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

342 M. Moreto et al.<br />

of ρ quickly adapt to phase changes but tend to forget the behavior of the<br />

application. Small performance variations are obta<strong>in</strong>ed for different values of ρ<br />

rang<strong>in</strong>g from 0 to 1, with a peak for ρ =0.5. Furthermore, this value is very<br />

convenient as we can use a shifter to update histograms. Next, a new period of<br />

measur<strong>in</strong>g MLP-aware SDHs beg<strong>in</strong>s. The key contribution of this paper is the<br />

method to obta<strong>in</strong> MLP-aware SDHs that we expla<strong>in</strong> <strong>in</strong> the follow<strong>in</strong>g Subsection.<br />

3.2 MLP-Aware Stack Distance Histogram<br />

As previously stated, M<strong>in</strong>Misses assumes that all L2 accesses are equally important<br />

<strong>in</strong> terms of performance. However, this is not always true. Cache misses<br />

affect differently the performance of applications, even <strong>in</strong>side the same application.<br />

As was said <strong>in</strong> [9], an isolated L2 data miss has a penalty cost that can be<br />

approximated by the average memory latency. In the case of a burst of L2 data<br />

misses that fit <strong>in</strong> the ROB, the penalty cost is shared among misses as L2 misses<br />

can be served <strong>in</strong> parallel. In case of L2 <strong>in</strong>struction misses, they are serialized as<br />

fetch stops. Thus, L2 <strong>in</strong>struction misses have a constant miss penalty and MLP.<br />

We want to assign a cost to each L2 access accord<strong>in</strong>g to its effect on performance.<br />

In [13] a similar idea was used to modify LRU eviction policy for s<strong>in</strong>gle<br />

core and s<strong>in</strong>gle threaded architectures. In our situation, we have a CMP scenario<br />

where the shared L2 cache has a number of reserved ways for each core. At the<br />

end of a measur<strong>in</strong>g period, we can decide to cont<strong>in</strong>ue with the same partition<br />

or change it. If we decide to modify the partition, a core i that had wi reserved<br />

ways will receive w ′ i �= wi. Ifwi w ′ i , the thread receives less ways and some hits <strong>in</strong> the old configuration<br />

will become misses. Thus, we want to have an estimation of the performance effects<br />

when misses are converted <strong>in</strong>to hits and vice versa. Throughout this paper,<br />

we will call this impact on performance MLP cost. All accesses are treated as if<br />

they were <strong>in</strong> the correct path until the branch prediction is checked. All misses<br />

on the wrong path are not considered as accesses <strong>in</strong> flight.<br />

MLP cost of L2 misses. If we force an L2 configuration that assigns exactly<br />

w ′ i = di ways to thread i with w ′ i >wi, some of the L2 misses of this thread will<br />

(a) MSHR (b) MSHR fields<br />

Fig. 2. Miss Status Hold<strong>in</strong>g Register

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!