21.01.2013 Views

Lecture Notes in Computer Science 4917

Lecture Notes in Computer Science 4917

Lecture Notes in Computer Science 4917

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Compiler Techniques for Reduc<strong>in</strong>g Data Cache Miss Rate 359<br />

each thread’s “hot” area mapped to a different half of the cache. This is simply<br />

an application of page color<strong>in</strong>g, which is not an unusual OS function.<br />

4.2 Analysis of Data Objects<br />

To facilitate data layout, we consider the address space of an application as<br />

partitioned <strong>in</strong>to several objects. Anobject is loosely def<strong>in</strong>ed as a contiguous<br />

region <strong>in</strong> the (virtual) address space that can be relocated with little help from<br />

the compiler and/or the runtime system. The compiler typically creates several<br />

objects <strong>in</strong> the code and data segment, the start<strong>in</strong>g location and size of which<br />

can be found by scann<strong>in</strong>g the symbol table. A section of memory allocated by a<br />

malloc call can be considered to be a s<strong>in</strong>gle dynamic object, s<strong>in</strong>ce it can easily<br />

be relocated us<strong>in</strong>g an <strong>in</strong>strumented front-end to malloc. However, s<strong>in</strong>ce the<br />

same <strong>in</strong>vocation of malloc can return different addresses <strong>in</strong> different runs of an<br />

application – we need some extra <strong>in</strong>formation to identify the dynamic objects<br />

(that is, to associate a profiled object with the same object at runtime). Similar<br />

to [1], we use an additional tag (henceforth referred to as HeapTag) toidentify<br />

the dynamic objects. HeapTag is generated by xor-fold<strong>in</strong>g the top four addresses<br />

of the return stack and the call-site of malloc. We do not attempt to reorder<br />

stack objects. Instead the stack is treated as a s<strong>in</strong>gle object.<br />

After the objects have been identified, their reference count and lifetime <strong>in</strong>formation<br />

over the simulation w<strong>in</strong>dow can be retrieved by <strong>in</strong>strument<strong>in</strong>g the<br />

application b<strong>in</strong>ary with a tool such as ATOM [22]. Also obta<strong>in</strong>able are the temporal<br />

relationships between the objects, which can be captured us<strong>in</strong>g a temporal<br />

relationship graph (henceforth referred to as TRGSelect graph). The TRGSelect<br />

graph conta<strong>in</strong>s nodes that represent objects (or portions of objects) and edges<br />

between nodes conta<strong>in</strong> a weight which represents how many times the two objects<br />

were <strong>in</strong>terleaved <strong>in</strong> the actual profiled execution.<br />

Temporal relationships are collected at a f<strong>in</strong>er granularity than full objects –<br />

ma<strong>in</strong>ly because some of the objects are much larger than others, and usually only<br />

a small portion of a bigger object has temporal association with the smaller one.<br />

It is more logical to partition the objects <strong>in</strong>to fixed size chunks, and then record<br />

the temporal relationship between chunks. Though all the chunks belong<strong>in</strong>g to<br />

an object are placed sequentially <strong>in</strong> their orig<strong>in</strong>al order, hav<strong>in</strong>g f<strong>in</strong>er-gra<strong>in</strong>ed<br />

temporal <strong>in</strong>formation helps us to make more <strong>in</strong>formed decisions when two conflict<strong>in</strong>g<br />

objects must be put <strong>in</strong> an overlapp<strong>in</strong>g cache region. We have set the<br />

chunk size equal to the block size of the targeted cache. This provides the best<br />

performance, as we now track conflicts at the exact same granularity that they<br />

occur <strong>in</strong> the cache.<br />

4.3 Object and Edge Filter<strong>in</strong>g<br />

Profil<strong>in</strong>g a typical SPEC2000 benchmark, even for a partial profile, <strong>in</strong>volves<br />

tens of thousands of objects, and generates hundreds of millions of temporal<br />

relationship edges between objects. To make this analysis manageable, we must<br />

reduce both the number of nodes (the number of objects) as well as the number

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!