15.01.2013 Views

U. Glaeser

U. Glaeser

U. Glaeser

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

(a) Instruction Cache (b) Trace Cache<br />

FIGURE 8.17 Example dynamic sequence stored in an instruction cache and trace cache.<br />

and their results [9], which would otherwise be thrown away due to branch mispredictions, but control<br />

independence mechanisms have numerous difficult implementation issues [22]. Moreover, a large instruction<br />

window does nothing to reduce the execution time of long data dependence chains, which ultimately<br />

limit performance if branch mispredictions do not. Value prediction and other forms of data speculation<br />

break data dependence chains [10], but difficult implementation issues must be resolved, such as providing<br />

high-bandwidth value prediction and high-performance recovery mechanisms.<br />

The hierarchical organization of trace processors can be leveraged to overcome implementation barriers<br />

to data speculation and control independence. The interested reader may learn more about trace<br />

processor control independence mechanisms and data speculation from other sources [21,24,25].<br />

Trace Cache and Trace Predictor: Efficient High-Bandwidth<br />

Instruction Fetching<br />

Conventional instruction caches are unable to meet future fetch bandwidth requirements because of<br />

taken branches in the dynamic instruction stream. A taken branch instruction and its target instruction<br />

reside in different cache lines, or in the same cache line with unwanted instructions in between, as shown<br />

in Fig. 8.17(a). Figure 8.17(a) shows a long dynamic sequence of instructions made up of four fetch<br />

blocks separated by taken branches. Ideally, to keep a 16-issue machine well-supplied, the entire sequence<br />

needs to be fetched in a single cycle. But, because the fetch blocks are noncontiguous, it takes at least<br />

four cycles to fetch and assemble the desired sequence.<br />

The fundamental problem is instruction caches store instructions in their static order. A trace cache<br />

[8,14,18,20] stores instructions the way they appear in the dynamic instruction stream. Figure 8.17(b)<br />

shows the same sequence of four fetch blocks stored contiguously in one trace cache line. The trace cache<br />

allows multiple, otherwise noncontiguous fetch blocks to be fetched in a single cycle. A trace in this<br />

context is a dynamic sequence of instructions with a hardware-defined length limit (e.g., 16 or 32<br />

instructions), containing any number of embedded taken and not-taken branches.<br />

A trace cache can be incorporated in the fetch mechanism in several ways. One possibility is to replace<br />

the conventional instruction cache with a trace cache. More likely, both a trace cache and instruction<br />

cache are used. In trace processors, described in the next section, the trace cache is accessed first and, if<br />

it does not have the desired trace, the trace is quickly constructed from the back-up instruction cache.<br />

Early trace cache fetch units [14,20] access the trace cache and instruction cache in parallel, as shown in<br />

Fig. 8.18. If the trace exists in the trace cache, it supplies instructions and the instruction cache’s<br />

instructions are discarded since they are subsumed by the trace. Otherwise, the instruction cache supplies<br />

a smaller fetch block.<br />

A trace is uniquely identified by the program counter of the first instruction in the trace (start PC)<br />

and embedded branch outcomes (taken/not-taken bit for every branch; this assumes indirect branches<br />

terminate a trace, since taken/not-taken is insufficient for indirect branches). The start PC and branch<br />

outcomes are collectively called the trace identifier, or trace id. Looking up a trace in the trace cache is<br />

similar to looking up instructions/data in conventional caches, except the trace id is used instead of an<br />

address. A subset of the trace id forms an index into the trace cache and the remaining bits form a tag.<br />

© 2002 by CRC Press LLC

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!