14.11.2012 Views

PC Architecture. A book by Michael B. Karbo

PC Architecture. A book by Michael B. Karbo

PC Architecture. A book by Michael B. Karbo

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Athlon 64 128 KB 512 KB<br />

Athlon 64 FX 128 KB 1024 KB<br />

Pentium 4 (III, “Prescott”) 28 KB 1024 KB<br />

Figure 76. The most common processors and their caches.<br />

Latency<br />

A very important aspect of all RAM – cache included – is latency. All RAM storage has a certain latency, which<br />

means that a certain number of clock ticks (cycles) must pass between, for example, two reads. L1 cache has less<br />

latency than L2; which is why it is so efficient.<br />

When the cache is <strong>by</strong>passed to read directly from RAM, the latency is many times greater. In Fig. 77 the number of<br />

wasted clock ticks are shown for various CPU’s. Note that when the processor core has to fetch data from the actual<br />

RAM (when both L1 and L2 have failed), it costs around 150 clock ticks. This situation is called stalling and needs to<br />

be avoided.<br />

Note that the Pentium 4 has a much smaller L1 cache than the Athlon XP, but it is significantly faster. It simply<br />

takes fewer clock ticks (cycles) to fetch data:<br />

Latency Pentium II Athlon Pentium 4<br />

L1 cache: 3 cycles 3 cycles 2 cycles<br />

L2 cache: 18 cycles 6 cycles 5 cycles<br />

Figure 77. Latency leads to wasted clock ticks; the fewer there are of these, the faster the processor will appear to<br />

be.<br />

Intelligent ”data prefetch”<br />

In CPU’s like the Pentium 4 and Athlon XP, a handful of support mechanisms are also used which work in parallel<br />

with the cache. These include:<br />

A hardware auto data prefetch unit, which attempts to guess which data should be read into the cache. This device<br />

monitors the instructions being processed and predicts what data the next job will need.<br />

Related to this is the Translation Look-aside Buffer, which is also a kind of cache. It contains information which<br />

constantly supports the supply of data to the L1 cache, and this buffer is also being optimised in new processor<br />

designs. Both systems contribute to improved exploitation of the limited bandwidth in the memory system.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!