12.07.2015 Views

Non-linear memory layout transformations and data prefetching ...

Non-linear memory layout transformations and data prefetching ...

Non-linear memory layout transformations and data prefetching ...

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

2.3 Cache Organization 15processor clocks favor this organization for st-level caches, that should be kept small insize.• Fully Associative cache: A given block of <strong>memory</strong> elements can reside anywhere in cache.This mapping reduces miss rate because it eliminates conict misses. Of course, increasedassociativity comes at a cost. This is the increase of hit time <strong>and</strong> the high structuralcost. However, even if fast processor clocks favor simple caches for on-chip caches, as wedraw away from processor chip to higher-level caches, increasing of miss penalty rewardsassociativity.• N-way Set Associative cache: A set associative organization denes that cache is dividedinto sets of N cache lines. A given block of <strong>memory</strong> can be placed anywhere within a singleset. This set is determined by the block address:(set mapped) = (Block address) MOD (Number of sets in cache)A direct mapped cache can be considered as an 1-way set associative cache, while a fully associativecache with capacity of M cache lines can be considered as a M-way set associativecache.As far as cache performance is considered, an 8-way set associative cache has, in essence,the same miss rate as a fully associative cache.2.3.1 Pseudo-associative cachesAnother approach to improve miss rates without aecting the processor clock is pseudo-associativecaches. This mechanism is so eective as 2-way associativity. Pseudo-associative caches thenhave one fast <strong>and</strong> one slow hit time -corresponding to a regular hit <strong>and</strong> a pseudo hit. On a hit,pseudo-associative caches work just like direct mapped caches. When a miss occurs in the directmapped entry, an alternate entry (the index with the highest index bit ipped) is checked. A hitto the alternate entry (pseudo-hit) requires an extra cycle. This pseudo-hit results in the twoentries being swapped, so that the next access to the same line would be a fast access. A missin both entries (fast <strong>and</strong> alternative) causes an eviction of whichever of the two lines is LRU.The new <strong>data</strong> is always placed in the fast index, so if the alternate index was evicted, the linein the fast index will need to be moved to the alternate index. So a regular hit takes no extracycles, a pseudo-hit takes 1 cycle, <strong>and</strong> access to the L2 <strong>and</strong> main <strong>memory</strong> takes 1 cycle longerin a system with a pseudo-associative cache than in one without.2.3.2 Victim CachesCache conicts can be addressed in the hardware through associativity of some form. Whileassociativity has the advantage of reducing conicts by allowing locations to map to multiple

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!