13.07.2015 Views

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

CODING FOR SIMD ARCHITECTURESchunks so that the total size of two blocked A <strong>and</strong> B chunks is smaller than the cachesize. This allows maximum data reuse.A (i, j) access patternjA(i, j) access patternafter blockingBlockingi+< cache sizeB(i, j) access patternafter blockingOM15158Figure 4-3. Loop Blocking Access PatternAs one can see, all the redundant cache misses can be eliminated by applying thisloop blocking technique. If MAX is huge, loop blocking can also help reduce thepenalty from DTLB (data translation look-aside buffer) misses. In addition toimproving the cache/memory performance, this optimization technique also savesexternal bus b<strong>and</strong>width.4.6 INSTRUCTION SELECTIONThe following section gives some guidelines for choosing instructions to complete atask.One barrier to SIMD computation can be the existence of data-dependent branches.Conditional moves can be used to eliminate data-dependent branches. Conditional4-25

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!