12.07.2015 Views

Non-linear memory layout transformations and data prefetching ...

Non-linear memory layout transformations and data prefetching ...

Non-linear memory layout transformations and data prefetching ...

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

CHAPTER3Fast Indexing for Blocked ArrayLayoutsThis chapter proposes a new method to perform blocked array <strong>layout</strong>s combined with a fastindexing scheme for numerical codes. We use static loop performance analysis to specify theoptimal loop nesting order <strong>and</strong> legal <strong>transformations</strong> (including tiling) that give the best recompositionof iterations. Array elements are stored exactly as they are swept by the tiled instructionstream in a blocked <strong>layout</strong>. We nally apply our ecient indexing to the resulting optimizedcode, to easily translate multi-dimensional indexing of arrays into their blocked <strong>memory</strong> <strong>layout</strong>using quick <strong>and</strong> simple binary-mask operations.The remainder of this chapter is organized as follows: Section 3.1 briey discusses the problemof <strong>data</strong> locality using as example the typical matrix multiplication algorithm. Section 3.2reviews denitions related to Morton ordering. Section 3.3 presents previously proposed non<strong>linear</strong>array <strong>layout</strong>s, as well as our blocked array <strong>layout</strong>s along with our ecient array indexing.Finally, concluding remarks are presented in Section 3.4.3.1 The problem: Improving cache locality for array computationsIn this section, we elaborate on the necessity for both control (loop) <strong>and</strong> <strong>data</strong> <strong>transformations</strong>,to fully exploit <strong>data</strong> locality. We present, stepwise, all optimization phases to improve localityof references with the aid of the typical matrix multiplication kernel.1. unoptimized version

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!