Non-linear memory layout transformations and data prefetching ...
Non-linear memory layout transformations and data prefetching ...
Non-linear memory layout transformations and data prefetching ...
- No tags were found...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
CHAPTER3Fast Indexing for Blocked ArrayLayoutsThis chapter proposes a new method to perform blocked array <strong>layout</strong>s combined with a fastindexing scheme for numerical codes. We use static loop performance analysis to specify theoptimal loop nesting order <strong>and</strong> legal <strong>transformations</strong> (including tiling) that give the best recompositionof iterations. Array elements are stored exactly as they are swept by the tiled instructionstream in a blocked <strong>layout</strong>. We nally apply our ecient indexing to the resulting optimizedcode, to easily translate multi-dimensional indexing of arrays into their blocked <strong>memory</strong> <strong>layout</strong>using quick <strong>and</strong> simple binary-mask operations.The remainder of this chapter is organized as follows: Section 3.1 briey discusses the problemof <strong>data</strong> locality using as example the typical matrix multiplication algorithm. Section 3.2reviews denitions related to Morton ordering. Section 3.3 presents previously proposed non<strong>linear</strong>array <strong>layout</strong>s, as well as our blocked array <strong>layout</strong>s along with our ecient array indexing.Finally, concluding remarks are presented in Section 3.4.3.1 The problem: Improving cache locality for array computationsIn this section, we elaborate on the necessity for both control (loop) <strong>and</strong> <strong>data</strong> <strong>transformations</strong>,to fully exploit <strong>data</strong> locality. We present, stepwise, all optimization phases to improve localityof references with the aid of the typical matrix multiplication kernel.1. unoptimized version