Non-linear memory layout transformations and data prefetching ...

More documents

Recommendations

Info

30 Basic Concepts
CHAPTER3Fast Indexing for Blocked ArrayLayoutsThis chapter proposes a new method to perform blocked array layouts combined with a fastindexing scheme for numerical codes. We use static loop performance analysis to specify theoptimal loop nesting order and legal transformations (including tiling) that give the best recompositionof iterations. Array elements are stored exactly as they are swept by the tiled instructionstream in a blocked layout. We nally apply our ecient indexing to the resulting optimizedcode, to easily translate multi-dimensional indexing of arrays into their blocked memory layoutusing quick and simple binary-mask operations.The remainder of this chapter is organized as follows: Section 3.1 briey discusses the problemof data locality using as example the typical matrix multiplication algorithm. Section 3.2reviews denitions related to Morton ordering. Section 3.3 presents previously proposed nonlineararray layouts, as well as our blocked array layouts along with our ecient array indexing.Finally, concluding remarks are presented in Section 3.4.3.1 The problem: Improving cache locality for array computationsIn this section, we elaborate on the necessity for both control (loop) and data transformations,to fully exploit data locality. We present, stepwise, all optimization phases to improve localityof references with the aid of the typical matrix multiplication kernel.1. unoptimized version
Page 1: NATIONAL TECHNICAL UNIVERSITY OF AT
Page 4 and 5: .................Evangelia G. Athan
Page 6 and 7: vióôéò ìç-ãñáììéêÝò
Page 8 and 9: viiiAnother issue, that had not bee
Page 10 and 11: xCONTENTS2.7 Iteration Space . . .
Page 12 and 13: xiiCONTENTS
Page 14 and 15: xivLIST OF FIGURES4.4 Alignment of
Page 16 and 17: xviLIST OF FIGURES
Page 18 and 19: xviiiLIST OF TABLES
Page 20 and 21: äýíáìç ðñïò ôç óõíå
Page 22 and 23: 2 Introductionthe processor die its
Page 24 and 25: 4 Introductionthe instruction strea
Page 26 and 27: 6 IntroductionUnfortunately, the pe
Page 28 and 29: 8 Introduction• A study of the ee
Page 30 and 31: 10 Introduction
Page 32 and 33: 12 Basic ConceptsPart of the On-chi
Page 34 and 35: 14 Basic Conceptsapplication execut
Page 36 and 37: 16 Basic Conceptscache entries, it
Page 38 and 39: 18 Basic Concepts• Least-Recently
Page 40 and 41: 20 Basic Conceptstable. Of course,
Page 42 and 43: 22 Basic ConceptsIn the above examp
Page 44 and 45: 24 Basic Concepts• Forward expres
Page 46 and 47: 26 Basic ConceptsThe following unim
Page 48 and 49: 28 Basic Conceptsfor (i = 0; i
Page 52 and 53: 32 Fast Indexing for Blocked Array
Page 76 and 77: 56 A Tile Size Selection Analysis[M
Page 78 and 79: 58 A Tile Size Selection Analysismi
Page 80 and 81: 60 A Tile Size Selection Analysismi
Page 82 and 83: 62 A Tile Size Selection Analysisd.
Page 84 and 85: 64 A Tile Size Selection AnalysisRe
Page 86 and 87: 66 A Tile Size Selection AnalysisTh
Page 88 and 89: 68 A Tile Size Selection AnalysisSu
Page 90 and 91: 70 A Tile Size Selection AnalysisWe
Page 92 and 93: 72 A Tile Size Selection AnalysisTL
Page 94 and 95: 74 A Tile Size Selection Analysis4.
Page 96 and 97: 76 Simultaneous MultithreadingAlong
Page 98 and 99: 78 Simultaneous Multithreadingas be
Page 100 and 101:
80 Simultaneous MultithreadingFor c
Page 102 and 103:
82 Simultaneous Multithreading871th
Page 104 and 105:
84 Simultaneous MultithreadingExami
Page 106 and 107:
86 Experimental Results3500MBalttil
Page 108 and 109:
88 Experimental Results2000MBaLttil
Page 110 and 111:
90 Experimental Results140MBaLtMBaL
Page 112 and 113:
92 Experimental Resultsnumber of mi
Page 114 and 115:
94 Experimental Resultsnumber of mi
Page 116 and 117:
96 Experimental ResultsTotal penalt
Page 118 and 119:
98 Experimental Resultsso that dier
Page 120 and 121:
100 Experimental Results10STRMMSSYR
Page 122 and 123:
102 Experimental ResultsNorm. Perfo
Page 124 and 125:
104 Experimental Resultstool. Figur
Page 126 and 127:
106 Experimental Results
Page 128 and 129:
108 Conclusionsdata are now stored
Page 131 and 132:
APPENDIXATable of SymbolsExplanatio
Page 133 and 134:
APPENDIXBHardware ArchitectureUltra
Page 135 and 136:
APPENDIXCProgram CodesIn the follow
Page 137 and 138:
C.4 SSYMM: Symmetric Matrix-Matrix
Page 139 and 140:
Bibliography[AAKK05]Evangelia Athan
Page 141 and 142:
BIBLIOGRAPHY 121ference on Programm
Page 143 and 144:
BIBLIOGRAPHY 123[KPCM99][KRC97][KRC
Page 145 and 146:
BIBLIOGRAPHY 125[RS01][RT98a][RT98b
Page 147:
BIBLIOGRAPHY 127[WM95][WMC96]Wm. A.
show all

Non-linear memory layout transformations and data prefetching ...

Create successful ePaper yourself

Delete template?

Save as template?