Slides - Par Lab

More documents

Recommendations

Info

Coalescing ¡ GPUs and CPUs both perform memory transactions at a larger granularity than the program requests (“cache line”) ¡ GPUs have a “coalescer”, which examines memory requests dynamically from different SIMD lanes and coalesces them ¡ To use bandwidth effectively, when threads load, they should: § Present a set of unit strided loads (dense accesses) § Keep sets of loads aligned to vector boundaries 38/73
Data Structure Padding L(row major) ¡ Multidimensional arrays are usually stored as monolithic vectors in memory ¡ Care should be taken to assure aligned memory accesses for the necessary access pattern J39/73
Page 1: An Introduction to CUDA/OpenCL a
Page 13 and 14: What is a CUDA Thread Block?
Page 15 and 16: Synchronization ¡ Threads within
Page 17 and 18: Scalability ¡ Manycore chips exi
Page 19 and 20: Memory model Thread Per-‐threa
Page 24 and 25: Using per-‐block shared memo
Page 26 and 27: CUDA: Runtime support ¡Explicit
Page 28 and 29: CUDA and OpenCL correspondence
Page 30 and 31: Imperatives for Efficient CUDA
Page 32 and 33: Mapping CUDA to a GPU, continu
Page 34 and 35: Profiling NVIDIA Visual Profiler©
Page 36 and 37: Memory, Memory, Memory ! Chapter
Page 40 and 41: SoA, AoS ¡ Different data acces
Page 42 and 43: Efficiency and Productivity" Parall
Page 44 and 45: Abstraction, cont." Reduction is on
Page 46 and 47: Diving In#include #include #include
Page 48 and 49: Containers" Concise and readable co
Page 50 and 51: Iterators" Iterators act like point
Page 52 and 53: Algorithms" Elementwise operations"
Page 54 and 55: Algorithms" Standard data types// a
Page 56 and 57: Custom Types & Operators// compare
Page 58 and 59: Recap" Containers manage memory" He
Page 60 and 61: Explicit versus implicit parallelis
Page 62 and 63: Explicit versus implicit parallelis
Page 64 and 65: Productivity Implications" Consider
Page 66 and 67: Productivity Implications" Consider
Page 68 and 69: Leveraging Parallel Primitives" Use
Page 70 and 71: Leveraging Parallel Primitives" Com
Page 72 and 73: Summary ¡ Throughput optimized p

Slides - Par Lab

Create successful ePaper yourself

Delete template?

Save as template?