Slides - Par Lab

More documents

Recommendations

Info

Using per-‐block shared memory ¡ Variables shared across block __shared__ int *begin, *end; ¡ Scratchpad memory __shared__ int scratch[BLOCKSIZE]; scratch[threadIdx.x] = begin[threadIdx.x]; // … compute on scratch values … begin[threadIdx.x] = scratch[threadIdx.x]; ¡ Communicating values between threads Block scratch[threadIdx.x] = begin[threadIdx.x]; __syncthreads(); int left = scratch[threadIdx.x -‐ 1]; ¡ Per-‐block shared memory is faster than L1 cache, slower than register file ¡ It is relatively small: register file is 2-‐4x larger Shared 24/73
CUDA: Features available on GPU ¡ Double and single precision (IEEE compliant) ¡ Standard mathematical functions § sinf, powf, atanf, ceil, min, sqrtf, etc. ¡ Atomic memory operations § atomicAdd, atomicMin, atomicAnd, atomicCAS, etc. ¡ These work on both global and shared memory 25/73
Page 1: An Introduction to CUDA/OpenCL a
Page 13 and 14: What is a CUDA Thread Block?
Page 15 and 16: Synchronization ¡ Threads within
Page 17 and 18: Scalability ¡ Manycore chips exi
Page 19 and 20: Memory model Thread Per-‐threa
Page 26 and 27: CUDA: Runtime support ¡Explicit
Page 28 and 29: CUDA and OpenCL correspondence
Page 30 and 31: Imperatives for Efficient CUDA
Page 32 and 33: Mapping CUDA to a GPU, continu
Page 34 and 35: Profiling NVIDIA Visual Profiler©
Page 36 and 37: Memory, Memory, Memory ! Chapter
Page 38 and 39: Coalescing ¡ GPUs and CPUs both
Page 40 and 41: SoA, AoS ¡ Different data acces
Page 42 and 43: Efficiency and Productivity" Parall
Page 44 and 45: Abstraction, cont." Reduction is on
Page 46 and 47: Diving In#include #include #include
Page 48 and 49: Containers" Concise and readable co
Page 50 and 51: Iterators" Iterators act like point
Page 52 and 53: Algorithms" Elementwise operations"
Page 54 and 55: Algorithms" Standard data types// a
Page 56 and 57: Custom Types & Operators// compare
Page 58 and 59: Recap" Containers manage memory" He
Page 60 and 61: Explicit versus implicit parallelis
Page 62 and 63: Explicit versus implicit parallelis
Page 64 and 65: Productivity Implications" Consider
Page 66 and 67: Productivity Implications" Consider
Page 68 and 69: Leveraging Parallel Primitives" Use
Page 70 and 71: Leveraging Parallel Primitives" Com
Page 72 and 73:
Summary ¡ Throughput optimized p
show all

Slides - Par Lab

Create successful ePaper yourself

Delete template?

Save as template?