Lecture 2 â Threads - many-core.group

More documents

Recommendations

Info

GPU strategy 2 – what went wrong? • The shared memory kernel performed worse than expected mainly because <strong>many</strong> threads do not compute (just load into shared memory): • For a 16x16 block, 60 threads do not compute (23%) • But max threads per block is 512 (sqrt(512)=22.6) (Also – stencil is small - little reuse)
GPU strategy 3 • Can use larger blocks (more compute threads) if : • For each block, start a line of threads (in i direction) • Load three lines into shared memory, then compute one line • Then load next line into shared memory, and proceed in j direction
Page 1 and 2: Lecture 2 - Threads Graham Pullan D
Page 3 and 4: Threads, thread blocks and shared m
Page 5 and 6: Threads • Example in L1 made no u
Page 7 and 8: Kernel for c = a + b __global__ voi
Page 9 and 10: Threads, blocks and grid Block is s
Page 11 and 12: More on thread blocks • Thread bl
Page 13 and 14: Streaming multiprocessors and share
Page 15 and 16: Governing equation • Heat conduct
Page 17 and 18: 2D heat conduction • In 2D: ∂T
Page 19 and 20: Domain
Page 21 and 22: Results Initial field After 50000 s
Page 23 and 24: GPU strategy 1 • Start a thread f
Page 25 and 26: GPU strategy 1 - threads and blocks
Page 27 and 28: GPU strategy 1 - kernel launch code
Page 29 and 30: Performance • CPU - 1 core, Intel
Page 31 and 32: GPU strategy 2 - threads and blocks
Page 33 and 34: GPU strategy 2 - kernel (part 1) //
Page 35: Performance • CPU - 1 core, Intel
Page 39 and 40: GPU strategy 3
Page 45 and 46: Lecture 2 summary

Lecture 2 â Threads - many-core.group

Create successful ePaper yourself

Delete template?

Save as template?

Lecture 2 â Threads - many-core.group