Tutorial CUDA
Tutorial CUDA
Tutorial CUDA
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
Hardware Implementation:<br />
Execution Model<br />
Each active block is split<br />
into warps in a welldefined<br />
way<br />
Warps are time-sliced<br />
In other words:<br />
Threads within a warp are<br />
executed physically in<br />
parallel<br />
Warps and blocks are<br />
executed logically in<br />
parallel<br />
© NVIDIA Corporation 2008<br />
Host<br />
Kernel<br />
1<br />
Kernel<br />
2<br />
Block (1, 1)<br />
Thread<br />
(0, 0)<br />
Thread<br />
(0, 1)<br />
Thread<br />
(0, 2)<br />
…<br />
Device<br />
Grid 1<br />
Block<br />
(0, 0)<br />
Block<br />
(0, 1)<br />
Grid 2<br />
Thread<br />
(31, 0)<br />
Block<br />
(1, 0)<br />
Block<br />
(1, 1)<br />
Warp 0 Warp 1<br />
Thread<br />
(32, 0)<br />
…<br />
Warp 2 Warp 3<br />
…<br />
Thread<br />
(31, 1)<br />
Thread<br />
(32, 1)<br />
…<br />
Warp 4 Warp 5<br />
…<br />
Thread<br />
(31, 2)<br />
Thread<br />
(32, 2)<br />
…<br />
Block<br />
(2, 0)<br />
Block<br />
(2, 1)<br />
Thread<br />
(63, 0)<br />
Thread<br />
(63, 1)<br />
Thread<br />
(63, 2)