29.01.2013 Views

Tutorial CUDA

Tutorial CUDA

Tutorial CUDA

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Maximize Occupancy to Hide Latency<br />

Sources of latency:<br />

© NVIDIA Corporation 2008<br />

Global memory access: 400-600 cycle latency<br />

Read-after-write register dependency<br />

Instruction’s result can only be read 11 cycles later<br />

Latency blocks dependent instructions in the same<br />

thread<br />

But instructions in other threads are not blocked<br />

Hide latency by running as many threads per<br />

multiprocessor as possible!<br />

Choose execution configuration to maximize<br />

occupancy = (# of active warps) / (maximum # of active warps)<br />

Maximum # of active warps is 24 on G8x

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!