30.07.2015 Views

Slides - Par Lab

Slides - Par Lab

Slides - Par Lab

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Coalescing ¡ GPUs and CPUs both perform memory transactions at a larger granularity than the program requests (“cache line”) ¡ GPUs have a “coalescer”, which examines memory requests dynamically from different SIMD lanes and coalesces them ¡ To use bandwidth effectively, when threads load, they should: § Present a set of unit strided loads (dense accesses) § Keep sets of loads aligned to vector boundaries 38/73

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!