30.04.2014 Views

CUDA Accelerated Linpack on Clusters - Nvidia

CUDA Accelerated Linpack on Clusters - Nvidia

CUDA Accelerated Linpack on Clusters - Nvidia

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Optimizati<strong>on</strong>s – Auto Split<br />

• Keep track of CPU and GPU performance and adjust split<br />

— Wallclock() CPU time<br />

— Cuda event record GPU time<br />

— Compute optimal split for next iterati<strong>on</strong><br />

cudaEventRecord(GPU_start,0);<br />

Loop: launch GPU copys + Kernels<br />

cudaEventRecord(GPU_stop,0);<br />

CPU_start = wallclock();<br />

Call CPU_DGEMM<br />

CPU_stop = wallclock();<br />

cudaEventSynchr<strong>on</strong>ize(GPU_stop);<br />

GPU_GFLOPS = GPU_FLOPS/GPU_TIME<br />

CPU_GFLOPS = CPU_FLOPS/CPU_TIME<br />

SPLIT = GPU_GFLOPS/(GPU_GFLOPS+CPU_GFLOPS)

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!