30.04.2014 Views

CUDA Accelerated Linpack on Clusters - Nvidia

CUDA Accelerated Linpack on Clusters - Nvidia

CUDA Accelerated Linpack on Clusters - Nvidia

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Fermi DGEMM Strategy<br />

• Slice Matrix into several pieces<br />

• Use Stream API<br />

— Overlap COPY + Compute<br />

Copy A H2D<br />

Loop over pieces:<br />

Copy Bi,Ci H2D<br />

DGEMM A,Bi,Ci<br />

Copy Ci D2H<br />

CPU DGEMM A,B_last,C_last

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!