CUDA Accelerated Linpack on Clusters - Nvidia

More documents

Recommendations

Info

DGEMM: C = alpha A B + beta C DGEMM(A,B,C) = DGEMM(A,B1,C1) U DGEMM(A,B2,C2) (GPU) (CPU) The idea can be extended to multi-GPU configuration and to handle huge matrices Find the optimal split, knowing the relative performances of the GPU and CPU cores on DGEMM
Overlap DGEMM on CPU and GPU // Copy A cublasSetMatrix (m, k , sizeof(A[0]), A, lda, devA, m_gpu); // Copy B1 cublasSetMatrix (k ,n_gpu, sizeof(B[0]), B, ldb, devB, k_gpu); // Copy C1 cublasSetMatrix (m, n_gpu, sizeof(C[0]), C, ldc, devC, m_gpu); // DGEMM on GPU // Control returns immediately to CPU cublasDgemm('n', 'n', m, n_gpu, k, alpha, devA, m,devB, k, beta, devC, m); // DGEMM on CPU dgemm('n','n',m,n_cpu,k, alpha, A, lda,B+ldb*n_gpu, ldb, beta,C+ldc*n_gpu, ldc); // Copy C1 status = cublasGetMatrix (m, n, sizeof(C[0]), devC, m, C, *ldc); Using <strong>CUDA</strong>, it is very easy to express the workflow in the diagram
Page 1 and 2: CUDA Accel
Page 3 and 4: LINPACK Benchmark The LINPACK bench
Page 5 and 6: LINPACK Benchmark Solve a dense NxN
Page 7: NVIDIA Tesla GPU Computing Products
Page 11 and 12: DGEMM performance on GPU (T10) A DG
Page 13 and 14: Results on workstation SUN Ultra 24
Page 15 and 16: Effect of PCI-e bandwidth Page lock
Page 17 and 18: FERMI T20 GPU
Page 19 and 20: T10 DGEMM 16 • T10 DGEMM algorith
Page 21 and 22: T20 DGEMM Performance • Inner Loo
Page 23 and 24: DGEMM Compute/COPY Analysis
Page 25 and 26: Fermi DGEMM Strategy • Slice Matr
Page 27 and 28: Additional Overlap strategy • Cop
Page 29 and 30: Optimizations - Auto Split • Keep
Page 31 and 32: DTRSM Same basic splitting approach
Page 33 and 34: DTRSM • GPU implimentation based
Page 35 and 36: DTRSM • Repeat: • DTRSM diagona
Page 37 and 38: Results on single node Supermicro 6
Page 39 and 40: Linpack Score (GFL
Page 41 and 42: Nebulae Cluster #2 on Latest Top500
Page 43 and 44: Future Challenges • Compute growi

CUDA Accelerated Linpack on Clusters - Nvidia

Create successful ePaper yourself

Delete template?

Save as template?