with CUDA Fortran
with CUDA Fortran
with CUDA Fortran
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
Instruction-Level Parallelism <strong>with</strong> CUF<br />
Kernels<br />
• When product of specified grid and block size are smaller than<br />
loop in that dimension<br />
!$cuf kernel do (2) <br />
do j = 1 ny<br />
do i = 1, nx<br />
c_d(i,j) = a_d(i,j) + b_d(i,j)<br />
enddo<br />
enddo<br />
• If nx==1024, each thread calculates 4 elements of c_d