CUBLAS Library

More documents

Recommendations

Info

CUDACUBLAS LibraryThe code in fortran.c has been used to demonstrate interoperabilitywith the compilers g77 3.2.3 on 32‐bit Linux, g77 3.4.5 on 64‐bit Linux,and Intel Fortran 9.0 on 32‐bit and 64‐bit Microsoft Windows XP. Notethat for g77, use of the compiler flag -fno-second-underscore isrequired to use these wrappers as provided. Also, the use of thedefault calling conventions with regard to argument and return valuepassing is expected. Using the flag -fno-f2c changes the defaultcalling convention with respect to these two items.Two kinds of wrapper functions are provided. The thunking wrappersallow interfacing to existing Fortran applications without any changesto the applications. During each call, the wrappers allocate GPUmemory, copy source data from CPU memory space to GPU memoryspace, call CUBLAS, and finally copy back the results to CPU memoryspace and deallocate the GPU memory. As this process causes verysignificant call overhead, these wrappers are intended for light testing,not for production code. By default, non‐thunking wrappers are usedfor production code. To enable the thunking wrappers, symbolCUBLAS_USE_THUNKING must be defined for the compilation offortran.c.The non‐thunking wrappers, intended for production code, substitutedevice pointers for vector and matrix arguments in all BLAS functions.To use these interfaces, existing applications need to be modifiedslightly to allocate and deallocate data structures in GPU memoryspace (using CUBLAS_ALLOC and CUBLAS_FREE) and to copy databetween GPU and CPU memory spaces (using CUBLAS_SET_VECTOR,CUBLAS_GET_VECTOR, CUBLAS_SET_MATRIX, andCUBLAS_GET_MATRIX). The sample wrappers provided in fortran.cmap device pointers to 32‐bit integers on the Fortran side, regardlessof whether the host platform is a 32‐bit or 64‐bit platform.One approach to deal with index arithmetic on device pointers inFortran code is to use C‐style macros, and use the C preprocessor toexpand these, as shown in the example below. On Linux, one way ofpre‐processing is to invoke 'g77 -E -x f77-cpp-input'. On Windowsplatforms with Microsoft Visual C/C++, using 'cl -EP' achievessimilar results.When traditional fixed‐form Fortran 77 code is ported to CUBLAS, linelength often increases when the BLAS calls are exchanged for CUBLAS76 PG-00000-002_V1.1NVIDIA
CHAPTER ACUBLAS Fortran Bindingscalls. Longer function names and possible macro expansion arecontributing factors. Inadvertently exceeding the maximum linelength can lead to run‐time errors that are difficult to find, so careshould be taken not to exceed the 72‐column limit if fixed form isretained.The following examples show a small application implemented inFortran 77 on the host, and show the same application using the nonthunkingwrappers after it has been ported to use CUBLAS.Example A.1. Fortran 77 Application Executing on the Hostsubroutine modify (m, ldm, n, p, q, alpha, beta)implicit noneinteger ldm, n, p, qreal*4 m(ldm,*), alpha, betaexternal sscalcall sscal (n-p+1, alpha, m(p,q), ldm)call sscal (ldm-p+1, beta, m(p,q), 1)returnendprogram matrixmodimplicit noneinteger M, Nparameter (M=6, N=5)real*4 a(M,N)integer i, jdo j = 1, Ndo i = 1, Ma(i,j) = (i-1) * M + jenddoenddocall modify (a, M, N, 2, 3, 16.0, 12.0)do j = 1, Ndo i = 1, Mwrite(*,"(F7.0$)") a(i,j)enddowrite (*,*) ""PG-00000-002_V1.1 77NVIDIA
Page 1 and 2:
CUDACUBLAS LibraryPG-00000-002_V1.1
Page 3 and 4:
Table of ContentsThe CUBLAS Library
Page 5 and 6:
C HAPTER1The CUBLAS LibraryCUBLAS i
Page 7 and 8:
CHAPTER 1The CUBLAS LibraryExample
Page 9 and 10:
CHAPTER 1The CUBLAS LibraryExample
Page 11 and 12:
CHAPTER 1The CUBLAS LibraryCUBLAS T
Page 13:
CHAPTER 1The CUBLAS Librarycreates
Page 16 and 17:
C HAPTER2BLAS1 FunctionsLevel 1 Bas
Page 18 and 19:
CUDACUBLAS LibraryError status for
Page 20 and 21:
CUDACUBLAS Librarywherelx = 1 if in
Page 22 and 23:
CUDACUBLAS LibraryInput (continued)
Page 24 and 25:
Page 26 and 27:
CUDACUBLAS LibraryOutputx rotated v
Page 28 and 29:
Page 30 and 31: CUDACUBLAS Libraryly is defined in
Page 32 and 33: CUDACUBLAS LibraryOutputreturns sin
Page 34 and 35: CUDACUBLAS LibraryError status for
Page 36 and 37: CUDACUBLAS LibraryThe elements of x
Page 38 and 39: CUDACUBLAS LibraryOutputxycontains-
Page 40 and 41: CUDACUBLAS LibraryInputnxincxnumber
Page 42 and 43: C HAPTER3BLAS2 and BLAS3 FunctionsT
Page 44 and 45: CUDACUBLAS Libraryalpha and beta ar
Page 46 and 47: CUDACUBLAS LibraryInput (continued)
Page 52 and 53: CUDACUBLAS LibraryOutputA updated a
Page 54 and 55: CUDACUBLAS Libraryprecision element
Page 56 and 57: CUDACUBLAS LibraryFunction cublasSt
Page 62 and 63: CUDACUBLAS Libraryx is an n‐eleme
Page 64 and 65: CUDACUBLAS LibraryOutputx updated t
Page 74 and 75: CUDACUBLAS LibraryReference: http:/
Page 76 and 77: CUDACUBLAS LibrarySingle-precision
Page 78 and 79: CUDACUBLAS Library74 PG-00000-002_V
Page 82 and 83: CUDACUBLAS LibraryExample A.1. Fort
Page 84: CUDACUBLAS Library80 PG-00000-002_V
show all

CUBLAS Library

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?