19.08.2015 Views

Using MKL, the Intel Math Kernel Library (v11.0.0.079) - ICHEC

Using MKL, the Intel Math Kernel Library (v11.0.0.079) - ICHEC

Using MKL, the Intel Math Kernel Library (v11.0.0.079) - ICHEC

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>ICHEC</strong>TECHNICALREPORT€€Alin M Elena & Buket B. Gursoy<strong>ICHEC</strong> Computational Scientists− ∂ 2 u∂x − ∂ 2 u+ qu = f (x, y),q = const ≥ 02 2∂yGeneral form of <strong>the</strong> Discrete Fourier Transform<strong>Using</strong> <strong>MKL</strong> :: <strong>Intel</strong> <strong>Math</strong> <strong>Kernel</strong> <strong>Library</strong>2D Helmholtz Equationn d −1 n 2 −1 n 1 −1⎛⎛ d ⎞⎞z k1 ,k 2 ,...,k d= σ × ∑...∑ ∑W j1 , j 2 ,..., j dexp⎜⎜δi2π∑j lk l/n l ⎟⎟⎝⎝⎠⎠j d = 0j 2 = 0j 1 = 0<strong>Intel</strong> <strong>MKL</strong> (<strong>Math</strong> <strong>Kernel</strong> <strong>Library</strong>) provides ma<strong>the</strong>matical routines forscientific and engineering applications which <strong>Intel</strong> have optimised andcustomised for performance. The library provides sequential, threaded anddistributed versions of routines. The library targets three architectures IA-32(32bit machines), <strong>Intel</strong> 64 (AMD64/EM64T machines) and IA64 (Itaniumprocessor family).l=1Introduction<strong>Intel</strong> <strong>MKL</strong> is available on Stokes. To use <strong>the</strong> lastinstalled version just load <strong>the</strong> intel-mkl module. This willconfigure your environment to use <strong>MKL</strong> for <strong>Intel</strong> 64architecture as appropriate for Stokes. At <strong>the</strong> time of writing<strong>the</strong> default version is 11.0.0.079.$ module load intel-mkl<strong>MKL</strong> provides routines in <strong>the</strong> following areas:•BLAS•Sparse BLAS•LAPACK•PBLAS•ScaLAPACK•Sparse Solver routines•Vector Ma<strong>the</strong>matical <strong>Library</strong> Functions•Statistical Functions•Fourier Transform Functions•Partial Differential Equations Support•Nonlinear Optimisation Problem Solvers•Support Functions•BLACS Routines•Data Fitting Functions<strong>MKL</strong> supports for C/C++ and Fortran, however oneshould note that not all <strong>the</strong> functions are available directlyfor both C/C++ and Fortran. In <strong>the</strong>se cases mixed languagetechniques should be used. <strong>Intel</strong> <strong>MKL</strong> offers Fortran 90/95and up interfaces for some routines, BLAS and LAPACK.Linking<strong>Using</strong> <strong>MKL</strong>Introduction 1Linking 1Interface layer 2Threading layer 2Computational layer 2Run-time library layer 3Linking model 4Static & Dynamic 5Examples 6-8<strong>Intel</strong> <strong>MKL</strong> employs a layered model for linking to moreeasily account for differences in, threading, computation,run-time libraries (RTL) on different operating systems. Thelayers are:•Interface Layer•Threading Layer•Computational Layer•Run-Time <strong>Library</strong> Layer1


•Interface LayerThis layer provides matching between <strong>the</strong> compiled code of an application and <strong>the</strong> threading/computationalcomponents of <strong>the</strong> library. This layer:•provides an LP64 interface.•access to <strong>MKL</strong> ILP64.•deals with <strong>the</strong> way in which different compilers return function values.•a way to map single precision names to double precision names in applications that employ ILP64programming model (e.g. Cray-style naming).Interface Layer Librarieslibmkl_intel_lp64 libmkl_intel_ilp64 libmkl_gf_lp64 libmkl_gf_ilp64 libmkl_intel_sp2dp•Threading LayerThis layer helps <strong>the</strong> threaded <strong>MKL</strong> to co-operate with compiler level threading. This also provides <strong>the</strong>sequential version layer.Threading Layer Librarieslibmkl_intel_thread libmkl_gnu_thread libmkl_pgi_thread libmkl_sequential•Computational LayerThis is <strong>the</strong> heart of <strong>MKL</strong> and has only one variant for any processor/operating system family, Thecomputational layer accommodates multiple architectures through identification of <strong>the</strong> architecture orarchitectural feature and chooses <strong>the</strong> appropriate binary code at execution. <strong>Intel</strong> <strong>MKL</strong> may be thought of as<strong>the</strong> large computational layer that is unaffected by different computational environments. Then, as it has noRTL requirements, RTLs refer not to <strong>the</strong> computational layer but to one of <strong>the</strong> layers above it: <strong>the</strong> Interfacelayer or Threading layer. The most likely case is matching <strong>the</strong> threading layer with <strong>the</strong> RTL layer.Computational Layer Librarieslibmkl_avx libmkl_core libmkl_def libmkl_mc/mc3 libmkl_scalapack_lp64/ilp64libmkl_vml_def/p4n/mc/mc2/mc3/avx/cmptlibmkl_cdft_core2


•Run-Time <strong>Library</strong> LayerThis layer has run-time library support functions. For example, libiomp and libguide are run-time librariesproviding threading support for <strong>the</strong> OpenMP threading in <strong>Intel</strong> <strong>MKL</strong>. In addition to <strong>the</strong> <strong>Intel</strong> compiler, itprovides support for one more threading compiler on Linux OS (GNU). Note that when using <strong>the</strong> libiompyou should also link against <strong>the</strong> POSIX threads library by appending -lpthread.In addition to <strong>the</strong> libraries provided through <strong>the</strong> layered model you have <strong>the</strong> Fortran 90/95 interfaces andcluster components. Each of <strong>the</strong>m fits in <strong>the</strong> computational or RTL layer.Run-Time Layer Librarieslibiomp5Fortran 90/95 Interfaceslibmkl_lapack95_lp64libmkl_blas95_lp64libmkl_lapack95_ilp64libmkl_blas95_ilp64Cluster Componentslibmkl_blacs_intelmpi_lp64libmkl_blacs_openmpi_lp64libmkl_blacs_sgimpt_lp64libmkl_scalapack_lp64libmkl_blacs_lp64libmkl_blacs_intelmpi_ilp64libmkl_blacs_openmpi_ilp64libmkl_blacs_sgimpt_ilp64libmkl_scalapack_ilp64libmkl_blacs_ilp64libmkl_cdft_coreFFT Interfaceslibfftw2x_cdft_DOUBLE/SINGLE libfftw2xc_intel/intel_sp libfftw2xf_intel/intel_splibfftw3x_cdft/cdft_sp libfftw3xc_intel/intel_sp libfftw3xf_intel/intel_spLP64and ILP64LP64 stands for long and pointer as 64 bit types and ILP64 stands for int, long and pointer as 64 bittypes. To operate on large data arrays (of more than 231-1 elements), you need to select <strong>the</strong> ILP64interface, where integers are 64-bit; o<strong>the</strong>rwise, use <strong>the</strong> default, LP64, interface, where integers are 32-bit.3


Static/Dynamic Linking<strong>Intel</strong> <strong>MKL</strong> supports both linking models static ordynamic. Each of <strong>the</strong>m has <strong>the</strong>ir pros and cons. Staticlinking resolves all symbolic references at link time. Thebehaviour of statically built executables is predictable,because <strong>the</strong>re are no run-time dependencies. The maindisadvantage is that having to relink new versions of <strong>the</strong>library to your application may be error-prone and timeconsuming,because you have to relink <strong>the</strong> entireapplication. Moreover, static linking results in largeexecutables and uses memory less efficiently. If severalexecutables are linked with <strong>the</strong> same library, each of <strong>the</strong>mmust load it into memory independently. This matters mostfor executables having data sizes that are small andcomparable with <strong>the</strong> size of <strong>the</strong> executable.Dynamic linking postpones <strong>the</strong> resolution of someundefined symbolic references until run time. Dynamicallybuilt executables contain those symbols along with a list oflibraries that provide definitions of <strong>the</strong> symbols. When <strong>the</strong>executable is loaded, <strong>the</strong> final linking is done before <strong>the</strong>application runs. If several dynamically built executablesreference <strong>the</strong> same library, it is loaded into memory onlyonce and <strong>the</strong> executables share it, <strong>the</strong>reby saving memory.Dynamic linking enables you to separately update <strong>the</strong>libraries on which applications depend and does notrequire relinking <strong>the</strong> applications. The developmentadvantages of dynamic linking are achieved at some costto performance, because every unresolved symbol has tobe looked up in a dedicated table and resolved at runtime.Linking SyntaxDynamic Case -L$<strong>MKL</strong>PATH -I$<strong>MKL</strong>INCLUDE[-lmkl_blas{95_ilp64|95_lp64}][-lmkl_lapack{95_ilp64|95_lp64}][ ]-lmkl_{intel_ilp64|intel_lp64|intel_sp2dp|gf_ilp64|gf_lp64}-lmkl_{intel_thread|gnu_thread|pgi_thread|sequential}-lmkl_core[-liomp5] [-lpthread] [-lm] [-ldl]Static Case -L$<strong>MKL</strong>PATH -I$<strong>MKL</strong>INCLUDE[$<strong>MKL</strong>PATH/libmkl_blas{95_ilp64|95_lp64}.a][$<strong>MKL</strong>PATH/libmkl_lapack{95_ilp64|95_lp64}.a]-Wl,--start-group[ ]$<strong>MKL</strong>PATH/libmkl_{intel_ilp64|intel_lp64|intel_sp2dp|gf_ilp64|gf_lp64}.a$<strong>MKL</strong>PATH/libmkl_{intel_thread|gnu_thread|pgi_thread|sequential}.a$<strong>MKL</strong>PATH/libmkl_core.a-Wl,--end-group[-liomp5] [-lpthread] [-lm] [-ldl]Notes: {a,b,c} only one of <strong>the</strong> libraries should be chosen, [a] optional. On stokes and stoney$<strong>MKL</strong>PATH=$<strong>MKL</strong>ROOT/lib/intel64$<strong>MKL</strong>INCLUDE=$<strong>MKL</strong>ROOT/include5


Linking examplesOne should note <strong>the</strong> dummy libraries mkl_intel, mkl_gf, mkl_scalapack, are just pointing to <strong>the</strong> _lp64 libraries.Fur<strong>the</strong>r details about <strong>the</strong> libraries contained by <strong>MKL</strong> can be found in <strong>the</strong> “<strong>Intel</strong> <strong>MKL</strong> User Guide”:http://registrationcenter.intel.com/irc_nas/2690/mkl_userguide_lnx.pdfOn stokes and/or stoney you will need to load <strong>the</strong> following modules intel-cc and/or intel-fc, intel-mkl and mvapich2-intel for <strong>the</strong> distributed version. The $<strong>MKL</strong>ROOT variable is defined when <strong>the</strong> <strong>MKL</strong> module is loaded and will correspond toa given version of <strong>the</strong> <strong>MKL</strong> e.g. it may be set to: /ichec/packages/intel/composerxe_mkl/2013.0.079/mkl.Sequential and Threaded:intel_lp64, sequential, this example will link all <strong>the</strong> domain specific functions available in <strong>MKL</strong> which do not need anyextra libraries or interfaces, e.g. BLAS, FFT, VML.Dynamic-L$<strong>MKL</strong>ROOT/lib/intel64 -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lpthreadStatic-L$<strong>MKL</strong>ROOT/lib/intel64 -Wl,--start-group$<strong>MKL</strong>ROOT/lib/intel64/libmkl_intel_lp64.a $<strong>MKL</strong>ROOT/lib/intel64/libmkl_sequential.a$<strong>MKL</strong>ROOT/lib/intel64/libmkl_core.a -Wl,--end-group -lpthreadintel_lp64, intel_threads, iomp5Dynamic-L$<strong>MKL</strong>ROOT/lib/intel64 -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -liomp5 -lpthreadStatic$<strong>MKL</strong>ROOT/lib/intel64 -Wl,--start-group$<strong>MKL</strong>ROOT/lib/intel64/libmkl_intel_lp64.a $<strong>MKL</strong>ROOT/lib/intel64/libmkl_intel_thread.a$<strong>MKL</strong>ROOT/lib/intel64/libmkl_core.a -Wl,--end-group -liomp5 -lpthreadLapack, intel_lp64, intel_threads, iomp5Dynamic-L$<strong>MKL</strong>ROOT/lib/intel64 -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -liomp5 -lpthread -lmStatic-L$<strong>MKL</strong>ROOT/lib/intel64 -Wl,--start-group$<strong>MKL</strong>ROOT/lib/intel64/libmkl_intel_lp64.a $<strong>MKL</strong>ROOT/lib/intel64/libmkl_intel_thread.a$<strong>MKL</strong>ROOT/lib/intel64/libmkl_core.a -Wl,--end-group -liomp5 -lpthread -lmLapack Fortran 90/95 interface, intel_lp64, intel_threads, iomp5Dynamic-L$<strong>MKL</strong>ROOT/lib/intel64 -lmkl_lapack95_lp64 -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core-liomp5 -lpthread -lmStatic-L$<strong>MKL</strong>ROOT/lib/intel64 $<strong>MKL</strong>ROOT/lib/intel64/libmkl_lapack_lp64.a -Wl,--start-group $<strong>MKL</strong>ROOT/lib/intel64/libmkl_intel_lp64.a $<strong>MKL</strong>ROOT/lib/intel64/libmkl_intel_thread.a $<strong>MKL</strong>ROOT/lib/intel64/libmkl_core.a6


Parallel Direct Sparse Solver, intel_lp64, intel_threads, iomp5Dynamic-L$<strong>MKL</strong>ROOT/lib/intel64 -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -liomp5 -lpthread -lmStatic-L$<strong>MKL</strong>ROOT/lib/intel64 -Wl,--start-group$<strong>MKL</strong>ROOT/lib/intel64/libmkl_intel_thread.a $<strong>MKL</strong>ROOT/lib/intel64/libmkl_core.a -Wl,--endgroup-liomp5 -lpthread -lmcblas,intel_lp64, intel_threads, iomp5Dynamic-L$<strong>MKL</strong>ROOT/lib/intel64 -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -liomp5 -lpthread -lmStatic-L$<strong>MKL</strong>ROOT/lib/intel64 -Wl,--start-group $<strong>MKL</strong>ROOT/lib/intel64/libmkl_intel_lp64.a$<strong>MKL</strong>ROOT/lib/intel64/libmkl_intel_thread.a $<strong>MKL</strong>ROOT/lib/intel64/libmkl_core.a -Wl,--endgroup-liomp5 -lpthread -lm±Touse single precision fftw interfaces add <strong>the</strong> suffix _sp to <strong>the</strong> fftw interface name. Do not forget toadd <strong>the</strong> preprocessor option -DFFTW_ENABLE_FLOAT at compile time and include <strong>the</strong> headers from$<strong>MKL</strong>ROOT/include/fftwfftw2xf interface, intel_lp64, intel_threads, iomp5-L$<strong>MKL</strong>ROOT/lib/intel64 _I$<strong>MKL</strong>ROOT/include/fftw $<strong>MKL</strong>ROOT/lib/intel64/libfftw2xf_intel.a-Wl,--start-group $<strong>MKL</strong>ROOT/lib/intel64/libmkl_intel_lp64.a $<strong>MKL</strong>ROOT/lib/intel64/libmkl_intel_thread.a $<strong>MKL</strong>ROOT/lib/intel64/libmkl_core.a -Wl,--end-group -liomp5 -lpthread -lmfftw2xc interface, intel_lp64, intel_threads, iomp5-L$<strong>MKL</strong>ROOT/lib/intel64 -I$<strong>MKL</strong>ROOT/include/fftw $<strong>MKL</strong>ROOT/lib/intel64/libfftw2xc_intel.a-Wl,--start-group $<strong>MKL</strong>ROOT/lib/intel64/libmkl_intel_lp64.a$<strong>MKL</strong>ROOT/lib/intel64/libmkl_intel_thread.a $<strong>MKL</strong>ROOT/lib/intel64/libmkl_core.a-Wl,--end-group -liomp5 -lpthread -lmfftw3xf interface, intel_lp64, intel_threads, iomp-L$<strong>MKL</strong>ROOT/lib/intel64 -I$<strong>MKL</strong>ROOT/include/fftw $<strong>MKL</strong>ROOT/lib/intel64/libfftw3xf_intel.a-Wl,--start-group $<strong>MKL</strong>ROOT/lib/intel64/libmkl_intel_lp64.a$<strong>MKL</strong>ROOT/lib/intel64/libmkl_intel_thread.a $<strong>MKL</strong>ROOT/lib/intel64/libmkl_core.a-Wl,--end-group -liomp5 -lpthread -lm7

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!