9 <strong>Intel</strong>® M<strong>at</strong>h Kernel Library User’s Guide is one of -lmkl_blacs, -lmkl_blacs_intelmpi,-lmkl_blacs_intelmpi20,-mkl_blacs_openmpi is -lmkl_scalapack_core <strong>and</strong>/or -lmkl_cdft_core is forScaLAPACK, <strong>and</strong> for Cluster FFTs. are LAPACK, processor optimized kernels,threading library, <strong>and</strong> system library for threading support linked as described <strong>at</strong> thebeginning of section Link Comm<strong>and</strong> Syntax in Chapter 5.For example, if you are using <strong>Intel</strong> MPI 3.x, wish to st<strong>at</strong>ically use the LP64 interface withScaLAPACK <strong>and</strong> to have only one MPI process per core (<strong>and</strong> thus do not employ threading),provide the following linker options:-L$MKLPATH -I$MKLINCLUDE -Wl,--start-group $MKLPATH/libmkl_intel_lp64.a$MKLPATH/libmkl_scalapack_lp64 $MKLPATH/libmkl_blacs_intelmpi20_lp64$MKLPATH/libmkl_sequential.a $MKLPATH/libmkl_core.a -st<strong>at</strong>ic_mpi-Wl,--end-group -lpthread –lmFor more examples, see Examples for Linking with ScaLAPACK <strong>and</strong> Cluster FFT.Note th<strong>at</strong> <strong>and</strong> library should correspond to the MPIversion. For instance, if it is <strong>Intel</strong> MPI 2.x, then <strong>and</strong>libmkl_blacs_intelmpi20 libraries are used. To link with <strong>Intel</strong> MPI 3.0 or 3.1, alsolibmkl_blacs_intelmpi20 should be used.For inform<strong>at</strong>ion on linking with <strong>Intel</strong>® MKL libraries, see Chapter 5 Linking Your Applic<strong>at</strong>ionwith <strong>Intel</strong>® M<strong>at</strong>h Kernel Library.Setting the Number of ThreadsThe OpenMP* software responds to the environmental variable OMP_NUM_THREADS. <strong>Intel</strong>®MKL 10.0 has also introduced other mechanisms to set the number of threads, such asMKL_NUM_THREADS or MKL_DOMAIN_NUM_THREADS (see section “Using Additional ThreadingControl” in chapter 6). Make certain th<strong>at</strong> the relevant environment variable has the same<strong>and</strong> correct value on all the nodes. <strong>Intel</strong> MKL 10.0 also no longer sets the default numberof threads to 1, but depends on the compiler to set the default number. For the threadinglayer based on the <strong>Intel</strong>® compiler (libmkl_intel_thread.a), this value is the numberof CPUs according to the OS. Be cautious to avoid over-prescribing the number of threads,which may occur, for instance, when the number of MPI ranks per node <strong>and</strong> the number ofthreads per node are both gre<strong>at</strong>er than one.The best way to set, for example, the environment variable OMP_NUM_THREADS is in thelogin environment. Remember th<strong>at</strong> mpirun starts a fresh default shell on all of the nodes<strong>and</strong> so, changing this value on the head node <strong>and</strong> then doing the run (which works on anSMP system) will not effectively change the variable as far as your program is concerned.In .bashrc, you could add a line <strong>at</strong> the top, which looks like this:9-2
Working with <strong>Intel</strong>® M<strong>at</strong>h Kernel Library Cluster Software 9OMP_NUM_THREADS=1; export OMP_NUM_THREADSIt is possible to run multiple CPUs per node using MPICH, but the MPICH must be built toallow it. Be aware th<strong>at</strong> certain MPICH applic<strong>at</strong>ions may not work perfectly in a threadedenvironment (see the Known Limit<strong>at</strong>ions section in the Release Notes). The safest thing formultiple CPUs, although not necessarily the fastest, is to run one MPI process perprocessor with OMP_NUM_THREADS set to one. Always verify th<strong>at</strong> the combin<strong>at</strong>ion withOMP_NUM_THREADS=1 works correctly.Using Shared LibrariesAll needed shared libraries must be visible on all the nodes <strong>at</strong> run time. One way toaccomplish this is to point these libraries by the LD_LIBRARY_PATH environment variable inthe .bashrc file. If <strong>Intel</strong> MKL is installed only on one node, you should link st<strong>at</strong>ically whenbuilding your <strong>Intel</strong> MKL applic<strong>at</strong>ions.The <strong>Intel</strong>® compilers or GNU compilers can be used to compile a program th<strong>at</strong> uses <strong>Intel</strong>MKL. However, make certain th<strong>at</strong> MPI implement<strong>at</strong>ion <strong>and</strong> compiler m<strong>at</strong>ch up correctly.ScaLAPACK TestsTo build NetLib ScaLAPACK tests for IA-32, IA-64, or <strong>Intel</strong>® 64 architectures, addlibmkl_scalapack_core.a to your link comm<strong>and</strong>.Examples for Linking with ScaLAPACK <strong>and</strong> Cluster FFTFor inform<strong>at</strong>ion on detailed MKL structure of the architecture-specific directories of thecluster libraries, see section Directory Structure in Detail in Chapter 3.Examples for C ModuleSuppose the following conditions are met:• MPICH 1.2.5 or higher is installed in /opt/mpich,• <strong>Intel</strong>® MKL 10.0 is installed in /opt/intel/mkl/10.0.xxx, where xxx is the <strong>Intel</strong>MKL package number, for example, /opt/intel/mkl/10.0.039.• You use the <strong>Intel</strong>® C Compiler 8.1 or higher <strong>and</strong> the main module is in C.9-3