13.07.2015 Views

Intel(R) - Computational and Systems Biology at MIT

Intel(R) - Computational and Systems Biology at MIT

Intel(R) - Computational and Systems Biology at MIT

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

6 <strong>Intel</strong>® M<strong>at</strong>h Kernel Library User’s GuideOper<strong>at</strong>ing on DenormalsIf an <strong>Intel</strong> MKL function oper<strong>at</strong>es on denormals, th<strong>at</strong> is, non-zero numbers th<strong>at</strong> are smallerthan the smallest possible non-zero number supported by a given flo<strong>at</strong>ing-point form<strong>at</strong>, orproduces denormals during the comput<strong>at</strong>ion (for instance, if the incoming d<strong>at</strong>a is too closeto the underflow threshold), you may experience considerable performance drop. The CPUst<strong>at</strong>e may be set so th<strong>at</strong> flo<strong>at</strong>ing-point oper<strong>at</strong>ions on denormals invoke the exceptionh<strong>and</strong>ler th<strong>at</strong> slows down the applic<strong>at</strong>ion.To resolve the issue, before compiling the main program, turn on the -ftz option, if youare using the <strong>Intel</strong>® compiler or any other compiler th<strong>at</strong> can control this fe<strong>at</strong>ure. In thiscase, denormals are tre<strong>at</strong>ed as zeros <strong>at</strong> processor level <strong>and</strong> the exception h<strong>and</strong>ler is notinvoked. Note, however, th<strong>at</strong> setting this option slightly impacts the accuracy.Another way to bring the performance back to norm is proper scaling of the input d<strong>at</strong>a toavoid numbers near the underflow threshold.FFT Optimized RadicesYou can gain performance of <strong>Intel</strong> MKL FFT if length of the d<strong>at</strong>a vector permits factoriz<strong>at</strong>ioninto powers of optimized radices.In <strong>Intel</strong> MKL, the list of optimized radices depends upon the architecture:• 2, 3, 4, 5 for IA-32 architecture• 2, 3, 4, 5 for <strong>Intel</strong>® 64 architecture• 2, 3, 4, 5, 7, 11 for IA-64 architecture.Using <strong>Intel</strong>® MKL Memory Management<strong>Intel</strong> MKL has the memory management software th<strong>at</strong> controls memory buffers for use bythe library functions. New buffers th<strong>at</strong> the library alloc<strong>at</strong>es when certain functions (Level 3BLAS or FFT) are called are not dealloc<strong>at</strong>ed until the program ends. To get the amount ofmemory alloc<strong>at</strong>ed by the memory management software, call the MKL_MemSt<strong>at</strong>()function. If <strong>at</strong> some point your program needs to free memory, it may do so with a call toMKL_FreeBuffers(). If another call is made to a library function th<strong>at</strong> needs a memorybuffer, then the memory manager will again alloc<strong>at</strong>e the buffers <strong>and</strong> they will again remainalloc<strong>at</strong>ed until either the program ends or the program dealloc<strong>at</strong>es the memory.This behavior facilit<strong>at</strong>es better performance. However, some tools may report the behavioras a memory leak. You can release memory in your program through the use of a functionmade available in <strong>Intel</strong> MKL or you can force memory releasing after each call by setting anenvironment variable.6-16

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!