13.07.2015 Views

Intel(R) - Computational and Systems Biology at MIT

Intel(R) - Computational and Systems Biology at MIT

Intel(R) - Computational and Systems Biology at MIT

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

10 <strong>Intel</strong>® M<strong>at</strong>h Kernel Library User’s Guide"threshold" parameter in HPL.d<strong>at</strong> to a neg<strong>at</strong>ive number when testing ENDEARLY. There isno point in doing a residual check if the problem ended early. It also sometimes gives abetter picture to compile with -DASYOUGO2 when using -DENDEARLY.You need to know the specifics of -DENDEARLY:— -DENDEARLY stops the problem after a few iter<strong>at</strong>ions of DGEMM on the blocksize(the bigger the blocksize, the further it gets). It prints only 5 or 6 "upd<strong>at</strong>es",whereas -DASYOUGO prints about 46 or so outputs before the problem completes.— Performance for -DASYOUGO <strong>and</strong> -DENDEARLY always starts off <strong>at</strong> one speed,slowly increases, <strong>and</strong> then slows down toward the end (because th<strong>at</strong> is wh<strong>at</strong> LUdoes). -DENDEARLY is likely to termin<strong>at</strong>e before it starts to slow down.— -DENDEARLY termin<strong>at</strong>es the problem early with an HPL Error exit. It means th<strong>at</strong>you need to ignore the missing residual results, which are wrong, as the problemnever completed. However, you can get an idea wh<strong>at</strong> the initial performance was,<strong>and</strong> if it looks good, then run the problem to completion without -DENDEARLY. Toavoid the error check, you can set HPL's threshold parameter in HPL.d<strong>at</strong> to aneg<strong>at</strong>ive number.— Though -DENDEARLY termin<strong>at</strong>es early, HPL tre<strong>at</strong>s the problem as completed <strong>and</strong>computes Gflop r<strong>at</strong>ing as though the problem ran to completion. Ignore thiserroneously high r<strong>at</strong>ing.— The bigger the problem, the more accur<strong>at</strong>ely the last upd<strong>at</strong>e th<strong>at</strong> -DENDEARLYreturns will be close to wh<strong>at</strong> happens when the problem runs to completion.-DENDEARLY is a poor approxim<strong>at</strong>ion for small problems. It is for this reason th<strong>at</strong>you are suggested to use ENDEARLY in conjunction with ASYOUGO2, becauseASYOUGO2 reports actual DGEMM performance, which can be a closerapproxim<strong>at</strong>ion to problems just starting.The best known compile options for Itanium® 2 processor are with the <strong>Intel</strong>®compiler <strong>and</strong> look like this:-O2 -ipo -ipo_obj -ftz -IPF_fltacc -IPF_fma -unroll -w -tpp2-DASYOUGO2: Gives detailed single-node DGEMM performance inform<strong>at</strong>ion. It captures allDGEMM calls (if you use Fortran BLAS) <strong>and</strong> records their d<strong>at</strong>a. Because of this, the routinehas a marginal intrusive overhead. Unlike -DASYOUGO, which is quite non-intrusive,-DASYOUGO2 is interrupting every DGEMM call to monitor its performance. You shouldbeware of this overhead, although for big problems, it is, for sure, less than 1/10th of apercent.Here is a sample ASYOUGO2 output (the first 3 non-intrusive numbers can be found inASYOUGO <strong>and</strong> ENDEARLY), so it suffices to describe these numbers here:Col=001280 Fract=0.050 Mflops=42454.99 (DT= 9.5 DF= 34.1DMF=38322.78).The problem size was N=16000 with a blocksize of 128. After 10 blocks, th<strong>at</strong> is, 1280columns, an output was sent to the screen. Here, the fraction of columns completed is1280/16000=0.08. Only about 20 outputs are printed, <strong>at</strong> various places through the10-10

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!