Intel(R) - Computational and Systems Biology at MIT
Intel(R) - Computational and Systems Biology at MIT
Intel(R) - Computational and Systems Biology at MIT
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
10 <strong>Intel</strong>® M<strong>at</strong>h Kernel Library User’s Guide"threshold" parameter in HPL.d<strong>at</strong> to a neg<strong>at</strong>ive number when testing ENDEARLY. There isno point in doing a residual check if the problem ended early. It also sometimes gives abetter picture to compile with -DASYOUGO2 when using -DENDEARLY.You need to know the specifics of -DENDEARLY:— -DENDEARLY stops the problem after a few iter<strong>at</strong>ions of DGEMM on the blocksize(the bigger the blocksize, the further it gets). It prints only 5 or 6 "upd<strong>at</strong>es",whereas -DASYOUGO prints about 46 or so outputs before the problem completes.— Performance for -DASYOUGO <strong>and</strong> -DENDEARLY always starts off <strong>at</strong> one speed,slowly increases, <strong>and</strong> then slows down toward the end (because th<strong>at</strong> is wh<strong>at</strong> LUdoes). -DENDEARLY is likely to termin<strong>at</strong>e before it starts to slow down.— -DENDEARLY termin<strong>at</strong>es the problem early with an HPL Error exit. It means th<strong>at</strong>you need to ignore the missing residual results, which are wrong, as the problemnever completed. However, you can get an idea wh<strong>at</strong> the initial performance was,<strong>and</strong> if it looks good, then run the problem to completion without -DENDEARLY. Toavoid the error check, you can set HPL's threshold parameter in HPL.d<strong>at</strong> to aneg<strong>at</strong>ive number.— Though -DENDEARLY termin<strong>at</strong>es early, HPL tre<strong>at</strong>s the problem as completed <strong>and</strong>computes Gflop r<strong>at</strong>ing as though the problem ran to completion. Ignore thiserroneously high r<strong>at</strong>ing.— The bigger the problem, the more accur<strong>at</strong>ely the last upd<strong>at</strong>e th<strong>at</strong> -DENDEARLYreturns will be close to wh<strong>at</strong> happens when the problem runs to completion.-DENDEARLY is a poor approxim<strong>at</strong>ion for small problems. It is for this reason th<strong>at</strong>you are suggested to use ENDEARLY in conjunction with ASYOUGO2, becauseASYOUGO2 reports actual DGEMM performance, which can be a closerapproxim<strong>at</strong>ion to problems just starting.The best known compile options for Itanium® 2 processor are with the <strong>Intel</strong>®compiler <strong>and</strong> look like this:-O2 -ipo -ipo_obj -ftz -IPF_fltacc -IPF_fma -unroll -w -tpp2-DASYOUGO2: Gives detailed single-node DGEMM performance inform<strong>at</strong>ion. It captures allDGEMM calls (if you use Fortran BLAS) <strong>and</strong> records their d<strong>at</strong>a. Because of this, the routinehas a marginal intrusive overhead. Unlike -DASYOUGO, which is quite non-intrusive,-DASYOUGO2 is interrupting every DGEMM call to monitor its performance. You shouldbeware of this overhead, although for big problems, it is, for sure, less than 1/10th of apercent.Here is a sample ASYOUGO2 output (the first 3 non-intrusive numbers can be found inASYOUGO <strong>and</strong> ENDEARLY), so it suffices to describe these numbers here:Col=001280 Fract=0.050 Mflops=42454.99 (DT= 9.5 DF= 34.1DMF=38322.78).The problem size was N=16000 with a blocksize of 128. After 10 blocks, th<strong>at</strong> is, 1280columns, an output was sent to the screen. Here, the fraction of columns completed is1280/16000=0.08. Only about 20 outputs are printed, <strong>at</strong> various places through the10-10