13.07.2015 Views

Intel(R) - Computational and Systems Biology at MIT

Intel(R) - Computational and Systems Biology at MIT

Intel(R) - Computational and Systems Biology at MIT

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

LINPACK <strong>and</strong> MP LINPACK Benchmarks 10m<strong>at</strong>rix decomposition: fractions 0.005,0.010,0.015,0.02,0.025,0.03,0.035,0.04,0.045,0.05,0.055,0.06,0.065,0.07,0.075,0.080,0.085,0.09,0.095,.10,...,.195,.295,.395,...,.895. However, this problem size is so small <strong>and</strong> theblock size so big by comparison th<strong>at</strong> as soon as it printed the value for 0.045, it wasalready through 0.08 fraction of the columns. On a really big problem, the fractionalnumber will be more accur<strong>at</strong>e. It never prints more than the 46 numbers above. So,smaller problems will have fewer than 46 upd<strong>at</strong>es, <strong>and</strong> the biggest problems will haveprecisely 46 upd<strong>at</strong>es.The Mflops is an estim<strong>at</strong>e based on 1280 columns of LU being completed. However,with lookahead steps, sometimes th<strong>at</strong> work is not actually completed when the outputis made. Nevertheless, this is a good estim<strong>at</strong>e for comparing identical runs.The 3 numbers in parenthesis are intrusive ASYOUGO2 addins. The DT is the total timeprocessor 0 has spent in DGEMM. The DF is the number of billion oper<strong>at</strong>ions th<strong>at</strong> havebeen performed in DGEMM by one processor. Hence, the performance of processor 0 (inGflops) in DGEMM is always DF/DT. Using the number of DGEMM flops as a basis insteadof the number of LU flops, you get a lower bound on performance of our run by looking<strong>at</strong> DMF, which can be compared to Mflops above (It uses the global LU time, but theDGEMM flops are computed under the assumption th<strong>at</strong> the problem is evenly distributedamongst the nodes, as only HPL’s node (0,0) returns any output.)Note th<strong>at</strong> when using the above performance monitoring tools to compare differentHPL.d<strong>at</strong> inputs, you should beware th<strong>at</strong> the p<strong>at</strong>tern of performance drop off th<strong>at</strong> LUexperiences is sensitive to some of the inputs. For instance, when you try very smallproblems, the performance drop off from the initial values to end values is very rapid. Thelarger the problem, the less the drop off, <strong>and</strong> it is probably safe to use the first fewperformance values to estim<strong>at</strong>e the difference between a problem size 700000 <strong>and</strong>701000, for instance. Another factor th<strong>at</strong> influences the performance drop off is the griddimensions (P <strong>and</strong> Q). For big problems, the performance tends to fall off less from the firstfew steps when P <strong>and</strong> Q are roughly equal in value. You can make use of a large number ofparameters, such as broadcast types, <strong>and</strong> change them so th<strong>at</strong> the final performance isdetermined very closely by the first few steps.Using these tools will gre<strong>at</strong>ly assist the amount of d<strong>at</strong>a you can test.10-11

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!