13.07.2015 Views

Intel(R) - Computational and Systems Biology at MIT

Intel(R) - Computational and Systems Biology at MIT

Intel(R) - Computational and Systems Biology at MIT

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

LINPACK <strong>and</strong> MP LINPACK Benchmarks 10— Save time by compiling with -DENDEARLY -DASYOUGO2 (described in the Optionsto reduce search time section) <strong>and</strong> using a neg<strong>at</strong>ive threshold (Do not to use aneg<strong>at</strong>ive threshold on the final run th<strong>at</strong> you intend to submit if you are doing aTop500 entry!) You can set the threshold in line 13 of the HPL 1.0a input fileHPL.d<strong>at</strong>.— If you are going to run a problem to completion, do it with -DASYOUGO (seeOptions to reduce search time section).5. Using the quick performance feedback, return to step 3 <strong>and</strong> iter<strong>at</strong>e until you are sureth<strong>at</strong> the performance is as good as possible.Options to reduce search timeRunning huge problems to completion on large numbers of nodes can take many hours.The search space for MP LINPACK is also huge: not only can you run any size problem, butover a number of block sizes, grid layouts, lookahead steps, using different factoriz<strong>at</strong>ionmethods, etc. It can be a large waste of time to run a huge problem to completion only todiscover it ran 0.01% slower than your previous best problem.There are 3 options you might want to experiment with to reduce the search time:• -DASYOUGO• -DENDEARLY• -DASYOUGO2Use cautiously, as it does have a marginal performance impact. To see DGEMM internalperformance, compile with -DASYOUGO2 <strong>and</strong> -DASYOUGO2_DISPLAY. This will give lotsof useful DGEMM performance inform<strong>at</strong>ion <strong>at</strong> the cost of around 0.2% performance loss.If you want the old HPL back, simply don't define these options <strong>and</strong> recompile from scr<strong>at</strong>ch(try "make arch= clean_arch_all").-DASYOUGO: Gives performance d<strong>at</strong>a as the run proceeds. The performance always startsoff higher <strong>and</strong> then drops because this actually happens in LU decomposition. The ASYOUGOperformance estim<strong>at</strong>e is usually an overestim<strong>at</strong>e (because LU slows down as it goes), but itgets more accur<strong>at</strong>e as the problem proceeds. The gre<strong>at</strong>er the lookahead step, the lessaccur<strong>at</strong>e the first number may be. ASYOUGO tries to estim<strong>at</strong>e where one is in the LUdecomposition th<strong>at</strong> MP LINPACK performs <strong>and</strong> this is always an overestim<strong>at</strong>e as comparedto ASYOUGO2, which measures actually achieved DGEMM performance. Note th<strong>at</strong> theASYOUGO output is a subset of the inform<strong>at</strong>ion th<strong>at</strong> ASYOUGO2 provides. So, refer to thedescription of the -DASYOUGO2 option below for the details of the output.-DENDEARLY: Termin<strong>at</strong>es the problem after a few steps, so th<strong>at</strong> you can set up 10 or 20HPL runs without monitoring them, see how they all do, <strong>and</strong> then only run the fastest onesto completion. -DENDEARLY assumes -DASYOUGO. You do not need to define both, althoughit doesn't hurt. Because the problem termin<strong>at</strong>es early, it is recommended setting the10-9

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!