13.07.2015 Views

Intel(R) Math Kernel Library for Linux* OS User's Guide

Intel(R) Math Kernel Library for Linux* OS User's Guide

Intel(R) Math Kernel Library for Linux* OS User's Guide

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

LINPACK and MP LINPACK Benchmarks 11Options to Reduce Search TimeRunning huge problems to completion on large numbers of nodes can take many hours.The search space <strong>for</strong> MP LINPACK is also huge: not only can you run any size problem, butover a number of block sizes, grid layouts, lookahead steps, using different factorizationmethods, etc. It can be a large waste of time to run a huge problem to completion only todiscover it ran 0.01% slower than your previous best problem.There are 3 options to reduce the search time:• -DASYOUGO• -DENDEARLY• -DASYOUGO2Use -DASYOUGO2 cautiously because it does have a marginal per<strong>for</strong>mance impact. Tosee DGEMM internal per<strong>for</strong>mance, compile with -DASYOUGO2 and-DASYOUGO2_DISPLAY. These options provide a lot of useful DGEMM per<strong>for</strong>mancein<strong>for</strong>mation at the cost of around 0.2% per<strong>for</strong>mance loss.If you want to use the old HPL, simply omit these options and recompile from scratch. Todo this, try "make arch= clean_arch_all".-DASYOUGO: Gives per<strong>for</strong>mance data as the run proceeds. The per<strong>for</strong>mance always startsoff higher and then drops because this actually happens in LU decomposition 1 . TheASYOUGO per<strong>for</strong>mance estimate is usually an overestimate (because the LU decompositionslows down as it goes), but it gets more accurate as the problem proceeds. The greater thelookahead step, the less accurate the first number may be. ASYOUGO tries to estimatewhere one is in the LU decomposition that MP LINPACK per<strong>for</strong>ms and this is always anoverestimate as compared to ASYOUGO2, which measures actually achieved DGEMMper<strong>for</strong>mance. Note that the ASYOUGO output is a subset of the in<strong>for</strong>mation that ASYOUGO2provides. So, refer to the description of the -DASYOUGO2 option below <strong>for</strong> the details of theoutput.-DENDEARLY: Terminates the problem after a few steps, so that you can set up 10 or 20HPL runs without monitoring them, see how they all do, and then only run the fastest onesto completion. -DENDEARLY assumes -DASYOUGO. You do not need to define both, althoughit doesn't hurt. To avoid the residual check <strong>for</strong> a problem that terminates early, set the"threshold" parameter in HPL.dat to a negative number when testing ENDEARLY. It alsosometimes gives a better picture to compile with -DASYOUGO2 when using -DENDEARLY.Usage notes on -DENDEARLY follow:— -DENDEARLY stops the problem after a few iterations of DGEMM on the blocksize(the bigger the blocksize, the further it gets). It prints only 5 or 6 "updates",whereas -DASYOUGO prints about 46 or so output elements be<strong>for</strong>e the problemcompletes.1. A decomposition of a matrix into a product of a lower (L) and upper (U) triangular matrices.11-11

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!