29.12.2014 Views

Magellan Final Report - Office of Science - U.S. Department of Energy

Magellan Final Report - Office of Science - U.S. Department of Energy

Magellan Final Report - Office of Science - U.S. Department of Energy

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>Magellan</strong> <strong>Final</strong> <strong>Report</strong><br />

As message size increases, overhead related to latency and connection instantiation become less significant.<br />

The results shown in Figure 9.12 demonstrate that while the raw hardware cluster has neared its practical<br />

maximum bandwidth at 16 KB, the virtual cluster is still 50 kbps below its average at 2 MB. Of particular<br />

note, variability in the results on the virtual cluster was incredibly high; with a 2 MB message size, a<br />

maximum reported bandwidth value <strong>of</strong> 181.74 kbps and a minimum <strong>of</strong> 32.24 kbps were reported.<br />

The difference between clusters in zero-length message bandwidth was also <strong>of</strong> concern. While an average<br />

<strong>of</strong> 329,779 messages passed between ranks every second on the raw hardware cluster, the virtual cluster could<br />

only manage an average <strong>of</strong> 21,793 per second. This is a strong indictment <strong>of</strong> the virtual network model as<br />

it pertains to MPI workload performance.<br />

Figure 9.13: Phloem: Selected MPI method tests, average performance ratio (16 ranks, 8 ranks per node).<br />

Figure 9.13 shows the performance penalties associated with individual one-to-many and many-to-many<br />

MPI calls, as measured by the Phloem mpiBench Bcast utility, run on a 16-rank 2-node cluster to maintain<br />

relevance with regard to the above SQMR tests. These are the averages <strong>of</strong> the per-method virtual cluster<br />

over raw hardware cluster time-to-solution ratios, chosen because they remain remarkably consistent across<br />

varying message sizes. This demonstrates that, in aggregation, the high point-to-point latency illustrated by<br />

the SQMR tests will result in a consistently reproducible multiplier for general parallel coding methods.<br />

Figure 9.14: Phloem: Selected MPI method tests, average performance ratio across 1000 trials.<br />

Figure 9.14 shows the scaling characteristics <strong>of</strong> the virtual cluster as compared to raw hardware, again<br />

62

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!