Magellan Final Report - Office of Science - U.S. Department of Energy
Magellan Final Report - Office of Science - U.S. Department of Energy
Magellan Final Report - Office of Science - U.S. Department of Energy
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
<strong>Magellan</strong> <strong>Final</strong> <strong>Report</strong><br />
As message size increases, overhead related to latency and connection instantiation become less significant.<br />
The results shown in Figure 9.12 demonstrate that while the raw hardware cluster has neared its practical<br />
maximum bandwidth at 16 KB, the virtual cluster is still 50 kbps below its average at 2 MB. Of particular<br />
note, variability in the results on the virtual cluster was incredibly high; with a 2 MB message size, a<br />
maximum reported bandwidth value <strong>of</strong> 181.74 kbps and a minimum <strong>of</strong> 32.24 kbps were reported.<br />
The difference between clusters in zero-length message bandwidth was also <strong>of</strong> concern. While an average<br />
<strong>of</strong> 329,779 messages passed between ranks every second on the raw hardware cluster, the virtual cluster could<br />
only manage an average <strong>of</strong> 21,793 per second. This is a strong indictment <strong>of</strong> the virtual network model as<br />
it pertains to MPI workload performance.<br />
Figure 9.13: Phloem: Selected MPI method tests, average performance ratio (16 ranks, 8 ranks per node).<br />
Figure 9.13 shows the performance penalties associated with individual one-to-many and many-to-many<br />
MPI calls, as measured by the Phloem mpiBench Bcast utility, run on a 16-rank 2-node cluster to maintain<br />
relevance with regard to the above SQMR tests. These are the averages <strong>of</strong> the per-method virtual cluster<br />
over raw hardware cluster time-to-solution ratios, chosen because they remain remarkably consistent across<br />
varying message sizes. This demonstrates that, in aggregation, the high point-to-point latency illustrated by<br />
the SQMR tests will result in a consistently reproducible multiplier for general parallel coding methods.<br />
Figure 9.14: Phloem: Selected MPI method tests, average performance ratio across 1000 trials.<br />
Figure 9.14 shows the scaling characteristics <strong>of</strong> the virtual cluster as compared to raw hardware, again<br />
62