Magellan Final Report - Office of Science - U.S. Department of Energy
Magellan Final Report - Office of Science - U.S. Department of Energy
Magellan Final Report - Office of Science - U.S. Department of Energy
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
<strong>Magellan</strong> <strong>Final</strong> <strong>Report</strong><br />
9.2 OpenStack Benchmarking<br />
All <strong>of</strong> the tests in this section were run on the ALCF OpenStack virtual machines or virtual clusters, and<br />
a corresponding number <strong>of</strong> IBM x3650 servers were used for the bare-metal (or ’raw’) comparison. This<br />
allowed us to determine the precise overhead cost <strong>of</strong> running in an open-stack virtualized environment. All<br />
s<strong>of</strong>tware between the virtual and raw hardware clusters was identical, and MPICH2 v1.4 was used by the<br />
applications.<br />
9.2.1 SPEC CPU<br />
SPEC-CPU2006 is an intensive benchmarking suite made available by the Standard Performance Evaluation<br />
Corporation [78]. It is designed to stress the CPU, memory, and compiler <strong>of</strong> a given system for the purpose<br />
<strong>of</strong> acquiring general performance numbers. A total <strong>of</strong> 29 benchmarks were run in two passes on both the<br />
virtual machine and on raw hardware; the first pass used integer operations and the second pass used floating<br />
point operations. Only the time-to-solution is reported below.<br />
(a) Integer<br />
(b) Floating point<br />
Figure 9.10: SPEC CPU benchmark results.<br />
At first glance, the results in Figure 9.10 seem to show greater performance for the virtualized machine; the<br />
most notable exception being MCF. This particular benchmark is noted as having been modified for increased<br />
memory cache performance. Two other benchmarks that show better performance on the raw hardware are<br />
MILC and CactusADM; these are also known as particularly memory-cache friendly applications. This leads<br />
us to believe that KVM hides the performance penalty associated with cache misses. As such, CPU cycles<br />
spent idle while an application replenishes its L2 cache are only added to the total runtime on hardware.<br />
Further investigation <strong>of</strong> the impact <strong>of</strong> the memory cache was out <strong>of</strong> the scope <strong>of</strong> this project. Other studies<br />
have shown the effects <strong>of</strong> memory performance in VMs[43]<br />
9.2.2 MILC<br />
As an example <strong>of</strong> a primarily compute-bound scientific application, the MIMD Lattice Computation[61] application<br />
was chosen as the second benchmark. MIMD Lattice Computation is a commonly used application<br />
for parallel simulation <strong>of</strong> four-dimensional lattice gauge theory. Two passes were run on each cluster, using<br />
one core per node and 8 cores per node, respectively.<br />
The results in Figure 9.11 show an almost six-fold increase in runtime for the one-core-per-node run, which<br />
is almost certainly a result <strong>of</strong> poor MPI performance due to high latency (quantified below in the Phloem<br />
SQMR discussion). As for the 256-core run, while the time-to-solution improves for the raw hardware cluster,<br />
it increases by a factor <strong>of</strong> five for the virtual cluster. Again, this is a result <strong>of</strong> poor MPI performance due to<br />
high latency, an exponential increase in the number <strong>of</strong> core-to-core messages being passed, and contention<br />
among processes for network resources. This is further exacerbated by the OpenStack network model which<br />
requires a single network service to mediate all internal traffic.<br />
60