29.12.2014 Views

Magellan Final Report - Office of Science - U.S. Department of Energy

Magellan Final Report - Office of Science - U.S. Department of Energy

Magellan Final Report - Office of Science - U.S. Department of Energy

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>Magellan</strong> <strong>Final</strong> <strong>Report</strong><br />

9.2 OpenStack Benchmarking<br />

All <strong>of</strong> the tests in this section were run on the ALCF OpenStack virtual machines or virtual clusters, and<br />

a corresponding number <strong>of</strong> IBM x3650 servers were used for the bare-metal (or ’raw’) comparison. This<br />

allowed us to determine the precise overhead cost <strong>of</strong> running in an open-stack virtualized environment. All<br />

s<strong>of</strong>tware between the virtual and raw hardware clusters was identical, and MPICH2 v1.4 was used by the<br />

applications.<br />

9.2.1 SPEC CPU<br />

SPEC-CPU2006 is an intensive benchmarking suite made available by the Standard Performance Evaluation<br />

Corporation [78]. It is designed to stress the CPU, memory, and compiler <strong>of</strong> a given system for the purpose<br />

<strong>of</strong> acquiring general performance numbers. A total <strong>of</strong> 29 benchmarks were run in two passes on both the<br />

virtual machine and on raw hardware; the first pass used integer operations and the second pass used floating<br />

point operations. Only the time-to-solution is reported below.<br />

(a) Integer<br />

(b) Floating point<br />

Figure 9.10: SPEC CPU benchmark results.<br />

At first glance, the results in Figure 9.10 seem to show greater performance for the virtualized machine; the<br />

most notable exception being MCF. This particular benchmark is noted as having been modified for increased<br />

memory cache performance. Two other benchmarks that show better performance on the raw hardware are<br />

MILC and CactusADM; these are also known as particularly memory-cache friendly applications. This leads<br />

us to believe that KVM hides the performance penalty associated with cache misses. As such, CPU cycles<br />

spent idle while an application replenishes its L2 cache are only added to the total runtime on hardware.<br />

Further investigation <strong>of</strong> the impact <strong>of</strong> the memory cache was out <strong>of</strong> the scope <strong>of</strong> this project. Other studies<br />

have shown the effects <strong>of</strong> memory performance in VMs[43]<br />

9.2.2 MILC<br />

As an example <strong>of</strong> a primarily compute-bound scientific application, the MIMD Lattice Computation[61] application<br />

was chosen as the second benchmark. MIMD Lattice Computation is a commonly used application<br />

for parallel simulation <strong>of</strong> four-dimensional lattice gauge theory. Two passes were run on each cluster, using<br />

one core per node and 8 cores per node, respectively.<br />

The results in Figure 9.11 show an almost six-fold increase in runtime for the one-core-per-node run, which<br />

is almost certainly a result <strong>of</strong> poor MPI performance due to high latency (quantified below in the Phloem<br />

SQMR discussion). As for the 256-core run, while the time-to-solution improves for the raw hardware cluster,<br />

it increases by a factor <strong>of</strong> five for the virtual cluster. Again, this is a result <strong>of</strong> poor MPI performance due to<br />

high latency, an exponential increase in the number <strong>of</strong> core-to-core messages being passed, and contention<br />

among processes for network resources. This is further exacerbated by the OpenStack network model which<br />

requires a single network service to mediate all internal traffic.<br />

60

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!