Magellan Final Report - Office of Science - U.S. Department of Energy
Magellan Final Report - Office of Science - U.S. Department of Energy
Magellan Final Report - Office of Science - U.S. Department of Energy
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
<strong>Magellan</strong> <strong>Final</strong> <strong>Report</strong><br />
MB/s<br />
Virident tachIOn (400GB)<br />
Fusion IO ioDrive Duo (Single Slot, 160GB)<br />
OCZ Colossus (250 GB)<br />
1200<br />
1000<br />
800<br />
600<br />
400<br />
200<br />
TMS RamSan 20 (450GB)<br />
Intel X-25M (160GB)<br />
Percentage <strong>of</strong> Peak Write Bandwidth<br />
90%<br />
80%<br />
70%<br />
60%<br />
50%<br />
40%<br />
30%<br />
20%<br />
10%<br />
30% Capacity 50% Capacity 70% Capacity 90% Capacity<br />
0<br />
0 10 20 30 40 50<br />
Minutes<br />
0%<br />
Virident tachIOn<br />
(400GB)<br />
TMS RamSan 20<br />
(450GB)<br />
Fusion IO ioDrive<br />
Duo (Single Slot,<br />
160GB)<br />
Intel X-25M<br />
(160GB)<br />
OCZ Colossus (250<br />
GB)<br />
(a) Transient Random-Write Bandwidth Degradation (90% Capacity)<br />
(b) Steady-State Random-Write Bandwidth Degradation<br />
Figure 9.22: Graphs illustrating the degradation in the IO bandwidth to various flash devices under a<br />
sustained random write workload. The graph on the left shows the transient behavior while the graph on<br />
the right compares the steady-state performance <strong>of</strong> the devices while varying the utilized capacity.<br />
9.5 Applications<br />
We worked closely with our users to help them port their applications to cloud environments, to compare<br />
application performance with existing environments (where applicable), and to compare and contrast performance<br />
with other applications. In this section, we outline select benchmarking results from our applications.<br />
9.5.1 SPRUCE<br />
As mentioned in Chapter 3, the <strong>Magellan</strong> staff collaborated with the SPRUCE team to understand the<br />
implications <strong>of</strong> using cloud resources for urgent computing. In this context, some benchmarking was performed<br />
to measure the allocation delay (i.e., the amount <strong>of</strong> time between a request for some number <strong>of</strong><br />
instances and the time when all requested instances are available) as the size <strong>of</strong> the request increases. These<br />
benchmarking experiments were conducted on three separate cloud s<strong>of</strong>tware stacks on the ALCF <strong>Magellan</strong><br />
hardware: Eucalyptus 1.6.2, Eucalyptus 2.0 and OpenStack. The results <strong>of</strong> these benchmarks revealed some<br />
unexpected performance behaviors. First, Eucalyptus version 1.6.2 <strong>of</strong>fered very poor performance. The<br />
allocation delay linearly increased as the size <strong>of</strong> the request increased. This was unexpected given that in<br />
the benchmark experiments, the image being created was pre-cached across all the nodes <strong>of</strong> the cloud, so<br />
one would expect that the allocation delays would have been much more stable, given that the vast majority<br />
<strong>of</strong> the work is being done by the nodes and not the centralized cloud components. Also, as the number<br />
<strong>of</strong> requested instances increased, the stability <strong>of</strong> the cloud decreased. Instances were more likely to fail to<br />
reach a running state and the cloud also required a resting period in between trials in order to recover. For<br />
example, for the 128-instance trials, the cloud needed to rest for 150 minutes in between trials, or else all <strong>of</strong><br />
the instances would fail to start. In Eucalyptus version 2.0, the performance and stability issues appeared<br />
to be resolved. The allocation delays were much flatter (as one would expect), and the resting periods were<br />
no longer required. OpenStack, in comparison, <strong>of</strong>fered shorter allocation delays, the result <strong>of</strong> an allocation<br />
process which utilized copy-on-write and sparse files for the disk images. As the images were pre-cached<br />
on the nodes, this resulted in significantly shorter allocation delays, particularly for smaller request sizes.<br />
However, the plot <strong>of</strong> the allocation delay was again not as flat as might be expected (see Figure 9.23).<br />
75