Magellan Final Report - Office of Science - U.S. Department of Energy
Magellan Final Report - Office of Science - U.S. Department of Energy
Magellan Final Report - Office of Science - U.S. Department of Energy
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
<strong>Magellan</strong> <strong>Final</strong> <strong>Report</strong><br />
1000<br />
Comparison <strong>of</strong> allocation delay for mgrast image on magellan<br />
Eucalyptus-1.6.2<br />
Eucalyptus-2.0<br />
OpenStack<br />
800<br />
600<br />
Delay (seconds)<br />
400<br />
200<br />
0<br />
0 50 100 150 200 250 300<br />
# <strong>of</strong> instances<br />
Figure 9.23: The allocation delays for the MGRAST image as the number <strong>of</strong> requested instances increased on<br />
the ALCF <strong>Magellan</strong> cloud. The delays are shown for the Eucalyptus (versions 1.6.2 and 2.0) and OpenStack<br />
provisioning. The MGRAST image was pre-cached across all nodes.<br />
100 <br />
25000 <br />
90 <br />
Azure EC2 Large EC2 Xlarge <br />
80 <br />
EC2‐taskFarmer <br />
20000 <br />
Time (minutes) <br />
70 <br />
60 <br />
50 <br />
40 <br />
Franklin‐taskFarmer <br />
EC2‐Hadoop <br />
Azure <br />
Projected Cost (in dollars) <br />
15000 <br />
10000 <br />
30 <br />
5000 <br />
20 <br />
10 <br />
0 <br />
MIN COST <br />
MAX COST <br />
0 <br />
16 32 64 128 <br />
Number <strong>of</strong> processors <br />
Cost Range based on performance <br />
(a) Performance comparison<br />
(b) Projected Cost for 12.5 million gene sequences<br />
Figure 9.24: Performance and cost analysis <strong>of</strong> running BLAST on a number <strong>of</strong> cloud platforms<br />
9.5.2 BLAST<br />
The Joint Genome Institute’s Integrated Microbial Genomes (IMG) pipeline is a motivating example <strong>of</strong><br />
a workload with growing needs for computing cycles due to the growth <strong>of</strong> sequence data expected from<br />
next-generation sequencers. One <strong>of</strong> the most frequently used algorithms in IMG, the Basic Local Alignment<br />
Search Tool (BLAST), finds regions <strong>of</strong> local similarity between sequences. We benchmarked the BLAST<br />
algorithm on HPC systems (Franklin at NERSC) and cloud platforms including Amazon EC2, Yahoo! M45<br />
and Windows Azure. We managed the execution <strong>of</strong> loosely coupled BLAST runs using a custom developed<br />
task farmer and Hadoop.<br />
For this evaluation, we ran 2500 sequences against a reference database <strong>of</strong> about 3 GB, and compared<br />
the performance data. Figure 9.24 shows the performance <strong>of</strong> the task farmer on Franklin and Amazon EC2<br />
76