29.12.2014 Views

Magellan Final Report - Office of Science - U.S. Department of Energy

Magellan Final Report - Office of Science - U.S. Department of Energy

Magellan Final Report - Office of Science - U.S. Department of Energy

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>Magellan</strong> <strong>Final</strong> <strong>Report</strong><br />

1000<br />

Comparison <strong>of</strong> allocation delay for mgrast image on magellan<br />

Eucalyptus-1.6.2<br />

Eucalyptus-2.0<br />

OpenStack<br />

800<br />

600<br />

Delay (seconds)<br />

400<br />

200<br />

0<br />

0 50 100 150 200 250 300<br />

# <strong>of</strong> instances<br />

Figure 9.23: The allocation delays for the MGRAST image as the number <strong>of</strong> requested instances increased on<br />

the ALCF <strong>Magellan</strong> cloud. The delays are shown for the Eucalyptus (versions 1.6.2 and 2.0) and OpenStack<br />

provisioning. The MGRAST image was pre-cached across all nodes.<br />

100
<br />

25000
<br />

90
<br />

Azure
 EC2
Large
 EC2
Xlarge
<br />

80
<br />

EC2‐taskFarmer
<br />

20000
<br />

Time
(minutes)
<br />

70
<br />

60
<br />

50
<br />

40
<br />

Franklin‐taskFarmer
<br />

EC2‐Hadoop
<br />

Azure
<br />

Projected
Cost
(in
dollars)
<br />

15000
<br />

10000
<br />

30
<br />

5000
<br />

20
<br />

10
<br />

0
<br />

MIN
COST
<br />

MAX
COST
<br />

0
<br />

16
 32
 64
 128
<br />

Number
<strong>of</strong>
processors
<br />

Cost
Range
based
on
performance
<br />

(a) Performance comparison<br />

(b) Projected Cost for 12.5 million gene sequences<br />

Figure 9.24: Performance and cost analysis <strong>of</strong> running BLAST on a number <strong>of</strong> cloud platforms<br />

9.5.2 BLAST<br />

The Joint Genome Institute’s Integrated Microbial Genomes (IMG) pipeline is a motivating example <strong>of</strong><br />

a workload with growing needs for computing cycles due to the growth <strong>of</strong> sequence data expected from<br />

next-generation sequencers. One <strong>of</strong> the most frequently used algorithms in IMG, the Basic Local Alignment<br />

Search Tool (BLAST), finds regions <strong>of</strong> local similarity between sequences. We benchmarked the BLAST<br />

algorithm on HPC systems (Franklin at NERSC) and cloud platforms including Amazon EC2, Yahoo! M45<br />

and Windows Azure. We managed the execution <strong>of</strong> loosely coupled BLAST runs using a custom developed<br />

task farmer and Hadoop.<br />

For this evaluation, we ran 2500 sequences against a reference database <strong>of</strong> about 3 GB, and compared<br />

the performance data. Figure 9.24 shows the performance <strong>of</strong> the task farmer on Franklin and Amazon EC2<br />

76

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!