29.12.2014 Views

Magellan Final Report - Office of Science - U.S. Department of Energy

Magellan Final Report - Office of Science - U.S. Department of Energy

Magellan Final Report - Office of Science - U.S. Department of Energy

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>Magellan</strong> <strong>Final</strong> <strong>Report</strong><br />

data collected at Brookhaven National Laboratory’s Relativistic Heavy Ion Collider. Previously, STAR has<br />

demonstrated the use <strong>of</strong> cloud resources both on Amazon and at other local sites [28, 56]. STAR used<br />

<strong>Magellan</strong> resources to process near-real-time data from Brookhaven for the 2011 run data. The need for<br />

on-demand access to resources to process real-time data with a complex s<strong>of</strong>tware stack makes it useful to<br />

consider clouds as a platform for this application.<br />

The STAR pipeline transferred the input files from Brookhaven to a NERSC scratch directory. The<br />

files were then transferred using scp to the VMs; and after the processing was done, the output files were<br />

moved back to the NERSC scratch directory. The output files were finally moved to Brookhaven, where<br />

the produced files were catalogued and stored in mass storage. STAR calibration database snapshots were<br />

generated every two hours. Those snapshots were used to update database information in each <strong>of</strong> the running<br />

VMs once per day. Update times were randomized between VMs to optimize the snapshot availability.<br />

The STAR s<strong>of</strong>tware stack was initially deployed on NERSC/<strong>Magellan</strong>. After two months <strong>of</strong> successful<br />

running, the s<strong>of</strong>tware stack was expanded to a coherent cluster <strong>of</strong> over 100 VMs from three resource pools<br />

including NERSC Eucalyptus (up to 60 eight-core VMs), ALCF Nimbus cloud (up to 60 eight-core VMs),<br />

and ALCF OpenStack cloud (up to 30 eight-core VMs). Figure 11.3 shows the virtual machines used and<br />

corresponding aggregated load for a single run over three days.<br />

In summary, over a period <strong>of</strong> four months, about 18,000 files were processed, 70 TB <strong>of</strong> input data was<br />

moved to NERSC from Brookhaven, and about 40 TB <strong>of</strong> output data was moved back to Brookhaven. The<br />

number <strong>of</strong> simultaneous jobs over the period <strong>of</strong> time varied between 160 and 750. A total <strong>of</strong> about 25K CPU<br />

days or 70 CPU years was used for the processing.<br />

The STAR data processing is a good example <strong>of</strong> applications that can leverage a cloud computing testbed<br />

such as <strong>Magellan</strong>. The s<strong>of</strong>tware can be easily packaged as a virtual machine image, and the jobs within the<br />

STAR analysis also require little communication, thus having minimal impact from virtualization overheads.<br />

Using cloud resources enabled the data processing to proceed during the five months <strong>of</strong> experiments and to<br />

finish at nearly the same time.<br />

There were a number <strong>of</strong> lessons learned. The goal was to build a VM image that mirrored the STAR<br />

reconstruction and analysis workflow from an existing system at NERSC. Building such an image from<br />

scratch required extensive system administration skills and took weeks <strong>of</strong> effort. Additionally, several days<br />

effort was required when the image was ported to the ALCF Nimbus and OpenStack clouds. The image<br />

creator needs to be careful not to compromise the security <strong>of</strong> the VMs by leaving personal information<br />

like passwords or usernames in the images. Additionally, the image must be as complete as possible while<br />

limiting its size, since it resides in memory. Using cloud resources was not a turnkey operation and required<br />

significant design and development efforts. The team took advantage <strong>of</strong> on-demand resources with custom<br />

scripts to automate and prioritize the workflow.<br />

11.2.2 Genome Sequencing <strong>of</strong> Soil Samples<br />

<strong>Magellan</strong> resources at both ALCF and NERSC were used to perform genome sequencing <strong>of</strong> soil samples<br />

pulled from two plots at the Rothamsted Research Center in the UK. Rothamsted is one <strong>of</strong> the oldest<br />

soil research centers and has a controlled environment, <strong>of</strong>f limits to farming and any other disturbances by<br />

humans, reaching back 350 years. The specific aim <strong>of</strong> this project was to understand the impact <strong>of</strong> long-term<br />

plant influence (rhizosphere) on microbial community composition and function. Two distinct fields were<br />

selected to understand the differences in microbial populations associated with different land management<br />

practices.<br />

The demonstration used Argonne’s MG-RAST metagenomics analysis s<strong>of</strong>tware to gain a preliminary<br />

overview <strong>of</strong> the microbial populations <strong>of</strong> these two soil types (grassland and bare-fallow). MG-RAST is a<br />

large-scale metagenomics data analysis service. Metagenomics use shotgun sequencing methods to gather<br />

a random sampling <strong>of</strong> short sequences <strong>of</strong> DNA from community members in the source environment. MG-<br />

RAST provides a series <strong>of</strong> tools for analyzing the community structure (which organisms are present in<br />

an environment) and the proteins present in an environment, which describe discrete functional potentials.<br />

MG-RAST also provides unique statistical tools including novel sequence quality assessment and sample<br />

107

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!