29.12.2014 Views

Magellan Final Report - Office of Science - U.S. Department of Energy

Magellan Final Report - Office of Science - U.S. Department of Energy

Magellan Final Report - Office of Science - U.S. Department of Energy

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>Magellan</strong> <strong>Final</strong> <strong>Report</strong><br />

• The Argonne Laboratory Computing Resource Center (LCRC), working with <strong>Magellan</strong> project staff,<br />

was able to develop a secure method to extend the production HPC compute cluster into the ALCF<br />

<strong>Magellan</strong> testbed, allowing jobs to utilize the additional resources easily, while accessing the LCRC<br />

storage and running within the same computing environment.<br />

• The STAR experiment at the Relativistic Heavy Ion Collider (RHIC) used <strong>Magellan</strong> for real-time data<br />

processing. The goal <strong>of</strong> this particular analysis is to sift through collisions searching for the “missing<br />

spin.”<br />

• Biologists used the ALCF <strong>Magellan</strong> cloud to quickly analyze strains suspected in the E. coli outbreak<br />

in Europe in summer 2011.<br />

• <strong>Magellan</strong> was honored with the HPCwire Readers’ Choice Award for “Best Use <strong>of</strong> HPC in the Cloud”<br />

in 2010 and 2011.<br />

• <strong>Magellan</strong> benchmarking work won the “Best Paper Award” at both CloudCom 2010 and DataCloud<br />

2011.<br />

• <strong>Magellan</strong> played a critical role in demonstrating the capabilities <strong>of</strong> the 100 Gbps network deployed<br />

by the Advanced Networking Initiative (ANI). <strong>Magellan</strong> provided storage and compute resources that<br />

were used by many <strong>of</strong> the demonstrations performed during SC11. The demonstrations were typically<br />

able to achieve rates <strong>of</strong> 80-95 Gbps.<br />

1.4 Approach<br />

Our approach has been to evaluate various dimensions <strong>of</strong> cloud computing and associated technologies. We<br />

adopted an application-driven approach where we assessed the specific needs <strong>of</strong> scientific communities while<br />

making key decisions in the project.<br />

A distributed testbed infrastructure was deployed at the Argonne Leadership Computing Facility (ALCF)<br />

and the National <strong>Energy</strong> Research Scientific Computing Center (NERSC). The testbed consists <strong>of</strong> IBM<br />

iDataPlex servers and a mix <strong>of</strong> special servers, including Active Storage, Big Memory, and GPU servers<br />

connected through an Infiniband fabric. The testbed also has a mix <strong>of</strong> storage options, including both<br />

distributed and global disk storage, archival storage, and two classes <strong>of</strong> flash storage. The system provides<br />

both a high-bandwidth, low-latency quad-data rate InfiniBand network as well as a commodity Gigabit<br />

Ethernet network. This configuration is different from a typical cloud infrastructure but is more suitable for<br />

the needs <strong>of</strong> scientific applications. For example, InfiniBand may be unusual in a typical commercial cloud<br />

<strong>of</strong>fering; however, it allowed <strong>Magellan</strong> staff to investigate a range <strong>of</strong> performance points and measure the<br />

impact on application performance.<br />

The s<strong>of</strong>tware stack on the <strong>Magellan</strong> testbed was diverse and flexible to allow <strong>Magellan</strong> staff and users to<br />

explore a variety <strong>of</strong> models with the testbed. S<strong>of</strong>tware on the testbed included multiple private cloud s<strong>of</strong>tware<br />

solutions, a traditional batch queue with workload monitoring, and Hadoop. The s<strong>of</strong>tware stacks were<br />

evaluated for usability, ease <strong>of</strong> administration, ability to enforce DOE security policies, stability, performance,<br />

and reliability.<br />

We conducted a thorough benchmarking and workload analysis evaluating various aspects <strong>of</strong> cloud computing,<br />

including the communication protocols and I/O using micro as well as application benchmarks. We<br />

also provided a cost analysis to understand the cost effectiveness <strong>of</strong> clouds. The main goal <strong>of</strong> the project was<br />

to advance the understanding <strong>of</strong> how private cloud s<strong>of</strong>tware operates and identify its gaps and limitations.<br />

However, for the sake <strong>of</strong> completeness, we performed a comparison <strong>of</strong> the performance and cost against<br />

commercial services as well.<br />

We evaluated the use <strong>of</strong> MapReduce and Hadoop for various workloads in order to understand the<br />

applicability <strong>of</strong> the model for scientific pipelines as well as to understand the various performance trade<strong>of</strong>fs.<br />

We worked closely with scientific groups to identify the intricacies <strong>of</strong> application design and associated<br />

3

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!