Magellan Final Report - Office of Science - U.S. Department of Energy
Magellan Final Report - Office of Science - U.S. Department of Energy
Magellan Final Report - Office of Science - U.S. Department of Energy
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
<strong>Magellan</strong> <strong>Final</strong> <strong>Report</strong><br />
• The Argonne Laboratory Computing Resource Center (LCRC), working with <strong>Magellan</strong> project staff,<br />
was able to develop a secure method to extend the production HPC compute cluster into the ALCF<br />
<strong>Magellan</strong> testbed, allowing jobs to utilize the additional resources easily, while accessing the LCRC<br />
storage and running within the same computing environment.<br />
• The STAR experiment at the Relativistic Heavy Ion Collider (RHIC) used <strong>Magellan</strong> for real-time data<br />
processing. The goal <strong>of</strong> this particular analysis is to sift through collisions searching for the “missing<br />
spin.”<br />
• Biologists used the ALCF <strong>Magellan</strong> cloud to quickly analyze strains suspected in the E. coli outbreak<br />
in Europe in summer 2011.<br />
• <strong>Magellan</strong> was honored with the HPCwire Readers’ Choice Award for “Best Use <strong>of</strong> HPC in the Cloud”<br />
in 2010 and 2011.<br />
• <strong>Magellan</strong> benchmarking work won the “Best Paper Award” at both CloudCom 2010 and DataCloud<br />
2011.<br />
• <strong>Magellan</strong> played a critical role in demonstrating the capabilities <strong>of</strong> the 100 Gbps network deployed<br />
by the Advanced Networking Initiative (ANI). <strong>Magellan</strong> provided storage and compute resources that<br />
were used by many <strong>of</strong> the demonstrations performed during SC11. The demonstrations were typically<br />
able to achieve rates <strong>of</strong> 80-95 Gbps.<br />
1.4 Approach<br />
Our approach has been to evaluate various dimensions <strong>of</strong> cloud computing and associated technologies. We<br />
adopted an application-driven approach where we assessed the specific needs <strong>of</strong> scientific communities while<br />
making key decisions in the project.<br />
A distributed testbed infrastructure was deployed at the Argonne Leadership Computing Facility (ALCF)<br />
and the National <strong>Energy</strong> Research Scientific Computing Center (NERSC). The testbed consists <strong>of</strong> IBM<br />
iDataPlex servers and a mix <strong>of</strong> special servers, including Active Storage, Big Memory, and GPU servers<br />
connected through an Infiniband fabric. The testbed also has a mix <strong>of</strong> storage options, including both<br />
distributed and global disk storage, archival storage, and two classes <strong>of</strong> flash storage. The system provides<br />
both a high-bandwidth, low-latency quad-data rate InfiniBand network as well as a commodity Gigabit<br />
Ethernet network. This configuration is different from a typical cloud infrastructure but is more suitable for<br />
the needs <strong>of</strong> scientific applications. For example, InfiniBand may be unusual in a typical commercial cloud<br />
<strong>of</strong>fering; however, it allowed <strong>Magellan</strong> staff to investigate a range <strong>of</strong> performance points and measure the<br />
impact on application performance.<br />
The s<strong>of</strong>tware stack on the <strong>Magellan</strong> testbed was diverse and flexible to allow <strong>Magellan</strong> staff and users to<br />
explore a variety <strong>of</strong> models with the testbed. S<strong>of</strong>tware on the testbed included multiple private cloud s<strong>of</strong>tware<br />
solutions, a traditional batch queue with workload monitoring, and Hadoop. The s<strong>of</strong>tware stacks were<br />
evaluated for usability, ease <strong>of</strong> administration, ability to enforce DOE security policies, stability, performance,<br />
and reliability.<br />
We conducted a thorough benchmarking and workload analysis evaluating various aspects <strong>of</strong> cloud computing,<br />
including the communication protocols and I/O using micro as well as application benchmarks. We<br />
also provided a cost analysis to understand the cost effectiveness <strong>of</strong> clouds. The main goal <strong>of</strong> the project was<br />
to advance the understanding <strong>of</strong> how private cloud s<strong>of</strong>tware operates and identify its gaps and limitations.<br />
However, for the sake <strong>of</strong> completeness, we performed a comparison <strong>of</strong> the performance and cost against<br />
commercial services as well.<br />
We evaluated the use <strong>of</strong> MapReduce and Hadoop for various workloads in order to understand the<br />
applicability <strong>of</strong> the model for scientific pipelines as well as to understand the various performance trade<strong>of</strong>fs.<br />
We worked closely with scientific groups to identify the intricacies <strong>of</strong> application design and associated<br />
3