Magellan Final Report - Office of Science - U.S. Department of Energy
Magellan Final Report - Office of Science - U.S. Department of Energy
Magellan Final Report - Office of Science - U.S. Department of Energy
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
Chapter 1<br />
Overview<br />
Cloud computing has served the needs <strong>of</strong> enterprise web applications for the last few years. The term “cloud<br />
computing” has been used to refer to a number <strong>of</strong> different concepts (e.g., MapReduce, public clouds, private<br />
clouds, etc.), technologies (e.g., virtualization, Apache Hadoop), and service models (e.g., Infrastructureas-a-Service<br />
[IaaS], Platform-as-a-Service [PaaS], S<strong>of</strong>tware-as-a-Service[SaaS]). Clouds have been shown to<br />
provide a number <strong>of</strong> key benefits including cost savings, rapid elasticity, ease <strong>of</strong> use, and reliability. Cloud<br />
computing has been particularly successful with customers lacking significant IT infrastructure or customers<br />
who have quickly outgrown their existing capacity.<br />
The open-ended nature <strong>of</strong> scientific exploration and the increasing role <strong>of</strong> computing in performing science<br />
has resulted in a growing need for computing resources. There has been an increasing interest over the last few<br />
years in evaluating the use <strong>of</strong> cloud computing to address these demands. In addition, there are a number <strong>of</strong><br />
key features <strong>of</strong> cloud environments that are attractive to some scientific applications. For example, a number<br />
<strong>of</strong> scientific applications have specific s<strong>of</strong>tware requirements including OS version dependencies, compilers and<br />
libraries, and the users require the flexibility associated with custom s<strong>of</strong>tware environments that virtualized<br />
environments can provide. An example <strong>of</strong> this is the Supernova Factory, which relies on large data volumes<br />
for the supernova search and has a code base which consists <strong>of</strong> a large number <strong>of</strong> custom modules. The<br />
complexity <strong>of</strong> the pipeline necessitates having specific library and OS versions. Virtualized environments also<br />
promise to provide a portable container that will enable scientists to share an environment with collaborators.<br />
For example, the ATLAS experiment, a particle physics experiment at the Large Hadron Collider at CERN,<br />
is investigating the use <strong>of</strong> virtual machine images for distribution <strong>of</strong> all required s<strong>of</strong>tware [10]. Similarly, the<br />
MapReduce model holds promise for data-intensive applications. Thus, cloud computing models promise to<br />
be an avenue to address new categories <strong>of</strong> scientific applications, including data-intensive science applications,<br />
on-demand/surge computing, and applications that require customized s<strong>of</strong>tware environments. A number <strong>of</strong><br />
groups in the scientific community have investigated and tracked how the cloud s<strong>of</strong>tware and business model<br />
might impact the services <strong>of</strong>fered to the scientific community. However, there is a limited understanding<br />
<strong>of</strong> how to operate and use clouds, how to port scientific workflows, and how to determine the cost/benefit<br />
trade-<strong>of</strong>fs <strong>of</strong> clouds, etc. for scientific applications.<br />
The <strong>Magellan</strong> project was funded by the American Recovery and Reinvestment Act to investigate the<br />
applicability <strong>of</strong> cloud computing for the <strong>Department</strong> <strong>of</strong> <strong>Energy</strong>’s <strong>Office</strong> <strong>of</strong> <strong>Science</strong> (DOE SC) applications.<br />
<strong>Magellan</strong> is a joint project at the Argonne Leadership Computing Facility (ALCF) and the National <strong>Energy</strong><br />
Research Scientific Computing Center (NERSC). Over the last two years we have evaluated various dimensions<br />
<strong>of</strong> clouds—cloud models such as Infrastructure as a Service (IaaS) and Platform as a Service(PaaS),<br />
virtual s<strong>of</strong>tware stacks, MapReduce and its open source implementation (Hadoop). We evaluated these on<br />
various criteria including stability, manageability, and security from a resource provider perspective, and<br />
performance and usability from an end-user perspective.<br />
Cloud computing has similarities with other distributed computing models such as grid and utility computing.<br />
However, the use <strong>of</strong> virtualization technology, the MapReduce programming model, and tools such<br />
1