29.12.2014 Views

Magellan Final Report - Office of Science - U.S. Department of Energy

Magellan Final Report - Office of Science - U.S. Department of Energy

Magellan Final Report - Office of Science - U.S. Department of Energy

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Chapter 1<br />

Overview<br />

Cloud computing has served the needs <strong>of</strong> enterprise web applications for the last few years. The term “cloud<br />

computing” has been used to refer to a number <strong>of</strong> different concepts (e.g., MapReduce, public clouds, private<br />

clouds, etc.), technologies (e.g., virtualization, Apache Hadoop), and service models (e.g., Infrastructureas-a-Service<br />

[IaaS], Platform-as-a-Service [PaaS], S<strong>of</strong>tware-as-a-Service[SaaS]). Clouds have been shown to<br />

provide a number <strong>of</strong> key benefits including cost savings, rapid elasticity, ease <strong>of</strong> use, and reliability. Cloud<br />

computing has been particularly successful with customers lacking significant IT infrastructure or customers<br />

who have quickly outgrown their existing capacity.<br />

The open-ended nature <strong>of</strong> scientific exploration and the increasing role <strong>of</strong> computing in performing science<br />

has resulted in a growing need for computing resources. There has been an increasing interest over the last few<br />

years in evaluating the use <strong>of</strong> cloud computing to address these demands. In addition, there are a number <strong>of</strong><br />

key features <strong>of</strong> cloud environments that are attractive to some scientific applications. For example, a number<br />

<strong>of</strong> scientific applications have specific s<strong>of</strong>tware requirements including OS version dependencies, compilers and<br />

libraries, and the users require the flexibility associated with custom s<strong>of</strong>tware environments that virtualized<br />

environments can provide. An example <strong>of</strong> this is the Supernova Factory, which relies on large data volumes<br />

for the supernova search and has a code base which consists <strong>of</strong> a large number <strong>of</strong> custom modules. The<br />

complexity <strong>of</strong> the pipeline necessitates having specific library and OS versions. Virtualized environments also<br />

promise to provide a portable container that will enable scientists to share an environment with collaborators.<br />

For example, the ATLAS experiment, a particle physics experiment at the Large Hadron Collider at CERN,<br />

is investigating the use <strong>of</strong> virtual machine images for distribution <strong>of</strong> all required s<strong>of</strong>tware [10]. Similarly, the<br />

MapReduce model holds promise for data-intensive applications. Thus, cloud computing models promise to<br />

be an avenue to address new categories <strong>of</strong> scientific applications, including data-intensive science applications,<br />

on-demand/surge computing, and applications that require customized s<strong>of</strong>tware environments. A number <strong>of</strong><br />

groups in the scientific community have investigated and tracked how the cloud s<strong>of</strong>tware and business model<br />

might impact the services <strong>of</strong>fered to the scientific community. However, there is a limited understanding<br />

<strong>of</strong> how to operate and use clouds, how to port scientific workflows, and how to determine the cost/benefit<br />

trade-<strong>of</strong>fs <strong>of</strong> clouds, etc. for scientific applications.<br />

The <strong>Magellan</strong> project was funded by the American Recovery and Reinvestment Act to investigate the<br />

applicability <strong>of</strong> cloud computing for the <strong>Department</strong> <strong>of</strong> <strong>Energy</strong>’s <strong>Office</strong> <strong>of</strong> <strong>Science</strong> (DOE SC) applications.<br />

<strong>Magellan</strong> is a joint project at the Argonne Leadership Computing Facility (ALCF) and the National <strong>Energy</strong><br />

Research Scientific Computing Center (NERSC). Over the last two years we have evaluated various dimensions<br />

<strong>of</strong> clouds—cloud models such as Infrastructure as a Service (IaaS) and Platform as a Service(PaaS),<br />

virtual s<strong>of</strong>tware stacks, MapReduce and its open source implementation (Hadoop). We evaluated these on<br />

various criteria including stability, manageability, and security from a resource provider perspective, and<br />

performance and usability from an end-user perspective.<br />

Cloud computing has similarities with other distributed computing models such as grid and utility computing.<br />

However, the use <strong>of</strong> virtualization technology, the MapReduce programming model, and tools such<br />

1

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!