29.12.2014 Views

Magellan Final Report - Office of Science - U.S. Department of Energy

Magellan Final Report - Office of Science - U.S. Department of Energy

Magellan Final Report - Office of Science - U.S. Department of Energy

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>Magellan</strong> <strong>Final</strong> <strong>Report</strong><br />

data pre- and post-processing steps, such as visualization, are <strong>of</strong>ten performed on the scientist’s desktop.<br />

The increased scale <strong>of</strong> digital data due to low-cost sensors and other technologies has resulted in the need<br />

for these applications to scale [58]. These applications are <strong>of</strong>ten penalized by scheduling policies used<br />

at supercomputing centers. The requirements <strong>of</strong> such applications are similar to those <strong>of</strong> the Internet<br />

applications that currently dominate the cloud computing space, but with far greater data storage and<br />

throughput requirements. These workloads may also benefit from the MapReduce programming model by<br />

simplifying the programming and execution <strong>of</strong> this class <strong>of</strong> applications.<br />

4.2 Usage Scenarios<br />

It is critical to understand the needs <strong>of</strong> scientific applications and users and to analyze these requirements<br />

with respect to existing cloud computing platforms and solutions. In addition to the traditional DOE<br />

HPC center users, we identified three categories <strong>of</strong> scientific community users that might benefit from cloud<br />

computing resources at DOE HPC centers:<br />

4.2.1 On-Demand Customized Environments<br />

The Infrastructure as a Service (IaaS) facility commonly provided by commercial cloud computing addresses a<br />

key shortcoming <strong>of</strong> large-scale grid and HPC systems, that is, the relative lack <strong>of</strong> application portability. This<br />

issue is considered one <strong>of</strong> the major challenges <strong>of</strong> grid systems, since significant effort is required to deploy<br />

and maintain s<strong>of</strong>tware stacks across systems distributed across geographic locales as well as organizational<br />

boundaries [30]. A key design goal <strong>of</strong> these unified s<strong>of</strong>tware stacks is providing the best s<strong>of</strong>tware for the<br />

widest range <strong>of</strong> applications. Unfortunately, scientific applications frequently require specific versions <strong>of</strong><br />

infrastructure libraries; when these libraries aren’t available, applications may run poorly or not at all. For<br />

example, the Supernova Factory project is building tools to measure the expansion <strong>of</strong> the universe and Dark<br />

<strong>Energy</strong>. This project has a large number <strong>of</strong> custom modules [1]. The complexity <strong>of</strong> the pipeline makes<br />

it important to have specific library and OS versions which makes it difficult to take advantage <strong>of</strong> many<br />

large resources due to conflicts or incompatibilities. User-customized operating system images provided by<br />

application groups and tuned for a particular application help address this issue.<br />

4.2.2 Virtual Clusters<br />

Some scientific users prefer to run their own private clusters for a number <strong>of</strong> reasons. They <strong>of</strong>ten don’t need<br />

the concurrency levels achievable at supercomputing centers, but do require guaranteed access to resources<br />

for specific periods <strong>of</strong> time. They also <strong>of</strong>ten need a shared environment between collaborators, since setting<br />

up the s<strong>of</strong>tware environment under each user space can be complicated and time consuming. Clouds may<br />

be a viable platform to satisfy these needs.<br />

4.2.3 <strong>Science</strong> Gateways<br />

Users <strong>of</strong> well-defined computational workflows <strong>of</strong>ten prefer to have simple web-based interfaces to their<br />

application workflow and data archives. Web interfaces enable easier access to resources by non-experts,<br />

and enable wider availability <strong>of</strong> scientific data for communities <strong>of</strong> users in a common domain (e.g., virtual<br />

organizations). Cloud computing provides a number <strong>of</strong> technologies that might facilitate such a usage<br />

scenario.<br />

4.3 <strong>Magellan</strong> User Survey<br />

Cloud computing introduces a new usage or business model and additional new technologies that have<br />

previously not been applied at a large scale in scientific applications. The virtualization technology that<br />

16

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!