Magellan Final Report - Office of Science - U.S. Department of Energy
Magellan Final Report - Office of Science - U.S. Department of Energy
Magellan Final Report - Office of Science - U.S. Department of Energy
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
<strong>Magellan</strong> <strong>Final</strong> <strong>Report</strong><br />
data pre- and post-processing steps, such as visualization, are <strong>of</strong>ten performed on the scientist’s desktop.<br />
The increased scale <strong>of</strong> digital data due to low-cost sensors and other technologies has resulted in the need<br />
for these applications to scale [58]. These applications are <strong>of</strong>ten penalized by scheduling policies used<br />
at supercomputing centers. The requirements <strong>of</strong> such applications are similar to those <strong>of</strong> the Internet<br />
applications that currently dominate the cloud computing space, but with far greater data storage and<br />
throughput requirements. These workloads may also benefit from the MapReduce programming model by<br />
simplifying the programming and execution <strong>of</strong> this class <strong>of</strong> applications.<br />
4.2 Usage Scenarios<br />
It is critical to understand the needs <strong>of</strong> scientific applications and users and to analyze these requirements<br />
with respect to existing cloud computing platforms and solutions. In addition to the traditional DOE<br />
HPC center users, we identified three categories <strong>of</strong> scientific community users that might benefit from cloud<br />
computing resources at DOE HPC centers:<br />
4.2.1 On-Demand Customized Environments<br />
The Infrastructure as a Service (IaaS) facility commonly provided by commercial cloud computing addresses a<br />
key shortcoming <strong>of</strong> large-scale grid and HPC systems, that is, the relative lack <strong>of</strong> application portability. This<br />
issue is considered one <strong>of</strong> the major challenges <strong>of</strong> grid systems, since significant effort is required to deploy<br />
and maintain s<strong>of</strong>tware stacks across systems distributed across geographic locales as well as organizational<br />
boundaries [30]. A key design goal <strong>of</strong> these unified s<strong>of</strong>tware stacks is providing the best s<strong>of</strong>tware for the<br />
widest range <strong>of</strong> applications. Unfortunately, scientific applications frequently require specific versions <strong>of</strong><br />
infrastructure libraries; when these libraries aren’t available, applications may run poorly or not at all. For<br />
example, the Supernova Factory project is building tools to measure the expansion <strong>of</strong> the universe and Dark<br />
<strong>Energy</strong>. This project has a large number <strong>of</strong> custom modules [1]. The complexity <strong>of</strong> the pipeline makes<br />
it important to have specific library and OS versions which makes it difficult to take advantage <strong>of</strong> many<br />
large resources due to conflicts or incompatibilities. User-customized operating system images provided by<br />
application groups and tuned for a particular application help address this issue.<br />
4.2.2 Virtual Clusters<br />
Some scientific users prefer to run their own private clusters for a number <strong>of</strong> reasons. They <strong>of</strong>ten don’t need<br />
the concurrency levels achievable at supercomputing centers, but do require guaranteed access to resources<br />
for specific periods <strong>of</strong> time. They also <strong>of</strong>ten need a shared environment between collaborators, since setting<br />
up the s<strong>of</strong>tware environment under each user space can be complicated and time consuming. Clouds may<br />
be a viable platform to satisfy these needs.<br />
4.2.3 <strong>Science</strong> Gateways<br />
Users <strong>of</strong> well-defined computational workflows <strong>of</strong>ten prefer to have simple web-based interfaces to their<br />
application workflow and data archives. Web interfaces enable easier access to resources by non-experts,<br />
and enable wider availability <strong>of</strong> scientific data for communities <strong>of</strong> users in a common domain (e.g., virtual<br />
organizations). Cloud computing provides a number <strong>of</strong> technologies that might facilitate such a usage<br />
scenario.<br />
4.3 <strong>Magellan</strong> User Survey<br />
Cloud computing introduces a new usage or business model and additional new technologies that have<br />
previously not been applied at a large scale in scientific applications. The virtualization technology that<br />
16