29.12.2014 Views

Magellan Final Report - Office of Science - U.S. Department of Energy

Magellan Final Report - Office of Science - U.S. Department of Energy

Magellan Final Report - Office of Science - U.S. Department of Energy

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>Magellan</strong> <strong>Final</strong> <strong>Report</strong><br />

specific QoS guarantees to avoid users needing to periodically query to see if resources have become available.<br />

Portability. The vision <strong>of</strong> grid computing has been to enable users to use resources from a diverse set <strong>of</strong><br />

sites by leveraging a common set <strong>of</strong> job submission and data management interfaces across all sites. However,<br />

experience revealed that there were challenges due to differing s<strong>of</strong>tware stacks, s<strong>of</strong>tware compatibility<br />

issues, and so on. In this regard, virtualization facilitates s<strong>of</strong>tware portability. Open source tools such as<br />

Eucalyptus and OpenStack enable transition from local sites to Amazon EC2, but site-specific capabilities<br />

and cloud interfaces in general are diverse, <strong>of</strong>ten making it difficult to easily span multiple sites. In addition,<br />

cloud-s<strong>of</strong>tware stack’s networking models and security constraints make the cost <strong>of</strong> data movement to and<br />

especially from the cloud relatively expensive, discouraging data portability. These costs are also an issue<br />

in situations where scientists would like to perform post-processing on their end-desktop or local clusters, or<br />

would like to share their output data with other colleagues. More detailed cost discussions are presented in<br />

Chapter 12.<br />

Performance. Traditional synchronous applications which rely on MPI perform poorly on virtual machines<br />

and have a huge performance overhead, while application codes with minimal or no synchronization,<br />

modest I/O requirements, with large messages or very little communication tend to perform well in cloud<br />

virtual machines. Traditional cloud computing platforms are designed for applications where there is little<br />

or no communication between individual nodes and where applications are less affected by failures <strong>of</strong> individual<br />

nodes. This assumption breaks down for large-scale synchronous applications (details in Chapter 9).<br />

Additionally, in the cloud model time-to-solution is a more critical metric than individual node performance.<br />

Scalability is achieved by throwing more nodes at the problem. However, in order for this approach<br />

to be successful for e-<strong>Science</strong>, it is important to understand the setup and configuration costs associated with<br />

porting a scientific application to the cloud - this must be factored into the overall time-to-solution metrics<br />

in resource selection and management decisions.<br />

Data Management. As discussed earlier, scientific applications have a number <strong>of</strong> data and storage needs.<br />

Synchronized applications need access to parallel file systems. There is also a need for long-term data storage.<br />

None <strong>of</strong> the current cloud storage methods resemble the high-performance parallel file systems that HPC<br />

applications typical require and currently have accessible to them at HPC centers. General storage solutions<br />

<strong>of</strong>fered by cloud-s<strong>of</strong>tware stacks include EBS volumes and bucket store, which at the time <strong>of</strong> this writing are<br />

inadequate for scientific needs, on both performance and convenience.<br />

6.5 Summary<br />

The open-source cloud s<strong>of</strong>tware stacks have significantly improved over the duration <strong>of</strong> the project. However,<br />

these s<strong>of</strong>tware stacks do not address many <strong>of</strong> the DOE security, accounting and allocation policy requirements<br />

for scientific workloads. Thus, they could still benefit from additional development and testing <strong>of</strong> system<br />

and user tools.<br />

35

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!