Magellan Final Report - Office of Science - U.S. Department of Energy
Magellan Final Report - Office of Science - U.S. Department of Energy
Magellan Final Report - Office of Science - U.S. Department of Energy
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
<strong>Magellan</strong> <strong>Final</strong> <strong>Report</strong><br />
specific QoS guarantees to avoid users needing to periodically query to see if resources have become available.<br />
Portability. The vision <strong>of</strong> grid computing has been to enable users to use resources from a diverse set <strong>of</strong><br />
sites by leveraging a common set <strong>of</strong> job submission and data management interfaces across all sites. However,<br />
experience revealed that there were challenges due to differing s<strong>of</strong>tware stacks, s<strong>of</strong>tware compatibility<br />
issues, and so on. In this regard, virtualization facilitates s<strong>of</strong>tware portability. Open source tools such as<br />
Eucalyptus and OpenStack enable transition from local sites to Amazon EC2, but site-specific capabilities<br />
and cloud interfaces in general are diverse, <strong>of</strong>ten making it difficult to easily span multiple sites. In addition,<br />
cloud-s<strong>of</strong>tware stack’s networking models and security constraints make the cost <strong>of</strong> data movement to and<br />
especially from the cloud relatively expensive, discouraging data portability. These costs are also an issue<br />
in situations where scientists would like to perform post-processing on their end-desktop or local clusters, or<br />
would like to share their output data with other colleagues. More detailed cost discussions are presented in<br />
Chapter 12.<br />
Performance. Traditional synchronous applications which rely on MPI perform poorly on virtual machines<br />
and have a huge performance overhead, while application codes with minimal or no synchronization,<br />
modest I/O requirements, with large messages or very little communication tend to perform well in cloud<br />
virtual machines. Traditional cloud computing platforms are designed for applications where there is little<br />
or no communication between individual nodes and where applications are less affected by failures <strong>of</strong> individual<br />
nodes. This assumption breaks down for large-scale synchronous applications (details in Chapter 9).<br />
Additionally, in the cloud model time-to-solution is a more critical metric than individual node performance.<br />
Scalability is achieved by throwing more nodes at the problem. However, in order for this approach<br />
to be successful for e-<strong>Science</strong>, it is important to understand the setup and configuration costs associated with<br />
porting a scientific application to the cloud - this must be factored into the overall time-to-solution metrics<br />
in resource selection and management decisions.<br />
Data Management. As discussed earlier, scientific applications have a number <strong>of</strong> data and storage needs.<br />
Synchronized applications need access to parallel file systems. There is also a need for long-term data storage.<br />
None <strong>of</strong> the current cloud storage methods resemble the high-performance parallel file systems that HPC<br />
applications typical require and currently have accessible to them at HPC centers. General storage solutions<br />
<strong>of</strong>fered by cloud-s<strong>of</strong>tware stacks include EBS volumes and bucket store, which at the time <strong>of</strong> this writing are<br />
inadequate for scientific needs, on both performance and convenience.<br />
6.5 Summary<br />
The open-source cloud s<strong>of</strong>tware stacks have significantly improved over the duration <strong>of</strong> the project. However,<br />
these s<strong>of</strong>tware stacks do not address many <strong>of</strong> the DOE security, accounting and allocation policy requirements<br />
for scientific workloads. Thus, they could still benefit from additional development and testing <strong>of</strong> system<br />
and user tools.<br />
35