Magellan Final Report - Office of Science - U.S. Department of Energy
Magellan Final Report - Office of Science - U.S. Department of Energy
Magellan Final Report - Office of Science - U.S. Department of Energy
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
<strong>Magellan</strong> <strong>Final</strong> <strong>Report</strong><br />
for optimizations necessary for particular hardware). In cloud systems, the end-user is responsible for<br />
managing all aspects <strong>of</strong> the operating system and other s<strong>of</strong>tware that might be required by the application.<br />
The user has to identify, manage, operate, and maintain the operating system and dependent libraries<br />
in addition to their specific application s<strong>of</strong>tware. This requires end-users to have system administration<br />
skills themselves, or additional support. Thus centers will need to provide tools and support for managing a<br />
diverse set <strong>of</strong> kernels and operating systems that might be required by specific groups. The clear separation<br />
<strong>of</strong> responsibilities for s<strong>of</strong>tware upgrades and operating system patches no longer exists, and sites will need<br />
mechanisms to bridge the gap between supporting user-supported images and site security policies.<br />
Tools exist to bundle a running operating system and upload it to the cloud system. However, some<br />
customization is typically required. Users need to have an understanding <strong>of</strong> standard Linux system administration,<br />
including managing ssh daemons, ntp, etc. Furthermore, debugging and testing can be tedious,<br />
since it <strong>of</strong>ten requires repacking and booting instances to verify the correct behavior. This also requires<br />
experimentation to determine what applications and data are best to include in the image or handle through<br />
some other mechanism. The simplicity <strong>of</strong> bundling everything in the image needs to be balanced with the<br />
need to make dynamic changes to applications and data. This process is complex and <strong>of</strong>ten requires users to<br />
carefully analyze what s<strong>of</strong>tware pieces will be required for their application, including libraries, utilities, and<br />
supporting datasets. If the application or supporting datasets are extremely large or change quickly, then<br />
the user stores the data outside <strong>of</strong> the image due to limits on image size and its impact on virtual machine<br />
boot-up times and memory that is available to the application at run-time. All our scientific users who had<br />
to create images ranked it as medium or hard.<br />
In order to help user groups try out cloud computing without the setup costs, we set up a virtual cluster<br />
with a PBS queue that users could submit to. This largely made the virtual environment look similar to<br />
the HPC environments. This environment allowed users to try out the virtual environment and compare it<br />
to bare-metal performance. The success <strong>of</strong> this approach can be seen from one <strong>of</strong> the users who remarked,<br />
“Everything works beautifully on the cluster! I deployed our s<strong>of</strong>tware on the gatekeeper and ran a job<br />
through PBS. I didn’t hit a single problem. From my perspective it makes no difference.”<br />
11.4.3 On-Demand Bare-Metal Provisioning<br />
As illustrated through the use cases presented in Section 11.1, on-demand bare-metal provisioning through<br />
serial queues, reservations, and HaaS can meet many <strong>of</strong> the needs <strong>of</strong> scientific users seeking the extra<br />
features <strong>of</strong> cloud environments such as custom environments. Bare-metal provisioning provides a number <strong>of</strong><br />
advantages, including access to high-performance parallel file systems, better performance for applications<br />
that cannot tolerate the virtualization overhead, and access to specialized hardware for which virtualization<br />
does not currently exist (e.g. GPUs).<br />
11.4.4 User Support<br />
It is important to understand that a large number <strong>of</strong> the users <strong>of</strong> <strong>Magellan</strong> had already used cloud resources,<br />
had significant experience with virtual environments and system administration, or had computer scientists<br />
or IT personnel on the project helping. In spite <strong>of</strong> previous experience, these users faced numerous challenges.<br />
Scientific users with less experience in cloud tools require additional user support and training to help them<br />
port and manage application in these environments; 96% <strong>of</strong> the users who responded to the survey (including<br />
computer science and IT personnel) said they needed help from the <strong>Magellan</strong> staff. Most <strong>Magellan</strong> users were<br />
able to overcome the challenges <strong>of</strong> using these environments and would consider using these environments<br />
again.<br />
11.4.5 Workflow Management<br />
Another challenge in using cloud systems is developing a mechanism to distribute work. This is complicated<br />
by the fact that cloud systems like Amazon Web Services are inherently ephemeral and subject to failure.<br />
115