29.12.2014 Views

Magellan Final Report - Office of Science - U.S. Department of Energy

Magellan Final Report - Office of Science - U.S. Department of Energy

Magellan Final Report - Office of Science - U.S. Department of Energy

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>Magellan</strong> <strong>Final</strong> <strong>Report</strong><br />

for optimizations necessary for particular hardware). In cloud systems, the end-user is responsible for<br />

managing all aspects <strong>of</strong> the operating system and other s<strong>of</strong>tware that might be required by the application.<br />

The user has to identify, manage, operate, and maintain the operating system and dependent libraries<br />

in addition to their specific application s<strong>of</strong>tware. This requires end-users to have system administration<br />

skills themselves, or additional support. Thus centers will need to provide tools and support for managing a<br />

diverse set <strong>of</strong> kernels and operating systems that might be required by specific groups. The clear separation<br />

<strong>of</strong> responsibilities for s<strong>of</strong>tware upgrades and operating system patches no longer exists, and sites will need<br />

mechanisms to bridge the gap between supporting user-supported images and site security policies.<br />

Tools exist to bundle a running operating system and upload it to the cloud system. However, some<br />

customization is typically required. Users need to have an understanding <strong>of</strong> standard Linux system administration,<br />

including managing ssh daemons, ntp, etc. Furthermore, debugging and testing can be tedious,<br />

since it <strong>of</strong>ten requires repacking and booting instances to verify the correct behavior. This also requires<br />

experimentation to determine what applications and data are best to include in the image or handle through<br />

some other mechanism. The simplicity <strong>of</strong> bundling everything in the image needs to be balanced with the<br />

need to make dynamic changes to applications and data. This process is complex and <strong>of</strong>ten requires users to<br />

carefully analyze what s<strong>of</strong>tware pieces will be required for their application, including libraries, utilities, and<br />

supporting datasets. If the application or supporting datasets are extremely large or change quickly, then<br />

the user stores the data outside <strong>of</strong> the image due to limits on image size and its impact on virtual machine<br />

boot-up times and memory that is available to the application at run-time. All our scientific users who had<br />

to create images ranked it as medium or hard.<br />

In order to help user groups try out cloud computing without the setup costs, we set up a virtual cluster<br />

with a PBS queue that users could submit to. This largely made the virtual environment look similar to<br />

the HPC environments. This environment allowed users to try out the virtual environment and compare it<br />

to bare-metal performance. The success <strong>of</strong> this approach can be seen from one <strong>of</strong> the users who remarked,<br />

“Everything works beautifully on the cluster! I deployed our s<strong>of</strong>tware on the gatekeeper and ran a job<br />

through PBS. I didn’t hit a single problem. From my perspective it makes no difference.”<br />

11.4.3 On-Demand Bare-Metal Provisioning<br />

As illustrated through the use cases presented in Section 11.1, on-demand bare-metal provisioning through<br />

serial queues, reservations, and HaaS can meet many <strong>of</strong> the needs <strong>of</strong> scientific users seeking the extra<br />

features <strong>of</strong> cloud environments such as custom environments. Bare-metal provisioning provides a number <strong>of</strong><br />

advantages, including access to high-performance parallel file systems, better performance for applications<br />

that cannot tolerate the virtualization overhead, and access to specialized hardware for which virtualization<br />

does not currently exist (e.g. GPUs).<br />

11.4.4 User Support<br />

It is important to understand that a large number <strong>of</strong> the users <strong>of</strong> <strong>Magellan</strong> had already used cloud resources,<br />

had significant experience with virtual environments and system administration, or had computer scientists<br />

or IT personnel on the project helping. In spite <strong>of</strong> previous experience, these users faced numerous challenges.<br />

Scientific users with less experience in cloud tools require additional user support and training to help them<br />

port and manage application in these environments; 96% <strong>of</strong> the users who responded to the survey (including<br />

computer science and IT personnel) said they needed help from the <strong>Magellan</strong> staff. Most <strong>Magellan</strong> users were<br />

able to overcome the challenges <strong>of</strong> using these environments and would consider using these environments<br />

again.<br />

11.4.5 Workflow Management<br />

Another challenge in using cloud systems is developing a mechanism to distribute work. This is complicated<br />

by the fact that cloud systems like Amazon Web Services are inherently ephemeral and subject to failure.<br />

115

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!