29.12.2014 Views

Magellan Final Report - Office of Science - U.S. Department of Energy

Magellan Final Report - Office of Science - U.S. Department of Energy

Magellan Final Report - Office of Science - U.S. Department of Energy

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>Magellan</strong> <strong>Final</strong> <strong>Report</strong><br />

configurations. User expertise plays a role here, as it does on HPC systems; some users are comfortable<br />

building new configurations and designing system infrastructure while many others are not.<br />

This difference in capabilities demonstrates an important set <strong>of</strong> trade-<strong>of</strong>fs between cloud and HPC system<br />

support models; specific purpose built APIs (like those provided by HPC centers) can provide deep support<br />

for a relatively fixed set <strong>of</strong> services, while general purpose APIs (like those provided by IaaS cloud systems)<br />

can only provide high level support efficiently. The crux <strong>of</strong> the user support issue on cloud systems is to<br />

determine the level <strong>of</strong> support for various activities ranging from complete to partial support.<br />

7.2 <strong>Magellan</strong> User Support Model and Experience<br />

Both <strong>Magellan</strong> sites leveraged their extensive experience supporting diverse user communities to support<br />

the <strong>Magellan</strong> user groups. In addition, the Argonne <strong>Magellan</strong> user support model benefited from extensive<br />

experience supporting users <strong>of</strong> experimental hardware testbeds. These testbeds are similar to clouds in that<br />

users <strong>of</strong>ten need administrative access to resources. This experience enabled us to understand the trade<strong>of</strong>f<br />

between depth <strong>of</strong> support and range <strong>of</strong> user activities required for private cloud environments. Both sites<br />

also heavily leveraged existing systems to manage mailing lists, track tickets, and manage user accounts. Our<br />

user support model consisted <strong>of</strong> five major areas: monitoring system services, direct support <strong>of</strong> the high level<br />

tools and APIs provided by the system, construction and support <strong>of</strong> a set <strong>of</strong> baseline VM configurations,<br />

training sessions, and building a community support base for more complex problems. We will discuss each<br />

<strong>of</strong> these areas.<br />

Monitoring. Monitoring is a critical component for providing any user service; even if the services are being<br />

<strong>of</strong>fered as a test bed. While cloud APIs provide access to more generic functionality than the traditional<br />

HPC system s<strong>of</strong>tware stack, basic testing is quite straightforward. An initial step to monitor system operational<br />

status was to extend the existing monitoring infrastructure to cover the cloud systems. At ALCF, an<br />

existing test harness was extended in order to run system correctness tests on <strong>Magellan</strong> easily and <strong>of</strong>ten. The<br />

test harness ran a variety <strong>of</strong> tests, from allocating VM instances to running performance benchmarks. These<br />

tests confirmed that the system was performing reliably and consistently over time. A similar approach<br />

was used at NERSC, where tests were run on a routine basis to ensure that instances could be spawned,<br />

networking could be configured, and the storage system was functioning correctly. These tests enabled the<br />

centers to proactively address the most routine problems.<br />

Cloud setup support. The second component <strong>of</strong> the support model was helping users make use <strong>of</strong> the<br />

basic cloud services provided by <strong>Magellan</strong>. These activities included generating users credentials for access<br />

to the cloud services and providing documentation to address routine questions about how to setup simple<br />

VM instance deployments, storage, networks, and security policies. This approach provided enough help for<br />

users to get started, and is analogous in an HPC setting to ensuring that users could login to a system and<br />

submit jobs.<br />

Image support. Our initial assessment suggested that supporting more complex user activities would be<br />

prohibitively expensive, so we opted to take a different approach. Instead <strong>of</strong> direct support, we provided a<br />

number <strong>of</strong> pre-configured baseline configurations that users could build on, without needing to start from<br />

scratch. We provided a number <strong>of</strong> starting VM instance configurations that were verified to work, as well as<br />

several recipes for building common network configurations and storage setups. This approach was relatively<br />

effective; users could easily build on the basic images and documented examples. NERSC also provided tools<br />

to automate building virtual clusters with pre-configured services such as a batch system and an NFS file<br />

system.<br />

Documentation and Tutorials. Both sites also provided tutorials and online documentation to introduce<br />

users to cloud models and address the most common questions and issues. Tutorials were organized for both<br />

37

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!