Magellan Final Report - Office of Science - U.S. Department of Energy
Magellan Final Report - Office of Science - U.S. Department of Energy
Magellan Final Report - Office of Science - U.S. Department of Energy
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
<strong>Magellan</strong> <strong>Final</strong> <strong>Report</strong><br />
configurations. User expertise plays a role here, as it does on HPC systems; some users are comfortable<br />
building new configurations and designing system infrastructure while many others are not.<br />
This difference in capabilities demonstrates an important set <strong>of</strong> trade-<strong>of</strong>fs between cloud and HPC system<br />
support models; specific purpose built APIs (like those provided by HPC centers) can provide deep support<br />
for a relatively fixed set <strong>of</strong> services, while general purpose APIs (like those provided by IaaS cloud systems)<br />
can only provide high level support efficiently. The crux <strong>of</strong> the user support issue on cloud systems is to<br />
determine the level <strong>of</strong> support for various activities ranging from complete to partial support.<br />
7.2 <strong>Magellan</strong> User Support Model and Experience<br />
Both <strong>Magellan</strong> sites leveraged their extensive experience supporting diverse user communities to support<br />
the <strong>Magellan</strong> user groups. In addition, the Argonne <strong>Magellan</strong> user support model benefited from extensive<br />
experience supporting users <strong>of</strong> experimental hardware testbeds. These testbeds are similar to clouds in that<br />
users <strong>of</strong>ten need administrative access to resources. This experience enabled us to understand the trade<strong>of</strong>f<br />
between depth <strong>of</strong> support and range <strong>of</strong> user activities required for private cloud environments. Both sites<br />
also heavily leveraged existing systems to manage mailing lists, track tickets, and manage user accounts. Our<br />
user support model consisted <strong>of</strong> five major areas: monitoring system services, direct support <strong>of</strong> the high level<br />
tools and APIs provided by the system, construction and support <strong>of</strong> a set <strong>of</strong> baseline VM configurations,<br />
training sessions, and building a community support base for more complex problems. We will discuss each<br />
<strong>of</strong> these areas.<br />
Monitoring. Monitoring is a critical component for providing any user service; even if the services are being<br />
<strong>of</strong>fered as a test bed. While cloud APIs provide access to more generic functionality than the traditional<br />
HPC system s<strong>of</strong>tware stack, basic testing is quite straightforward. An initial step to monitor system operational<br />
status was to extend the existing monitoring infrastructure to cover the cloud systems. At ALCF, an<br />
existing test harness was extended in order to run system correctness tests on <strong>Magellan</strong> easily and <strong>of</strong>ten. The<br />
test harness ran a variety <strong>of</strong> tests, from allocating VM instances to running performance benchmarks. These<br />
tests confirmed that the system was performing reliably and consistently over time. A similar approach<br />
was used at NERSC, where tests were run on a routine basis to ensure that instances could be spawned,<br />
networking could be configured, and the storage system was functioning correctly. These tests enabled the<br />
centers to proactively address the most routine problems.<br />
Cloud setup support. The second component <strong>of</strong> the support model was helping users make use <strong>of</strong> the<br />
basic cloud services provided by <strong>Magellan</strong>. These activities included generating users credentials for access<br />
to the cloud services and providing documentation to address routine questions about how to setup simple<br />
VM instance deployments, storage, networks, and security policies. This approach provided enough help for<br />
users to get started, and is analogous in an HPC setting to ensuring that users could login to a system and<br />
submit jobs.<br />
Image support. Our initial assessment suggested that supporting more complex user activities would be<br />
prohibitively expensive, so we opted to take a different approach. Instead <strong>of</strong> direct support, we provided a<br />
number <strong>of</strong> pre-configured baseline configurations that users could build on, without needing to start from<br />
scratch. We provided a number <strong>of</strong> starting VM instance configurations that were verified to work, as well as<br />
several recipes for building common network configurations and storage setups. This approach was relatively<br />
effective; users could easily build on the basic images and documented examples. NERSC also provided tools<br />
to automate building virtual clusters with pre-configured services such as a batch system and an NFS file<br />
system.<br />
Documentation and Tutorials. Both sites also provided tutorials and online documentation to introduce<br />
users to cloud models and address the most common questions and issues. Tutorials were organized for both<br />
37