Magellan Final Report - Office of Science - U.S. Department of Energy
Magellan Final Report - Office of Science - U.S. Department of Energy
Magellan Final Report - Office of Science - U.S. Department of Energy
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
Chapter 7<br />
User Support<br />
Many <strong>of</strong> the aspects <strong>of</strong> cloud computing that make it so powerful also introduce new complexities and<br />
challenges for both users and user support staff. Cloud computing provides users the flexibility to customize<br />
their s<strong>of</strong>tware stack, but it comes with the additional burden <strong>of</strong> managing the stack. Commercial cloud<br />
providers have a limited user support model and typically additional support comes at an extra cost. This<br />
chapter describes the user support model that was used for the <strong>Magellan</strong> project, including some <strong>of</strong> the<br />
challenges that emerged during the course <strong>of</strong> the project. We discuss the key aspects <strong>of</strong> cloud computing<br />
architecture that have bearing on user support. We discuss several examples <strong>of</strong> usage patterns <strong>of</strong> users and<br />
how these were addressed. <strong>Final</strong>ly, we summarize the overall assessment <strong>of</strong> the user support process for<br />
mid-range computing users on cloud platforms.<br />
7.1 Comparison <strong>of</strong> User Support Models<br />
HPC centers provide a well-curated environment for robust, high-performance computing, which they make<br />
accessible to non-expert users through a variety <strong>of</strong> activities. In these environments, substantial effort is put<br />
into helping users to be productive and successful on the hosted platform. These efforts take a number <strong>of</strong><br />
forms, from building a tuned s<strong>of</strong>tware environment that is optimized for HPC workloads, to user education,<br />
and application porting and optimization. These efforts are important to the success <strong>of</strong> current and new<br />
users in HPC facilities, as many computational scientists are not necessarily deeply knowledgeable in terms<br />
<strong>of</strong> the details <strong>of</strong> modern computing hardware and s<strong>of</strong>tware architecture.<br />
HPC centers typically provide a single system s<strong>of</strong>tware stack, paired with purpose built hardware, and a<br />
set <strong>of</strong> policies for user access and prioritization. Users rely on a relatively fixed set <strong>of</strong> interfaces for interaction<br />
with the resource manager, file system, and other facility services. Many HPC use cases are well covered<br />
within this scope; for example, this environment is adapted for MPI applications that perform I/O to a<br />
parallel file system. Other use cases such as high-throughput computing and data-intensive computing, may<br />
not be so well supported at HPC centers. For example, computer scientists developing low level runtime<br />
s<strong>of</strong>tware for HPC applications have a particularly difficult time performing this work at production computing<br />
centers. Also, deploying Hadoop on demand for computations, could be performed within the framework <strong>of</strong><br />
a traditional HPC system, albeit with significant effort and in a less optimized fashion.<br />
Cloud systems provide Application Programming Interfaces (API) for low level resource provisioning.<br />
These APIs enable users to provision new virtual machines, storage, and network resources. These resources<br />
are configured by the user and can be built into complex networks including dozens, hundreds, or potentially<br />
even thousands <strong>of</strong> VMs with distinct s<strong>of</strong>tware configurations, security policies, and service architectures. The<br />
flexibility <strong>of</strong> the capabilities provided by cloud APIs is substantial, allowing users to manage clusters built<br />
out <strong>of</strong> virtual machines hosted inside <strong>of</strong> a cloud. This power comes at some cost in terms <strong>of</strong> support. The<br />
cloud model <strong>of</strong>fers a large amount <strong>of</strong> flexibility, making it difficult and expensive to provide support to cloud<br />
users. The opportunities for errors or mistakes greatly increase once a user begins to modify virtual machine<br />
36