Magellan Final Report - Office of Science - U.S. Department of Energy

More documents

Recommendations

Info

Chapter 8 Security Security personnel at both Argonne and NERSC worked alongside the systems engineers, support staff, users, and others during this project. The goals for the security work on Magellan included assessing the system and designing security controls to protect the cloud testbed. In addition, the security teams were expected to view the testbed through the lens of DOE security requirements and report on whether a cloud system can meet those requirements. In this section, we discuss security work done on the Magellan system as well as assessments of how DOE standards apply to the cloud testbed. It is important to keep in mind that these discussions are based on the Magellan cloud testbed and they would not necessarily extend to public clouds systems such as Amazon EC2. Assessment of Amazon EC2 and other public cloud offerings is outside the scope of this project. The two cloud testbeds had similar architectures but each, of course, had features unique to each site. Argonne utilized both Eucalyptus and OpenStack cloud stacks on their system during the project. NERSC used Eucalyptus for their testbed. The basic configuration of both Eucalyptus and OpenStack, included a server that provided user-accessible web services, a server that handled network connectivity with the virtualized systems, a server that handled storage, and the compute nodes that ran the virtualized system images. Each of these services can be installed on a single system or distributed. In the case of Eucalyptus, the three core services were installed on a single system. ALCF also experimented with distributing the services over a number of systems with OpenStack. Both systems provided a similar architecture and common aggregation points for user interaction and network activity generated by the virtualized systems. 8.1 Experiences on Deployed Security There are limited service provider security tools available today. Thus, we faced a number of challenges in identifying how to perform the usual work of observation and analysis. As with any new resource, we compared what (if anything) differentiated cloud computing from any other large multiuser system. In particular, we evaluated how the cloud model is different from the shell access/job submission model that we currently provide across our production systems From this, we were able to define problems we had to solve and identify how to reconfigure the basic system configuration or build tools for solving them. Most of this work is an early prototype, just the initial steps to a true production quality set of tools and procedures which would work in conjunction with the more traditional set of tools and procedures used in HPC. Our work could be thought of as a special case of the more general problem of HPC security—i.e., continuum of risk space from nominal user interaction to user shell, root shell, and finally whole system image. The set of defenses we implemented can be roughly broken out into two groups—static and dynamic. Examples of static defenses are account management, port scanning and firewalls. Dynamic defenses interact with the cloud instance, interpreting actions and recording behaviors based on well defined local security policy. 40
Magellan Final Report Firewalls are a natural tool to protect sensitive infrastructure components from intrusion from unwanted address space. One of the first things done was to isolate the cloud, cluster, and node controller instances from the address space used by the virtual instances to protect infrastructure from everything except a small address space. It is also necessary to prevent spoofing of infrastructure address space from the set of user initiated images, since the VM instances can create arbitrarily addressed network traffic. Another simple test is to scan public facing address space for unexpected ssh ports as well as exceptionally susceptible accounts (such as password-less root accounts). Mandating a full system scan before releasing a VM for public access was discussed and a method was proposed, but this method was not implemented on the Magellan deployment. Taking a closer look at user activities—both in terms of site security policy as well as machine readable logging—was another step in making the cloud systems more equivalent to our current batch systems. Enforcing local computer security policy on dynamic user-run virtual machine systems (such as Eucalyptus or EC2) is not something that traditional intrusion detection systems excel at, since they tend to be designed with fairly rigid notions of the mappings between systems, users, and addresses. To resolve this problem, we used data from two locations. First we used the EC2 branch of the Python Boto package (via euca2ools) to query the Cloud Controller for information about registered instances, IP addresses, users, and instance groups firewall rules. This provided a clean interface and data source for actively querying the Cloud Controller. The second source takes advantage of the Eucalyptus network architecture since all ingress/egress as well as intra-instance network traffic passes through a choke point on the cloud controller. The data for both cases was then processed by an instance of the Bro intrusion detection system. As already suggested, this provides two principal benefits. The first is the ability to express local site security policy by configuring a set of scripts we created for this purpose. The data from Bro helps track successful and failed logins, VM instantiation, inter-cluster scan detection, and rules about system firewall configurations. Once the local policy is defined, activities which violate that policy can be quickly identified and acted on. In addition, all activity was logged in a well defined format. 8.2 Challenges Meeting Assessment and Authorization Standards While the technical aspects of computer security tend to gather the most attention, significant work is required to ensure that any large scale technology can be placed in the context of the current language found in Security Controls and Security Assessment and Authorization (A&A). Since any detailed interpretation of these documents is far outside the scope of this project, we will only discuss a few key points. Current best documentation involving cloud resources and security guidance is the FedRAMP [27] document, which covers three main areas: • List of baseline security controls for low and moderate impact cloud systems. NIST Special Publication 800-53R3 provides the foundation for the development of these security controls. • Processes in which authorized cloud computing systems will be monitored continuously. This draft defines continuous monitoring deliverables, reporting frequency, and responsibility for cloud service provider compliance with the Federal Information Security Management Act. • Proposed operational approaches for assessment and authorization for cloud computing systems that reflect on all aspects of an authorization, including sponsorship, leveraging, maintenance, and continuous monitoring, a joint authorization process, and roles and responsibilities for federal agencies and cloud service providers. This is detailed under the risk management framework in NIST Special Publication 800-37R1. Since a traditional HPC site will have covered a reasonable number of these controls in their regular A&A work, we will focus on a small number of unique issues presented by providers of cloud computing infrastructure. 41
Page 1 and 2:
The Magellan Report on Cloud Comput
Page 3 and 4: Executive Summary The goal of Magel
Page 5 and 6: Key Findings The goal of the Magell
Page 7 and 8: Magellan Final Report Finding 8. DO
Page 9 and 10: Magellan Final Report role in addre
Page 11 and 12: Contents Executive Summary Key Find
Page 13 and 14: Magellan Final Report 9.7 Discussio
Page 15 and 16: Chapter 1 Overview Cloud computing
Page 17 and 18: Magellan Final Report • The Argon
Page 19 and 20: Chapter 2 Background The term “cl
Page 21 and 22: Magellan Final Report 2.1.4 Hardwar
Page 23 and 24: Magellan Final Report Table 3.1: Ke
Page 25 and 26: Magellan Final Report Little Magell
Page 27 and 28: Magellan Final Report 3.2 Advanced
Page 29 and 30: Chapter 4 Application Characteristi
Page 31 and 32: Magellan Final Report Table 4.1: Pe
Page 33 and 34: Magellan Final Report Output data
Page 35 and 36: Magellan Final Report of the pipeli
Page 37 and 38: Chapter 5 Magellan Testbed As part
Page 39 and 40: Magellan Final Report Figure 5.1: P
Page 41 and 42: Magellan Final Report Figure 5.2: P
Page 43 and 44: Magellan Final Report NERSC deploye
Page 45 and 46: Magellan Final Report Figure 6.1: A
Page 47 and 48: Magellan Final Report greater than
Page 49 and 50: Magellan Final Report specific QoS
Page 51 and 52: Magellan Final Report configuration
Page 53: Magellan Final Report 7.4 Summary U
Page 57 and 58: Magellan Final Report Aside from le
Page 59 and 60: Magellan Final Report 9.1 Understan
Page 61 and 62: Magellan Final Report grid) on 256
Page 63 and 64: Magellan Final Report Table 9.1: HP
Page 65 and 66: Magellan Final Report 25  Ping 
Page 67 and 68: Magellan Final Report 100  12 
Page 69 and 70: Magellan Final Report case of GTC,
Page 71 and 72: Magellan Final Report 1.4 IB TCPo
Page 73 and 74: Magellan Final Report only affects
Page 75 and 76: Magellan Final Report Figure 9.11:
Page 77 and 78: Magellan Final Report charted as a
Page 79 and 80: Magellan Final Report Evaluation Cr
Page 81 and 82: Magellan Final Report Write Perform
Page 83 and 84: Magellan Final Report 3500 3000 G
Page 85 and 86: Magellan Final Report Histogram Plo
Page 87 and 88: Magellan Final Report SATA devices.
Page 89 and 90: Magellan Final Report MB/s Virident
Page 91 and 92: Magellan Final Report and the perfo
Page 93 and 94: Magellan Final Report (a) Hosts (b)
Page 95 and 96: Magellan Final Report Routing IP pa
Page 97 and 98: Chapter 10 MapReduce Programming Mo
Page 99 and 100: Magellan Final Report 10.3 Hadoop E
Page 101 and 102: Magellan Final Report 35000  3500
Page 103 and 104: Magellan Final Report summarize som
Page 105 and 106:
Magellan Final Report Processing ti
Page 107 and 108:
Magellan Final Report in the networ
Page 109 and 110:
Magellan Final Report Workload Patt
Page 111 and 112:
Magellan Final Report This benchmar
Page 113 and 114:
Magellan Final Report Task Tracker
Page 115 and 116:
Magellan Final Report processing ti
Page 117 and 118:
Magellan Final Report Using ESnet
Page 119 and 120:
Magellan Final Report Figure 11.2:
Page 121 and 122:
Magellan Final Report data collecte
Page 123 and 124:
Magellan Final Report comparison to
Page 125 and 126:
Magellan Final Report 11.2.5 Integr
Page 127 and 128:
Magellan Final Report very large (4
Page 129 and 130:
Magellan Final Report for optimizat
Page 131 and 132:
Magellan Final Report One of the ad
Page 133 and 134:
Magellan Final Report commercial cl
Page 135 and 136:
Magellan Final Report Table 12.2: H
Page 137 and 138:
Magellan Final Report Cost per TF t
Page 139 and 140:
Magellan Final Report Productivity.
Page 141 and 142:
Magellan Final Report compute insta
Page 143 and 144:
Chapter 13 Conclusions Cloud comput
Page 145 and 146:
Magellan Final Report Inherently, t
Page 147 and 148:
Bibliography [1] G. Aldering, G. Ad
Page 149 and 150:
Magellan Final Report [30] I. Foste
Page 151 and 152:
Magellan Final Report [67] M. Palan
Page 153 and 154:
Appendix A Publications Selected Pr
Page 155 and 156:
Magellan Final Report Magellan Rese
Page 157 and 158:
Magellan Final Report Selected Mage
Page 159 and 160:
Appendix B Surveys B1
Page 161 and 162:
• Nuclear Physics - Accelarator P
Page 163 and 164:
Allow users to edit responses. What
Page 165 and 166:
Amazon Eucalyptus OpenStack Other:
Page 167 and 168:
Please list any publications/report
Page 169 and 170:
Hadoop Streaming Hadoop Native Prog
show all

Magellan Final Report - Office of Science - U.S. Department of Energy

Create successful ePaper yourself

Delete template?

Save as template?