Magellan Final Report - Office of Science - U.S. Department of Energy

More documents

Recommendations

Info

Chapter 9 Benchmarking and Workload Analysis Scientific applications can be broadly classified as (a) tightly coupled computations, or (b) asynchronous or loosely coupled computations. Tightly coupled applications are typically MPI codes that range from mid-range to large-scale. Mid-range applications that need tens to hundreds of processors often run on local clusters, while the large-scale codes are run at supercomputing centers. There is an interest in understanding if these mid-range applications could run in cloud environments, eliminating the need for various users to manage their own clusters. Within the mid-range space, many of the scientific applications that run on the desktop or local clusters have asynchronous or loosely coupled computations. In recent years, the increased scale of digital data due to low-cost sensors and other technologies [58] has resulted in the need for these applications to scale to larger environments such as cloud environments. The requirements of these high-throughput applications are similar to those of the internet applications that currently dominate the cloud computing space, but with far greater data storage and throughput requirements. It is important to understand the behavior of each of these classes of applications in cloud environments. We have performed a number of benchmarking experiments to understand the performance of applications in virtualized cloud environments as compared with traditional HPC systems. To understand which applications might work well in cloud environments, we conducted an exhaustive benchmarking study and workload analysis. Specifically, • We evaluated the impact of virtualization and interconnect performance to understand the nature of clouds required for scientific applications. This three-phase benchmarking effort considered public clouds, private clouds, and a variety of different interconnects including InfiniBand, 10 GigE, and 1 GigE (Section 9.1). • We conducted benchmarking tests to understand the virtualization overheads in the OpenStack virtual environments (Section 9.2). • We conducted an I/O benchmarking study to understand the range of I/O performance that applications can anticipate in virtualized cloud environments (Section 9.3). • Solid-state storage (SSS) is poised as a disruptive technology alongside clouds to handle large data volumes. We did a preliminary flash storage evaluation to understand the performance of different products available at the time of evaluation (Section 9.4) • We worked closely with various Magellan user groups to understand the performance impact on their applications of virtualized cloud environments (Section 9.5). • Finally, we performed some workload analysis to understand the characteristics of scientific applications and correlate it with our benchmarking data (Section 9.6). 44
Magellan Final Report 9.1 Understanding the Impact of Virtualization and Interconnect Performance The benchmarking was conducted in three phases. In Phase 1, we compared the performance of a commercial cloud platform (Amazon EC2) with NERSC systems (Carver and Franklin) and a mid-range computing system (Lawrencium) at LBNL. The Carver and Magellan systems are identical in hardware configuration, and the traditional cluster configuration for Magellan uses the same software configuration as Carver. The typical problem configurations for the NERSC-6 benchmarks (described below) are defined for much larger systems. We constructed reduced problem size configurations to target the requirements of mid-range workloads. In Phase 2, we repeated the experiments of the reduced problem size configurations on Magellan hardware to further understand and characterize the virtualization impact. Finally, in Phase 3, we studied the effect of scaling and interconnects on the Magellan testbed and Amazon. 9.1.1 Applications Used in Study DOE supercomputer centers typically serve a diverse user community. In NERSC’s case, the community contains over 4,000 users working on more than 400 distinct projects and using some 600 codes that serve the diverse science needs of the DOE Office of Science research community. Abstracting the salient performancecritical features of such a workload is a challenging task. However, significant workload characterization efforts have resulted in a set of full application benchmarks that span a range of science domains, algorithmic methods, and concurrencies, as well as machine-based characteristics that influence performance such as message size, memory access pattern, and working set sizes. These applications form the basis for the Sustained System Performance (SSP) metric, which represents the effectiveness of a system for delivered performance on applications better than peak FLOP rates [55]. As well as being representative of the DOE Office of Science workload, some of these applications have also been used by other federal agencies, as they represent significant parts of their workload. MILC and PARATEC were used by the NSF, CAM by NCAR, and GAMESS by the NSF and DoD HPCMO. The representation of the methods embodied in our benchmark suite goes well beyond the particular codes employed. For example, PARATEC is representative of methods that constitute one of the largest consumers of supercomputing cycles in computer centers around the world [64]. Therefore, although our benchmark suite was developed with the NERSC workload in mind, we are confident that it is broadly representative of the workloads of many supercomputing centers today. More details about these applications and their computational characteristics in the NERSC-6 benchmark suite can be found in Ref. [3]. The typical problem configurations for these benchmarks are defined for much larger “capability” systems, so we had to construct reduced size problem configurations to target the requirements of mid-range workloads that are the primary subject of this study. For example, many of the input configurations were constructed for a system acquisition (begun during 2008) that resulted in a 1 petaflop peak resource that contains over 150,000 cores—considerably larger than is easily feasible to run in today’s commercial cloud infrastructures. Thus, problem sets were modified to use smaller grids and concomitant concurrencies along with shorter iteration spaces and/or shorter simulation durations. We also modified the problem configurations to eliminate any significant I/O in order to focus on the computational and network characteristics. For the scaling studies described in Section 9.1.5, we used the larger problem sets from MILC and PARATEC as described below in the application description. Next we describe each of the applications that make up our benchmarking suite, describe the parameters they were run with, and comment upon their computation and communication characteristics. CAM. The Community Atmosphere Model (CAM) is the atmospheric component of the Community Climate System Model (CCSM) developed at NCAR and elsewhere for the weather and climate research communities [7, 6]. In this work we use CAM v3.1 with a finite volume (FV) dynamical core and a “D” grid (about 0.5 degree resolution). In this case we used 120 MPI tasks and ran for three days simulated time (144 timesteps). 45
Page 1 and 2:
The Magellan Report on Cloud Comput
Page 3 and 4:
Executive Summary The goal of Magel
Page 5 and 6:
Key Findings The goal of the Magell
Page 7 and 8: Magellan Final Report Finding 8. DO
Page 9 and 10: Magellan Final Report role in addre
Page 11 and 12: Contents Executive Summary Key Find
Page 13 and 14: Magellan Final Report 9.7 Discussio
Page 15 and 16: Chapter 1 Overview Cloud computing
Page 17 and 18: Magellan Final Report • The Argon
Page 19 and 20: Chapter 2 Background The term “cl
Page 21 and 22: Magellan Final Report 2.1.4 Hardwar
Page 23 and 24: Magellan Final Report Table 3.1: Ke
Page 25 and 26: Magellan Final Report Little Magell
Page 27 and 28: Magellan Final Report 3.2 Advanced
Page 29 and 30: Chapter 4 Application Characteristi
Page 31 and 32: Magellan Final Report Table 4.1: Pe
Page 33 and 34: Magellan Final Report Output data
Page 35 and 36: Magellan Final Report of the pipeli
Page 37 and 38: Chapter 5 Magellan Testbed As part
Page 39 and 40: Magellan Final Report Figure 5.1: P
Page 41 and 42: Magellan Final Report Figure 5.2: P
Page 43 and 44: Magellan Final Report NERSC deploye
Page 45 and 46: Magellan Final Report Figure 6.1: A
Page 47 and 48: Magellan Final Report greater than
Page 49 and 50: Magellan Final Report specific QoS
Page 51 and 52: Magellan Final Report configuration
Page 53 and 54: Magellan Final Report 7.4 Summary U
Page 55 and 56: Magellan Final Report Firewalls are
Page 57: Magellan Final Report Aside from le
Page 61 and 62: Magellan Final Report grid) on 256
Page 63 and 64: Magellan Final Report Table 9.1: HP
Page 65 and 66: Magellan Final Report 25  Ping 
Page 67 and 68: Magellan Final Report 100  12 
Page 69 and 70: Magellan Final Report case of GTC,
Page 71 and 72: Magellan Final Report 1.4 IB TCPo
Page 73 and 74: Magellan Final Report only affects
Page 75 and 76: Magellan Final Report Figure 9.11:
Page 77 and 78: Magellan Final Report charted as a
Page 79 and 80: Magellan Final Report Evaluation Cr
Page 81 and 82: Magellan Final Report Write Perform
Page 83 and 84: Magellan Final Report 3500 3000 G
Page 85 and 86: Magellan Final Report Histogram Plo
Page 87 and 88: Magellan Final Report SATA devices.
Page 89 and 90: Magellan Final Report MB/s Virident
Page 91 and 92: Magellan Final Report and the perfo
Page 93 and 94: Magellan Final Report (a) Hosts (b)
Page 95 and 96: Magellan Final Report Routing IP pa
Page 97 and 98: Chapter 10 MapReduce Programming Mo
Page 99 and 100: Magellan Final Report 10.3 Hadoop E
Page 101 and 102: Magellan Final Report 35000  3500
Page 103 and 104: Magellan Final Report summarize som
Page 105 and 106: Magellan Final Report Processing ti
Page 107 and 108: Magellan Final Report in the networ
Page 109 and 110:
Magellan Final Report Workload Patt
Page 111 and 112:
Magellan Final Report This benchmar
Page 113 and 114:
Magellan Final Report Task Tracker
Page 115 and 116:
Magellan Final Report processing ti
Page 117 and 118:
Magellan Final Report Using ESnet
Page 119 and 120:
Magellan Final Report Figure 11.2:
Page 121 and 122:
Magellan Final Report data collecte
Page 123 and 124:
Magellan Final Report comparison to
Page 125 and 126:
Magellan Final Report 11.2.5 Integr
Page 127 and 128:
Magellan Final Report very large (4
Page 129 and 130:
Magellan Final Report for optimizat
Page 131 and 132:
Magellan Final Report One of the ad
Page 133 and 134:
Magellan Final Report commercial cl
Page 135 and 136:
Magellan Final Report Table 12.2: H
Page 137 and 138:
Magellan Final Report Cost per TF t
Page 139 and 140:
Magellan Final Report Productivity.
Page 141 and 142:
Magellan Final Report compute insta
Page 143 and 144:
Chapter 13 Conclusions Cloud comput
Page 145 and 146:
Magellan Final Report Inherently, t
Page 147 and 148:
Bibliography [1] G. Aldering, G. Ad
Page 149 and 150:
Magellan Final Report [30] I. Foste
Page 151 and 152:
Magellan Final Report [67] M. Palan
Page 153 and 154:
Appendix A Publications Selected Pr
Page 155 and 156:
Magellan Final Report Magellan Rese
Page 157 and 158:
Magellan Final Report Selected Mage
Page 159 and 160:
Appendix B Surveys B1
Page 161 and 162:
• Nuclear Physics - Accelarator P
Page 163 and 164:
Allow users to edit responses. What
Page 165 and 166:
Amazon Eucalyptus OpenStack Other:
Page 167 and 168:
Please list any publications/report
Page 169 and 170:
Hadoop Streaming Hadoop Native Prog
show all

Magellan Final Report - Office of Science - U.S. Department of Energy

Create successful ePaper yourself

Delete template?

Save as template?