Magellan Final Report - Office of Science - U.S. Department of Energy

More documents

Recommendations

Info

Magellan Final Report Run$me Rela$ve to Carver  18  16  14  12  10  8  6  4  2  0  GAMESS  GTC  IMPACT  fvCAM  MAESTRO256  Amazon EC2  Lawrencium  Franklin  Run$me Rela$ve to Carver  60  50  40  30  20  10  0  MILC  PARATEC  Amazon EC2  Lawrencium  Franklin  Figure 9.1: Runtime of each application on EC2, Lawrencium and Franklin relative to Carver. HPCC Results The results of running HPCC v.1.4.0 on 64 cores of the four machines in our study are shown in Table 9.1. The DGEMM results are, as one would expect, based on the properties of the CPUs. The STREAM results show that EC2 is significantly faster for this benchmark than Lawrencium. We believe this is because of the particular processor distribution we received for our EC2 nodes for this test. We had 26 AMD Opteron 270s, 16 AMD Opteron 2218 HEs, and 14 Intel Xeon E5430s, of which this measurement represents an average. The AMD Opteron-based systems are known to have better memory performance than the Intel Harpertown-based systems used in Lawrencium. Both EC2 and Lawrencium are significantly slower than the Nehalem-based Carver system, however. The network latency and bandwidth results clearly show the difference between the interconnects on the tested systems. For display we have chosen both the average ping-pong latency and bandwidth, and the randomly-ordered ring latency and bandwidth. The ping-pong results show the latency and the bandwidth with no self-induced contention, while the randomly ordered ring tests show the performance degradation with self-contention. The un-contended latency and bandwidth measurements of the EC2 Gigabit Ethernet interconnect are more than 20 times worse than the slowest other machine. Both EC2 and Lawrencium suffer a significant performance degradation when self-contention is introduced. The EC2 latency is 13× worse than Lawrencium, and more than 400× slower than a modern system like Carver. The bandwidth numbers show similar trends: EC2 is 12× slower than Lawrencium, and 30× slower than Carver. We now turn our attention to the complex synthetics. The performance of these is sensitive to characteristics of both the processor and the network, and their performance gives us some insight into how real applications may perform on EC2. HPL is the high-performance version of the widely reported Linpack benchmark, which is used to determine the TOP500 list. It solves a dense linear system of equations, and its performance depends upon DGEMM and the network bandwidth and latency. On a typical high performance computing system today, roughly 90% of the time is spent in DGEMM, and the results for the three HPC systems illustrate this clearly. However, for EC2, the less capable network clearly inhibits overall HPL performance, by a factor of six or more. The FFTE benchmark measures the floating point rate of execution of a double precision complex one-dimensional discrete Fourier transform, and the PTRANS benchmark measures the time to transpose a large matrix. Both of these benchmarks’ performance depends upon the memory and network bandwidth and therefore show similar trends. EC2 is approximately 20 times slower than Carver and four times slower than Lawrencium in both cases. The RandomAccess benchmark measures the rate of random updates of memory, and its performance depends on memory and network latency. In this case, EC2 is approximately 10× slower than Carver and 3× slower than Lawrencium. Overall, the results of the HPCC runs indicate that the lower performing network interconnect in EC2 50
Magellan Final Report 25  Ping Pong Latency  Random Ring Latency  Ping Pong Bandwidth  Random Ring Bandwidth  Run$me Rela$ve to Lawrencium  20  15  10  5  IMPACT‐T  GTC  MAESTRO  MILC  fvCAM  PARATEC  0  0  10  20  30  40  50  60  70  80  %Communica$on  Figure 9.2: Correlation between the runtime of each application on EC2 and the amount of time an application spends communicating. has a significant impact upon the performance of even very simple application proxies. This is illustrated clearly by the HPL results, which are significantly worse than would be expected from simply looking at the DGEMM performance. Application Performance Figure 9.1 shows the relative runtime of each of our test applications relative to Carver, which is the newest and therefore fastest machine in this study. For these applications, at these concurrencies, Franklin and Lawrencium are between 1.4× and 2.6× slower than Carver. For EC2, the range of performance observed is significantly greater. In the best case, GAMESS, EC2 is only 2.7× slower than Carver. For the worst case, PARATEC, EC2 is more than 50× slower than Carver. This large spread of performance simply reflects the different demands each application places upon the network, as in the case of the compact applications that were described in the previous section. Qualitatively we can understand the differences in terms of the performance characteristics of each of the applications described earlier. PARATEC shows the worst performance on EC2, 52× slower than Carver. It performs 3-D FFTs, and the global (i.e., all-to-all) data transposes within these FFT operations can incur a large communications overhead. MILC (20×) and MAESTRO (17×) also stress global communication, but to a lesser extent than PARATEC. CAM (11×), IMPACT (9×) and GTC (6×) are all characterized by large point-to-point communications, which do not induce quite as much contention as global communication, hence their performance is not impacted quite as much. GAMESS (2.7×), for this benchmark problem, places relatively little demand upon the network, and therefore is hardly slowed down at all on EC2. Qualitatively, it seems that those applications that perform the most collective communication with the most messages are those that perform the worst on EC2. To gain a more quantitative understanding, we performed a more detailed analysis, which is described in the next section. Performance Analysis Using IPM To understand more deeply the reasons for the poor performance of EC2 in comparison to the other platforms, we performed additional experiments using the Integrated Performance Monitoring (IPM) framework [77, 84]. IPM is a profiling tool that uses the MPI profiling interface to measure the time taken by an application in MPI on a task-by-task basis. This allows us to examine the relative amounts of time taken by an application 51
Page 1 and 2:
The Magellan Report on Cloud Comput
Page 3 and 4:
Executive Summary The goal of Magel
Page 5 and 6:
Key Findings The goal of the Magell
Page 7 and 8:
Magellan Final Report Finding 8. DO
Page 9 and 10:
Magellan Final Report role in addre
Page 11 and 12:
Contents Executive Summary Key Find
Page 13 and 14: Magellan Final Report 9.7 Discussio
Page 15 and 16: Chapter 1 Overview Cloud computing
Page 17 and 18: Magellan Final Report • The Argon
Page 19 and 20: Chapter 2 Background The term “cl
Page 21 and 22: Magellan Final Report 2.1.4 Hardwar
Page 23 and 24: Magellan Final Report Table 3.1: Ke
Page 25 and 26: Magellan Final Report Little Magell
Page 27 and 28: Magellan Final Report 3.2 Advanced
Page 29 and 30: Chapter 4 Application Characteristi
Page 31 and 32: Magellan Final Report Table 4.1: Pe
Page 33 and 34: Magellan Final Report Output data
Page 35 and 36: Magellan Final Report of the pipeli
Page 37 and 38: Chapter 5 Magellan Testbed As part
Page 39 and 40: Magellan Final Report Figure 5.1: P
Page 41 and 42: Magellan Final Report Figure 5.2: P
Page 43 and 44: Magellan Final Report NERSC deploye
Page 45 and 46: Magellan Final Report Figure 6.1: A
Page 47 and 48: Magellan Final Report greater than
Page 49 and 50: Magellan Final Report specific QoS
Page 51 and 52: Magellan Final Report configuration
Page 53 and 54: Magellan Final Report 7.4 Summary U
Page 55 and 56: Magellan Final Report Firewalls are
Page 57 and 58: Magellan Final Report Aside from le
Page 59 and 60: Magellan Final Report 9.1 Understan
Page 61 and 62: Magellan Final Report grid) on 256
Page 63: Magellan Final Report Table 9.1: HP
Page 67 and 68: Magellan Final Report 100  12 
Page 69 and 70: Magellan Final Report case of GTC,
Page 71 and 72: Magellan Final Report 1.4 IB TCPo
Page 73 and 74: Magellan Final Report only affects
Page 75 and 76: Magellan Final Report Figure 9.11:
Page 77 and 78: Magellan Final Report charted as a
Page 79 and 80: Magellan Final Report Evaluation Cr
Page 81 and 82: Magellan Final Report Write Perform
Page 83 and 84: Magellan Final Report 3500 3000 G
Page 85 and 86: Magellan Final Report Histogram Plo
Page 87 and 88: Magellan Final Report SATA devices.
Page 89 and 90: Magellan Final Report MB/s Virident
Page 91 and 92: Magellan Final Report and the perfo
Page 93 and 94: Magellan Final Report (a) Hosts (b)
Page 95 and 96: Magellan Final Report Routing IP pa
Page 97 and 98: Chapter 10 MapReduce Programming Mo
Page 99 and 100: Magellan Final Report 10.3 Hadoop E
Page 101 and 102: Magellan Final Report 35000  3500
Page 103 and 104: Magellan Final Report summarize som
Page 105 and 106: Magellan Final Report Processing ti
Page 107 and 108: Magellan Final Report in the networ
Page 109 and 110: Magellan Final Report Workload Patt
Page 111 and 112: Magellan Final Report This benchmar
Page 113 and 114: Magellan Final Report Task Tracker
Page 115 and 116:
Magellan Final Report processing ti
Page 117 and 118:
Magellan Final Report Using ESnet
Page 119 and 120:
Magellan Final Report Figure 11.2:
Page 121 and 122:
Magellan Final Report data collecte
Page 123 and 124:
Magellan Final Report comparison to
Page 125 and 126:
Magellan Final Report 11.2.5 Integr
Page 127 and 128:
Magellan Final Report very large (4
Page 129 and 130:
Magellan Final Report for optimizat
Page 131 and 132:
Magellan Final Report One of the ad
Page 133 and 134:
Magellan Final Report commercial cl
Page 135 and 136:
Magellan Final Report Table 12.2: H
Page 137 and 138:
Magellan Final Report Cost per TF t
Page 139 and 140:
Magellan Final Report Productivity.
Page 141 and 142:
Magellan Final Report compute insta
Page 143 and 144:
Chapter 13 Conclusions Cloud comput
Page 145 and 146:
Magellan Final Report Inherently, t
Page 147 and 148:
Bibliography [1] G. Aldering, G. Ad
Page 149 and 150:
Magellan Final Report [30] I. Foste
Page 151 and 152:
Magellan Final Report [67] M. Palan
Page 153 and 154:
Appendix A Publications Selected Pr
Page 155 and 156:
Magellan Final Report Magellan Rese
Page 157 and 158:
Magellan Final Report Selected Mage
Page 159 and 160:
Appendix B Surveys B1
Page 161 and 162:
• Nuclear Physics - Accelarator P
Page 163 and 164:
Allow users to edit responses. What
Page 165 and 166:
Amazon Eucalyptus OpenStack Other:
Page 167 and 168:
Please list any publications/report
Page 169 and 170:
Hadoop Streaming Hadoop Native Prog
show all

Magellan Final Report - Office of Science - U.S. Department of Energy

Create successful ePaper yourself

Delete template?

Save as template?