Magellan Final Report - Office of Science - U.S. Department of Energy

More documents

Recommendations

Info

Chapter 12 Cost Analysis One of the most common arguments made for the adoption of cloud computing is the potential cost saving compared to deploying and operating in-house infrastructure. There are several reasons in support of this argument. Commercial clouds consolidate demand across a large customer base, resulting economies of scales that small departmental clusters cannot achieve. This includes lower number of FTEs per core, stronger purchasing power when negotiating with the vendor, and better power efficiency since large systems can justify investing more in the design of the cooling infrastructure. It is worth noting that large DOE HPC Centers also consolidate demand and achieve many of the same benefits. Low upfront costs and pay-as-yougo models are also considered advantages of clouds. Clouds allow users to avoid both time-investments and costs associated with building out facilities, procuring hardware and services, and deploying systems. Users are able to get access to on-demand resources and only pay for the services they use. Ultimately, whether a cloud offering is less costly than owning and operating in-house resources is very dependent on the details of the workload, including characteristics such as scaling requirements, overall utilization, and time criticality of the workload. In this chapter, we will review some of the cost model approaches, consider the costs for typical DOE centers, and discuss other aspects that can impact the cost analysis. 12.1 Related Studies Walker [81] proposed a modeling tool that allows organizations to compare the cost of leasing CPU time from a cloud provider with a server purchase. The paper provides an analysis comparing three options for NSF’s Ranger supercomputing resources, i.e., purchase, lease, or purchase-upgrade, and shows that a three-year purchase investment is the best strategy for this system. The analysis also considers a one-rack server and shows that leasing is the better option for the system. Carlyle et. al. [9] provide a case study of costs incurred by end-users of Purdue’s HPC community cluster program and conclude that users of the cluster program would incur higher costs if they were purchasing commercial cloud computing HPC offerings such as the Amazon Cluster compute program. Hamilton published an analysis of data center costs in 2008 [38] and an updated analysis in 2010 [39]. His primary conclusion was that, despite wide-spread belief that power costs dominated data center costs, server costs remained the primary cost factor. 12.2 Cost Analysis Models There are various approaches to conducting cost comparison as discussed above. We present three approaches, computing the hourly cost of an HPC system, the cost of a DOE center in a commercial cloud, and a cost analysis using the HPC Linpack benchmark as a stand-in for DOE applications. These three models are useful since they tackle the question from various dimensions—system, center, and user perspectives. This first approach translates the operational cost an HPC system into the typical pricing unit used in 118
Magellan Final Report commercial clouds. The second essentially compares the costs of an entire center. The final approach takes an application centric approach. 12.2.1 Assumptions and Inputs In all of the cost analyses, we have attempted to use the most cost effective option available. For example, based on our benchmarking analysis, the Cluster Compute offering is the most cost effective option from Amazon for tightly coupled MPI applications and even most CPU intensive applications, since the instances are dedicated resulting in less interference from other applications. Furthermore, most of the instance pricing works out to a roughly constant cost per core hour. For example, a Cluster Compute Instance is approximately 16x more capable than a regular small instance and the cost is approximately 16x more. So using smaller, less expensive instances isn’t more cost effective if the application can effectively utilize all of the cores, which is true of most CPU intensive scientific applications. In contrast, many web applications under utilize the CPU, making small instances more cost effective for those use cases. For compute instances, we use a one year reserved instance and assume the nodes are fully utilized over the entire year to compute an effective core hour cost. With reserved instances, you pay a fixed up-front cost in order to pay a lower per hour cost. If the instance is used for a majority of the reserved period (one year in our analysis), this results in a lower effective rate. For example, an on-demand Cluster Compute instance costs $1.60 per hour, but a reserved instance that is used during the entire one year period results in an effective rate of $1.05 (a 30% reduction). We further divide this by the number of cores in the instance to arrive at an effective core-hour cost, which simplifies comparisons with other systems. Table 12.1 summarizes this calculation. It is worth noting that the lowest spot instance pricing is approximately 50% of this effective core hour cost. We do not use this offering as a basis for the cost analysis, since the runtime for a spot instance is unpredictable and application programmers need to design their applications to handle pre-emption, which would not match the requirements for our applications. However, spot pricing does provide an estimate of the absolute lowest bound for pricing, since it essentially reflects the price threshold at which Amazon is unwilling to offer a service. For file system costs, we use elastic block storage to compute the storage costs for file systems. This most likely underestimates the costs since it omits the costs for I/O requests and the costs for instances that would be required to act as file system servers. S3 is used to compute the costs for archival storage. S3 uses a tiered cost system where the incremental storage costs decline as more data is stored in the system. For example, the monthly cost to store the first terabyte of data using reduced redundancy is $0.093 per gigabyte, while the monthly cost to store data between 1 TB and 49 TB is $0.083 per GB. For simplicity, we compute all S3 costs at the lowest rate. For example, since the NERSC archival system has 19 PB of data stored, we use Amazon’s rate of $0.037 per GB (for a month) for data stored above 5 PB with reduced redundancy. The cost for transactions is also omitted for simplicity, but would further increase the cost of using the commercial offering. The pricing data was collected from the Amazon website on September 30, 2011. 12.2.2 Computed Hourly Cost of an HPC System One of the more direct methods to compare the cost of DOE HPC System with cloud offerings is to compute the effective hourly cost per core hour. This makes it relatively straight forward to compare it with similar commercial cloud systems. However, determining this cost is problematic, since many of the costs used for the calculation are indirect or business sensitive. However, for the sake of comparison we will use Hopper, a Cray XE-6 system recently deployed at NERSC. This system was selected for comparison because it is a relatively recent deployment, is large enough to capture economy of scale, and is tuned for scientific applications relevant to the DOE-SC community. In lieu of providing detailed costs that may be business sensitive, we will use conservative values for the cost which are higher than actual costs. The Hopper contract has been valued at approximately $52M. We use the peak power of 3 MW of power (it typically uses around 2 MW) which translates into an power cost of $2.6M per year assuming a cost of $0.10 per KWHour. In general, $0.10 119
Page 1 and 2:
The Magellan Report on Cloud Comput
Page 3 and 4:
Executive Summary The goal of Magel
Page 5 and 6:
Key Findings The goal of the Magell
Page 7 and 8:
Magellan Final Report Finding 8. DO
Page 9 and 10:
Magellan Final Report role in addre
Page 11 and 12:
Contents Executive Summary Key Find
Page 13 and 14:
Magellan Final Report 9.7 Discussio
Page 15 and 16:
Chapter 1 Overview Cloud computing
Page 17 and 18:
Magellan Final Report • The Argon
Page 19 and 20:
Chapter 2 Background The term “cl
Page 21 and 22:
Magellan Final Report 2.1.4 Hardwar
Page 23 and 24:
Magellan Final Report Table 3.1: Ke
Page 25 and 26:
Magellan Final Report Little Magell
Page 27 and 28:
Magellan Final Report 3.2 Advanced
Page 29 and 30:
Chapter 4 Application Characteristi
Page 31 and 32:
Magellan Final Report Table 4.1: Pe
Page 33 and 34:
Magellan Final Report Output data
Page 35 and 36:
Magellan Final Report of the pipeli
Page 37 and 38:
Chapter 5 Magellan Testbed As part
Page 39 and 40:
Magellan Final Report Figure 5.1: P
Page 41 and 42:
Magellan Final Report Figure 5.2: P
Page 43 and 44:
Magellan Final Report NERSC deploye
Page 45 and 46:
Magellan Final Report Figure 6.1: A
Page 47 and 48:
Magellan Final Report greater than
Page 49 and 50:
Magellan Final Report specific QoS
Page 51 and 52:
Magellan Final Report configuration
Page 53 and 54:
Magellan Final Report 7.4 Summary U
Page 55 and 56:
Magellan Final Report Firewalls are
Page 57 and 58:
Magellan Final Report Aside from le
Page 59 and 60:
Magellan Final Report 9.1 Understan
Page 61 and 62:
Magellan Final Report grid) on 256
Page 63 and 64:
Magellan Final Report Table 9.1: HP
Page 65 and 66:
Magellan Final Report 25  Ping 
Page 67 and 68:
Magellan Final Report 100  12 
Page 69 and 70:
Magellan Final Report case of GTC,
Page 71 and 72:
Magellan Final Report 1.4 IB TCPo
Page 73 and 74:
Magellan Final Report only affects
Page 75 and 76:
Magellan Final Report Figure 9.11:
Page 77 and 78:
Magellan Final Report charted as a
Page 79 and 80:
Magellan Final Report Evaluation Cr
Page 81 and 82: Magellan Final Report Write Perform
Page 83 and 84: Magellan Final Report 3500 3000 G
Page 85 and 86: Magellan Final Report Histogram Plo
Page 87 and 88: Magellan Final Report SATA devices.
Page 89 and 90: Magellan Final Report MB/s Virident
Page 91 and 92: Magellan Final Report and the perfo
Page 93 and 94: Magellan Final Report (a) Hosts (b)
Page 95 and 96: Magellan Final Report Routing IP pa
Page 97 and 98: Chapter 10 MapReduce Programming Mo
Page 99 and 100: Magellan Final Report 10.3 Hadoop E
Page 101 and 102: Magellan Final Report 35000  3500
Page 103 and 104: Magellan Final Report summarize som
Page 105 and 106: Magellan Final Report Processing ti
Page 107 and 108: Magellan Final Report in the networ
Page 109 and 110: Magellan Final Report Workload Patt
Page 111 and 112: Magellan Final Report This benchmar
Page 113 and 114: Magellan Final Report Task Tracker
Page 115 and 116: Magellan Final Report processing ti
Page 117 and 118: Magellan Final Report Using ESnet
Page 119 and 120: Magellan Final Report Figure 11.2:
Page 121 and 122: Magellan Final Report data collecte
Page 123 and 124: Magellan Final Report comparison to
Page 125 and 126: Magellan Final Report 11.2.5 Integr
Page 127 and 128: Magellan Final Report very large (4
Page 129 and 130: Magellan Final Report for optimizat
Page 131: Magellan Final Report One of the ad
Page 135 and 136: Magellan Final Report Table 12.2: H
Page 137 and 138: Magellan Final Report Cost per TF t
Page 139 and 140: Magellan Final Report Productivity.
Page 141 and 142: Magellan Final Report compute insta
Page 143 and 144: Chapter 13 Conclusions Cloud comput
Page 145 and 146: Magellan Final Report Inherently, t
Page 147 and 148: Bibliography [1] G. Aldering, G. Ad
Page 149 and 150: Magellan Final Report [30] I. Foste
Page 151 and 152: Magellan Final Report [67] M. Palan
Page 153 and 154: Appendix A Publications Selected Pr
Page 155 and 156: Magellan Final Report Magellan Rese
Page 157 and 158: Magellan Final Report Selected Mage
Page 159 and 160: Appendix B Surveys B1
Page 161 and 162: • Nuclear Physics - Accelarator P
Page 163 and 164: Allow users to edit responses. What
Page 165 and 166: Amazon Eucalyptus OpenStack Other:
Page 167 and 168: Please list any publications/report
Page 169 and 170: Hadoop Streaming Hadoop Native Prog
show all

Magellan Final Report - Office of Science - U.S. Department of Energy

Create successful ePaper yourself

Delete template?

Save as template?