Magellan Final Report - Office of Science - U.S. Department of Energy
Magellan Final Report - Office of Science - U.S. Department of Energy
Magellan Final Report - Office of Science - U.S. Department of Energy
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
<strong>Magellan</strong> <strong>Final</strong> <strong>Report</strong><br />
Cost per TF to operate<br />
for 1 year<br />
Table 12.4: Comparison <strong>of</strong> cost <strong>of</strong> a Teraflop Year between DOE Center and Amazon<br />
Amazon (1-year reserved<br />
NERSC<br />
ALCF<br />
instances)<br />
HPL Peak 42TF 1361 TF (Hopper 1054 TF,<br />
Franklin 266 TF, Carver<br />
41TF)<br />
481 TF (Intrepid 459TF,<br />
Surveyor 11TF, Challenger<br />
11TF)<br />
Cost $7.5M/year for 42TF $55M/year (entire annual $41M/year (average an-<br />
center budget)<br />
nual center budget)<br />
$179K/year per TF $40K/year per TF $85K/year per TF<br />
services, storage, network, and other benefits. For example, if we isolate just the Hopper system (1054 TF)<br />
and use the annual costs from Section 12.2.2 ($21.1M per year), we arrive at $20k per TF-Year. The ALCF<br />
numbers demonstrate the impact <strong>of</strong> older equipment on cost analysis, as the ALCF systems are coming up<br />
on their fifth year <strong>of</strong> operations. These systems will soon be replaced with new, next generation IBM Blue<br />
Gene systems. A discussion <strong>of</strong> this impact is provided in the section on historical trends in pricing below.<br />
There are scientific applications where the performance penalty <strong>of</strong> running in a virtualized environment<br />
including Cloud systems is less significant. For example, many high-throughput applications such as those<br />
used in bioinformatics run relatively efficiently in cloud systems since they are computationally intensive and<br />
can run independent <strong>of</strong> other tasks. One way <strong>of</strong> understanding the potential improvements for applications<br />
that run more efficiently in Clouds is to recalculate the Linpack result assuming the same efficiency across<br />
systems. Amazon’s Linpack result achieved 50% <strong>of</strong> peak compared with 82% <strong>of</strong> peak for the NERSC and<br />
ALCF systems. If Amazon were able to achieve the same efficiency as the DOE HPC systems, then their<br />
cost would drop to $109K/year per TF which is still higher than the DOE systems.<br />
12.3 Other Cost Factors<br />
In addition to the analysis above, we briefly discuss other factors that can impact costs and should be considered<br />
when performing a cost/benefit analysis for moving to a cloud model.<br />
Utilization. Consolidation has been cited as one <strong>of</strong> the primary motivators for using cloud resources. Utilization<br />
<strong>of</strong> IT hardware assets has been reported at between 5% and 20% [5]. This is not a typical utilization<br />
rate at HPC Centers, as they already consolidate demand across a large user base. For example, DOE HPC<br />
Centers provide resources to hundreds <strong>of</strong> projects and thousands <strong>of</strong> users. Consequently, many DOE HPC<br />
Centers have very high utilization rates. In the above analysis, we have assumed a utilization <strong>of</strong> 85% which<br />
is relatively conservative.<br />
Security. Security is another area where outsourcing to the cloud is <strong>of</strong>ten expected to reap cost savings,<br />
since the responsibility for security can potentially be shifted to the commercial cloud provider. This may be<br />
more accurate for particular cloud models than others. For example, when outsourcing services like e-mail,<br />
the service provider can centrally manage applying updates, protecting against common spam and phishing<br />
attacks, etc. This can likely be done more efficiently and effectively than typical in-house operations<br />
can achieve. However, for infrastructure services the answer is more complicated. In an IaaS model, the<br />
commercial cloud provider is responsible for certain aspects <strong>of</strong> security such as physical security, network<br />
security, and maintaining security domains. However, the end user must still insure that the operations <strong>of</strong><br />
the virtual resources under their control are compliant with relevant security requirements. Many <strong>of</strong> the<br />
security mechanisms are implemented and enforced at the host level, such as maintaining the security <strong>of</strong><br />
the operating system, configuration management, and securing services. As illustrated in the case studies<br />
(Chapter 11), most scientific users are not experts in these areas and thus will require user support services<br />
123