Magellan Final Report - Office of Science - U.S. Department of Energy
Magellan Final Report - Office of Science - U.S. Department of Energy
Magellan Final Report - Office of Science - U.S. Department of Energy
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
<strong>Magellan</strong> <strong>Final</strong> <strong>Report</strong><br />
compute instance type. Linpack is able to take full advantage <strong>of</strong> these increases. However, Linpack is not<br />
representative <strong>of</strong> all scientific applications. Thus, as shown in our benchmarking (Chapter 9), scientific<br />
applications do not typically realize the same benefits. The increase in FLOPs per cycle <strong>of</strong>ten occurs every<br />
other generation <strong>of</strong> processor from Intel, since it is typically a byproduct <strong>of</strong> a change in core architecture.<br />
This is the “Tock” <strong>of</strong> the “Tick-Tock” development cycle in Intel parlance. AMD typically follows a similar<br />
roadmap for their processor architecture. Consequently, these types <strong>of</strong> improvements typically occur every<br />
3-4 years and will be seen in DOE centers in the next few years.<br />
Amazon announced new pricing in November 2011. This lowered the price for some instance types and<br />
introduced new tiers <strong>of</strong> reserved instances. Some instances did not change in cost, e.g., the m1.small remains<br />
at $0.085 per hour. However, some <strong>of</strong> the higher-end instances did drop. For example, the on-demand<br />
price <strong>of</strong> a cc1.4xlarge instance (the type used in much <strong>of</strong> the analysis above) dropped by 19%. Amazon<br />
also expanded the reserved instances to three different tiers. The new tiers allow customers to match the<br />
reserved instance to the level <strong>of</strong> utilization. These different tiers essentially trade higher upfront costs for<br />
lower per-usage costs. The tier with the highest upfront costs, “Heavy Utilization Reserved Instances”,<br />
provides the lowest effective rate if the instance is fully utilized over the reservation period.<br />
Applying the earlier analysis to the new instance type yields some interesting results. The per-core hour<br />
cost <strong>of</strong> a 1-year reserved instance <strong>of</strong> the new type results in $0.058 cents per core hour versus $0.13 per core<br />
hour before. This is partly due to a heavily discounted reserved instance cost for this instance type. For<br />
example, fully utilizing a 1-year reserved instance for most types provides a discount around 40% over the<br />
on-demand option. However, for the new instance type this discount is 63%. More impressive, the cost <strong>of</strong> a<br />
Teraflop Year (Section 12.2.4) drops to approximately $36k per TF-Year. This is a 5x improvement over the<br />
previous calculation. Roughly half <strong>of</strong> this improvement comes from the switch to Intel Sandy Bridge and the<br />
resulting doubling in FLOPs per cycle. Since Amazon essentially scales its pricing on the number <strong>of</strong> available<br />
cores in the instance, not floating point performance, there is no premium charge for the doubling <strong>of</strong> the<br />
FLOPs per cycle for the new instance type. Another fraction <strong>of</strong> the increase comes from the improvement<br />
in efficiency <strong>of</strong> the Linpack execution (68% versus 50%). The remainder comes from the drop in pricing and<br />
especially the heavily discounted reserved instance pricing.<br />
The DOE systems used in the previous analysis were deployed over a year ago. ALCF’s Intrepid was<br />
deployed almost 4 years ago. So, much <strong>of</strong> the improvement in the Amazon’s values come from the deployment<br />
<strong>of</strong> a very new architecture. Ultimately, the DOE Labs and the commercial cloud vendors are relying on the<br />
same technology trends to deliver improved performance over time. As DOE Centers deploy new technologies<br />
like Intel Sandy Bridge and AMD Interlagos, their costs will drop in a similar manner. DOE deployments<br />
are typically timed about three years apart at each center, with these deployments staggered across centers.<br />
In addition, centers will <strong>of</strong>ten perform mid-life upgrades to systems to remain closer to the technology edge.<br />
This combination <strong>of</strong> strategies enables DOE to deliver a portfolio <strong>of</strong> systems that closely tracks the technology<br />
improvements. For example, the Mira system is projected to achieve a cost less than $8k per TF-Year when<br />
it is deployed next year. This is approximately 4x better than Amazon’s improved result, and represents an<br />
improvement <strong>of</strong> around 10x over the previous ALCF system, Intrepid, which was deployed around 4 years<br />
ago. Eventually, DOE HPC centers and commercial cloud providers like Amazon are likely to track each<br />
other, aiming for cost-effectiveness. However, the pricing changes in the commercial cloud will not address<br />
the various other challenges in moving scientific computing and HPC workloads to cloud <strong>of</strong>ferings, such as<br />
workflow and data management challenges, high-performance parallel file systems, and access to legacy data<br />
sets (see Sections 6.4 and 11.4).<br />
The recent announcement by Amazon highlights several lessons. One, it shows that Amazon is responding<br />
to feedback from the HPC community on improving their <strong>of</strong>ferings in this space. This process started with<br />
the introduction <strong>of</strong> the Cluster Compute and GPU <strong>of</strong>ferings over a year ago and continues with the recent<br />
announcement <strong>of</strong> the new instance type and associated pricing. Secondly, it demonstrates the importance <strong>of</strong><br />
tracking the changes in the cloud space and routinely updating cost analysis to ensure that the appropriate<br />
choices are being made.<br />
127