29.12.2014 Views

Magellan Final Report - Office of Science - U.S. Department of Energy

Magellan Final Report - Office of Science - U.S. Department of Energy

Magellan Final Report - Office of Science - U.S. Department of Energy

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>Magellan</strong> <strong>Final</strong> <strong>Report</strong><br />

compute instance type. Linpack is able to take full advantage <strong>of</strong> these increases. However, Linpack is not<br />

representative <strong>of</strong> all scientific applications. Thus, as shown in our benchmarking (Chapter 9), scientific<br />

applications do not typically realize the same benefits. The increase in FLOPs per cycle <strong>of</strong>ten occurs every<br />

other generation <strong>of</strong> processor from Intel, since it is typically a byproduct <strong>of</strong> a change in core architecture.<br />

This is the “Tock” <strong>of</strong> the “Tick-Tock” development cycle in Intel parlance. AMD typically follows a similar<br />

roadmap for their processor architecture. Consequently, these types <strong>of</strong> improvements typically occur every<br />

3-4 years and will be seen in DOE centers in the next few years.<br />

Amazon announced new pricing in November 2011. This lowered the price for some instance types and<br />

introduced new tiers <strong>of</strong> reserved instances. Some instances did not change in cost, e.g., the m1.small remains<br />

at $0.085 per hour. However, some <strong>of</strong> the higher-end instances did drop. For example, the on-demand<br />

price <strong>of</strong> a cc1.4xlarge instance (the type used in much <strong>of</strong> the analysis above) dropped by 19%. Amazon<br />

also expanded the reserved instances to three different tiers. The new tiers allow customers to match the<br />

reserved instance to the level <strong>of</strong> utilization. These different tiers essentially trade higher upfront costs for<br />

lower per-usage costs. The tier with the highest upfront costs, “Heavy Utilization Reserved Instances”,<br />

provides the lowest effective rate if the instance is fully utilized over the reservation period.<br />

Applying the earlier analysis to the new instance type yields some interesting results. The per-core hour<br />

cost <strong>of</strong> a 1-year reserved instance <strong>of</strong> the new type results in $0.058 cents per core hour versus $0.13 per core<br />

hour before. This is partly due to a heavily discounted reserved instance cost for this instance type. For<br />

example, fully utilizing a 1-year reserved instance for most types provides a discount around 40% over the<br />

on-demand option. However, for the new instance type this discount is 63%. More impressive, the cost <strong>of</strong> a<br />

Teraflop Year (Section 12.2.4) drops to approximately $36k per TF-Year. This is a 5x improvement over the<br />

previous calculation. Roughly half <strong>of</strong> this improvement comes from the switch to Intel Sandy Bridge and the<br />

resulting doubling in FLOPs per cycle. Since Amazon essentially scales its pricing on the number <strong>of</strong> available<br />

cores in the instance, not floating point performance, there is no premium charge for the doubling <strong>of</strong> the<br />

FLOPs per cycle for the new instance type. Another fraction <strong>of</strong> the increase comes from the improvement<br />

in efficiency <strong>of</strong> the Linpack execution (68% versus 50%). The remainder comes from the drop in pricing and<br />

especially the heavily discounted reserved instance pricing.<br />

The DOE systems used in the previous analysis were deployed over a year ago. ALCF’s Intrepid was<br />

deployed almost 4 years ago. So, much <strong>of</strong> the improvement in the Amazon’s values come from the deployment<br />

<strong>of</strong> a very new architecture. Ultimately, the DOE Labs and the commercial cloud vendors are relying on the<br />

same technology trends to deliver improved performance over time. As DOE Centers deploy new technologies<br />

like Intel Sandy Bridge and AMD Interlagos, their costs will drop in a similar manner. DOE deployments<br />

are typically timed about three years apart at each center, with these deployments staggered across centers.<br />

In addition, centers will <strong>of</strong>ten perform mid-life upgrades to systems to remain closer to the technology edge.<br />

This combination <strong>of</strong> strategies enables DOE to deliver a portfolio <strong>of</strong> systems that closely tracks the technology<br />

improvements. For example, the Mira system is projected to achieve a cost less than $8k per TF-Year when<br />

it is deployed next year. This is approximately 4x better than Amazon’s improved result, and represents an<br />

improvement <strong>of</strong> around 10x over the previous ALCF system, Intrepid, which was deployed around 4 years<br />

ago. Eventually, DOE HPC centers and commercial cloud providers like Amazon are likely to track each<br />

other, aiming for cost-effectiveness. However, the pricing changes in the commercial cloud will not address<br />

the various other challenges in moving scientific computing and HPC workloads to cloud <strong>of</strong>ferings, such as<br />

workflow and data management challenges, high-performance parallel file systems, and access to legacy data<br />

sets (see Sections 6.4 and 11.4).<br />

The recent announcement by Amazon highlights several lessons. One, it shows that Amazon is responding<br />

to feedback from the HPC community on improving their <strong>of</strong>ferings in this space. This process started with<br />

the introduction <strong>of</strong> the Cluster Compute and GPU <strong>of</strong>ferings over a year ago and continues with the recent<br />

announcement <strong>of</strong> the new instance type and associated pricing. Secondly, it demonstrates the importance <strong>of</strong><br />

tracking the changes in the cloud space and routinely updating cost analysis to ensure that the appropriate<br />

choices are being made.<br />

127

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!