Magellan Final Report - Office of Science - U.S. Department of Energy
Magellan Final Report - Office of Science - U.S. Department of Energy
Magellan Final Report - Office of Science - U.S. Department of Energy
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
<strong>Magellan</strong> <strong>Final</strong> <strong>Report</strong><br />
pacity.<br />
Sporadic Demand. One <strong>of</strong> the more common cases for using commercial cloud <strong>of</strong>ferings is when the demand<br />
is highly variable, especially if there are also time sensitive requirements for the service. In the analysis<br />
<strong>of</strong> DOE centers above, the demand is still highly variable. However, scientists can typically tolerate reasonable<br />
delays in the start <strong>of</strong> an application, especially if this results in access to more cycles. For cases where<br />
the demand must be quickly met, the ability to quickly add additional resources can mean the difference<br />
in being able to complete the project or not. One example is a DOE lab science project that is conceived<br />
suddenly that requires more resources than can be obtained quickly from a DOE HPC Center. DOE HPC<br />
Centers typically do not have a viable way to add the necessary resources in a short timescale, and the<br />
existing resources are heavily utilized, so the only way to make room for the new project would be to push<br />
out existing projects. Other examples are ones with real time-critical requirements, e.g., modeling an oil<br />
spill to direct efforts to contain the spill, hurricane tracking, or simulating a critical piece <strong>of</strong> equipment when<br />
it fails and causes an expensive resource to go down. Additionally, some experiments such as those at light<br />
sources and accelerators require or benefit from analysis resources in real-time. If those experiments only<br />
run a fraction <strong>of</strong> the time, they may benefit from on-demand cloud-type models. These kinds <strong>of</strong> opportunity<br />
costs can be difficult to quantify but should be considered when considering whether to move to a cloud model.<br />
Facility Constrained. Some sites are severely infrastructure limited. This could be due to building restrictions,<br />
insufficient power at the site, or other limitations. In these cases, commercial <strong>of</strong>ferings may be the only<br />
reasonable option available. However, if expansion is an option, the long-term cost should be considered.<br />
While infrastructure expansion can be costly, those costs can be amortized over a long period, typically 15<br />
years.<br />
The potential cost savings to the customer for these cases come from a few common sources. One is<br />
the potential ability to avoid purchasing and deploying computing resources when the demand is unclear.<br />
The other is operating resources that are only needed for infrequent periods <strong>of</strong> time which results in very<br />
low utilization. Once a project can maintain a reasonably high utilization <strong>of</strong> a resource, the cost savings<br />
typically vanish.<br />
12.6 Late Update<br />
As this report was being finalized, Amazon announced several important updates. Due to the timing <strong>of</strong><br />
these announcements, we have left most <strong>of</strong> the analysis unchanged, but we considered it was important to<br />
discuss the impact <strong>of</strong> these changes on the analysis. There were three significant developments: an updated<br />
Top500 entry from Amazon, the release <strong>of</strong> a new instance type which was used for the Top500 entry, and<br />
new pricing. We will discuss each <strong>of</strong> these and their impact on the analysis.<br />
Amazon’s Top500 entry for November 2011 achieved 240 TF and number 42 on the list. More interesting<br />
than the absolute numbers or position is the improvement in efficiency to 68% <strong>of</strong> peak. On previous Top500<br />
entries, Amazon had achieved approximately 50% <strong>of</strong> peak. This is likely due to better tuning <strong>of</strong> the Linpack<br />
execution. Traditional HPC systems typically achieve between 80%-90% <strong>of</strong> peak. Eventually virtualized<br />
systems may approach these efficiencies through improved integration with the interconnect and continued<br />
improvements in the virtualization stacks.<br />
In parallel with the new Top500 result, Amazon announced a new instance type, cc2.8xlarge. This<br />
instance type is notable for several reasons. It is the first significant deployment <strong>of</strong> Intel Sandy Bridge EP.<br />
In addition to increasing the number <strong>of</strong> cores per socket to eight cores from four cores compared with the<br />
Nehalem processor used in the previous cluster compute instance type, the Sandy Bridge processor also<br />
effectively doubled the number <strong>of</strong> floating point operations per cycle. However, the processors used in the<br />
new instance type run at slightly lower clock rate (2.6 GHz versus 2.95 GHz). As a result <strong>of</strong> these differences,<br />
the new instance has a theoretical peak FLOP rate that is approximately 3.5x larger than the previous cluster<br />
126