29.12.2014 Views

Magellan Final Report - Office of Science - U.S. Department of Energy

Magellan Final Report - Office of Science - U.S. Department of Energy

Magellan Final Report - Office of Science - U.S. Department of Energy

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>Magellan</strong> <strong>Final</strong> <strong>Report</strong><br />

pacity.<br />

Sporadic Demand. One <strong>of</strong> the more common cases for using commercial cloud <strong>of</strong>ferings is when the demand<br />

is highly variable, especially if there are also time sensitive requirements for the service. In the analysis<br />

<strong>of</strong> DOE centers above, the demand is still highly variable. However, scientists can typically tolerate reasonable<br />

delays in the start <strong>of</strong> an application, especially if this results in access to more cycles. For cases where<br />

the demand must be quickly met, the ability to quickly add additional resources can mean the difference<br />

in being able to complete the project or not. One example is a DOE lab science project that is conceived<br />

suddenly that requires more resources than can be obtained quickly from a DOE HPC Center. DOE HPC<br />

Centers typically do not have a viable way to add the necessary resources in a short timescale, and the<br />

existing resources are heavily utilized, so the only way to make room for the new project would be to push<br />

out existing projects. Other examples are ones with real time-critical requirements, e.g., modeling an oil<br />

spill to direct efforts to contain the spill, hurricane tracking, or simulating a critical piece <strong>of</strong> equipment when<br />

it fails and causes an expensive resource to go down. Additionally, some experiments such as those at light<br />

sources and accelerators require or benefit from analysis resources in real-time. If those experiments only<br />

run a fraction <strong>of</strong> the time, they may benefit from on-demand cloud-type models. These kinds <strong>of</strong> opportunity<br />

costs can be difficult to quantify but should be considered when considering whether to move to a cloud model.<br />

Facility Constrained. Some sites are severely infrastructure limited. This could be due to building restrictions,<br />

insufficient power at the site, or other limitations. In these cases, commercial <strong>of</strong>ferings may be the only<br />

reasonable option available. However, if expansion is an option, the long-term cost should be considered.<br />

While infrastructure expansion can be costly, those costs can be amortized over a long period, typically 15<br />

years.<br />

The potential cost savings to the customer for these cases come from a few common sources. One is<br />

the potential ability to avoid purchasing and deploying computing resources when the demand is unclear.<br />

The other is operating resources that are only needed for infrequent periods <strong>of</strong> time which results in very<br />

low utilization. Once a project can maintain a reasonably high utilization <strong>of</strong> a resource, the cost savings<br />

typically vanish.<br />

12.6 Late Update<br />

As this report was being finalized, Amazon announced several important updates. Due to the timing <strong>of</strong><br />

these announcements, we have left most <strong>of</strong> the analysis unchanged, but we considered it was important to<br />

discuss the impact <strong>of</strong> these changes on the analysis. There were three significant developments: an updated<br />

Top500 entry from Amazon, the release <strong>of</strong> a new instance type which was used for the Top500 entry, and<br />

new pricing. We will discuss each <strong>of</strong> these and their impact on the analysis.<br />

Amazon’s Top500 entry for November 2011 achieved 240 TF and number 42 on the list. More interesting<br />

than the absolute numbers or position is the improvement in efficiency to 68% <strong>of</strong> peak. On previous Top500<br />

entries, Amazon had achieved approximately 50% <strong>of</strong> peak. This is likely due to better tuning <strong>of</strong> the Linpack<br />

execution. Traditional HPC systems typically achieve between 80%-90% <strong>of</strong> peak. Eventually virtualized<br />

systems may approach these efficiencies through improved integration with the interconnect and continued<br />

improvements in the virtualization stacks.<br />

In parallel with the new Top500 result, Amazon announced a new instance type, cc2.8xlarge. This<br />

instance type is notable for several reasons. It is the first significant deployment <strong>of</strong> Intel Sandy Bridge EP.<br />

In addition to increasing the number <strong>of</strong> cores per socket to eight cores from four cores compared with the<br />

Nehalem processor used in the previous cluster compute instance type, the Sandy Bridge processor also<br />

effectively doubled the number <strong>of</strong> floating point operations per cycle. However, the processors used in the<br />

new instance type run at slightly lower clock rate (2.6 GHz versus 2.95 GHz). As a result <strong>of</strong> these differences,<br />

the new instance has a theoretical peak FLOP rate that is approximately 3.5x larger than the previous cluster<br />

126

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!