29.12.2014 Views

Magellan Final Report - Office of Science - U.S. Department of Energy

Magellan Final Report - Office of Science - U.S. Department of Energy

Magellan Final Report - Office of Science - U.S. Department of Energy

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>Magellan</strong> <strong>Final</strong> <strong>Report</strong><br />

that will require additional personnel and possibly training. In the end, overall security costs in clouds are<br />

unlikely to go down but will more likely just shift from one area to another.<br />

Power Efficiency. Power efficiency is <strong>of</strong>ten noted as a reason commercial cloud providers are cost effective.<br />

Most <strong>of</strong> the power efficiency comes from consolidation which was discussed elsewhere. This consolidation<br />

results in higher utilization and less wasted power. In addition, warehouse scale data centers can easily justify<br />

the additional design and operations efforts to optimize the facilities and hardware to achieve improved<br />

power efficiency. Consequently, best-in-class commercial cloud providers boast <strong>of</strong> power usage effectiveness<br />

(PUE) <strong>of</strong> below 1.2 [35]. The PUE is the ratio <strong>of</strong> the total power used to operate a system to the amount <strong>of</strong><br />

power used directly by the IT equipment. Higher PUEs are typically due to inefficiencies in cooling, power<br />

supplies, air handlers, etc. So a PUE below 1.2 is an example <strong>of</strong> a very efficient design. In comparison, a<br />

2007 report cited an estimated average PUE <strong>of</strong> 2.0. More recent reports estimate a range between 1.83 and<br />

1.92, but these surveys likely include results from commercial cloud data centers [54]. DOE HPC Centers<br />

have raised concerns about power requirements for large systems for nearly a decade. This was triggered<br />

by an observation <strong>of</strong> the growing costs <strong>of</strong> electricity to operate large HPC systems. Consequently, many<br />

<strong>of</strong> the large DOE HPC data centers are working towards improving their PUE and the overall efficiency <strong>of</strong><br />

the systems. While DOE facilities may not quite match the best commercial warehouse scale data centers,<br />

recent deployments are approaching PUE values between 1.2-1.3. These efficiencies are achieved through<br />

novel cooling design, architectural choices, and power management. For example, the <strong>Magellan</strong> system at<br />

NERSC utilized rear door heat exchangers which used return water used to cool other systems. Another<br />

example is the cooling for the ALCF data centers which has been designed to maximize free cooling capabilities<br />

when weather conditions are favorable. With high efficiency chillers installed to allow for partial free<br />

cooling and centrifugal chiller cooling operations simultaneously, up to 17,820 KW-hr can be saved per day.<br />

Furthermore, new planned facilities, such as the LBNL Computational Research and Theory Building, will<br />

incorporate free-air cooling to further improve the energy efficiency. Looking towards the future, a major<br />

thrust <strong>of</strong> the DOE Exascale vision is to research new ways to deliver more computing capability per watt<br />

for DOE applications. This will likely be achieved through a combination <strong>of</strong> energy efficient system designs<br />

and novel packaging and cooling techniques.<br />

Personnel. A large fraction <strong>of</strong> the staff effort at DOE HPC Centers does not go directly towards performing<br />

hardware support and basic system support but is, instead, focused on developing and maintaining the<br />

environment, applications, and tools to support the scientists. Consequently, many <strong>of</strong> these functions would<br />

still need to be carried out in a cloud model as well. Moving towards a cloud model where individual research<br />

teams managed their environment and cloud instances could actually increase the effort since many <strong>of</strong> those<br />

functions would go from being centralized at DOE centers or institutional clusters to being decentralized<br />

and spread across individual research groups. This could actually reverse the trend in DOE labs towards<br />

consolidating support staff for IT systems.<br />

Storage. While most <strong>of</strong> the cost analysis has focused on the cost <strong>of</strong> computation in the Cloud, storage cost<br />

is also an important consideration. Calculating the TCO for a recent disk storage procurement at NERSC<br />

yields a cost <strong>of</strong> approximately $85 per TB-Year or $0.007 per GB-Month. This is over 14x less expensive<br />

than the current storage cost for Amazon’s EBS <strong>of</strong> $0.10 per GB-Month. It is even 5x less expensive than<br />

Amazon’s least expensive incremental rate for S3 with reduced redundancy ($0.037 per GB-Month). In addition,<br />

Amazon imposes additional charges for IO transactions and out-bound data transfers. For example,<br />

at Amazon’s lowest published rates, transferring 10 TB <strong>of</strong> data out <strong>of</strong> Amazon would cost around $500. For<br />

some data intensive science applications, this could add considerable costs. Other cloud providers have similar<br />

costs. For example, Google’s Cloud Storage <strong>of</strong>ferings are basically the same ($0.105 per GB-Month for<br />

its lowest published rate). These higher costs do not factor in many <strong>of</strong> the performance differences discussed<br />

in Section 9.3 nor does the need for high-performance parallel file systems by many HPC applications. The<br />

cost, performance, and access models for storage are a significant barrier to adopting Cloud based solutions<br />

for scientific applications.<br />

124

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!