29.12.2014 Views

Magellan Final Report - Office of Science - U.S. Department of Energy

Magellan Final Report - Office of Science - U.S. Department of Energy

Magellan Final Report - Office of Science - U.S. Department of Energy

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Chapter 4<br />

Application Characteristics<br />

A challenge with exploiting cloud computing for science is that the scientific community has needs that<br />

can be significantly different from typical enterprise customers. Applications <strong>of</strong>ten run in a tightly coupled<br />

manner at scales greater than most enterprise applications require. This <strong>of</strong>ten leads to bandwidth and<br />

latency requirements that are more demanding than most cloud customers. Scientific applications also<br />

typically require access to large amounts <strong>of</strong> data including large volumes <strong>of</strong> legacy data. This can lead to<br />

large startup cost and data storage cost. The goal <strong>of</strong> <strong>Magellan</strong> has been to analyze these requirements<br />

using an advanced testbed, a flexible s<strong>of</strong>tware stack, and performing a range <strong>of</strong> data gathering efforts. In<br />

this chapter we summarize some <strong>of</strong> the key application characteristics that are necessary to understand the<br />

feasibility <strong>of</strong> clouds for these scientific applications. In Section 4.1 we summarize the computational models,<br />

and in Section 4.2 we discuss some diverse usage scenarios. We summarize the results <strong>of</strong> the user survey in<br />

Section 4.3 and discuss some scientific use cases in Section 4.4.<br />

4.1 Computational Models<br />

Scientific workloads can be classified into three broad categories based on their resource requirements: largescale<br />

tightly coupled computations, mid-range computing, and high throughput computing. In this section,<br />

we provide a high-level classification <strong>of</strong> workloads in the scientific space, based on their resource requirements<br />

and delve into the details <strong>of</strong> why cloud computing is attractive to these application spaces.<br />

Large-Scale Tightly Coupled. These are complex scientific codes generally running at large-scale supercomputing<br />

centers across the nation. Typically, these are MPI codes using a large number <strong>of</strong> processors<br />

(<strong>of</strong>ten in the order <strong>of</strong> thousands). A single job can use thousands to millions <strong>of</strong> core hours depending on<br />

the scale. These jobs are usually run at supercomputing centers through batch queue systems. Users wait<br />

in a managed queue to access the resources requested, and their jobs are run when the required resources<br />

are available and no other jobs are ahead <strong>of</strong> them in the priority list. Most supercomputing centers provide<br />

archival storage and parallel file system access for the storage and I/O needs <strong>of</strong> these applications. Applications<br />

in this class are expected to take a performance hit when working in virtualized cloud environments [46].<br />

Mid-Range Tightly Coupled. These applications run at a smaller scale than the large-scale jobs. There<br />

are a number <strong>of</strong> codes that need tens to hundreds <strong>of</strong> processors. Some <strong>of</strong> these applications run at supercomputing<br />

centers and backfill the queues. More commonly, users rely on small compute clusters that are<br />

managed by the scientific groups themselves to satisfy these needs. These mid-range applications are good<br />

candidates for cloud computing even though they might incur some performance hit.<br />

High Throughput. Some scientific explorations are performed on the desktop or local clusters and have<br />

asynchronous, independent computations. Even in the case <strong>of</strong> large-scale science problems, a number <strong>of</strong> the<br />

15

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!