29.12.2014 Views

Magellan Final Report - Office of Science - U.S. Department of Energy

Magellan Final Report - Office of Science - U.S. Department of Energy

Magellan Final Report - Office of Science - U.S. Department of Energy

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>Magellan</strong> <strong>Final</strong> <strong>Report</strong><br />

Figure 9.25 shows some preliminary analysis <strong>of</strong> the data collected on the NERSC <strong>Magellan</strong> system over<br />

a period <strong>of</strong> seven months, for over 220K runs <strong>of</strong> several classes <strong>of</strong> HPC applications. The <strong>Magellan</strong> batch<br />

queue was available to NERSC users; however, it is not clear how the workload on <strong>Magellan</strong> compares with<br />

the general workload on NERSC systems. Figure 9.25a shows the scatter plot <strong>of</strong> number <strong>of</strong> nodes used by<br />

a job vs duration <strong>of</strong> the job. Most <strong>of</strong> the jobs in the period that we pr<strong>of</strong>iled used fewer than 256 cores. The<br />

total wall clock time used by the job on all processors did tend to vary from a few hundred to a few thousand<br />

hours. Figure 9.25b shows a similar trend, but the X-axis shows the number <strong>of</strong> processes instead <strong>of</strong> tasks.<br />

Figure 9.25c shows the scatter plot <strong>of</strong> the jobs percentage <strong>of</strong> I/O vs percentage <strong>of</strong> communication. The<br />

jobs closer to the origin <strong>of</strong> the graph (i.e., less communication and less I/O) are likely to do well in cloud<br />

environments. The graph also seems to indicate that there are clear clusters <strong>of</strong> I/O vs communication.<br />

Figure 9.25d shows the percentage <strong>of</strong> CPUs vs percentage <strong>of</strong> I/O. A number <strong>of</strong> these applications tend to be<br />

CPU intensive, thus clustering in the > 50% part <strong>of</strong> the X-axis. Applications that are closer to the X-axis<br />

are likely to do better in virtual environments. Additional analysis will need to classify jobs by user and<br />

application to understand these patterns better and also understand the volume <strong>of</strong> I/O and communication.<br />

The IPM data is a rich source <strong>of</strong> workload data that helps us understand the types <strong>of</strong> applications that<br />

run well on systems such as <strong>Magellan</strong>.<br />

9.7 Discussion<br />

Virtualized cloud computing environments promise to be useful for scientific applications that need customized<br />

s<strong>of</strong>tware environments. Virtualization is known to have a significant performance impact for scientific<br />

applications. We analyzed the performance <strong>of</strong> a number <strong>of</strong> different interconnect technologies and I/O<br />

performance to understand the performance trade-<strong>of</strong>fs in this space and gain an understanding <strong>of</strong> what a<br />

private cloud configured for scientific computing should look like.<br />

9.7.1 Interconnect<br />

Our benchmarking approach enabled us to understand the impact <strong>of</strong> the layers in the hierarchy <strong>of</strong> protocols.<br />

Although performance is highly dependent on the type <strong>of</strong> workload, we see that the performance decrease<br />

from virtualization is largely due to overheads in the communication, i.e., the reliance <strong>of</strong> the virtual machines<br />

on TCP over Ethernet as the communication mechanism.<br />

Our results show that while the difference between the interconnects is small at lower core counts, the<br />

impact is significant even at the mid-range size <strong>of</strong> problems <strong>of</strong> 256 cores to 1024 cores. While the bandwidth<br />

is slightly impacted at higher concurrency, the latency takes a much higher hit. Scientific applications tend to<br />

run at higher concurrencies to achieve the best time to solution. Thus to serve the needs <strong>of</strong> these applications,<br />

we need good interconnects.<br />

In addition to the network fabric and the protocol, the virtualization s<strong>of</strong>tware stack imposes an additional<br />

overhead to the performance <strong>of</strong> the applications. This overhead also is impacted by the communication<br />

pattern <strong>of</strong> the application and the concurrency <strong>of</strong> the run.<br />

The higher bound on the performance on virtual machines today is what is achievable with TCP over<br />

Ethernet clusters. Our experiments show that the availability <strong>of</strong> InfiniBand interconnects on virtual machines<br />

would boost the performance <strong>of</strong> scientific applications by reducing the communication overheads.<br />

Scientific applications would be able to leverage the benefits <strong>of</strong> virtualization and get a significant increase<br />

in performance if future environments had InfiniBand available for the communication network.<br />

Our performance evaluation illustrates the importance <strong>of</strong> a high-bandwidth, low-latency network. However,<br />

combining virtual machines and an InfiniBand (IB) network presents several challenges. Possible<br />

methods for extending the InfiniBand network inside a virtual machine include routing IP over InfiniBand<br />

from the Hypervisor to the virtual machine; bridging Ethernet over InfiniBand through the Linux bridge to<br />

the virtual machine; and full virtualization <strong>of</strong> the NIC. We will briefly discuss these options, including the<br />

benefits and trade-<strong>of</strong>fs, and the current issues.<br />

80

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!