Magellan Final Report - Office of Science - U.S. Department of Energy
Magellan Final Report - Office of Science - U.S. Department of Energy
Magellan Final Report - Office of Science - U.S. Department of Energy
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
<strong>Magellan</strong> <strong>Final</strong> <strong>Report</strong><br />
Figure 9.25 shows some preliminary analysis <strong>of</strong> the data collected on the NERSC <strong>Magellan</strong> system over<br />
a period <strong>of</strong> seven months, for over 220K runs <strong>of</strong> several classes <strong>of</strong> HPC applications. The <strong>Magellan</strong> batch<br />
queue was available to NERSC users; however, it is not clear how the workload on <strong>Magellan</strong> compares with<br />
the general workload on NERSC systems. Figure 9.25a shows the scatter plot <strong>of</strong> number <strong>of</strong> nodes used by<br />
a job vs duration <strong>of</strong> the job. Most <strong>of</strong> the jobs in the period that we pr<strong>of</strong>iled used fewer than 256 cores. The<br />
total wall clock time used by the job on all processors did tend to vary from a few hundred to a few thousand<br />
hours. Figure 9.25b shows a similar trend, but the X-axis shows the number <strong>of</strong> processes instead <strong>of</strong> tasks.<br />
Figure 9.25c shows the scatter plot <strong>of</strong> the jobs percentage <strong>of</strong> I/O vs percentage <strong>of</strong> communication. The<br />
jobs closer to the origin <strong>of</strong> the graph (i.e., less communication and less I/O) are likely to do well in cloud<br />
environments. The graph also seems to indicate that there are clear clusters <strong>of</strong> I/O vs communication.<br />
Figure 9.25d shows the percentage <strong>of</strong> CPUs vs percentage <strong>of</strong> I/O. A number <strong>of</strong> these applications tend to be<br />
CPU intensive, thus clustering in the > 50% part <strong>of</strong> the X-axis. Applications that are closer to the X-axis<br />
are likely to do better in virtual environments. Additional analysis will need to classify jobs by user and<br />
application to understand these patterns better and also understand the volume <strong>of</strong> I/O and communication.<br />
The IPM data is a rich source <strong>of</strong> workload data that helps us understand the types <strong>of</strong> applications that<br />
run well on systems such as <strong>Magellan</strong>.<br />
9.7 Discussion<br />
Virtualized cloud computing environments promise to be useful for scientific applications that need customized<br />
s<strong>of</strong>tware environments. Virtualization is known to have a significant performance impact for scientific<br />
applications. We analyzed the performance <strong>of</strong> a number <strong>of</strong> different interconnect technologies and I/O<br />
performance to understand the performance trade-<strong>of</strong>fs in this space and gain an understanding <strong>of</strong> what a<br />
private cloud configured for scientific computing should look like.<br />
9.7.1 Interconnect<br />
Our benchmarking approach enabled us to understand the impact <strong>of</strong> the layers in the hierarchy <strong>of</strong> protocols.<br />
Although performance is highly dependent on the type <strong>of</strong> workload, we see that the performance decrease<br />
from virtualization is largely due to overheads in the communication, i.e., the reliance <strong>of</strong> the virtual machines<br />
on TCP over Ethernet as the communication mechanism.<br />
Our results show that while the difference between the interconnects is small at lower core counts, the<br />
impact is significant even at the mid-range size <strong>of</strong> problems <strong>of</strong> 256 cores to 1024 cores. While the bandwidth<br />
is slightly impacted at higher concurrency, the latency takes a much higher hit. Scientific applications tend to<br />
run at higher concurrencies to achieve the best time to solution. Thus to serve the needs <strong>of</strong> these applications,<br />
we need good interconnects.<br />
In addition to the network fabric and the protocol, the virtualization s<strong>of</strong>tware stack imposes an additional<br />
overhead to the performance <strong>of</strong> the applications. This overhead also is impacted by the communication<br />
pattern <strong>of</strong> the application and the concurrency <strong>of</strong> the run.<br />
The higher bound on the performance on virtual machines today is what is achievable with TCP over<br />
Ethernet clusters. Our experiments show that the availability <strong>of</strong> InfiniBand interconnects on virtual machines<br />
would boost the performance <strong>of</strong> scientific applications by reducing the communication overheads.<br />
Scientific applications would be able to leverage the benefits <strong>of</strong> virtualization and get a significant increase<br />
in performance if future environments had InfiniBand available for the communication network.<br />
Our performance evaluation illustrates the importance <strong>of</strong> a high-bandwidth, low-latency network. However,<br />
combining virtual machines and an InfiniBand (IB) network presents several challenges. Possible<br />
methods for extending the InfiniBand network inside a virtual machine include routing IP over InfiniBand<br />
from the Hypervisor to the virtual machine; bridging Ethernet over InfiniBand through the Linux bridge to<br />
the virtual machine; and full virtualization <strong>of</strong> the NIC. We will briefly discuss these options, including the<br />
benefits and trade-<strong>of</strong>fs, and the current issues.<br />
80