29.12.2014 Views

Magellan Final Report - Office of Science - U.S. Department of Energy

Magellan Final Report - Office of Science - U.S. Department of Energy

Magellan Final Report - Office of Science - U.S. Department of Energy

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>Magellan</strong> <strong>Final</strong> <strong>Report</strong><br />

Routing IP packets from the virtual machine (VM) to the hypervisor (HV) and then using IP over<br />

InfiniBand is the most readily available approach today. In this approach, the VM communicates through<br />

the HV using a virtual network interface. Then standard IP routing is used to route the packets over the<br />

InfiniBand interface. This approach provides the lowest performance <strong>of</strong> the various options. Since the packet<br />

must traverse multiple TCP stacks (in the VM and through the HV), the overhead is high. Also there are<br />

numerous integration challenges in integrating this with cloud management s<strong>of</strong>tware like Eucalyptus and<br />

OpenStack.<br />

Utilizing Ethernet over InfiniBand (EoIB) is another approach. In EoIB, first proposed by Mellanox, the<br />

Ethernet frames are encapsulated in an InfiniBand packet. The InfiniBand stack presents a virtual Ethernet<br />

NIC. This virtual NIC would be presented on the HV and then bridged to the VM. This approach has several<br />

advantages over the other approaches. VLANs should be supported in this model, which would provide some<br />

support for isolating networks. Bridging <strong>of</strong> the virtual Ethernet interface is similar to how Ethernet NICS<br />

are commonly bridged for a virtual machine. This means that packages like Eucalyptus could more easily<br />

support it. Since the communication would be TCP-based, some performance would be lost compared with<br />

remote memory access <strong>of</strong>fered by native InfiniBand. However, the packets would only go through a single<br />

TCP stack (on the VM) on each side. So the performance should exceed the IP over InfiniBand method.<br />

While this approach looks promising, the current EoIB implementation does not support Linux bridging.<br />

Another disadvantage is that only Mellanox currently plans to support EoIB, and it requires a special bridge<br />

device connected to the InfiniBand network. However, the devices are relatively low cost.<br />

Full virtualization <strong>of</strong> the InfiniBand adapter is just starting to emerge. This method would provide the<br />

best performance, since it would support full RDMA. However, it would be difficult to isolate networks<br />

and protect the network from actions by the VM. Furthermore, images would have to include and configure<br />

the InfiniBand stack. This would dramatically complicate the process <strong>of</strong> creating and maintaining<br />

images. <strong>Final</strong>ly, only the latest adapters from Mellanox are expected to support full virtualization. The<br />

version <strong>of</strong> adapters in <strong>Magellan</strong> will likely not support this method. We are exploring options for evaluating<br />

virtualization <strong>of</strong> the NIC and looking at alternative methods.<br />

High-performance networks like InfiniBand can dramatically impact the performance, but integrating<br />

them into virtual machine-based systems requires further research and development. Alternatively, it may<br />

be more attractive to look at how some <strong>of</strong> the benefits <strong>of</strong> virtualization can be provided to the users without<br />

using true virtualization. For example, our early experiments with MOAB/xCAT show that dynamic imaging<br />

<strong>of</strong> nodes or s<strong>of</strong>tware overlay methods allow the user to tailor the environment for their application without<br />

incurring the cost <strong>of</strong> virtualization.<br />

9.7.2 I/O on Virtual Machines<br />

The I/O performance results clearly highlight that I/O can be one <strong>of</strong> the causes <strong>of</strong> performance bottlenecks<br />

on virtualized cloud environments. Performance in VMs is lower than on physical machines, which may be<br />

attributed to an additional level <strong>of</strong> abstraction between the VM and the hardware. Also, it is evident from<br />

the results that local disks perform better than block store volumes. Block store volumes can be mounted<br />

on multiple instances and used as a shared file system to run MPI programs, but the performance degrades<br />

significantly. Thus these may not be suitable for all the MPI applications that compose the HPC workload.<br />

Our large-scale tests also capture the variability associated with virtual machine performance on Amazon.<br />

HPC systems typically provide a high-performance scratch file system that is used for reading initial<br />

conditions, saving simulation output, or writing checkpoint files. This capability is typically provided by<br />

parallel file systems such as GPFS, Lustre, or Panasas. Providing this capability for virtual machine-based<br />

systems presents several challenges including security, scalability, and stability.<br />

The security challenges focus around the level <strong>of</strong> control a user would have within the virtual machine.<br />

Most parallel file systems use a host-based trust model, i.e., a server trusts that a client will enforce access<br />

rules and permissions. Since the user would likely have root privileges on the VM, they would be capable <strong>of</strong><br />

viewing and modifying other users’ data on the file system.<br />

Accessing shared file systems from user controlled virtual machines can also create scalability and stability<br />

81

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!