29.12.2014 Views

Magellan Final Report - Office of Science - U.S. Department of Energy

Magellan Final Report - Office of Science - U.S. Department of Energy

Magellan Final Report - Office of Science - U.S. Department of Energy

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>Magellan</strong> <strong>Final</strong> <strong>Report</strong><br />

9.1.2 Machines<br />

All results were obtained during normal, multi-user, production periods on all machines unless specified<br />

otherwise.<br />

<strong>Magellan</strong>. All the experiments described in this section were performed on the <strong>Magellan</strong> system at NERSC.<br />

Chapter 5 described the testbed in detail; here we summarize the key characteristics relevant to understanding<br />

the benchmarking results. <strong>Magellan</strong> is a 720-node IBM iDataPlex cluster. Each node has two quad-core<br />

Intel Nehalem processors running at 2.67 Ghz, 24 GB <strong>of</strong> RAM, and two network connections: a single Quad<br />

Data Rate (QDR) InfiniBand network connection and a GiB Ethernet connector. The InfiniBand network<br />

is locally a fat tree with a global 2-D mesh. Our 10G network is based on the Juniper Qfabric Switching<br />

System with four Q/F nodes, two Q/F Interconnects, and Junos 11.3.<br />

Some early experiments were performed on Carver, a machine identical in hardware to the <strong>Magellan</strong><br />

cloud testbed at NERSC. Carver is a 400-node IBM iDataPlex cluster also located at NERSC, and the<br />

nodes have an identical configuration to <strong>Magellan</strong>. All <strong>of</strong> the codes compiled on Carver used version 10.0 <strong>of</strong><br />

the Portland Group suite and version 1.4.1 <strong>of</strong> OpenMPI.<br />

The virtual machine environment on <strong>Magellan</strong> at NERSC is based on Eucalyptus 1.6 (studies in Section<br />

9.1.4) and 2.0 (studies in Section 9.1.5), an open source s<strong>of</strong>tware platform that allows organizations<br />

to leverage existing Linux-based infrastructure to create private clouds. Our Eucalyptus installation uses<br />

Kernel-based Virtual Machines(KVM) as a hypervisor. We modified Eucalyptus to use Virtio for disk access.<br />

We use the KVM option for the emulated e1000 nic for the network. All codes were compiled with the Intel<br />

compiler version 11.1 and used version 1.4.2 <strong>of</strong> OpenMPI.<br />

All tests were performed on an instance type configured with 8 CPUs/20 G memory/20 G disk. The guest<br />

OS is CentOS release 5.5 (Linux kernel 2.6.28-11). We set up a virtual cluster where the head node mounts<br />

a block store volume that has all the application binaries and data. The block store volume is mounted on<br />

the other virtual machines via NFS. All the virtual machine communication traffic goes over the Ethernet<br />

network.<br />

Amazon EC2. Amazon EC2 is a virtual computing environment that provides a web services API for<br />

launching and managing virtual machine instances. Amazon provides a number <strong>of</strong> different instance types<br />

that have varying performance characteristics. CPU capacity is defined in terms <strong>of</strong> an abstract Amazon EC2<br />

Compute Unit. One EC2 Compute Unit is approximately equivalent to a 1.0–1.2 GHz 2007 Opteron or 2007<br />

Xeon processor. For our tests we used the m1.large instance type, which has four EC2 Compute Units, two<br />

virtual cores with two EC2 Compute Units each, and 7.5 GB <strong>of</strong> memory. The nodes are connected with<br />

Gigabit Ethernet. To ensure consistency, the binaries compiled on Lawrencium were used on EC2. All <strong>of</strong><br />

the nodes are located in the US East region in the same availability zone. Amazon <strong>of</strong>fers no guarantees on<br />

proximity <strong>of</strong> nodes allocated together, and there is significant variability in latency between nodes.<br />

In addition to the variability in network latency, we also observed variability in the underlying hardware<br />

on which the virtual machines are running. By examining /proc/cpuinfo, we are able to identify the actual<br />

CPU type <strong>of</strong> the non-virtualized hardware. In our test runs, we identified three different CPUs: the Intel<br />

Xeon E5430 2.66 GHz quad-core processor, the AMD Opteron 270 2.0 GHz dual-core processor, and the<br />

AMD Opteron 2218 HE 2.6 GHz dual-core processor. Amazon does not provide a mechanism to control<br />

which underlying hardware virtual machines are instantiated on. Thus, we almost always end up with a<br />

virtual cluster running on a heterogeneous set <strong>of</strong> processors. This heterogeneity also meant we were unable<br />

to use any <strong>of</strong> the processor-specific compiler options.<br />

During the course <strong>of</strong> the project Amazon started providing access to Cluster Compute (CC) instances<br />

in addition to the other types <strong>of</strong> instances previously available. These new CC instances are significantly<br />

different from the other types in terms <strong>of</strong> performance characteristics. This is the first instance type which<br />

guarantees the hardware architecture <strong>of</strong> the nodes and reduces the chance for any performance variation<br />

due to varying hardware types. Amazon has defined the specification on these CC instances to have dual<br />

socket Intel Nehelam CPUs at 2.97 GHz and 23 GB <strong>of</strong> memory per node. The nodes are connected by<br />

48

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!