29.12.2014 Views

Magellan Final Report - Office of Science - U.S. Department of Energy

Magellan Final Report - Office of Science - U.S. Department of Energy

Magellan Final Report - Office of Science - U.S. Department of Energy

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>Magellan</strong> <strong>Final</strong> <strong>Report</strong><br />

100
<br />

Percentage
Performance
Rela/ve
to
Na/ve
<br />

90
<br />

80
<br />

70
<br />

60
<br />

50
<br />

40
<br />

30
<br />

20
<br />

10
<br />

TCP
o
IB
<br />

TCP
o
Eth
<br />

VM
<br />

0
<br />

GTC
 PARATEC
 CAM
<br />

Figure 9.4: Relative performance <strong>of</strong> each <strong>of</strong> the cases examined for the three applications: GTC, PARATEC<br />

and CAM.<br />

100
<br />

100
<br />

Percentage
<strong>of</strong>
Computa0on
Time
Rela0ve
to
Na0ve
<br />

90
<br />

80
<br />

70
<br />

60
<br />

50
<br />

40
<br />

30
<br />

20
<br />

10
<br />

TCP
o
IB
<br />

TCP
o
Eth
<br />

VM
<br />

Percentage
<strong>of</strong>
Communcia0on
Time
rela0ve
to
Na0ve
<br />

90
<br />

80
<br />

70
<br />

60
<br />

50
<br />

40
<br />

30
<br />

20
<br />

10
<br />

TCP
o
IB
<br />

TCP
o
Eth
<br />

VM
<br />

0
<br />

GTC
 PARATEC
 CAM
<br />

0
<br />

GTC
 PARATEC
 CAM
<br />

Figure 9.5: Performance ratio for the compute (a) and communication (b) parts <strong>of</strong> the runtime for each <strong>of</strong><br />

the three applications GTC, PARATEC and CAM for each <strong>of</strong> the cases under consideration.<br />

urations are shown in Figure 9.4. For all the codes, except CAM using TCP over Ethernet, the performance<br />

shows the expected trend, with each successive configuration showing decreased performance. To understand<br />

the reasons for the differing performance impact on each <strong>of</strong> the three codes, we analyzed them using the<br />

Integrated Performance Monitoring framework [77, 84].<br />

The principal performance measurement made by IPM is the percentage <strong>of</strong> the runtime spent communicating<br />

(the time in the MPI library), which also allows us to infer the time spent computing. The results<br />

<strong>of</strong> this analysis are shown in Figure 9.5 which shows the compute time ratios for the various configurations.<br />

As expected from the HPCC results, only the VM configuration shows significant performance degradation<br />

for the compute portion <strong>of</strong> the runtime. Why the different applications show different slowdowns is unclear<br />

at this point, but we suspect it is related to the mix <strong>of</strong> instructions that they perform and the different<br />

slowdowns each experiences in the VM. The communication ratios are shown in Figure 9.5b. In this case<br />

the differences between the different applications and between the various configurations are much more<br />

pronounced. As well as providing overall information about the amount <strong>of</strong> time spent in the MPI library,<br />

IPM also reports the type <strong>of</strong> MPI routine called, the size <strong>of</strong> message sent, and the destination <strong>of</strong> the message.<br />

Using this information, we can understand why the different applications show different slowdowns. In the<br />

54

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!