29.12.2014 Views

Magellan Final Report - Office of Science - U.S. Department of Energy

Magellan Final Report - Office of Science - U.S. Department of Energy

Magellan Final Report - Office of Science - U.S. Department of Energy

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>Magellan</strong> <strong>Final</strong> <strong>Report</strong><br />

35000
<br />

35000
<br />

30000
<br />

30000
<br />

25000
<br />

25000
<br />

Time
(ms)
<br />

20000
<br />

15000
<br />

10000
<br />

5000
<br />

0
<br />

20
 40
 60
 80
 100
<br />

Maps
<br />

20
<br />

100
<br />

60
 Reduces
<br />

Time
(ms)
<br />

20000
<br />

15000
<br />

10000
<br />

5000
<br />

0
<br />

20
 40
 60
 80
 100
<br />

Maps
<br />

20
<br />

100
<br />

60
 Reduces
<br />

(a) 100 lines<br />

(b) 1000 lines<br />

Figure 10.1: Hadoop MRBench.<br />

MRBench evaluates the performance <strong>of</strong> MapReduce systems while varying key parameters such as data<br />

size and the number <strong>of</strong> Map/Reduce tasks. We varied the number <strong>of</strong> lines <strong>of</strong> data written from 100 to 1000<br />

and varied the number <strong>of</strong> maps and reduces. Figure 10.1 shows the time with varying maps and reduces<br />

for a) 100 and b) 1000 lines. As the number <strong>of</strong> maps and reduces increases the time increases; however<br />

the difference is less than 10 seconds. The number <strong>of</strong> lines written at the orders <strong>of</strong> magnitude we measure<br />

shows no perceptible effect. MRBench can be provided with custom mapper and reducer implementations to<br />

measure specific system or application behavior. This could be used to develop benchmarks that emphasize<br />

the nature <strong>of</strong> scientific workloads.<br />

TestDFSIO measures the I/O performance <strong>of</strong> HDFS. Figure 10.2 shows the throughput for small and large<br />

file sizes with varying concurrent writers/files. For small file sizes, the throughput remains fairly constant<br />

with varying number <strong>of</strong> concurrent writers. However, the throughput decreases rapidly as the number <strong>of</strong><br />

concurrent files/writers increases. This seems to be dependent on the HDFS block size and will require<br />

tuning and additional benchmarking to understand the configuration necessary for scientific workloads at a<br />

specific site.<br />

70 <br />

60 <br />

10MB <br />

10GB <br />

Throughput MB/s <br />

50 <br />

40 <br />

30 <br />

20 <br />

10 <br />

0 <br />

0 10 20 30 40 50 60 <br />

Number <strong>of</strong> concurrent writerss <br />

Figure 10.2: HDFS throughput for different file sizes with varying number <strong>of</strong> concurrent writers.<br />

87

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!