Magellan Final Report - Office of Science - U.S. Department of Energy
Magellan Final Report - Office of Science - U.S. Department of Energy
Magellan Final Report - Office of Science - U.S. Department of Energy
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
<strong>Magellan</strong> <strong>Final</strong> <strong>Report</strong><br />
35000 <br />
35000 <br />
30000 <br />
30000 <br />
25000 <br />
25000 <br />
Time (ms) <br />
20000 <br />
15000 <br />
10000 <br />
5000 <br />
0 <br />
20 40 60 80 100 <br />
Maps <br />
20 <br />
100 <br />
60 Reduces <br />
Time (ms) <br />
20000 <br />
15000 <br />
10000 <br />
5000 <br />
0 <br />
20 40 60 80 100 <br />
Maps <br />
20 <br />
100 <br />
60 Reduces <br />
(a) 100 lines<br />
(b) 1000 lines<br />
Figure 10.1: Hadoop MRBench.<br />
MRBench evaluates the performance <strong>of</strong> MapReduce systems while varying key parameters such as data<br />
size and the number <strong>of</strong> Map/Reduce tasks. We varied the number <strong>of</strong> lines <strong>of</strong> data written from 100 to 1000<br />
and varied the number <strong>of</strong> maps and reduces. Figure 10.1 shows the time with varying maps and reduces<br />
for a) 100 and b) 1000 lines. As the number <strong>of</strong> maps and reduces increases the time increases; however<br />
the difference is less than 10 seconds. The number <strong>of</strong> lines written at the orders <strong>of</strong> magnitude we measure<br />
shows no perceptible effect. MRBench can be provided with custom mapper and reducer implementations to<br />
measure specific system or application behavior. This could be used to develop benchmarks that emphasize<br />
the nature <strong>of</strong> scientific workloads.<br />
TestDFSIO measures the I/O performance <strong>of</strong> HDFS. Figure 10.2 shows the throughput for small and large<br />
file sizes with varying concurrent writers/files. For small file sizes, the throughput remains fairly constant<br />
with varying number <strong>of</strong> concurrent writers. However, the throughput decreases rapidly as the number <strong>of</strong><br />
concurrent files/writers increases. This seems to be dependent on the HDFS block size and will require<br />
tuning and additional benchmarking to understand the configuration necessary for scientific workloads at a<br />
specific site.<br />
70 <br />
60 <br />
10MB <br />
10GB <br />
Throughput MB/s <br />
50 <br />
40 <br />
30 <br />
20 <br />
10 <br />
0 <br />
0 10 20 30 40 50 60 <br />
Number <strong>of</strong> concurrent writerss <br />
Figure 10.2: HDFS throughput for different file sizes with varying number <strong>of</strong> concurrent writers.<br />
87