28.07.2013 Views

Performance Analysis and Optimization of the Hurricane File System ...

Performance Analysis and Optimization of the Hurricane File System ...

Performance Analysis and Optimization of the Hurricane File System ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

CHAPTER 5. MICROBENCHMARKS – READ OPERATION 50<br />

avg # cycles per processor<br />

1e+07<br />

5e+06<br />

0e+00<br />

2 4 6 8 10 12<br />

# processors<br />

Fine grain++ & no free list<br />

All optimizations<br />

Figure 5.31: Regular read – all optimizations.<br />

2. Padded ORS <strong>and</strong> block cache entries.<br />

avg # cycles per processor<br />

3e+06<br />

2e+06<br />

1e+06<br />

0e+00<br />

2 4 6 8 10 12<br />

# processors<br />

All optimizations<br />

Figure 5.32: Regular read – all optimizations,<br />

magnified.<br />

3. Padded all o<strong>the</strong>r critical data structures in <strong>the</strong> ORS <strong>and</strong> block cache systems, such as locks <strong>and</strong> waiting<br />

I/O data structures (shown in Figure 3.4 <strong>and</strong> Figure 3.5).<br />

4. Modified hash functions. (ORS <strong>and</strong> block cache)<br />

5. Increased size <strong>of</strong> hash tables. (ORS <strong>and</strong> block cache)<br />

6. Elimination <strong>of</strong> <strong>the</strong> block cache free list.<br />

7. Use <strong>of</strong> fine grain locks in <strong>the</strong> block cache.<br />

For simplicity, <strong>the</strong> K42 padding facilities were used ra<strong>the</strong>r than manually padding <strong>and</strong> aligning <strong>the</strong> data<br />

structures. The results are shown in Figure 5.31 17 . Scalability showed fur<strong>the</strong>r improvement when compared<br />

to Section 5.5.7. A magnification <strong>of</strong> <strong>the</strong> results, shown in Figure 5.32, indicate that scalability is good<br />

although not quite ideal. The 8 processor results exhibited <strong>the</strong> same anomaly as described in Section 5.5.7,<br />

resulting in <strong>the</strong> lower error bar in <strong>the</strong> graph. Figure 5.21 compares <strong>the</strong> fully optimized results against <strong>the</strong><br />

original unoptimized results. At this point, ra<strong>the</strong>r than attempting to achieve ideal scalability, we decided<br />

to examine o<strong>the</strong>r workloads that stress o<strong>the</strong>r basic file system operations.<br />

5.5.10 Summary<br />

For regular, r<strong>and</strong>om-access files, good scalability was achieved in <strong>the</strong> read experiments using a combination<br />

<strong>of</strong> modified hash functions, large hash tables, padded hash list headers, padded cache entries, o<strong>the</strong>r padded<br />

data structures, fine grain locks, <strong>and</strong> a modified block cache free list.<br />

5.6 Summary<br />

Determining <strong>and</strong> eliminating bottlenecks in parallel system s<strong>of</strong>tware is difficult, requiring a long process <strong>of</strong><br />

experimentation. In general, we have found it important to apply <strong>the</strong> principles <strong>of</strong> maximizing locality <strong>and</strong><br />

17 Error bars were removed from <strong>the</strong> 12 processor configuration.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!