28.07.2013 Views

Performance Analysis and Optimization of the Hurricane File System ...

Performance Analysis and Optimization of the Hurricane File System ...

Performance Analysis and Optimization of the Hurricane File System ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

CHAPTER 5. MICROBENCHMARKS – READ OPERATION 42<br />

avg # cycles per processor<br />

4e+07<br />

3e+07<br />

2e+07<br />

1e+07<br />

0e+00<br />

2 4 6 8 10 12<br />

# processors<br />

Unoptimized<br />

Padded ORS entries & hash table<br />

Perfect memory<br />

Figure 5.16: Extent read –<br />

padded ORS cache entries & hash<br />

list headers.<br />

avg # cycles per processor<br />

2e+07<br />

1.5e+7<br />

1e+07<br />

5e+06<br />

0e+00<br />

2 4 6 8 10 12<br />

# processors<br />

Padded ORS entries & hash table<br />

All optimizations<br />

Perfect memory<br />

Figure 5.17: Extent read – all optimizations.<br />

avg # cycles per processor<br />

1e+07<br />

8e+06<br />

6e+06<br />

4e+06<br />

2e+06<br />

0e+00<br />

2 4 6 8 10 12<br />

# processors<br />

All optimizations<br />

All optimizations (manual pad)<br />

Figure 5.18: Extent read – fully<br />

optimized.<br />

Uniprocessor performance was worse than <strong>the</strong> original unoptimized version, taking twice as long to<br />

complete <strong>the</strong> task. This unexpected behavior was also seen in Section 5.4.6. We briefly ran a similar<br />

experiment where <strong>the</strong> hash list headers were manually padded instead <strong>of</strong> using <strong>the</strong> K42 padding facilities<br />

<strong>and</strong> found that <strong>the</strong> uniprocessor anomaly was no longer present. Again, this is fur<strong>the</strong>r evidence that <strong>the</strong><br />

K42 padding facilities may have added unnecessary overhead to <strong>the</strong> uniprocessor configuration.<br />

As shown in Figure 5.16 11 , results between 2 <strong>and</strong> 8 processors approach <strong>the</strong> performance <strong>of</strong> <strong>the</strong> perfect<br />

memory configuration. Secondary cache line conflicts from <strong>the</strong> ORS cache entries caused execution time to<br />

double. The original hash function was used so <strong>the</strong> expected poor performance for 12 processors appeared<br />

again. A change in <strong>the</strong> hash function could lead to near-ideal performance <strong>and</strong> scalability <strong>and</strong> is attempted<br />

in Section 5.4.9.<br />

5.4.9 All <strong>Optimization</strong>s Combined<br />

This experiment combined <strong>the</strong> individual optimizations to determine whe<strong>the</strong>r ideal performance <strong>and</strong> scal-<br />

ability could be achieved. The original configuration from Section 5.2 was used along with <strong>the</strong> following<br />

modifications.<br />

1. Modified ORS hash function (Section 5.4.5).<br />

2. Increased ORS hash table size (Section 5.4.7).<br />

3. Padded ORS hash list headers (Section 5.4.6).<br />

4. Padded ORS cache entries (Section 5.4.8).<br />

For simplicity, <strong>the</strong> K42 padding facilities were used ra<strong>the</strong>r than manually padding <strong>and</strong> aligning <strong>the</strong> data<br />

structures. The results are shown in Figure 5.17 12 . <strong>Performance</strong> mirrors that from Section 5.4.8 between<br />

11 Error bars were removed from <strong>the</strong> unoptimized 12 processor configuration.<br />

12 Error bars were removed from <strong>the</strong> padded 12 processor configuration.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!