28.07.2013 Views

Performance Analysis and Optimization of the Hurricane File System ...

Performance Analysis and Optimization of the Hurricane File System ...

Performance Analysis and Optimization of the Hurricane File System ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

CHAPTER 5. MICROBENCHMARKS – READ OPERATION 44<br />

5.5.1 Unoptimized HFS<br />

This experiment was based on <strong>the</strong> configuration <strong>of</strong> Section 5.2, except that regular 4 MB files were used<br />

ra<strong>the</strong>r than extent-based 12 MB files. There were two reasons for this selection. First, <strong>the</strong> file size was<br />

chosen so <strong>the</strong> block index occupied exactly 1 meta-data cache entry, making it simple to trace cache entries.<br />

Second, initial trials using 12 MB regular files were very time consuming. Using 4 MB files was adequate to<br />

show <strong>the</strong> performance characteristics.<br />

The results, shown in Figure 5.20, indicate extremely poor performance for more than 2 processors,<br />

especially when compared with results from <strong>the</strong> extent-based file experiments in Section 5.2, despite reading<br />

only 1/3 as much data (4 MB vs 12 MB). The obvious bottlenecks identified by code inspection were (1) <strong>the</strong><br />

hash function <strong>of</strong> <strong>the</strong> block cache system, (2) <strong>the</strong> limited number <strong>of</strong> hash lists (only 4) in <strong>the</strong> block cache<br />

system, <strong>and</strong> (3) <strong>the</strong> global lock used in <strong>the</strong> block cache system. The original hash function resulted in all<br />

processors sharing a single hash list. The reason for having only a limited number <strong>of</strong> hash lists in <strong>the</strong> original<br />

HFS design has not been determined. The global lock is used to protect a variety <strong>of</strong> data structures from<br />

conflicting concurrent accesses.<br />

Outline <strong>of</strong> <strong>Optimization</strong>s<br />

We subsequently applied a number <strong>of</strong> optimizations. These include (1) padded hash list headers, cache<br />

entries, <strong>and</strong> o<strong>the</strong>r critical data structures, (2) modified hash functions, (3) larger hash tables, (4) modified<br />

free list, <strong>and</strong> (5) use <strong>of</strong> fine grain locks. The fully optimized results are shown in Figure 5.21. With <strong>the</strong><br />

optimizations, good scalability was achieved. Sections 5.5.2 to 5.5.9 describe <strong>the</strong> investigation process that<br />

led to <strong>the</strong>se results. A modified block cache hash function or an increased number <strong>of</strong> block cache hash lists<br />

were ineffective, as shown in Sections 5.5.2 <strong>and</strong> 5.5.3, respectively. Surprisingly, implementing fine grain<br />

locks in Section 5.5.5 along with <strong>the</strong> padding <strong>of</strong> critical data structures in Section 5.5.6 did not result in<br />

scalability ei<strong>the</strong>r. As demonstrated in Section 5.5.7, <strong>the</strong> first critical bottleneck was <strong>the</strong> block cache free list,<br />

followed by <strong>the</strong> global block cache lock, as shown in Section 5.5.8. Finally, Section 5.5.9 describes <strong>the</strong> fully<br />

optimized configuration in detail.<br />

Many <strong>of</strong> <strong>the</strong> experiments are described in part to demonstrate how non-trivial it is to find bottlenecks in<br />

large, complex, parallel system s<strong>of</strong>tware.<br />

5.5.2 Improved Hash Function<br />

This experiment examined <strong>the</strong> impact <strong>of</strong> modifying <strong>the</strong> hash function <strong>of</strong> <strong>the</strong> block cache system. The<br />

configuration is similar to Section 5.5.1 except that <strong>the</strong> hash function was modified to better distribute <strong>the</strong><br />

use <strong>of</strong> hash lists, as shown in Table 5.4. The hash table size used by <strong>the</strong> block cache remained unmodified at

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!