28.07.2013 Views

Performance Analysis and Optimization of the Hurricane File System ...

Performance Analysis and Optimization of the Hurricane File System ...

Performance Analysis and Optimization of the Hurricane File System ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

CHAPTER 5. MICROBENCHMARKS – READ OPERATION 40<br />

avg # cycles per processor<br />

4e+07<br />

3e+07<br />

2e+07<br />

1e+07<br />

0e+00<br />

2 4 6 8 10 12<br />

# processors<br />

Unoptimized<br />

Rehashed & padded<br />

Perfect memory<br />

Figure 5.14: Extent read – rehashed<br />

& padded ORS hash table.<br />

New Old<br />

Hash Hash<br />

CPU List#s List#s<br />

0 12, 13 12, 13<br />

1 20, 21 20, 21<br />

2 28, 29 28, 29<br />

3 36, 37 36, 37<br />

4 44, 45 44, 45<br />

5 52, 53 52, 53<br />

6 60, 61 60, 61<br />

7 68, 69 4, 5<br />

8 76, 77 12, 13<br />

9 84, 85 20, 21<br />

10 92, 93 28, 29<br />

11 100, 101 36, 37<br />

Table 5.3: ORS hash lists accessed.<br />

avg # cycles per processor<br />

4e+07<br />

3e+07<br />

2e+07<br />

1e+07<br />

0e+00<br />

2 4 6 8 10 12<br />

# processors<br />

Unoptimized<br />

Larger ORS hash table<br />

Perfect memory<br />

Figure 5.15: Extent read – larger<br />

ORS hash table.<br />

in Figure 5.12 7 . The performance was worse than <strong>the</strong> original unoptimized configuration between 1 to<br />

8 processors but significantly better on 12 processors. The degradation between 1 to 8 processors was due<br />

to <strong>the</strong> denser usage <strong>of</strong> <strong>the</strong> hash table. Since <strong>the</strong> hash list headers <strong>and</strong> corresponding embedded locks reside<br />

contiguously in memory due to <strong>the</strong> K42 memory allocation algorithm, a denser usage <strong>of</strong> <strong>the</strong> locks led to<br />

increased false sharing <strong>of</strong> secondary cache lines. Figure 5.13 illustrates an example where false sharing can<br />

occur. In this example, processors 0 <strong>and</strong> 1 cause false sharing because <strong>the</strong>y concurrently access hash list<br />

headers 12, 13, <strong>and</strong> 15, 16, respectively. Since <strong>the</strong>se hash list headers all occupy <strong>the</strong> same cache line, this<br />

cache line will “ping-pong” between <strong>the</strong> two processors <strong>and</strong> result in reduced performance.<br />

The increased false sharing caused increased memory <strong>and</strong> cache coherence traffic. <strong>Performance</strong> on 12 pro-<br />

cessors was similar to performance on 8 processors. A major scalability bottleneck was not introduced when<br />

moving from 8 to 12 processors, unlike in <strong>the</strong> original unoptimized configuration, where hash list lock con-<br />

tention occurred. In <strong>the</strong> rehashed experiment, <strong>the</strong> only difference between 8 <strong>and</strong> 12 processors was a marginal<br />

increase in memory bus traffic <strong>and</strong> memory bank contention. The increasing range <strong>of</strong> <strong>the</strong> error bars in Fig-<br />

ure 5.12 between 4 <strong>and</strong> 12 processors was due to <strong>the</strong> increasing memory bus contention from additional<br />

false sharing between additional pairs <strong>of</strong> processors.<br />

5.4.6 Improved Hash Function & Padded ORS Hash Table<br />

False sharing can <strong>of</strong>ten be eliminated by properly padding <strong>the</strong> affected data structures. This experiment<br />

is based on <strong>the</strong> configuration in Section 5.4.5 with <strong>the</strong> modified ORS hash function, where each hash list<br />

header was padded to <strong>the</strong> end <strong>of</strong> <strong>the</strong> secondary cache line to eliminate false sharing problems. The results<br />

7 Error bars were removed from <strong>the</strong> unoptimized 12 processor configuration.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!