Performance Analysis and Optimization of the Hurricane File System ...

More documents

Recommendations

Info

CHAPTER 5. MICROBENCHMARKS – READ OPERATION 37 avg # cycles per processor 4e+07 3e+07 2e+07 1e+07 0e+00 2 4 6 8 10 12 # processors Unoptimized All optimizations (manual pad) Figure 5.5: Extent read – fully optimized. avg # cycles per processor 4e+07 3e+07 2e+07 1e+07 0e+00 2 4 6 8 10 12 # processors Unoptimized Perfect memory Figure 5.6: Extent read – perfect memory. avg # cycles per processor 4e+06 3e+06 2e+06 1e+06 0e+00 2 4 6 8 10 12 # processors Perfect memory Figure 5.7: Extent read – perfect memory, magnified. cache performance. Regular, random-access, block-based files are used in the subsequent section (Section 5.5) to explore block cache performance. 5.4.1 Perfect Memory Model This experiment was based on the previous experiment (Section 5.2), but uses the perfect memory model rather than the standard memory model. The differences between the perfect and standard memory model results provide an indication of the time spent waiting for memory operations and any possible memory contention. The results, shown in Figure 5.6 2 , indicate that the number of cycles required by each thread remained fairly constant under all processor configurations, especially when compared against the standard memory model results. On 2 to 8 processor configurations, memory system access time in the standard memory model contributed approximately 6,300,000 additional cycles to each thread, more than doubling thread execution time when compared against the perfect memory model. Hash list lock contention, shown in the 12 processor configuration, caused only a minor performance degradation, increasing average thread execution time by approximately 500,000 cycles. A magnification of the results is shown in Figure 5.7. The increase is due solely to hash list lock contention, that is, the sharing of hash list locks as described in Section 5.3. Operations performed under the protection of a hash list lock occur very quickly due to the perfect memory model. Since the hash list lock is held for only a short period of time, contention is low, leading to a relatively small time increase in the 12 processor configuration. These results suggest that the severe degradation seen in the standard memory model on 12 processors may be largely attributed to hash list lock contention, exacerbated by memory contention problems. 2 Error bars were removed from the unoptimized 12 processor configuration.
CHAPTER 5. MICROBENCHMARKS – READ OPERATION 38 avg # cycles per processor 8e+07 6e+07 4e+07 2e+07 0e+00 avg # cycles per processor 4e+07 3e+07 2e+07 1e+07 0e+00 2 4 6 8 10 12 # processors Unoptimized Baseline Figure 5.8: Extent read – baseline. 2 4 6 8 10 12 # processors Unoptimized Spin lock Perfect memory Figure 5.10: Extent read – spinonly locks. 5.4.2 Baseline avg # cycles per processor 6e+07 4e+07 2e+07 0e+00 avg # cycles per processor 8e+04 6e+04 4e+04 2e+04 0e+00 2 4 6 8 10 12 # processors Unoptimized N-way set-associative cache Perfect memory Figure 5.11: Extent read – 4-way primary and 2-way secondary cache. 2 4 6 8 10 12 # processors Baseline Figure 5.9: Extent read – baseline, magnified. avg # cycles per processor 4e+07 3e+07 2e+07 1e+07 0e+00 2 4 6 8 10 12 # processors Rehashed Unoptimized Perfect memory Figure 5.12: Extent read – rehashed. This experiment assesses whether K42 and the microbenchmark facilities themselves were scalable without considering HFS. Due to the rapidly changing development environment and the experimental nature of K42, the performance and scalability of K42 needs to be frequently verified. The configuration for this experiment was the same as Section 5.2 where the standard memory model was used. The code path was identical except that calls to the core of HFS were short-circuited. Calls reached as far as the HFS interface layer but no further. More specifically, calls to the HFS core to acquire and update meta-data and perform the read operation were excluded. The results are shown in Figure 5.8 3 . A magnification of the results is shown in Figure 5.9. As expected, the results showed ideal scalability, indicating that K42 and the microbenchmark mechanisms were indeed scalable, and were a negligible portion of the overhead. 3 Error bars were removed from the unoptimized 12 processor configuration.
Page 1 and 2: Performance Analysis and Optimizati
Page 3 and 4: Acknowledgements This thesis has be
Page 5 and 6: 4.6 Measurements Taken and Graph In
Page 7 and 8: List of Tables 3.1 File system inte
Page 9 and 10: 6.1 Create. . . . . . . . . . . . .
Page 11 and 12: CHAPTER 1. INTRODUCTION AND MOTIVAT
Page 13 and 14: CHAPTER 1. INTRODUCTION AND MOTIVAT
Page 15 and 16: CHAPTER 2. BACKGROUND AND RELATED W
Page 23 and 24: Chapter 3 HFS Architecture This cha
Page 25 and 26: CHAPTER 3. HFS ARCHITECTURE 16 3.1.
Page 27 and 28: CHAPTER 3. HFS ARCHITECTURE 18 ORS
Page 29 and 30: CHAPTER 3. HFS ARCHITECTURE 20 dirt
Page 31 and 32: CHAPTER 3. HFS ARCHITECTURE 22 dire
Page 33 and 34: CHAPTER 3. HFS ARCHITECTURE 24 1. 2
Page 35 and 36: CHAPTER 3. HFS ARCHITECTURE 26 curr
Page 37 and 38: CHAPTER 4. EXPERIMENTAL SETUP 28 1
Page 39 and 40: CHAPTER 4. EXPERIMENTAL SETUP 30 de
Page 41 and 42: CHAPTER 4. EXPERIMENTAL SETUP 32 av
Page 43 and 44: Chapter 5 Microbenchmarks - Read Op
Page 45: CHAPTER 5. MICROBENCHMARKS - READ O
Page 49 and 50: CHAPTER 5. MICROBENCHMARKS - READ O
Page 61 and 62: Chapter 6 Microbenchmarks - Other F
Page 63 and 64: CHAPTER 6. MICROBENCHMARKS - OTHER
Page 75 and 76: Chapter 7 Macrobenchmark 7.1 Purpos
Page 77 and 78: CHAPTER 7. MACROBENCHMARK 68 server
Page 79 and 80: CHAPTER 7. MACROBENCHMARK 70 avg #
Page 81 and 82: CHAPTER 7. MACROBENCHMARK 72 Logica
Page 87 and 88: CHAPTER 7. MACROBENCHMARK 78 7.5 Ot
Page 89 and 90: CHAPTER 7. MACROBENCHMARK 80 to deq
Page 91 and 92: CHAPTER 8. CONCLUSIONS 82 8.1 Gener
Page 93 and 94: CHAPTER 8. CONCLUSIONS 84 read-only
Page 95 and 96: Bibliography [1] Gene M. Amdahl. Va
Page 97 and 98:
BIBLIOGRAPHY 88 [33] David Kotz, So
Page 99:
BIBLIOGRAPHY 90 [71] Keith A. Smith
show all

Performance Analysis and Optimization of the Hurricane File System ...

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?