Performance Analysis and Optimization of the Hurricane File System ...

More documents

Recommendations

Info

CHAPTER 6. MICROBENCHMARKS – OTHER FUNDAMENTAL OPERATIONS 55 avg # cycles per processor 1e+10 8e+09 6e+09 4e+09 2e+09 0e+00 2 4 6 8 10 12 # processors Unoptimized 20 pool Figure 6.7: Lookup – unoptimized. 6.2.4 + 200 Entries Pool avg # cycles per processor 1e+10 8e+09 6e+09 4e+09 2e+09 0e+00 2 4 6 8 10 12 # processors Unoptimized 20 pool Unoptimized 200 pool Figure 6.8: Lookup – 200 initial pool. avg # cycles per processor 4e+09 3e+09 2e+09 1e+09 0e+00 2 4 6 8 10 12 # processors Unoptimized 200 pool Figure 6.9: Lookup – 200 initial pool, magnified. Influenced by the results that will be shown in Section 6.3.1 for the file name lookup workload, the initial pool of block cache entries was increased from 20 to 200 to resolve a potential anomaly in the block cache entry allocation algorithm. This experiment is otherwise identical in configuration to Section 6.2.3. The results, as shown in Figure 6.6, indicate that the small initial pool of block cache entries was not the cause of the scalability problem. 6.2.5 Summary The optimizations implemented from file read experiments were applicable to the file creation operation. File creation scalability was improved drastically. On 12 processors, approximately an order of magnitude of improvement was achieved. However, ideal scalability was not achieved since threads performed 3 times slower on a 12 processor configuration than on a uniprocessor configuration. Even worse, performance trends indicate that thread execution time will continue to grow linearly past 12 processors. We were unable to improve scalability any further for this particular workload 6.3 File Name Lookup In this set of experiments, we examine the scalability of the file name lookup operation. As described in Section 3.6.2 on page 25, the name lookup operation returns a file token identifier for a given path name. The file token allows for easy identification of the target file by the file system and is used by clients to identify specific files. In the first experiment, the unoptimized HFS configuration of Section 5.2 was used. Each thread performed a file look up of the same, one file: DiskXXX:/directory1/directory2/directory3/directory4/
CHAPTER 6. MICROBENCHMARKS – OTHER FUNDAMENTAL OPERATIONS 56 lookatme.txt 1000 times, where XXX refers to the processor that the thread is executing on. The files were regular, random-access files. All threads were executed in parallel and each was assigned its own processor and disk. The file was located several levels down the directory tree to provide a more realistic file location that would cause a more realistic coverage of cache entries examined. However, the parent directories were empty except for the very leaf directory that contained the target file. This configuration may have provided a slightly unfair workload since, in practice, each directory should have been populated, causing the lookup scheme to linearly traverse each directory to find the target entry that represented the next directory down the path. Since we were not concerned with absolute performance results but with relative scalability results, we considered the configuration to be appropriate. The results, shown in Figure 6.7, indicate that scalability was extremely poor beyond 2 processors. Since the experiment was run from within the file system address space (server-level), the normal file name caching system provided by the VFS was not utilized. Under normal usage, the VFS would handle the majority of repeated lookups of the same path names, making the performance of the file lookup operation less critical. 6.3.1 Increased Initial Pool of Block Cache Entries This experiment examined the effect of increasing the initial pool of block cache entries from 20 to 200. Under the chosen workload, each thread required approximately 10 block cache entries in its working set. Due to the method of managing block cache entries, the characteristics of the chosen workload, and a temporary patch to avoid reader/writer problems in the original HFS implementation, the number of block cache entries never increased and remained at 20. The optimal behavior of the block cache system should have resulted in an increased number of block cache entries, from 20 entries to 200 entries. When running the experiment on more than 2 processors, this flaw caused the 20 block cache entries to thrash between all processors. The results from increasing the initial pool of block cache entries from 20 to 200 is shown in Figure 6.8. Performance improved but the scalability problems were not resolved, as shown in the magnified version of the results in Figure 6.9. 6.3.2 Applying the Read Test Optimizations In this experiment, we additionally applied the optimizations developed from the read test workload. This configuration included all the optimizations from Section 5.5.9 and the increased initial block entry pool size from Section 6.3.1. These optimizations included the following. 1. Padded ORS hash list headers. 2. Padded ORS and block cache entries.
Page 1 and 2:
Performance Analysis and Optimizati
Page 3 and 4:
Acknowledgements This thesis has be
Page 5 and 6:
4.6 Measurements Taken and Graph In
Page 7 and 8:
List of Tables 3.1 File system inte
Page 9 and 10:
6.1 Create. . . . . . . . . . . . .
Page 11 and 12:
CHAPTER 1. INTRODUCTION AND MOTIVAT
Page 13 and 14: CHAPTER 1. INTRODUCTION AND MOTIVAT
Page 15 and 16: CHAPTER 2. BACKGROUND AND RELATED W
Page 23 and 24: Chapter 3 HFS Architecture This cha
Page 25 and 26: CHAPTER 3. HFS ARCHITECTURE 16 3.1.
Page 27 and 28: CHAPTER 3. HFS ARCHITECTURE 18 ORS
Page 29 and 30: CHAPTER 3. HFS ARCHITECTURE 20 dirt
Page 31 and 32: CHAPTER 3. HFS ARCHITECTURE 22 dire
Page 33 and 34: CHAPTER 3. HFS ARCHITECTURE 24 1. 2
Page 35 and 36: CHAPTER 3. HFS ARCHITECTURE 26 curr
Page 37 and 38: CHAPTER 4. EXPERIMENTAL SETUP 28 1
Page 39 and 40: CHAPTER 4. EXPERIMENTAL SETUP 30 de
Page 41 and 42: CHAPTER 4. EXPERIMENTAL SETUP 32 av
Page 43 and 44: Chapter 5 Microbenchmarks - Read Op
Page 45 and 46: CHAPTER 5. MICROBENCHMARKS - READ O
Page 61 and 62: Chapter 6 Microbenchmarks - Other F
Page 63: CHAPTER 6. MICROBENCHMARKS - OTHER
Page 67 and 68: CHAPTER 6. MICROBENCHMARKS - OTHER
Page 75 and 76: Chapter 7 Macrobenchmark 7.1 Purpos
Page 77 and 78: CHAPTER 7. MACROBENCHMARK 68 server
Page 79 and 80: CHAPTER 7. MACROBENCHMARK 70 avg #
Page 81 and 82: CHAPTER 7. MACROBENCHMARK 72 Logica
Page 87 and 88: CHAPTER 7. MACROBENCHMARK 78 7.5 Ot
Page 89 and 90: CHAPTER 7. MACROBENCHMARK 80 to deq
Page 91 and 92: CHAPTER 8. CONCLUSIONS 82 8.1 Gener
Page 93 and 94: CHAPTER 8. CONCLUSIONS 84 read-only
Page 95 and 96: Bibliography [1] Gene M. Amdahl. Va
Page 97 and 98: BIBLIOGRAPHY 88 [33] David Kotz, So
Page 99: BIBLIOGRAPHY 90 [71] Keith A. Smith
show all

Performance Analysis and Optimization of the Hurricane File System ...

Create successful ePaper yourself

Delete template?

Save as template?