Performance Analysis and Optimization of the Hurricane File System ...

More documents

Recommendations

Info

Chapter 4 Experimental Setup This chapter describes the environment in which we ran our experiments. Topics we discuss include our measurement approach, workload selection, simulated hardware setup, simulated disk setup, system environment configuration, graph interpretation, and simulated hardware characteristics. In general, the experiments consisted mainly of a thread per processor accessing a disk exclusively. As the number of threads is increased, the number of processors and disks are increased proportionately. The time for each thread to perform its task is measured. 4.1 Measurement Approach The goal of our experiments was to obtain an understanding of file system performance scalability from the bottom up. In this way, we could gain an understanding of scalability at various levels of complexity, from simple operations to complicated real-world workloads. We began with simple and easy to understand microbenchmarks in order to obtain basic performance numbers and an idea of scalability trends. Scalability bottlenecks were identified and incremental optimizations were implemented. We then iteratively added more complexity in the workload and repeated the process several times. Each time, we verified expectations and ensured that the results had logical explanations. The gradual progression helped in dealing with the many variables of the file system and workload. We were primarily interested in the scalability of meta- data management since in our experience, this is the source of most performance problems. File cache management and latency hiding using file block prefetching are responsibilities of the K42 file cache manager and were not examined in this thesis. Speed-up, in terms of throughput, is the typical unit of measurement used in scalability results, however it is not as intuitively meaningful under our particular workloads. We were concerned with maintaining constant execution time per thread as the number of requesting threads, processors, and disks were increased. For example, let us assume that it takes X cycles to execute 1 request on a system with 1 processor and 27
CHAPTER 4. EXPERIMENTAL SETUP 28 1 disk. Ideally, on system with 12 threads of requests, 12 processors, and 12 disks, each request should also take X cycles to execute. However, this perfect speedup is not usually achieved because of problems such as false-sharing and lock contention, which increase execution time. Since we were concerned with maintaining the current level of performance for all threads of execution, we focused on the slow-down factor of the file system as the number of concurrent requests were increased. On a given number of processors, we calculated the average number of cycles required by each thread to execute its task. An ideal graph would have a perfectly horizontal, flat curve from 1 processor to N processors. These results allow for a more direct comparison across all processor configurations since, if there are scalability problems, we can directly see the increased average execution time of a thread. 4.2 Workload Selection Selecting an appropriate workload to stress the file system is non-trivial. Different types of workloads have characteristics that stress the file system in different ways. General workload types include scientific, commercial/transaction based, and event-driven. Unfortunately, K42 is in the development stages and lacks infrastructure required to run many applications and benchmarks. Instead, custom file system benchmarks were developed that could run on the existing K42 infrastructure. Although non-standard benchmarks were used, these were very quick to implement on K42 and offered a fully controllable, understandable, simple environment to gather performance information. Microbenchmark workloads include parallel file (1) reading, (2) writing, (3) creating, (4) statting 1 , and (5) name lookup. These microbenchmarks stress the fundamental operations of the file system and form a common base that is applicable to all types of workloads. The macrobenchmark we used consisted of a Web server simulator that replayed a portion of the World Cup 1998 aggregated Web log. It served to briefly verify that optimizations applied during the microbenchmarks were applicable to higher-level and more realistic workloads. The custom benchmarks were designed to be I/O-bound, as suggested by Chen and Patterson [15]. According to them, when evaluating file systems, I/O-bound workloads should be used since they stress file systems more than other types of workloads. I/O-bound workloads spend the majority of the time performing file system operations rather than user-level computation. The experiments were run using an optimistic configuration of 1 workload thread per processor and 1 disk per processor. Each workload thread is independent, executes in parallel, and only accesses its local disk. This configuration presents an easily scalable workload and enabled us to examine a parallel file system requirement mentioned by Kotz et al. [51, p. 13], in that file systems need to provide high performance access 1 “Statting” refers to the common Unix file system call stat() that returns the attributes of a file. Some of these attributes include access permissions, owner, group, size, access time, and modification time.
Page 1 and 2: Performance Analysis and Optimizati
Page 3 and 4: Acknowledgements This thesis has be
Page 5 and 6: 4.6 Measurements Taken and Graph In
Page 7 and 8: List of Tables 3.1 File system inte
Page 9 and 10: 6.1 Create. . . . . . . . . . . . .
Page 11 and 12: CHAPTER 1. INTRODUCTION AND MOTIVAT
Page 13 and 14: CHAPTER 1. INTRODUCTION AND MOTIVAT
Page 15 and 16: CHAPTER 2. BACKGROUND AND RELATED W
Page 23 and 24: Chapter 3 HFS Architecture This cha
Page 25 and 26: CHAPTER 3. HFS ARCHITECTURE 16 3.1.
Page 27 and 28: CHAPTER 3. HFS ARCHITECTURE 18 ORS
Page 29 and 30: CHAPTER 3. HFS ARCHITECTURE 20 dirt
Page 31 and 32: CHAPTER 3. HFS ARCHITECTURE 22 dire
Page 33 and 34: CHAPTER 3. HFS ARCHITECTURE 24 1. 2
Page 35: CHAPTER 3. HFS ARCHITECTURE 26 curr
Page 39 and 40: CHAPTER 4. EXPERIMENTAL SETUP 30 de
Page 41 and 42: CHAPTER 4. EXPERIMENTAL SETUP 32 av
Page 43 and 44: Chapter 5 Microbenchmarks - Read Op
Page 45 and 46: CHAPTER 5. MICROBENCHMARKS - READ O
Page 61 and 62: Chapter 6 Microbenchmarks - Other F
Page 63 and 64: CHAPTER 6. MICROBENCHMARKS - OTHER
Page 75 and 76: Chapter 7 Macrobenchmark 7.1 Purpos
Page 77 and 78: CHAPTER 7. MACROBENCHMARK 68 server
Page 79 and 80: CHAPTER 7. MACROBENCHMARK 70 avg #
Page 81 and 82: CHAPTER 7. MACROBENCHMARK 72 Logica
Page 87 and 88:
CHAPTER 7. MACROBENCHMARK 78 7.5 Ot
Page 89 and 90:
CHAPTER 7. MACROBENCHMARK 80 to deq
Page 91 and 92:
CHAPTER 8. CONCLUSIONS 82 8.1 Gener
Page 93 and 94:
CHAPTER 8. CONCLUSIONS 84 read-only
Page 95 and 96:
Bibliography [1] Gene M. Amdahl. Va
Page 97 and 98:
BIBLIOGRAPHY 88 [33] David Kotz, So
Page 99:
BIBLIOGRAPHY 90 [71] Keith A. Smith
show all

Performance Analysis and Optimization of the Hurricane File System ...

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?