Performance Analysis and Optimization of the Hurricane File System ...

More documents

Recommendations

Info

CHAPTER 2. BACKGROUND AND RELATED WORK 11 patterns and caching policies. Our focus is on the performance of the deeper, underlying mechanisms and data structures of the file system. 2.5.2 File System Workloads Selecting an appropriate workload to measure file system performance is a complicated task and has been studied by many researchers. The resulting performance of any experiment is dependent on the workload chosen and the workload may not reflect the ultimate end-use of the computer system. A study of file system workload was conducted by Ousterhout et al. [53] in 1985 on several time-shared VAX computers that showed a number of important characteristics of file system usage: 1. Many users have mostly low bandwidth file system requirements relative to available file system bandwidth. 2. Most file accesses are sequential (70%). 3. Most data blocks are deleted or replaced shortly after creation (20%-30% within 0.5 seconds, 50% within 5 minutes). 4. There are many more small-file accesses than large-file accesses (80% are less than 10 kbytes). Since publication, these basic characteristics have been influential to file system design and benchmarking. Various updated studies were conducted such as by Baker et al. [7] on the Sprite Network File System and by Vogel [82] on a Windows NT workstation environment. Results by Baker et al. reaffirmed most of the initial findings. Two major differences were that average user file throughput was 20 times greater and that the average size of large files have increased by more than an order of magnitude. Results by Vogel showed an additional 3 times increase in average user file throughput and an additional order of magnitude increase in the average size of large files. There was a shift towards random file access. Files were opened for shorter periods, in the order of a magnitude shorter. A more recent study was conducted by Roselli et al. [61] on a variety of computer systems that serve a variety of purposes. They focused on how disk behavior is affected by workload. Most files have a bimodal access pattern consisting of read-mostly or write-mostly access. The most common file system call is the stat() operation. A large study on file systems in a large commercial environment was conducted by Douceur and Bolosky [20]. They found that the various attributes of file systems, such as file size, file age, file lifetime, directory size, and directory depth, fit specific probability distributions. File and directory sizes are similar across file systems, however, file life times vary greatly. There is a strong relation between file type and file size. Finally, on average, file systems are half full. The characteristics of workloads for message-passing multiprocessor file systems running parallel scientific applications was studied by Kotz, Nieuwejaar, Ellis, Purakayastha, and Best in [32, 57, 51]. They found that files are generally larger, with longer life-times, perform mostly writes, and exhibit more intra-job file sharing
CHAPTER 2. BACKGROUND AND RELATED WORK 12 when compared against general purpose time-sharing, server, and workstation environments. Although large I/O requests are common, small requests are fairly common as well. They believe this is a natural result of parallelization and is an inherent characteristic in most parallel programs. Therefore, file systems for these computers must provide low latency to small requests and high bandwidth to large requests. They concluded that a distributed file system (such as NFS or AFS) would not provide adequate performance since it is designed for completely different workload characteristics. 2.5.3 File System Benchmarking A classic synthetic benchmark is the Andrew benchmark [27] that was originally designed to measure the performance scalability of the Andrew File System. It claims to be representative of the workload of an average user. However, it is more accurate to classify it as the workload of a typical software developer. The benchmark consists of creating directories that mirror a source tree, copying files from the source tree, statting all files, reading all files, and building the source tree. Developed in 1987 to reflect a workload of five software developers, with approximately 70 files totalling 200 kilobytes, this benchmark environment does not reflect current reality. Consequently, a few researchers have used modified versions of the benchmark with parameters scaled to the current state of technology [60, 14, 28, 68]. Despite modifications, the Andrew benchmark has other limitations. It does not truly stress the I/O subsystem since less than 25% of the time is spent performing I/O [15]. The Standard Performance Evaluation Corporation (SPEC) System File Server (SFS) 97 R1 V3.0 benchmark [74] is popular among the computer industry. It measures the performance of the file system running as an NFS server. This benchmark is not suitable for our use since it introduces complications and interference from NFS protocol processing and UDP/IP network protocol traffic. Bonnie, written by Tim Bray in 1990, is a classic microbenchmark that performs sequential reads and writes. It allows for comparison between read and write performance, block access versus character access, and random versus sequential access [15]. Bonnie was designed to reveal bottlenecks in the file system [77]. However, the range of tests appears to be fairly narrow since it only stresses read and write operations of the file system and not operations such as file/directory creation/deletion, path name lookup, and obtaining/modifying file attributes, making it unsuitable for our use. IOStone [54] simulates the locality found in the BSD file system workload study by Ousterhout et al. [53]. According to Tang [77], the workload does not scale well and is not I/O bound. Parallel file accesses do not occur since only one process is used. Chen and Patterson [15] found that IOStone spends less than 25% of the time doing I/O, making it unsuitable for our use. SDET [21] simulates a time-sharing system used in a software development environment. It simulates a software developer at a terminal typing and executing shell commands. Each user, simulated by a shell
Page 1 and 2: Performance Analysis and Optimizati
Page 3 and 4: Acknowledgements This thesis has be
Page 5 and 6: 4.6 Measurements Taken and Graph In
Page 7 and 8: List of Tables 3.1 File system inte
Page 9 and 10: 6.1 Create. . . . . . . . . . . . .
Page 11 and 12: CHAPTER 1. INTRODUCTION AND MOTIVAT
Page 13 and 14: CHAPTER 1. INTRODUCTION AND MOTIVAT
Page 15 and 16: CHAPTER 2. BACKGROUND AND RELATED W
Page 17 and 18: CHAPTER 2. BACKGROUND AND RELATED W
Page 19: CHAPTER 2. BACKGROUND AND RELATED W
Page 23 and 24: Chapter 3 HFS Architecture This cha
Page 25 and 26: CHAPTER 3. HFS ARCHITECTURE 16 3.1.
Page 27 and 28: CHAPTER 3. HFS ARCHITECTURE 18 ORS
Page 29 and 30: CHAPTER 3. HFS ARCHITECTURE 20 dirt
Page 31 and 32: CHAPTER 3. HFS ARCHITECTURE 22 dire
Page 33 and 34: CHAPTER 3. HFS ARCHITECTURE 24 1. 2
Page 35 and 36: CHAPTER 3. HFS ARCHITECTURE 26 curr
Page 37 and 38: CHAPTER 4. EXPERIMENTAL SETUP 28 1
Page 39 and 40: CHAPTER 4. EXPERIMENTAL SETUP 30 de
Page 41 and 42: CHAPTER 4. EXPERIMENTAL SETUP 32 av
Page 43 and 44: Chapter 5 Microbenchmarks - Read Op
Page 45 and 46: CHAPTER 5. MICROBENCHMARKS - READ O
Page 61 and 62: Chapter 6 Microbenchmarks - Other F
Page 63 and 64: CHAPTER 6. MICROBENCHMARKS - OTHER
Page 71 and 72:
CHAPTER 6. MICROBENCHMARKS - OTHER
Page 73 and 74:
CHAPTER 6. MICROBENCHMARKS - OTHER
Page 75 and 76:
Chapter 7 Macrobenchmark 7.1 Purpos
Page 77 and 78:
CHAPTER 7. MACROBENCHMARK 68 server
Page 79 and 80:
CHAPTER 7. MACROBENCHMARK 70 avg #
Page 81 and 82:
CHAPTER 7. MACROBENCHMARK 72 Logica
Page 83 and 84:
Page 85 and 86:
Page 87 and 88:
CHAPTER 7. MACROBENCHMARK 78 7.5 Ot
Page 89 and 90:
CHAPTER 7. MACROBENCHMARK 80 to deq
Page 91 and 92:
CHAPTER 8. CONCLUSIONS 82 8.1 Gener
Page 93 and 94:
CHAPTER 8. CONCLUSIONS 84 read-only
Page 95 and 96:
Bibliography [1] Gene M. Amdahl. Va
Page 97 and 98:
BIBLIOGRAPHY 88 [33] David Kotz, So
Page 99:
BIBLIOGRAPHY 90 [71] Keith A. Smith
show all

Performance Analysis and Optimization of the Hurricane File System ...

Create successful ePaper yourself

Delete template?

Save as template?