28.07.2013 Views

Performance Analysis and Optimization of the Hurricane File System ...

Performance Analysis and Optimization of the Hurricane File System ...

Performance Analysis and Optimization of the Hurricane File System ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

CHAPTER 7. MACROBENCHMARK 77<br />

7.4.1 Results<br />

With <strong>the</strong> user-level configuration, <strong>the</strong> amount <strong>of</strong> stress placed on <strong>the</strong> file system was significantly reduced,<br />

migrating it to <strong>the</strong> VFS <strong>and</strong> FCM components <strong>of</strong> <strong>the</strong> operating system instead. As shown in Figure 7.11 8 ,<br />

user-level scalability was largely unaffected by <strong>the</strong> optimizations. The user-level unoptimized <strong>and</strong> optimized<br />

versions performed similarly. Fur<strong>the</strong>r investigation revealed that, in <strong>the</strong> unoptimized user-level version on<br />

8 processors, only about 9% <strong>of</strong> a thread’s execution time was spent in <strong>the</strong> file system. At best, we could<br />

reduce file system overhead down to 0% <strong>and</strong> improve user-level performance by at most 9% 9 . In fact, in <strong>the</strong><br />

fully optimized user-level version (optimized 2), only 2% <strong>of</strong> a thread’s execution time was spent in <strong>the</strong> file<br />

system. Based on cycle counts, <strong>the</strong>se improvements reduced file system usage time by 90% (on an 8 processor<br />

system). From this perspective, <strong>the</strong> optimizations were effective, however, in <strong>the</strong> larger context <strong>of</strong> overall<br />

execution time, <strong>the</strong> optimizations had little impact.<br />

When <strong>the</strong> Web application was run at user-level, scalability problems were not caused by <strong>the</strong> <strong>the</strong> file<br />

system, but ra<strong>the</strong>r were caused by o<strong>the</strong>r components <strong>of</strong> <strong>the</strong> operating system, such as <strong>the</strong> FCM <strong>and</strong> <strong>the</strong><br />

VFS. Optimizing <strong>the</strong>se components were beyond <strong>the</strong> scope <strong>of</strong> this <strong>the</strong>sis.<br />

Since only 9% <strong>of</strong> a thread’s execution time was spent in <strong>the</strong> file system, a larger Web log trace could have<br />

enabled <strong>the</strong> Web server simulator to add <strong>the</strong> necessary stress to <strong>the</strong> file system, causing this portion <strong>of</strong> time<br />

to increase significantly. A larger workload would cause more data to be cached <strong>and</strong> accumulated in <strong>the</strong><br />

FCM until physical memory was depleted to <strong>the</strong> point where cache pages must be evicted. The subsequent<br />

reloading <strong>of</strong> previously evicted file data in <strong>the</strong> FCM would invoke <strong>the</strong> file system <strong>and</strong> increase its load. It<br />

could transform <strong>the</strong> FCM/VFS-bound workload into a HFS-bound workload. However, this configuration<br />

was not possible because <strong>of</strong> time constraints, speed limitations <strong>of</strong> <strong>the</strong> SimOS simulator, <strong>and</strong> <strong>the</strong> lack <strong>of</strong><br />

page eviction support in <strong>the</strong> version <strong>of</strong> K42 used. Regardless, it is important to ensure that <strong>the</strong> file system<br />

component is scalable when highly stressed, as <strong>the</strong> server-level experiments have demonstrated.<br />

Since <strong>the</strong> measurements were taken during <strong>the</strong> second run <strong>of</strong> <strong>the</strong> experiment, we were curious as to <strong>the</strong><br />

results from <strong>the</strong> first run. There should be more stress on <strong>the</strong> file system since <strong>the</strong> FCM <strong>and</strong> VFS caches<br />

are initially empty <strong>and</strong> must be filled by acquiring information from <strong>the</strong> file system. The results, shown in<br />

Figure 7.12, indicate that <strong>the</strong> unoptimized version experienced scalability problems while <strong>the</strong> optimized<br />

version scaled much better. Upon fur<strong>the</strong>r examination, we found at least 74% <strong>of</strong> a thread’s execution time<br />

was spent in <strong>the</strong> file system on an unoptimized 8 processor configuration. Such a large proportion meant<br />

that <strong>the</strong> file system was adequately stressed. Using <strong>the</strong> fully optimized configuration reduced this portion<br />

down to less than 25%. These results indicate that, under appropriate conditions, <strong>the</strong> optimizations were<br />

effective on <strong>the</strong> user-level Web server simulator.<br />

8 User-level results on 12 processors could not be obtained due to OS stability problems.<br />

9 In fact, over-all user-level performance improved by 23%. The additional improvement was most likely due to over-all<br />

reduced memory-bus contention.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!