28.07.2013 Views

Performance Analysis and Optimization of the Hurricane File System ...

Performance Analysis and Optimization of the Hurricane File System ...

Performance Analysis and Optimization of the Hurricane File System ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

CHAPTER 6. MICROBENCHMARKS – OTHER FUNDAMENTAL OPERATIONS 56<br />

lookatme.txt 1000 times, where XXX refers to <strong>the</strong> processor that <strong>the</strong> thread is executing on. The files were<br />

regular, r<strong>and</strong>om-access files. All threads were executed in parallel <strong>and</strong> each was assigned its own processor<br />

<strong>and</strong> disk. The file was located several levels down <strong>the</strong> directory tree to provide a more realistic file location<br />

that would cause a more realistic coverage <strong>of</strong> cache entries examined. However, <strong>the</strong> parent directories were<br />

empty except for <strong>the</strong> very leaf directory that contained <strong>the</strong> target file. This configuration may have provided<br />

a slightly unfair workload since, in practice, each directory should have been populated, causing <strong>the</strong> lookup<br />

scheme to linearly traverse each directory to find <strong>the</strong> target entry that represented <strong>the</strong> next directory down<br />

<strong>the</strong> path. Since we were not concerned with absolute performance results but with relative scalability results,<br />

we considered <strong>the</strong> configuration to be appropriate.<br />

The results, shown in Figure 6.7, indicate that scalability was extremely poor beyond 2 processors.<br />

Since <strong>the</strong> experiment was run from within <strong>the</strong> file system address space (server-level), <strong>the</strong> normal file name<br />

caching system provided by <strong>the</strong> VFS was not utilized. Under normal usage, <strong>the</strong> VFS would h<strong>and</strong>le <strong>the</strong><br />

majority <strong>of</strong> repeated lookups <strong>of</strong> <strong>the</strong> same path names, making <strong>the</strong> performance <strong>of</strong> <strong>the</strong> file lookup operation<br />

less critical.<br />

6.3.1 Increased Initial Pool <strong>of</strong> Block Cache Entries<br />

This experiment examined <strong>the</strong> effect <strong>of</strong> increasing <strong>the</strong> initial pool <strong>of</strong> block cache entries from 20 to 200. Under<br />

<strong>the</strong> chosen workload, each thread required approximately 10 block cache entries in its working set. Due to<br />

<strong>the</strong> method <strong>of</strong> managing block cache entries, <strong>the</strong> characteristics <strong>of</strong> <strong>the</strong> chosen workload, <strong>and</strong> a temporary<br />

patch to avoid reader/writer problems in <strong>the</strong> original HFS implementation, <strong>the</strong> number <strong>of</strong> block cache entries<br />

never increased <strong>and</strong> remained at 20. The optimal behavior <strong>of</strong> <strong>the</strong> block cache system should have resulted<br />

in an increased number <strong>of</strong> block cache entries, from 20 entries to 200 entries. When running <strong>the</strong> experiment<br />

on more than 2 processors, this flaw caused <strong>the</strong> 20 block cache entries to thrash between all processors.<br />

The results from increasing <strong>the</strong> initial pool <strong>of</strong> block cache entries from 20 to 200 is shown in Figure 6.8.<br />

<strong>Performance</strong> improved but <strong>the</strong> scalability problems were not resolved, as shown in <strong>the</strong> magnified version <strong>of</strong><br />

<strong>the</strong> results in Figure 6.9.<br />

6.3.2 Applying <strong>the</strong> Read Test <strong>Optimization</strong>s<br />

In this experiment, we additionally applied <strong>the</strong> optimizations developed from <strong>the</strong> read test workload. This<br />

configuration included all <strong>the</strong> optimizations from Section 5.5.9 <strong>and</strong> <strong>the</strong> increased initial block entry pool size<br />

from Section 6.3.1. These optimizations included <strong>the</strong> following.<br />

1. Padded ORS hash list headers.<br />

2. Padded ORS <strong>and</strong> block cache entries.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!