28.07.2013 Views

Performance Analysis and Optimization of the Hurricane File System ...

Performance Analysis and Optimization of the Hurricane File System ...

Performance Analysis and Optimization of the Hurricane File System ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

CHAPTER 4. EXPERIMENTAL SETUP 33<br />

memory results is shown in Figure 4.3. These results demonstrate that <strong>the</strong> perfect memory model eliminated<br />

memory bank access time, bus access time, <strong>and</strong> cache coherency overhead.<br />

4.7.2 Memory <strong>System</strong> Saturation<br />

The SimOS memory system parameters are given in Table 4.1. For instance: 177 MB/s bus b<strong>and</strong>width,<br />

1 memory bank, <strong>and</strong> 2100 ns memory access time are critical memory parameters. However, <strong>the</strong>se parameters<br />

alone do not describe memory system behavior under increasing contention. We ran a simple experiment<br />

to depict <strong>the</strong> characteristics <strong>and</strong> limitations <strong>of</strong> <strong>the</strong> memory system. <strong>File</strong> system scalability is limited to <strong>the</strong><br />

maximal throughput <strong>of</strong> <strong>the</strong>se hardware components. In <strong>the</strong> experiment, each processor executes a thread<br />

that sequentially traverses a separate, independent 2 MB array, reading <strong>and</strong> modifying each array element.<br />

The nature <strong>of</strong> <strong>the</strong> modification is a simple increment <strong>of</strong> <strong>the</strong> current value in <strong>the</strong> array element. Each element<br />

is 1 byte in size. The array is allocated local to each processor, using K42’s memory allocation routines. The<br />

allocation performs padding at <strong>the</strong> array boundaries to prevent false sharing <strong>of</strong> cache lines. The 2 MB array<br />

size ensured <strong>the</strong> data set would not fit in <strong>the</strong> 1 MB secondary cache so that read <strong>and</strong> write accesses to main<br />

memory were necessary. The goal <strong>of</strong> <strong>the</strong> experiment was to saturate <strong>the</strong> bus or <strong>the</strong> sole memory bank with<br />

memory traffic from <strong>the</strong> array accesses to determine <strong>the</strong> maximum load that can be placed on <strong>the</strong> system.<br />

Figure 4.4 shows that performance was satisfactory for up to 4 processors but degraded drastically<br />

beyond this point. Under continuous, intense pressure, <strong>the</strong> bus or memory bank could adequately service a<br />

maximum <strong>of</strong> 4 processors. In practice, this worst-case scenario should rarely occur since <strong>the</strong> processor caches<br />

should reduce <strong>the</strong> amount <strong>of</strong> memory bus traffic.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!