15.01.2015 Views

4th International Conference on Principles and Practices ... - MADOC

4th International Conference on Principles and Practices ... - MADOC

4th International Conference on Principles and Practices ... - MADOC

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Figure 1:<br />

Apache.<br />

Throughput degradati<strong>on</strong> behaviors of JBoss <strong>and</strong><br />

Figure 2: Throughput comparis<strong>on</strong> with respect to heap sizes.<br />

Workload of the benchmark is measured by transacti<strong>on</strong> rate, which<br />

specifies the number of Dealer <strong>and</strong> Manufacturing threads. Throughput<br />

of the benchmark is measured by JOPS (job operati<strong>on</strong>s per sec<strong>on</strong>d).<br />

The SPECjAppServer2004 Design Document [19] includes a<br />

complete descripti<strong>on</strong> of the workload <strong>and</strong> the applicati<strong>on</strong> envir<strong>on</strong>ment<br />

in which it is executed.<br />

3.2 Experimental Platforms<br />

To deploy SPECjAppServer2004, we used four machines to c<strong>on</strong>struct<br />

the three-tier architecture. Since our experiments utilized<br />

both the Uniprocessor system <strong>and</strong> the Multiprocessor system, our<br />

c<strong>on</strong>figurati<strong>on</strong> can be described as follows.<br />

Uniprocessor applicati<strong>on</strong> server (System A): The client machine<br />

is a dual-processor Apple PowerMac with 2x2GHz PowerPC G5<br />

processors <strong>and</strong> 2 GB of memory. The server is a single-processor<br />

1.6 GHz Athl<strong>on</strong> with 1GB of memory. The MySQL 8 database<br />

server is a Sun Blade with dual 2GHz AMD Opter<strong>on</strong> processors<br />

as the client machine running Fedora Core 2 <strong>and</strong> 2 GB of memory.<br />

Multiprocessor applicati<strong>on</strong> server (System B): The client machine<br />

is the same as the system above. However, we swapped the<br />

applicati<strong>on</strong> server machine <strong>and</strong> the database server machine. Thus,<br />

the dual-processor Sun Blade is used as the applicati<strong>on</strong> server, <strong>and</strong><br />

the single-processor Athl<strong>on</strong> is used as the database server.<br />

In all experiments, we used Suns J2SE 1.5.0 <strong>on</strong> the server side,<br />

<strong>and</strong> the young generati<strong>on</strong> area is set to the default value, which is<br />

1/9 of the entire heap <strong>and</strong> has shown to minimize the number of the<br />

expensive mature collecti<strong>on</strong>s. We ran all experiments in st<strong>and</strong>al<strong>on</strong>e<br />

mode with all n<strong>on</strong>-essential daem<strong>on</strong>s <strong>and</strong> services shut down.<br />

The virtual machine is instrumented to generate trace informati<strong>on</strong><br />

pertaining to the runtime behavior such as object allocati<strong>on</strong><br />

informati<strong>on</strong>, reference assignment, executi<strong>on</strong> thread informati<strong>on</strong>,<br />

<strong>and</strong> garbage collecti<strong>on</strong> (GC) informati<strong>on</strong>. It is not uncomm<strong>on</strong> that<br />

such trace files be as large as several gigabytes. These trace files are<br />

then used as inputs to our analysis tool that performs lifetime analysis<br />

similar to the Merlin algorithm proposed by Hertz et al. [10].<br />

The major difference between our approach <strong>and</strong> theirs is that ours<br />

uses off-line analysis <strong>and</strong> theirs uses <strong>on</strong>-line analysis. To obtain<br />

micro-architecture informati<strong>on</strong>, we utilize model specific performance<br />

m<strong>on</strong>itoring registers.<br />

8 MySQL available from http://www.mysql.com<br />

3.3 Variables <strong>and</strong> Measures<br />

We utilized several workload c<strong>on</strong>figurati<strong>on</strong>s to vary the level<br />

of stress <strong>on</strong> the selected applicati<strong>on</strong>s. In all experiments, we increased<br />

the workload from the minimum value available to the<br />

maximum value that still allow the applicati<strong>on</strong> to operate. For example,<br />

we began our experiment by setting the workload value of<br />

SPECjAppServer2004 to 1. In each subsequent experiment, we<br />

increased the workload value until JBoss encounters failure. The<br />

failure point is c<strong>on</strong>sidered to be the maximum workload that the<br />

system (combinati<strong>on</strong> of applicati<strong>on</strong> server, JVM, OS, etc.) can h<strong>and</strong>le.<br />

As shown in secti<strong>on</strong> 2, the throughput dramatically degrades<br />

as the workload increases. This degradati<strong>on</strong> is likely caused by<br />

the runtime overhead. To address our RO1, we m<strong>on</strong>itor the overall<br />

executi<strong>on</strong> time (T ), which is defined as:<br />

T = T app + T gc + T jit + T sync<br />

It is worth noticing that T app is the time spent executing the applicati<strong>on</strong><br />

itself. T gc is the time spent <strong>on</strong> garbage collecti<strong>on</strong>. T jit<br />

is the time spent <strong>on</strong> runtime compilati<strong>on</strong>. Many modern virtual<br />

machines use Just-In-Time (JIT) compilers to translate byte-code<br />

into native instructi<strong>on</strong>s when a method is first executed. This time<br />

does not include the executi<strong>on</strong> of compiled methods; instead, it is<br />

the time spent <strong>on</strong> the actual methods compilati<strong>on</strong> <strong>and</strong> code cache<br />

management. Finally, T sync is the time spent <strong>on</strong> synchr<strong>on</strong>izati<strong>on</strong>.<br />

We m<strong>on</strong>itored synchr<strong>on</strong>izati<strong>on</strong> operati<strong>on</strong>s such as lock/unlock, notify/wait,<br />

the number of threads yield due to lock c<strong>on</strong>tenti<strong>on</strong>s. We<br />

chose these time comp<strong>on</strong>ents because they have historically been<br />

used to measure the performance of Java Virtual Machines [2].<br />

By measuring the executi<strong>on</strong> of each run-time functi<strong>on</strong>, we can<br />

identify the functi<strong>on</strong> that is most sensitive to the increasing workload.<br />

The result of this research objective is used as the focal point<br />

in RO2. To address RO2, we further investigated the runtime behaviors<br />

of these factors. Once again, we varied the workload but<br />

this time, we also measured other performance parameters such as<br />

the number of page faults in additi<strong>on</strong> to the throughput. These parameters<br />

give us more insight into the effect of these factors <strong>on</strong><br />

the throughput. Specifically, we closely examined the governing<br />

policies of these runtime factors (causes) to gain more underst<strong>and</strong>ing<br />

of the effects they have <strong>on</strong> the throughput. To address RO3, we<br />

c<strong>on</strong>ducted experiments that adjust both the fundamental algorithms<br />

<strong>and</strong> the policies used by the runtime factors <strong>and</strong> observed their ef-<br />

42

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!