22.08.2013 Views

Parallelizing the Construction of Static Single Assignment Form

Parallelizing the Construction of Static Single Assignment Form

Parallelizing the Construction of Static Single Assignment Form

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

(e.g. <strong>the</strong> control flow graph and basic block data structures)<br />

could perhaps have been optimized for parallel access. However<br />

since <strong>the</strong>se data structures cross-cut <strong>the</strong> entire analysis<br />

framework, any changes would have far-reaching implications.<br />

Instead, we modify <strong>the</strong> existing codebase as little as<br />

possible, only where absolutely necessary to support parallel<br />

SSA construction.<br />

This involves replacing some non-thread-safe objects e.g.<br />

HashMap with thread-safe alternatives e.g. ConcurrentHashMap.<br />

We also had to introduce a limited number <strong>of</strong> Semaphore objects<br />

to guard multi-threaded access to data, e.g. <strong>the</strong> first<br />

and last fields in Soot HashChain objects. The doall loops<br />

from <strong>the</strong> parallelized versions <strong>of</strong> Algorithms 1 and 3 are implemented<br />

as invokeAll() operations on ArrayLists <strong>of</strong> RecursiveTask<br />

objects.<br />

Most <strong>of</strong> <strong>the</strong> rewritten code is localized to <strong>the</strong> soot.shimple.internal<br />

package. We intend to contribute a patch back to <strong>the</strong> Soot<br />

maintainers. The whole code took around three man-weeks<br />

<strong>of</strong> development effort.<br />

4. EVALUATION<br />

4.1 Benchmarks<br />

We use standard Java benchmark programs from <strong>the</strong> Da-<br />

Capo and Java Grande suites, to evaluate our parallel SSA<br />

construction algorithm.<br />

The DaCapo suite <strong>of</strong> Java benchmarks [4], version 9.12<br />

(bach), consists <strong>of</strong> large, widely used, real-world open-source<br />

applications: in fact, typical input for a client-side JIT compiler.<br />

The Java Grande suite <strong>of</strong> benchmarks [27] contains<br />

five large-scale scientific applications. We use <strong>the</strong> sequential<br />

versions <strong>of</strong> <strong>the</strong>se programs. We observe that <strong>the</strong> Java<br />

Grande codebase is significantly smaller than DaCapo, however<br />

it is representative <strong>of</strong> code from <strong>the</strong> scientific domain.<br />

We execute each analysed application with a standard<br />

workload (default for DaCapo and SizeA for Java Grande)<br />

and record <strong>the</strong> classes that <strong>the</strong> JVM classloader accesses. We<br />

ignore classes that do not belong to <strong>the</strong> benchmark distribution,<br />

e.g. Java standard library classes. Now, given a list<br />

<strong>of</strong> classes for each application, we run <strong>the</strong>se classes through<br />

<strong>the</strong> Soot analysis. Table 1 summarizes <strong>the</strong> applications we<br />

select. For each benchmark, we report (i) <strong>the</strong> number <strong>of</strong><br />

methods that we will analyse in Soot, (ii) <strong>the</strong> arithmetic<br />

mean <strong>of</strong> <strong>the</strong> bytecode instruction lengths <strong>of</strong> <strong>the</strong>se selected<br />

methods, and (iii) <strong>the</strong> bytecode instruction length <strong>of</strong> <strong>the</strong><br />

longest method.<br />

4.2 Platform<br />

Our commodity multi-core evaluation platform is described<br />

in Table 2. We use a Java fork-join pool with twice <strong>the</strong><br />

number <strong>of</strong> worker threads as hardware threads in <strong>the</strong> system.<br />

This parameter may be tuned; it has some effect on<br />

performance, but we cannot explore tuning due to space restrictions.<br />

4.3 Experiments<br />

We compare two different SSA construction techniques.<br />

The sequential version is <strong>the</strong> default Shimple builder pass<br />

supplied in Soot v2.4.0. The parallel version is our custom<br />

pass that uses Java fork/join parallelism to implement <strong>the</strong><br />

parallel SSA construction algorithm outlined in Section 2<br />

earlier in this paper.<br />

Vendor Intel<br />

Codename Nehalem<br />

Architecture Core i7-920<br />

Cores x SMT Contexts 4x2<br />

Per-core L1 i/d 32KB/32KB<br />

Per-core L2 256KB<br />

Shared L3 8MB<br />

Core freq 2.67GHz<br />

RAM size 6GB<br />

OS Linux 2.6.31 (64bit)<br />

JVM (1.6) 14.0-b16<br />

max JVM heap 4GB<br />

# FJ threads 16<br />

Table 2: Evaluation platform for experiments<br />

For all tests, we measure execution times <strong>of</strong> compilation<br />

phases using <strong>the</strong> Java library call System.nanoTime(). All<br />

times are <strong>the</strong> arithmetic means <strong>of</strong> five measurements, which<br />

have a low coefficient <strong>of</strong> variation. We reduce timing variance<br />

by using large fixed size JVM heaps. We compute<br />

speedups as: mean sequential time / mean parallel time.<br />

For <strong>the</strong> benchmark applications, we report <strong>the</strong> speedup in<br />

overall SSA construction time, i.e. <strong>the</strong> sum <strong>of</strong> φ-function insertion<br />

and variable renaming. This SSA construction time<br />

is summed over all methods analysed, for each individual<br />

benchmark.<br />

4.3.1 Comparison on Standard Benchmarks<br />

Figure 1 shows speedups for parallel SSA construction on<br />

<strong>the</strong> selected benchmark applications, on <strong>the</strong> Intel Core i7<br />

platform. The speedup varies with <strong>the</strong> method size threshold<br />

t. For a particular point on a benchmark’s speedup<br />

curve, we apply <strong>the</strong> sequential SSA construction algorithm<br />

to all methods with size < t, whereas we apply <strong>the</strong> parallel<br />

SSA construction algorithm to all methods with size ≥ t.<br />

If t is set too small, <strong>the</strong>n almost methods are analysed indiscriminately<br />

using <strong>the</strong> parallel algorithm. For many small<br />

methods, <strong>the</strong> overhead <strong>of</strong> parallelism outweighs any overlapped<br />

execution gains. In many cases, this causes an overall<br />

slowdown (speedup scores below 1 in <strong>the</strong> graph). On <strong>the</strong><br />

o<strong>the</strong>r hand, if t is set too large, <strong>the</strong>n hardly any methods are<br />

analysed in parallel, so <strong>the</strong> SSA construction is almost always<br />

sequential. In all cases, <strong>the</strong> curves tend to speedup = 1<br />

as t → ∞.<br />

Parallel SSA construction is more beneficial for large methods.<br />

The benchmarks with <strong>the</strong> largest mean method sizes<br />

in Table 1, namely fop and batik in DaCapo and euler and<br />

moldyn in Java Grande, show <strong>the</strong> best speedups in Figure<br />

1.<br />

We have not investigated alternative heuristics for selecting<br />

which individual methods for which <strong>the</strong> parallel SSA<br />

construction algorithm is to be preferred to <strong>the</strong> sequential<br />

algorithm. Since <strong>the</strong> parallelism depends on (i) number <strong>of</strong><br />

variables, and (ii) dominance properties <strong>of</strong> <strong>the</strong> control flow<br />

graph, method size seems like a simple proxy measure for<br />

overall complexity. O<strong>the</strong>r heuristics might include s<strong>of</strong>tware<br />

metrics such as cyclomatic complexity [20].<br />

4.3.2 Comparison on Inlined Benchmarks<br />

The reason why many benchmarks do not give significant<br />

speedups with parallel SSA construction is that most meth-

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!