Parallelizing the Construction of Static Single Assignment Form

speedup 

1.6 

1.4 

1.2 

1 

0.8 

0.6 

0.4 

0.2 

1 10 100 1000 10000 

montecarlo 

raytracer 

method length threshold 

search 

Figure 2: Speedups for parallel SSA construction on 

selected benchmarks with method inlining applied 

ods are small, so the overhead of the parallel threads is not 

mitigated by sufficient work for these threads to perform. 

Method inlining is one optimization technique to reduce 

this problem. The code of callee methods is reproduced at 

corresponding call-sites in the caller method bodies. This 

increases the overall size of the code, but it often reduces 

execution time. Thus it is a valuable optimization for JIT 

compilers [30] to be applied to the most frequently executed 

regions of code. 

We apply static method inlining in the Soot framework, to 

rewrite the Java bytecode class files for some of our benchmarks. 

The parameters used are: expansion-factor 20, maxcontainer-size 

10000, max-inlinee-size 500. Table 3 shows 

the new statistics for methods in each inlined benchmark, 

and how this relates to the original benchmark code. A negative 

% change indicates a reduction in relation to the original, 

a positive % change indicates an expansion in relation 

to the original. 

Now we compare the sequential versus the parallel implementation 

of SSA construction on these inlined benchmarks, 

in the same way as before. Figure 2 shows the speedups at 

various method thresholds. As before, at method threshold 

t, all methods with size < t are transformed using the 

sequential algorithm, whereas all methods with size >= t 

are transformed using the parallel algorithm. The benchmarks 

shown exhibit some speedups at method thresholds 

around 500, whereas in the non-inlined versions (Figure 1) 

no speedup was possible. 

5. RELATED WORK 

To the best of our knowledge, no-one has previously reported 

on a parallelization strategy for the construction of 

SSA. 

However, given the sudden abundance of parallel threads 

in commodity hardware, there has been significant recent interest 

in adapting classical data flow analysis frameworks to 

take advantage of light-weight threading. Knoop [11] argues 

that reverse data flow analysis, or demand-driven data flow 

analysis problems (i.e. multiple data flow queries at specific 

program points) are embarassingly parallel, and therefore 

good candidates for deployment on multi-core systems. Edvinsson 

and Lowe [9] motivate parallel data flow analysis 

for rapid points-to analysis of an object-oriented program 

as part of an integrated development environment. They 

analyse the target methods of polymorphic calls in parallel. 

They introduce heuristics to avoid spawning new analysis 

threads if the thread management overhead is large in comparison 

with the amount of parallel execution. They evaluate 

a Java implementation of their points-to analysis, on a 

range of open-source Java applications. The average parallel 

speedup is 1.78 with their best heuristic, on an eight-core 

IA-32 system. 

Rodriguez [26, 25] implements an interprocedural data 

flow analysis algorithm using light-weight multithreading 

based on the Actor model in Scala. Note that Scala Actors 

are implemented using the Java fork/join framework, 

which we have used in our implementation work. Rodriguez 

presents an object-oriented type analysis algorithm, and shows 

how there is abundant potential parallelism when this analysis 

is applied to three DaCapo Java applications (up to 

1000-way parallelism on an ideal machine). 

Méndez-Lojo et al [21] present an Andersen-style points-to 

analysis based on constraint graphs. The graphs can be manipulated 

in parallel with simple rewrite rules. Their implementation 

gives 3x speedup over the best sequential version 

on an 8-core machine for substantial C benchmarks. 

There is a body of older work on parallelism in data flow 

analysis. For instance, Lee et al [18, 17] show how classical 

problems such as reaching definitions analysis can be 

parallelized. They define three kinds of parallelism for data 

flow problems. Our work falls into the third category: algorithmic 

parallelism. They note that the granularity of 

the problem decomposition is critical here. Such parallelization 

is only effective for ‘large procedures or interprocedural 

problems.’ Our results confirm that this observation still 

holds. They give an empirical study, analysing reaching definitions 

in Fortran benchmark programs on a special-purpose 

research hypercube-processor-interconnect machine. They 

show speedups and some scalability for up to eight processors. 

6. CONCLUSIONS 

In this paper we have shown that the standard sequential 

algorithm for SSA construction may be parallelized. We 

have implemented the parallel algorithm in an existing compiler 

infrastructure using light-weight threading mechanisms 

for Java. The implementation has been tested on a representative 

set of Java benchmark programs. For shorter 

methods, the parallel execution is outweighed by the overhead 

of the fork/join thread library. However the parallel 

speedups are significant for larger methods, suggesting that 

a threshold size is required to select which methods should 

be subject to parallel SSA construction. 

We anticipate that commodity hardware with increasingly 

large numbers of cores and threads will become common over 

the next few years: the challenge for programmers is to make 

the best use of these cores. 

We have three reasons for believing that the size of SSAbased 

representations is also set to grow: 

1. Given the current trend towards more aggressive optimisation, 

including loop unrolling and method inlining, 

compilers will have large method bodies to analyse

Previous page

Next page

1

2

3

4

5

6

7

8

9

Parallelizing the Construction of Static Single Assignment Form

Create successful ePaper yourself

Delete template?

Save as template?