13.10.2014 Views

OPTIMIZING THE JAVA VIRTUAL MACHINE INSTRUCTION SET BY ...

OPTIMIZING THE JAVA VIRTUAL MACHINE INSTRUCTION SET BY ...

OPTIMIZING THE JAVA VIRTUAL MACHINE INSTRUCTION SET BY ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

102<br />

16 despecializations were performed. However, the gain appears to be the cumulative<br />

result of the combination of several despecializations, making it difficult to isolate<br />

exactly which combination of despecializations is of benefit.<br />

In general, the performance gains achieved through despecializations are relatively<br />

minor and occur only for a small number of benchmarks. Consequently it is not generally<br />

recommended that despecialization be used as a technique for attempting to<br />

improve application performance. Instead, it is simply observed that Java virtual<br />

machines are complex systems and that subtle interactions between different parts<br />

of that system, including the selection of Java bytecodes and the optimizations performed<br />

at various stages can have unanticipated results.<br />

The final sub-category covers those benchmarks that showed irregular performance<br />

with no apparent trend as the number of despecializations performed increased. Of<br />

these benchmarks, three showed a single large spike in performance that was subsequently<br />

overcome, returning to a runtime similar to the baseline performance, by<br />

performing additional despecializations. In at least some cases, this behaviour appears<br />

to be the result of performing ‘one half’ of a pair of despecializations, without<br />

performing the despecialization on the bytecode’s mate. For example, when 48 despecializations<br />

were performed several specialized istore bytecodes have been removed<br />

including istore 1, istore 2 and istore 3. However, the corresponding specialized<br />

load bytecodes have not yet been despecialized because they were executed with<br />

greater frequency. This may cause the optimizer difficulty when optimizations are performed<br />

on a sequence of bytecodes where local variables must be loaded and stored<br />

to the same local variable positions because the loads will be performed with specialized<br />

bytecodes while the stores are performed with general purpose bytecodes.<br />

Consequently, the optimizer may fail to recognize that the sequence is a candidate for<br />

such optimizations. The final 2 benchmarks in this group show additional variability<br />

rather than a single larger spike in performance. It is believed that their performance<br />

is also the result of an interaction between the bytecodes selected and the virtual<br />

machine’s optimizer.<br />

5.2.5 Performance Summary<br />

The average performance change across all virtual machines is presented for each<br />

benchmark in Figure 5.1. This figure also includes the overall average change in performance<br />

across all benchmarks, virtual machines and computer architectures tested.<br />

Overall, the average performance change observed when 32 or fewer despecializations

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!