13.10.2014 Views

OPTIMIZING THE JAVA VIRTUAL MACHINE INSTRUCTION SET BY ...

OPTIMIZING THE JAVA VIRTUAL MACHINE INSTRUCTION SET BY ...

OPTIMIZING THE JAVA VIRTUAL MACHINE INSTRUCTION SET BY ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

101<br />

machine code for the idioms. As a result, good performance can be achieved without<br />

relying on optimizations performed by later optimization stages.<br />

Before despecialization is performed, the bytecodes around the ifeq bytecode<br />

in question match one of the idioms tested for by the virtual machine’s optimizer,<br />

resulting in the generation of high quality machine code that represents those specific<br />

bytecodes. After despecialization is performed no idiom is present to handle the<br />

sequence of equivalent general purpose bytecodes. While the lack of an idiom to cover<br />

the equivalent sequence of general purpose bytecodes may initially be expected to give<br />

poorer performance, it leaves additional information for later optimization stages.<br />

Ironically, it appears that removing ifeq and consequently avoiding a bytecode idiom<br />

intended to result in the generation of high quality machine code is actually allowing<br />

other optimizations to achieve a substantial performance gain.<br />

Performance gains were also observed for a small number of benchmarks executed<br />

using Sun virtual machines. These include three benchmarks executed using the Sun-<br />

Blade 100 and one benchmark executed on the Pentium III. The performance gains<br />

observed for the Sun virtual machines were much smaller than the change observed<br />

for JGF LUFact on IBM’s RVM. Of the four benchmarks that showed improved performance<br />

on a Sun virtual machine, the largest improvement was approximately nine<br />

percent better than the original benchmark runtime. The source code for the Sun<br />

virtual machines has not yet been examined. Consequently, it is not currently possible<br />

to provide a detailed explanation of the performance improvements observed for<br />

these benchmarks. However, as was the case for IBM’s RVM, it was often possible to<br />

isolate the performance increase to a small number of despecializations. For example,<br />

the performance gain achieved for the JGF Search benchmark was also present when<br />

only the bytecodes lstore 2 and lload 2 were despecialized. Similarly, it was found<br />

that despecializing only lconst 0 was able to achieve much of the performance gain<br />

observed for the JGF Series benchmark. A small additional performance gain was<br />

observed for this benchmark when both iconst 1 and lconst 0 were despecialized.<br />

The final benchmark that showed a performance gain on the SunBlade 100 was<br />

201 compress. Unlike the other benchmarks that showed a performance gain, this<br />

benchmark showed gradually improving performance as the number of despecializations<br />

increased rather than distinct jumps in performance when specific despecializations<br />

were performed. As a result, it was not possible to isolate one or two specific<br />

bytecodes that were responsible for the observed improvement in performance. A<br />

similar situation also occurred for the JGF Series benchmark executed using Sun’s<br />

virtual machine on a Pentium III. In this case, a performance gain was observed after

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!