13.10.2014 Views

OPTIMIZING THE JAVA VIRTUAL MACHINE INSTRUCTION SET BY ...

OPTIMIZING THE JAVA VIRTUAL MACHINE INSTRUCTION SET BY ...

OPTIMIZING THE JAVA VIRTUAL MACHINE INSTRUCTION SET BY ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

193<br />

Compared to the complete despecialization performed in Chapter 4, this table has<br />

omitted 17 bytecodes. Five of these bytecodes are bipush, sipush, ldc, goto and<br />

jsr. Each of these bytecodes is despecialized through a widening operation which<br />

increases the number of bytes used to represent an operand accessed by the bytecode.<br />

Examining the performance results from Chapter 4 revealed that performing widening<br />

despecializations generally had a more harmful impact on application runtime<br />

than most other despecialization categories. As a result, it was decided that these<br />

bytecodes would not be despecialized.<br />

The second category of despecializations identified in Chapter 4 that showed a<br />

larger negative impact on application runtime was branch despecialization. These<br />

despecializations also increase the total number of bytecodes executed because they replace<br />

one specialized bytecode with two general purpose bytecodes. As a result, all of<br />

the bytecodes in the branch despecialization category which included the if<br />

bytecodes, if icmplt and if icmple were omitted from this list of despecializations.<br />

Removing the widening and branch bytecodes from consideration left a total of<br />

54 specialized bytecodes. Four additional bytecodes were omitted to bring the total<br />

number of despecializations to 50. These bytecodes were aload 0, aload 1, iload 1<br />

and iload 3. They were selected because the profiling work performed in Chapter<br />

5 revealed that these were the four most frequently executed specialized bytecodes,<br />

accounting for between 1.8 and 8.5 percent of the bytecodes executed by an application<br />

on average.<br />

Profiling was conducted in order to determine which sequences of bytecodes were<br />

executed with greatest frequency once the 50 specialized bytecodes list in Table 8.1<br />

were replaced with their equivalent general purpose forms. Multicodes were identified<br />

from this profile data using the same techniques employed previously, comparing both<br />

the transfer reduction and total bytecodes replaced scoring strategies for maximum<br />

multicode lengths from 5 through 50. The performance of each of the six benchmarks<br />

under consideration is shown in Figure 8.8 through Figure 8.13.<br />

The performance results achieved by performing multicode substitution in the<br />

presence of 50 despecializations are highly similar to the results achieved by performing<br />

multicode substitution alone. The maximum improvement in performance<br />

observed was 1.45 percent of the original runtime for 222 mpegaudio while the maximum<br />

performance loss observed was 1.0 percent of the original runtime for 209 db.<br />

The overall average change in performance across all six benchmarks was an improvement<br />

of 0.2 percent of the original benchmark run times.<br />

Analyzing the total number of transfers removed for each benchmark revealed that

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!