13.10.2014 Views

OPTIMIZING THE JAVA VIRTUAL MACHINE INSTRUCTION SET BY ...

OPTIMIZING THE JAVA VIRTUAL MACHINE INSTRUCTION SET BY ...

OPTIMIZING THE JAVA VIRTUAL MACHINE INSTRUCTION SET BY ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

159<br />

However, while considering longer sequences may allow a larger number of bytecodes<br />

to be replaced, or more transfers to be removed, there is also a potential cost associated<br />

with selecting many long multicodes. Since a new codelet must be added to the<br />

interpreter’s execution loop for each multicode selected, it is possible that choosing<br />

many long multicodes may increase the size of the interpreter’s main loop beyond<br />

the size of the instruction cache. Should this occur, it is likely that the performance<br />

loss due to decreased cache utilization will outweigh the performance gain achieved<br />

by multicode substitution.<br />

7.6.1 Individual Benchmark Performance<br />

Six benchmarks from the SPEC JVM98 benchmark suite were tested. These were<br />

described previously in Section 4.2.2 as part of the discussion presented on despecialization.<br />

The performance results achieved by performing multicode identification<br />

and substitution for each benchmark are shown in Figure 7.10 through Figure 7.21.<br />

Each of the following pages contains two graphs. The upper graph shows the<br />

performance achieved for each combination of scoring system and maximum multicode<br />

length considered. Fifty multicode substitutions were performed for each point shown<br />

on the graph. For example, Figure 7.10 shows that, for the 201 compress benchmark,<br />

application run time was reduced by approximately 18 percent when 50 multicodes<br />

were identified using Total Bytecodes Replaced Scoring to identify multicodes of up<br />

to 5 bytecodes in length. Similarly, Figure 7.10 shows that the best technique for<br />

identifying multicodes for the 201 compress benchmark was to consider sequences up<br />

to 35 bytecodes in length using Transfer Reduction scoring. Using this combination of<br />

scoring system and multicode length reduced application run time by over 24 percent.<br />

The lower graph on each page shows the performance achieved using the best<br />

technique identified in the upper graph. This graph shows how performing additional<br />

multicode substitutions results in further performance gains by examining the<br />

application’s performance when 10, 20, 30, 40 and 50 multicode substitutions are<br />

performed. The specific list of bytecode sequences replaced with multicodes can be<br />

found in Table A.1 through Table A.6, located in Appendix 11.<br />

Table 7.4 presents a summary of the performance results achieved for each of<br />

the benchmarks. It lists the multicode selection technique and maximum multicode<br />

length that offered the best performance. Each performance value is expressed as a<br />

percentage of the original, unmodified application run time. The table also shows<br />

the average change in performance achieved across the six benchmarks tested. The

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!