13.10.2014 Views

OPTIMIZING THE JAVA VIRTUAL MACHINE INSTRUCTION SET BY ...

OPTIMIZING THE JAVA VIRTUAL MACHINE INSTRUCTION SET BY ...

OPTIMIZING THE JAVA VIRTUAL MACHINE INSTRUCTION SET BY ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

177<br />

∆ x1 /x 2 /x 3 /.../x n→ and ∆ x1 x 2 x 3 ...x n→ and the cost of the transfers that are removed. Unfortunately<br />

each of these values are also difficult to determine.<br />

Attempts were made to estimate the difference in cost between ∆ x1 /x 2 /x 3 /.../x n→<br />

and ∆ x1 x 2 x 3 ...x n→ by examining the assembly language implementations of each. The<br />

number of Intel assembly language instructions was counted and the difference was<br />

used as an estimate of the change in performance.<br />

did not give good results.<br />

Unfortunately this technique<br />

Because the Pentium III is a complex instruction set<br />

computer, the amount of time required to execute its instructions varies. Furthermore,<br />

simply counting the number of assembly language lines failed to consider many other<br />

interactions within the assembly language such as the cost of branches within the<br />

codelet and pipeline stalls associated with adjacent accesses to the same data. As a<br />

result, this technique was abandoned in favour of a strategy that takes these factors<br />

into consideration.<br />

While it is difficult to perform an analysis to determine the change in performance<br />

associated with a multicode substitution, the difference in performance can<br />

be measured relatively easily. A baseline execution time can be established by running<br />

an application in its original, unmodified form. Once the multicode substitution<br />

has been performed, the application is executed a second time.<br />

The difference in<br />

execution time is ε x1 x 2 x 3 ...x n→. Since the number of times the multicode sequence is<br />

executed is known, the value of ∆ x1 x 2 x 3 ...x n→ can also be determined.<br />

Initial attempts used micro-benchmarking in order to determine the ∆ x1 x 2 x 3 ...x n→<br />

for each bytecode sequence being considered. This involved constructing a new Java<br />

class file that contained a main method and a timing method. The main method<br />

invoked a static method which performed the timing. It was not possible to perform<br />

the timing directly in the main method because the Java Language Specification<br />

requires the main method to take a single object reference parameter, making it<br />

impossible to test bytecode sequences that made use of one of the load 0 or<br />

store 0 bytecodes other than aload 0 or astore 0. The algorithm followed by<br />

the timing method is shown in Figure 7.23.<br />

Building the timing method was a non-trivial undertaking. There was no guarantee<br />

that the bytecode sequence being tested would be stack neutral. This meant that<br />

it could require certain data values to be present on the stack before it executed and<br />

that it might leave some data values on the stack after it completed. Determining<br />

what these values were was complicated by the fact that there are some Java bytecodes<br />

which work with stack values of variable type. Examples of such bytecodes are<br />

getstatic, putstatic, getfield and putfield. In addition, other bytecodes such

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!