30 6 A BACKEND FOR PYPY’S <strong>JIT</strong>structions the entry point is set up and we can proceed to load the input arguments andcompile the operations in the trace.After generating the function interface we generate instructions to load the input argumentsof the loop to locations controlled by the register allocator to setup the state <strong>for</strong>the loop. Input values <strong>for</strong> a loop are passed to the backend through pre-allocated arrays.There is one such array <strong>for</strong> each different argument type. These types can be floats, integers,long longs and pointers, although at the current time the backend only supportsintegers and pointers. At runtime we allocate a register <strong>for</strong> each box passed to the loopusing these lists of input arguments. For each box we then generate the instructions toload the value stored in the corresponding list in memory to an allocated register.6.4 Compiling Trace OperationsOnce the frame is set up and the instructions to load the input arguments into registershave been generated, the next step is to generate instructions <strong>for</strong> each of the operationsin the trace. The code generation <strong>for</strong> operations is rather straight <strong>for</strong>ward and is dividedinto two steps. For each operation the first step is allocating the registers <strong>for</strong> theoperation’s arguments and, if present, <strong>for</strong> the result. The second step is the instruction selectionand generation step. This step actually emits the instructions that implement theoperations using the allocated registers. A goal we try to achieve during the instructionselection is to emit the operations that provide the best execution speed in relation to thekind of arguments <strong>for</strong> an operation, i.e selection the correct operation to add a registerand small constant.We are going to look at how these operations are implemented by taking a look at thedifferent groups of trace operations that are handled by the backend. The operations canbe categorized as follows:• arithmetic operations• comparison operations• memory allocation• memory access• calls• <strong>for</strong>cing of frames• guards• jumps6.4.1 Arithmetic OperationsThe <strong>JIT</strong> implements operations <strong>for</strong> unary and binary arithmetic operations <strong>for</strong> signed andunsigned integers and long longs as well as <strong>for</strong> floating point numbers.
6.4 Compiling Trace Operations 31According to the scheme described above, <strong>for</strong> arithmetic operations we first allocate registers<strong>for</strong> the operands and <strong>for</strong> the result. How this is done <strong>for</strong> the int_sub operation isshown in the function prepare_op_int_sub in Figure 18. This function allocates theregisters <strong>for</strong> variables involved in the operation and makes sure the corresponding valuesare stored in the registers. Once we have allocated registers <strong>for</strong> the current operationwe emit the actual instructions, done by the function emit_op_int_sub also shown inFigure 18, taking care to select the most suitable instructions to implement the operationdepending on the input variables. For this operation we check if the second operandfits in an immediate value and emit an operation which takes one register and one immediateoperand. If the first argument is an immediate value we make use of <strong>ARM</strong>’sreverse subtract operation which subtracts the second operand from the value stored inthe register passed as first operand. Finally if both values are stored in registers we emitan instruction which takes both arguments in registers. This procedure is similar <strong>for</strong> allarithmetic operations, except <strong>for</strong> those not provided by the underlying plat<strong>for</strong>m.def prepare_op_int_sub(self, op, fcond):a0, a1 = op.getarglist()l0 = self.make_sure_var_in_reg(a0)l1 = self.make_sure_var_in_reg(a1)res = self.<strong>for</strong>ce_allocate_reg(op.result)return [l0, l1, res]def emit_op_int_sub(self, op, arglocs, regalloc, fcond):l0, l1, res = arglocsif l0.is_imm():self.mc.RSB_ri(res.value, l1.value, l0.value)elif l1.is_imm():self.mc.SUB_ri(res.value, l0.value, l1.value)else:self.mc.SUB_rr(res.value, l0.value, l1.value)Figure 18: Implementation of the int_sub operationFor operations such as division and modulo which are not supported by the <strong>ARM</strong>v7-Aprofile of the <strong>ARM</strong> instruction set we provide pre-compiled wrapper functions that implementthe behaviour expected by the <strong>JIT</strong> operations. The <strong>ARM</strong>v7-R profile supportsthese operations and would not need these helper fuctions. The functions are compiledwhen the backend is translated and rely on the implementations of these operations providedby the compiler vendors, which usually comply to <strong>ARM</strong>’s EABI. The EABI [Smi09]provides a binary interface which, among other things, defines interfaces <strong>for</strong> arithmeticfunctions to be followed by compiler vendors. As defined in the EABI the compiler vendorscan reuse the implementation of functions defined in the EABI from other librariesat link time or they can generate code <strong>for</strong> the used functions from the EABI. When generatingEABI functions the compiler can emit code that uses the features provided bythe selected version of the EABI and the hardware plat<strong>for</strong>m, exploiting the presence ofvector and/or floating point units or falling back to software based implementations, tomake the best use of the plat<strong>for</strong>m while still adhering to a standardized interface whichis reusable across toolchains.