An ARM Backend for PyPyls Tracing JIT - STUPS Group

More documents

Recommendations

Info

30 6 A BACKEND FOR PYPY’S JITstructions the entry point is set up and we can proceed to load the input arguments andcompile the operations in the trace.After generating the function interface we generate instructions to load the input argumentsof the loop to locations controlled by the register allocator to setup the state forthe loop. Input values for a loop are passed to the backend through pre-allocated arrays.There is one such array for each different argument type. These types can be floats, integers,long longs and pointers, although at the current time the backend only supportsintegers and pointers. At runtime we allocate a register for each box passed to the loopusing these lists of input arguments. For each box we then generate the instructions toload the value stored in the corresponding list in memory to an allocated register.6.4 Compiling Trace OperationsOnce the frame is set up and the instructions to load the input arguments into registershave been generated, the next step is to generate instructions for each of the operationsin the trace. The code generation for operations is rather straight forward and is dividedinto two steps. For each operation the first step is allocating the registers for theoperation’s arguments and, if present, for the result. The second step is the instruction selectionand generation step. This step actually emits the instructions that implement theoperations using the allocated registers. A goal we try to achieve during the instructionselection is to emit the operations that provide the best execution speed in relation to thekind of arguments for an operation, i.e selection the correct operation to add a registerand small constant.We are going to look at how these operations are implemented by taking a look at thedifferent groups of trace operations that are handled by the backend. The operations canbe categorized as follows:• arithmetic operations• comparison operations• memory allocation• memory access• calls• forcing of frames• guards• jumps6.4.1 Arithmetic OperationsThe JIT implements operations for unary and binary arithmetic operations for signed andunsigned integers and long longs as well as for floating point numbers.
6.4 Compiling Trace Operations 31According to the scheme described above, for arithmetic operations we first allocate registersfor the operands and for the result. How this is done for the int_sub operation isshown in the function prepare_op_int_sub in Figure 18. This function allocates theregisters for variables involved in the operation and makes sure the corresponding valuesare stored in the registers. Once we have allocated registers for the current operationwe emit the actual instructions, done by the function emit_op_int_sub also shown inFigure 18, taking care to select the most suitable instructions to implement the operationdepending on the input variables. For this operation we check if the second operandfits in an immediate value and emit an operation which takes one register and one immediateoperand. If the first argument is an immediate value we make use of ARM’sreverse subtract operation which subtracts the second operand from the value stored inthe register passed as first operand. Finally if both values are stored in registers we emitan instruction which takes both arguments in registers. This procedure is similar for allarithmetic operations, except for those not provided by the underlying platform.def prepare_op_int_sub(self, op, fcond):a0, a1 = op.getarglist()l0 = self.make_sure_var_in_reg(a0)l1 = self.make_sure_var_in_reg(a1)res = self.force_allocate_reg(op.result)return [l0, l1, res]def emit_op_int_sub(self, op, arglocs, regalloc, fcond):l0, l1, res = arglocsif l0.is_imm():self.mc.RSB_ri(res.value, l1.value, l0.value)elif l1.is_imm():self.mc.SUB_ri(res.value, l0.value, l1.value)else:self.mc.SUB_rr(res.value, l0.value, l1.value)Figure 18: Implementation of the int_sub operationFor operations such as division and modulo which are not supported by the ARMv7-Aprofile of the ARM instruction set we provide pre-compiled wrapper functions that implementthe behaviour expected by the JIT operations. The ARMv7-R profile supportsthese operations and would not need these helper fuctions. The functions are compiledwhen the backend is translated and rely on the implementations of these operations providedby the compiler vendors, which usually comply to ARM’s EABI. The EABI [Smi09]provides a binary interface which, among other things, defines interfaces for arithmeticfunctions to be followed by compiler vendors. As defined in the EABI the compiler vendorscan reuse the implementation of functions defined in the EABI from other librariesat link time or they can generate code for the used functions from the EABI. When generatingEABI functions the compiler can emit code that uses the features provided bythe selected version of the EABI and the hardware platform, exploiting the presence ofvector and/or floating point units or falling back to software based implementations, tomake the best use of the platform while still adhering to a standardized interface whichis reusable across toolchains.
Page 1: INSTITUT FÜR INFORMATIKSoftwaretec
Page 5: AbstractA large part of the computi
Page 9 and 10: 31 IntroductionARM cores are presen
Page 11 and 12: 52 ARMARM is at the same time the n
Page 13 and 14: 2.2 The ARM Architecture 7The curre
Page 15 and 16: 2.2 The ARM Architecture 9and from
Page 17 and 18: 113 Just-in-Time CompilationJust-in
Page 19 and 20: 3.2 Trace Based Just-In-Time Compil
Page 21 and 22: 154 PyPyThe PyPy project was starte
Page 23 and 24: 175 PyPy’s Approach to Tracing JI
Page 25 and 26: 5.1 The Shape of a Trace 19program
Page 27 and 28: 5.2 Optimizations 21loop_start(a0,
Page 29 and 30: 236 A Backend for PyPy’s JITIn th
Page 31 and 32: 6.1 Low level Code Generation Inter
Page 33 and 34: 6.2 Register Allocation 27def gen_l
Page 35: 6.3 Setup to Execute a Compiled Loo
Page 39 and 40: 6.4 Compiling Trace Operations 33of
Page 41 and 42: 6.5 Guards and Leaving the Loop 35o
Page 43 and 44: 37ARM using a cross compiler, such
Page 45 and 46: 8.2 Benchmarks 398.2.1 Python Bench
Page 47 and 48: 8.2 Benchmarks 41Benchmark cpython
Page 49 and 50: 8.2 Benchmarks 43Ratio6x5x4x3x2x1x0
Page 51 and 52: 8.2 Benchmarks 458.2.2 Prolog Bench
Page 53 and 54: 8.2 Benchmarks 47SWInojit, boehmjit
Page 55 and 56: 499 Related WorkThere are many virt
Page 57 and 58: 5111 AnnexBenchmark cpython [ms] no
Page 59 and 60: REFERENCES 53References[AACM07] ANC
Page 61 and 62: REFERENCES 55annual ACM SIGPLAN con
Page 63 and 64: LIST OF FIGURES 57List of Figures1

An ARM Backend for PyPyls Tracing JIT - STUPS Group

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?