An ARM Backend for PyPyls Tracing JIT - STUPS Group

More documents

Recommendations

Info

28 6 A BACKEND FOR PYPY’S JITto be spilled is based on the previously calculated longevity of the variables, spilling thevariable that survives for the longest time. This selection scheme should avoid blockingregisters by linking them to very long lived variables. A possible drawback is althoughif the variable is long-lived and often used, then it could occur that it needs to be readfrequently from memory if spilling occurs between usages. Once a variable is selected forspilling we generate an instruction to move this variable to the spilling area on the stackand store the information about where on the stack the variable is stored in the registerallocator. After moving the variable away we can free the previously bound register andassociate it with a new variable.6.3 Setup to Execute a Compiled LoopApart from compiling a trace, we need a way to actually execute the compiled instructions.This means that we need a mechanism to call into the compiled code once the frontendtries to execute the same loop again. Similar to the approach described in [GFE + 09]we generate code that follows the platform calling conventions (see Section 2.2.3 for howto create a procedure interface [Ear09]) so it can be casted to a function pointer at runtimeand called as a normal function/procedure.The compilation process for a loop is started when the frontend passes an optimized traceto the backend. Before the instructions contained in the trace can be compiled the backendneeds to take some preparing steps to provide an interface to the compiled trace thatcan be executed. This consist of creating a callable interface and generating the instructionsto set up the frame and load the arguments for the loop.6.3.1 Frame LayoutBefore we can execute instruction we need to setup the frame of the procedure. For thiswe generate instructions that create a frame as described below.The frame layout is composed of four parts:• callee saved registers, according to the calling convention• a slot to store the force index, a value used to check if the interpreter level framewas forced (see Section 6.4.6)• an area to store spilled variables• and the stack where push and pop instructions operateThe frame begins with the registers the function has to save according to the calling conventiondescribed in Section 2.2.3, which are on ARM registers r4 up to r11 or FP. Afterthese registers one word on the stack is left to store the force index, see Section 6.4.6.After the location for the force index the frame contains space for spilled register values.The address of the beginning of this area is stored in the FP register and spilled valuesare addressed by their offset from the FP. After this area the classic stack area begins
6.3 Setup to Execute a Compiled Loop 29where intermediate results are stored and registers are pushed and popped around subprocedurecalls. The stack pointer (SP) always points to the last value pushed on thestack, so at the beginning of the execution it points to the end of the spilling area and ismodified automatically every time a value is pushed on the stack or popped from it. Deflectingfrom the calling convention we use the frame pointer to mark the location werethe spilled registers are stored in the frame. The exact size of the spilling area is determinedby the number of registers the register allocator needs to spill for a particular loopand is only known after compiling the instructions for the loop, so the exact position ofthe SP is patched after compiling the instructions of the loop, once the size of the spillingarea is known. Figure 17 shows the layout of the frame at the beginning of the executionof a compiled loop.Previous Framer4r5r6r7r8r9r10FPForce IndexFramePointerSpillingStackPointerFigure 17: Frame layout used by the ARM backend6.3.2 Function InterfaceThe interface mentioned above to call the compiled code is generated by creating an interfacethat follows the rules imposed by the AAPCS and then performs a set of operationsthat setup of the frame and state to execute the loop.The function interface generated for ARM starts with pushing the callee saved registerson the stack. As a next step we generate instructions to move the stack pointer by oneword to leave area for the force index. Then we generate four no-ops where we laterare going to patch the instructions to setup the size of the spilling area. With these in-
Page 1: INSTITUT FÜR INFORMATIKSoftwaretec
Page 5: AbstractA large part of the computi
Page 9 and 10: 31 IntroductionARM cores are presen
Page 11 and 12: 52 ARMARM is at the same time the n
Page 13 and 14: 2.2 The ARM Architecture 7The curre
Page 15 and 16: 2.2 The ARM Architecture 9and from
Page 17 and 18: 113 Just-in-Time CompilationJust-in
Page 19 and 20: 3.2 Trace Based Just-In-Time Compil
Page 21 and 22: 154 PyPyThe PyPy project was starte
Page 23 and 24: 175 PyPy’s Approach to Tracing JI
Page 25 and 26: 5.1 The Shape of a Trace 19program
Page 27 and 28: 5.2 Optimizations 21loop_start(a0,
Page 29 and 30: 236 A Backend for PyPy’s JITIn th
Page 31 and 32: 6.1 Low level Code Generation Inter
Page 33: 6.2 Register Allocation 27def gen_l
Page 37 and 38: 6.4 Compiling Trace Operations 31Ac
Page 39 and 40: 6.4 Compiling Trace Operations 33of
Page 41 and 42: 6.5 Guards and Leaving the Loop 35o
Page 43 and 44: 37ARM using a cross compiler, such
Page 45 and 46: 8.2 Benchmarks 398.2.1 Python Bench
Page 47 and 48: 8.2 Benchmarks 41Benchmark cpython
Page 49 and 50: 8.2 Benchmarks 43Ratio6x5x4x3x2x1x0
Page 51 and 52: 8.2 Benchmarks 458.2.2 Prolog Bench
Page 53 and 54: 8.2 Benchmarks 47SWInojit, boehmjit
Page 55 and 56: 499 Related WorkThere are many virt
Page 57 and 58: 5111 AnnexBenchmark cpython [ms] no
Page 59 and 60: REFERENCES 53References[AACM07] ANC
Page 61 and 62: REFERENCES 55annual ACM SIGPLAN con
Page 63 and 64: LIST OF FIGURES 57List of Figures1

An ARM Backend for PyPyls Tracing JIT - STUPS Group

Create successful ePaper yourself

Delete template?

Save as template?