12.07.2015 Views

CS 5470 Compiler Techniques and Principles

CS 5470 Compiler Techniques and Principles

CS 5470 Compiler Techniques and Principles

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>CS</strong> <strong>5470</strong><strong>Compiler</strong> <strong>Techniques</strong> <strong>and</strong> <strong>Principles</strong>March 17, 2010 — LECTURE 22Dynamic-programming algorithmEfficiency of tiling algorithmsCISC machines


Maximal Munch Algorithm●Starting at root of IR tree, find largest tile that fits.●This covers root node (maybe more), leaving severalsubtrees. Repeat for each subtree.●As each tile is placed, corresponding instruction isgenerated. Instructions generated in reverse.●The “largest tile” is the one that covers the most nodes.If two tiles of equal size, choice is arbitrary.<strong>CS</strong> <strong>5470</strong>—Lecture 22 2


Dynamic-Programming Algorithm●Finds the optimum tiling of an IR tree.●In general, dynamic programming finds optimumsolutions for a whole problem based on the optimumsolutions for each subproblem.●Assigns a cost to every node in the IR tree.●COST(n) = sum of instruction costs for best instructionsequence to tile the subtree rooted at n.<strong>CS</strong> <strong>5470</strong>—Lecture 22 3


Dynamic-Programming Algorithm●For each tile of cost c matching node n, there are zero ormore subtrees s i corresponding to the leaves of the tile(tile leaves ≠ node leaves).● Cost c i of each subtree is already computed.● Cost of matching tile t to node n is c + ∑ c i .● Of all tiles t j that match node n, choose one withminimum cost.<strong>CS</strong> <strong>5470</strong>—Lecture 22 4


ExampleConsider IR tree:MEM+CONST 1CONST 2●Only one tile matches node CONST 1, the ADDIinstruction with cost 1.●The same is true for CONST 2.<strong>CS</strong> <strong>5470</strong>—Lecture 22 5


ExampleSeveral tiles match the + node:TILE INSTRUCTION TILE LEAVES TOTAL+ADD 1 1+1 3+CONST+CONSTADDI 1 1 2ADDI 1 1 2<strong>CS</strong> <strong>5470</strong>—Lecture 22 6


ExampleSeveral tiles match the MEM node:TILE INSTRUCTION TILE LEAVES TOTALMEMLOAD 1 2 3MEM+CONSTMEM+CONSTLOAD 1 1 2LOAD 1 1 2<strong>CS</strong> <strong>5470</strong>—Lecture 22 7


Instruction Emission●Once the cost of the root node (<strong>and</strong> thus entire tree) isfound, instruction emission begins.● Emission(node n): For each leaf l i of tile selected atnode n, perform Emission(l i ). Then emit instructionmatched at node n.●For our example, the following is emitted.ADDI r 1 ← r 0 + 1LOAD r 1 ← M[r 1 + 2]<strong>CS</strong> <strong>5470</strong>—Lecture 22 8


Efficiency of Tiling AlgorithmsSuppose that●there are T tiles for an instruction set,●the avg matching tile contains K non-leaf nodes,●the largest number of nodes ever examined to match a tile isK' (about the size of largest tile),●the avg number of matching tiles at each node is T'.For a typical RISC machine,T = 50, K = 2, K' = 4, <strong>and</strong> T' = 5.<strong>CS</strong> <strong>5470</strong>—Lecture 22 9


Efficiency of Tiling Algorithms●If input tree has N nodes, maximal munch (MM)considers matches at only N/K nodes. Why?●To find all tiles matching a node, MM examines at mostK' nodes. Then it compares each successful match tofind match with minimal cost.●Cost of matching each node using MM? Total cost?●The dynamic-programming algorithm must find allmatches at every node. Its cost?<strong>CS</strong> <strong>5470</strong>—Lecture 22 10


RISC v. CISC1. 32 registers 1. few registers (16, 8, 6)2. one class of registers 2. multiple classes of registers3. arithmetic ops between 3. arithmetic ops can accessregisters only registers or memory4. 3-address instructions, 4. 2-address instructions,r 1 ← r 2 op r 3 r 1 ← r 1 op r 25. load/store only with 5. several different addressingM[reg+const] addr mode modes6. 32-bit instructions 6. variable-length instructions7. one result or effect per 7. instructions with side effects,instruction “autoincrement”<strong>CS</strong> <strong>5470</strong>—Lecture 22 11


CISC Machine: Pentium●Has six general-purpose registers, plus SP <strong>and</strong> FP.●Multiply/divide instructions can operate only on the eaxregister.● Two-address arithmetic instructions: r 1 ← r 1 op r 2(destination register same as left source register).●Also one register oper<strong>and</strong> <strong>and</strong> one memory oper<strong>and</strong>:M[r 1 +c] ← M[r 1 +c] op r 2 , r 1 ← r 1 op M[r 2 +c].<strong>CS</strong> <strong>5470</strong>—Lecture 22 12


Instruction Selection for CISC1. few registers Generate TEMP nodes freely, leave toregister allocator.2. register classes To implement t 1 ← t 2 * t 3mov eax, t 2mul t 3mov t 1 , eax3. 2-address instructions To implement t 1 ← t 2 + t 3mov t 1 , t 2add t 1 , t 3<strong>CS</strong> <strong>5470</strong>—Lecture 22 13


Instruction Selection for CISC4. arithmetic ops can address memory A TEMP nodecould be a memory location. Can either fetch alloper<strong>and</strong>s into registers <strong>and</strong> move back to memoryafter operation, or use a memory-mode oper<strong>and</strong>s.mov eax, [ebp - 8]add eax, ecx add [ebp - 8], ecxmov [ebp - 8], eaxBoth sequences are equally fast (3 cycles), but oneon left trashes value in eax (register allocator issue).<strong>CS</strong> <strong>5470</strong>—Lecture 22 14


Instruction Selection for CISC5. several addressing modes●An addressing-mode instruction that accomplishes xthings usually requires x cycles.●Such an instruction is no faster than the correspondingmulti-instruction sequence.●Tree-matching can be made to select CISC addressingmodeinstructions or simpler instructions.6. variable-length instructions Not of concern.<strong>CS</strong> <strong>5470</strong>—Lecture 22 15


Instruction Selection for CISC7. instructions with side effects●Some machines have an “autoincrement” memoryfetch instruction whose effect isr 2 M[r 1 ]; r 1 ← r 1 + 4●Such an instruction is difficult to model using treepatterns because it produces two results.1. Ignore it (few modern machines haveautoincrement).2. Do an ad-hoc match within code generator.<strong>CS</strong> <strong>5470</strong>—Lecture 22 16

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!