Chapter 4 Introduction
Chapter 4 Introduction
Chapter 4 Introduction
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
Morgan Kaufmann Publishers 31 January 2013Loop Unrollingn Replicate loop body to expose moreparallelismn Reduces loop-control overheadn Use different registers per replicationn Called “register renaming”n Avoid loop-carried “anti-dependencies”n Store followed by a load of the same registern Aka “name dependence”n Reuse of a register nameCSE 420 <strong>Chapter</strong> 4 — The Processor — 117Loop Unrolling ExampleLoop: lw $t0, 0($s1) # $t0=array elementaddu $t0, $t0, $s2 # add scalar in $s2sw $t0, 0($s1) # store resultaddi $s1, $s1,–4 # decrement pointerbne $s1, $zero, Loop # branch $s1!=0ALU/branch Load/store cycleLoop: addi $s1, $s1,–16 lw $t0, 0($s1) 1nop lw $t1, 12($s1) 2addu $t0, $t0, $s2 lw $t2, 8($s1) 3addu $t1, $t1, $s2 lw $t3, 4($s1) 4addu $t2, $t2, $s2 sw $t0, 16($s1) 5addu $t3, $t4, $s2 sw $t1, 12($s1) 6nop sw $t2, 8($s1) 7bne $s1, $zero, Loop sw $t3, 4($s1) 8n IPC = 14/8 = 1.75nCloser to 2, but at cost of registers and code size<strong>Chapter</strong> 4 — The Processor — 118