20.01.2015 Views

Midterm Solution

Midterm Solution

Midterm Solution

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

0907432 Computer Design <strong>Midterm</strong> Exam Spring 2011<br />

4 Problems, 5 Pages 75 minutes March 31, 2011; 4 PM<br />

Your Name (in Arabic):___________________________<br />

Your Student ID: ________________________________<br />

Read The Following Carefully:<br />

(a) This exam booklet has 8 numbered pages and 4 problems. Check that your exam includes all 8<br />

pages. Show ALL of your work on these pages<br />

(b) All problems have weights assigned to them to help you budget your time. It is highly<br />

advisable that you manage your time carefully<br />

Good luck!<br />

Problem Points Score<br />

1 6<br />

2 8<br />

3 6<br />

4 10<br />

Total 30<br />

Page 1 of 5


0907432 Computer Design <strong>Midterm</strong> Exam Spring 2011<br />

4 Problems, 5 Pages 75 minutes March 31, 2011; 4 PM<br />

Problem 1 Amdahl’s Law (6 points)<br />

Three enhancements with the following speedups are proposed for a new architecture:<br />

Speedup1 = 30<br />

Speedup2 = 20<br />

Speedup3 = 15<br />

Only one enhancement is usable at a time.<br />

If enhancements 1 and 2 are each usable for 25% of the time, what fraction of the time must<br />

enhancement 3 be used to achieve an overall speedup of 10<br />

Page 2 of 5


0907432 Computer Design <strong>Midterm</strong> Exam Spring 2011<br />

4 Problems, 5 Pages 75 minutes March 31, 2011; 4 PM<br />

Problem 2 Loop unrolling [8 points]<br />

Unroll the following loop twice and schedule it, then find how many cycles are needed to execute one<br />

iteration of the unrolled loop. Assume that you have the five-stage pipeline with multi-cycle<br />

operations, FP operations takes 2 cycles, branch instructions are resolved in the Decode stage, and the<br />

processor can write one integer result and one FP result in the Write Back stage every cycle.<br />

The number of cycles needed is: ___15____<br />

Loop: l.d F0, 0(R1) F D E M W<br />

l.d F4, 0(R2) F D E M W<br />

mul.d F0, F0, F4 F D – E E M W<br />

add.d F2, F2, F0 F – D – E E M W<br />

daddui R1, R1, #-8 F – D E M W<br />

daddui R2, R2, #-8 F D E M W<br />

bnez R1, Loop F D E M W<br />

1 2 3 4 5 6 7 8 9 0 1 2 3 4 5<br />

Loop: l.d F0, 0(R1) F D E M W<br />

l.d F4, 0(R2) F D E M W<br />

l.d F6, -8(R1) F D E M W<br />

mul.d F0, F0, F4 F D E E M W<br />

l.d F8, -8(R2) F D E M W<br />

add.d F2, F2, F0 F D E E M W<br />

mul.d F6, F6, F8 F D E E M W<br />

daddui R1, R1, #-16 F D E M W<br />

add.d F2, F2, F6 F D E E M W<br />

daddui R2, R2, #-16 F D E M W<br />

bnez R1, Loop F D E M W<br />

Page 3 of 5


0907432 Computer Design <strong>Midterm</strong> Exam Spring 2011<br />

4 Problems, 5 Pages 75 minutes March 31, 2011; 4 PM<br />

Problem 3 Branch Prediction [6 points]<br />

Consider the following piece of code:<br />

DADDI R1, R0, #100<br />

L1: DADDI R1, R1, #-1<br />

BEQZ R1, END -- Branch 1<br />

DADDI R12, R0, #2<br />

L2: DADDI R12, R12, #-1<br />

BNEZ R12, L2 -- Branch 2<br />

J<br />

L1<br />

END: ...<br />

Assume R0 stores 0. Branch 1 is executed 100 times and branch 2 is executed a total of 198 times.<br />

For each branch, how many correct predictions will occur if we use the following prediction schemes<br />

Assume at the beginning of execution, the last branch was not taken. Please explain your answers.<br />

Part A [3 points]: 1-bit predictor initialized to T (taken)<br />

<strong>Solution</strong> :<br />

Branch 1, N,N,N,…,N,T is predicted correctly every time except the first and last.<br />

Branch 2 alternates T,N,T,N,.. and thus is predicted incorrectly every time except the first.<br />

Branch 1 : 98 correct, Branch 2 : 1 correct prediction<br />

Grading : ½ point for each number and 1 point for each explanation.<br />

Part B [3 points]: 2-bit saturating predictor initialized to 10 (taken)<br />

<strong>Solution</strong> :<br />

Branch 1 is predicted correctly every time except the first and last.<br />

Branch 2’s predictor alternates between 10 and 11, and thus predicts correctly every other<br />

time.<br />

Branch 1 : 98 and Branch 2 : 99 correct predictions<br />

Grading : ½ point for each number and 1 point for each explanation.<br />

Problem 4 Tomasulo’s algorithm [10 points]<br />

Page 4 of 5


0907432 Computer Design <strong>Midterm</strong> Exam Spring 2011<br />

4 Problems, 5 Pages 75 minutes March 31, 2011; 4 PM<br />

This problem concerns Tomasulo’s algorithm. Consider the following architecture specifications:<br />

Functional Unit Type Cycles in EX Number of Functional Units<br />

Integer 1 1<br />

FP Adder 2 1<br />

FP Multiplier 10 1<br />

1) Assume that you have unlimited reservation stations.<br />

2) Memory accesses use the integer functional unit to perform effective address calculation during the<br />

EX stage. For stores, memory is accessed during the EX stage. All loads access memory during the<br />

EX stage. Loads and Stores stay in EX for 1 cycle.<br />

3) Functional units are not pipelined.<br />

4) If an instruction moves to its WB stage in cycle x, then an instruction that is waiting on the same<br />

functional unit (due to a structural hazard) can start executing in cycle x.<br />

5) An instruction waiting for data on the CDB can move to its EX stage in the cycle after the CDB<br />

broadcast.<br />

6) Only one instruction can write to the CDB in a cycle. Stores do not need the CDB.<br />

7) Whenever there is a conflict for a functional unit or the CDB, assume that the oldest (by program<br />

order) of the conflicting instructions gets access, while others are stalled.<br />

8) Assume that the result from the integer functional unit is also broadcast on the CDB and forwarded<br />

to dependent instructions through the CDB (just like any floating point instruction).<br />

Complete the following table using Tomasulo’s algorithm. Only one instruction could be issued in a<br />

clock cycle. Fill in the cycle numbers in each pipeline stage. The entries for the first instruction are<br />

filled in for you. Explain the reason for any stalls.<br />

# Instruction IS EX WB Reason for Stalls<br />

1 L.D F0, 0(R1) 1 2 3<br />

2 L.D F2, 8(R1) 2 3 4<br />

3 MUL.D F4, F0, F2 3 5-14 15 Data dependence F2 (from 2)<br />

4 S.D F4, 24(R1) 4 16 -- Data dependence F4 (from 3)<br />

5 L.D F0, 16(R1) 5 6 7<br />

6 ADD.D F4, F0, F6 6 8-9 10 Data dependence F0 (from 5)<br />

7 ADD.D F2, F4, F8 7 11-12 13 Data dependence F4 (from 6)<br />

8 MUL.D F4, F4, F10 8 15-24 25 MUL FU occupied (from 3)<br />

9 S.D F2, 32(R1) 9 14 -- Data dependence F2 (from 7)<br />

10 S.D F4, 40(R1) 10 26 -- Data dependence F4 (from 8)<br />

11 DADDI R1, R1, #40 11 12 14 CDB Occupied<br />

Grading:<br />

-1 for each missed dependency (data or structural). –1 for any other errors. Cascading errors should not<br />

be penalized additionally as long as the relevant dependencies are still observed.<br />

Page 5 of 5

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!