01.09.2013 Views

Appendix G - Clemson University

Appendix G - Clemson University

Appendix G - Clemson University

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Exercises<br />

Exercises ■ G-49<br />

Watanabe, T. [1987]. “Architecture and performance of the NEC supercomputer SX system,”<br />

Parallel Computing 5, 247–255.<br />

Watson, W. J. [1972]. “The TI ASC—a highly modular and flexible super processor<br />

architecture,” Proc. AFIPS Fall Joint Computer Conf., 221–228.<br />

In these exercises assume VMIPS has a clock rate of 500 MHz and that Tloop =<br />

15. Use the start-up times from Figure G.4, and assume that the store latency is<br />

always included in the running time.<br />

G.1 [10] Write a VMIPS vector sequence that achieves the peak MFLOPS<br />

performance of the processor (use the functional unit and instruction description<br />

in Section G.2). Assuming a 500-MHz clock rate, what is the peak MFLOPS?<br />

G.2 [20/15/15] Consider the following vector code run on a 500-MHz<br />

version of VMIPS for a fixed vector length of 64:<br />

LV V1,Ra<br />

MULV.D V2,V1,V3<br />

ADDV.D V4,V1,V3<br />

SV Rb,V2<br />

SV Rc,V4<br />

Ignore all strip-mining overhead, but assume that the store latency must be<br />

included in the time to perform the loop. The entire sequence produces 64 results.<br />

a. [20] Assuming no chaining and a single memory pipeline, how<br />

many chimes are required? How many clock cycles per result (including both<br />

stores as one result) does this vector sequence require, including start-up<br />

overhead?<br />

b. [15] If the vector sequence is chained, how many clock cycles per<br />

result does this sequence require, including overhead?<br />

c. [15] Suppose VMIPS had three memory pipelines and chaining.<br />

If there were no bank conflicts in the accesses for the above loop, how many<br />

clock cycles are required per result for this sequence?<br />

G.3 [20/20/15/15/20/20/20] Consider the following FORTRAN code:<br />

do 10 i=1,n<br />

A(i) = A(i) + B(i)<br />

B(i) = x * B(i)<br />

10 continue<br />

Use the techniques of Section G.6 to estimate performance throughout this exercise,<br />

assuming a 500-MHz version of VMIPS.<br />

a. [20] Write the best VMIPS vector code for the inner portion of<br />

the loop. Assume x is in F0 and the addresses of A and B are in Ra and Rb,<br />

respectively.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!