Appendix G - Clemson University
Appendix G - Clemson University
Appendix G - Clemson University
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
Exercises<br />
Exercises ■ G-49<br />
Watanabe, T. [1987]. “Architecture and performance of the NEC supercomputer SX system,”<br />
Parallel Computing 5, 247–255.<br />
Watson, W. J. [1972]. “The TI ASC—a highly modular and flexible super processor<br />
architecture,” Proc. AFIPS Fall Joint Computer Conf., 221–228.<br />
In these exercises assume VMIPS has a clock rate of 500 MHz and that Tloop =<br />
15. Use the start-up times from Figure G.4, and assume that the store latency is<br />
always included in the running time.<br />
G.1 [10] Write a VMIPS vector sequence that achieves the peak MFLOPS<br />
performance of the processor (use the functional unit and instruction description<br />
in Section G.2). Assuming a 500-MHz clock rate, what is the peak MFLOPS?<br />
G.2 [20/15/15] Consider the following vector code run on a 500-MHz<br />
version of VMIPS for a fixed vector length of 64:<br />
LV V1,Ra<br />
MULV.D V2,V1,V3<br />
ADDV.D V4,V1,V3<br />
SV Rb,V2<br />
SV Rc,V4<br />
Ignore all strip-mining overhead, but assume that the store latency must be<br />
included in the time to perform the loop. The entire sequence produces 64 results.<br />
a. [20] Assuming no chaining and a single memory pipeline, how<br />
many chimes are required? How many clock cycles per result (including both<br />
stores as one result) does this vector sequence require, including start-up<br />
overhead?<br />
b. [15] If the vector sequence is chained, how many clock cycles per<br />
result does this sequence require, including overhead?<br />
c. [15] Suppose VMIPS had three memory pipelines and chaining.<br />
If there were no bank conflicts in the accesses for the above loop, how many<br />
clock cycles are required per result for this sequence?<br />
G.3 [20/20/15/15/20/20/20] Consider the following FORTRAN code:<br />
do 10 i=1,n<br />
A(i) = A(i) + B(i)<br />
B(i) = x * B(i)<br />
10 continue<br />
Use the techniques of Section G.6 to estimate performance throughout this exercise,<br />
assuming a 500-MHz version of VMIPS.<br />
a. [20] Write the best VMIPS vector code for the inner portion of<br />
the loop. Assume x is in F0 and the addresses of A and B are in Ra and Rb,<br />
respectively.