Appendix G - Clemson University
Appendix G - Clemson University
Appendix G - Clemson University
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
G-52 ■ <strong>Appendix</strong> G Vector Processors<br />
tional units and the increased complexity of assigning operations to units, all the<br />
overheads (T and T ) are doubled.<br />
loop start<br />
a. [15] Find the number of clock cycles for code sequence 1 on<br />
VMIPS.<br />
b. [20] Find the number of clock cycles on code sequence 1 for<br />
VMIPS-II. How does this compare to VMIPS?<br />
c. [15] Find the number of clock cycles on code sequence 2 for<br />
VMIPS.<br />
d. [15] Find the number of clock cycles on code sequence 2 for<br />
VMIPS-II. How does this compare to VMIPS?<br />
G.9 [20] Here is a tricky piece of code with two-dimensional arrays. Does this<br />
loop have dependences? Can these loops be written so they are parallel? If so,<br />
how? Rewrite the source code so that it is clear that the loop can be vectorized, if<br />
possible.<br />
do 290 j = 2,n<br />
do 290 i = 2,j<br />
aa(i,j)= aa(i-1,j)*aa(i-1,j)+bb(i,j)<br />
290 continue<br />
G.10 [12/15] Consider the following loop:<br />
do 10 i = 2,n<br />
A(i) = B<br />
10 C(i) = A(i-1)<br />
a. [12] Show there is a loop-carried dependence in this code fragment.<br />
b. [15] Rewrite the code in FORTRAN so that it can be vectorized as two<br />
separate vector sequences.<br />
G.11 [15/25/25] As we saw in Section G.5, some loop structures are not easily<br />
vectorized. One common structure is a reduction—a loop that reduces an array to<br />
a single value by repeated application of an operation. This is a special case of a<br />
recurrence. A common example occurs in dot product:<br />
dot = 0.0<br />
do 10 i=1,64<br />
10 dot = dot + A(i) * B(i)<br />
This loop has an obvious loop-carried dependence (on dot) and cannot be vectorized<br />
in a straightforward fashion. The first thing a good vectorizing compiler<br />
would do is split the loop to separate out the vectorizable portion and the recurrence<br />
and perhaps rewrite the loop as<br />
do 10 i=1,64<br />
10 dot(i) = A(i) * B(i)<br />
do 20 i=2,64<br />
20 dot(1) = dot(1) + dot(i)