Appendix G - Clemson University
Appendix G - Clemson University
Appendix G - Clemson University
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
G-26 ■ <strong>Appendix</strong> G Vector Processors<br />
The extension that is commonly used for this capability is vector-mask<br />
control. The vector-mask control uses a Boolean vector of length MVL to control<br />
the execution of a vector instruction just as conditionally executed instructions<br />
use a Boolean condition to determine whether an instruction is executed. When<br />
the vector-mask register is enabled, any vector instructions executed operate only<br />
on the vector elements whose corresponding entries in the vector-mask register<br />
are 1. The entries in the destination vector register that correspond to a 0 in the<br />
mask register are unaffected by the vector operation. If the vector-mask register is<br />
set by the result of a condition, only elements satisfying the condition will be<br />
affected. Clearing the vector-mask register sets it to all 1s, making subsequent<br />
vector instructions operate on all vector elements. The following code can now be<br />
used for the previous loop, assuming that the starting addresses of A and B are in<br />
Ra and Rb, respectively:<br />
LV V1,Ra ;load vector A into V1<br />
LV V2,Rb ;load vector B<br />
L.D F0,#0 ;load FP zero into F0<br />
SNEVS.D V1,F0 ;sets VM(i) to 1 if V1(i)!=F0<br />
SUBV.D V1,V1,V2 ;subtract under vector mask<br />
CVM ;set the vector mask to all 1s<br />
SV Ra,V1 ;store the result in A<br />
Most recent vector processors provide vector-mask control. The vector-mask<br />
capability described here is available on some processors, but others allow the<br />
use of the vector mask with only a subset of the vector instructions.<br />
Using a vector-mask register does, however, have disadvantages. When we<br />
examined conditionally executed instructions, we saw that such instructions still<br />
require execution time when the condition is not satisfied. Nonetheless, the elimination<br />
of a branch and the associated control dependences can make a conditional<br />
instruction faster even if it sometimes does useless work. Similarly, vector<br />
instructions executed with a vector mask still take execution time, even for the<br />
elements where the mask is 0. Likewise, even with a significant number of 0s in<br />
the mask, using vector-mask control may still be significantly faster than using<br />
scalar mode. In fact, the large difference in potential performance between vector<br />
and scalar mode makes the inclusion of vector-mask instructions critical.<br />
Second, in some vector processors the vector mask serves only to disable the<br />
storing of the result into the destination register, and the actual operation still<br />
occurs. Thus, if the operation in the previous example were a divide rather than a<br />
subtract and the test was on B rather than A, false floating-point exceptions might<br />
result since a division by 0 would occur. Processors that mask the operation as<br />
well as the storing of the result avoid this problem.<br />
Sparse Matrices<br />
There are techniques for allowing programs with sparse matrices to execute in<br />
vector mode. In a sparse matrix, the elements of a vector are usually stored in