01.09.2013 Views

Appendix G - Clemson University

Appendix G - Clemson University

Appendix G - Clemson University

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

G-26 ■ <strong>Appendix</strong> G Vector Processors<br />

The extension that is commonly used for this capability is vector-mask<br />

control. The vector-mask control uses a Boolean vector of length MVL to control<br />

the execution of a vector instruction just as conditionally executed instructions<br />

use a Boolean condition to determine whether an instruction is executed. When<br />

the vector-mask register is enabled, any vector instructions executed operate only<br />

on the vector elements whose corresponding entries in the vector-mask register<br />

are 1. The entries in the destination vector register that correspond to a 0 in the<br />

mask register are unaffected by the vector operation. If the vector-mask register is<br />

set by the result of a condition, only elements satisfying the condition will be<br />

affected. Clearing the vector-mask register sets it to all 1s, making subsequent<br />

vector instructions operate on all vector elements. The following code can now be<br />

used for the previous loop, assuming that the starting addresses of A and B are in<br />

Ra and Rb, respectively:<br />

LV V1,Ra ;load vector A into V1<br />

LV V2,Rb ;load vector B<br />

L.D F0,#0 ;load FP zero into F0<br />

SNEVS.D V1,F0 ;sets VM(i) to 1 if V1(i)!=F0<br />

SUBV.D V1,V1,V2 ;subtract under vector mask<br />

CVM ;set the vector mask to all 1s<br />

SV Ra,V1 ;store the result in A<br />

Most recent vector processors provide vector-mask control. The vector-mask<br />

capability described here is available on some processors, but others allow the<br />

use of the vector mask with only a subset of the vector instructions.<br />

Using a vector-mask register does, however, have disadvantages. When we<br />

examined conditionally executed instructions, we saw that such instructions still<br />

require execution time when the condition is not satisfied. Nonetheless, the elimination<br />

of a branch and the associated control dependences can make a conditional<br />

instruction faster even if it sometimes does useless work. Similarly, vector<br />

instructions executed with a vector mask still take execution time, even for the<br />

elements where the mask is 0. Likewise, even with a significant number of 0s in<br />

the mask, using vector-mask control may still be significantly faster than using<br />

scalar mode. In fact, the large difference in potential performance between vector<br />

and scalar mode makes the inclusion of vector-mask instructions critical.<br />

Second, in some vector processors the vector mask serves only to disable the<br />

storing of the result into the destination register, and the actual operation still<br />

occurs. Thus, if the operation in the previous example were a divide rather than a<br />

subtract and the test was on B rather than A, false floating-point exceptions might<br />

result since a division by 0 would occur. Processors that mask the operation as<br />

well as the storing of the result avoid this problem.<br />

Sparse Matrices<br />

There are techniques for allowing programs with sparse matrices to execute in<br />

vector mode. In a sparse matrix, the elements of a vector are usually stored in

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!