Appendix G - Clemson University
Appendix G - Clemson University
Appendix G - Clemson University
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
G-30 ■ <strong>Appendix</strong> G Vector Processors<br />
A[9]<br />
A[8]<br />
A[7]<br />
A[6]<br />
A[5]<br />
A[4]<br />
A[3]<br />
A[2]<br />
A[1]<br />
+<br />
C[0]<br />
B[9]<br />
B[8]<br />
B[7]<br />
B[6]<br />
B[5]<br />
B[4]<br />
B[3]<br />
B[2]<br />
B[1]<br />
Figure G.11 Using multiple functional units to improve the performance of a single<br />
vector add instruction, C = A + B. The machine shown in (a) has a single add pipeline<br />
and can complete one addition per cycle. The machine shown in (b) has four add pipelines<br />
and can complete four additions per cycle. The elements within a single vector<br />
add instruction are interleaved across the four pipelines. The set of elements that move<br />
through the pipelines together is termed an element group. (Reproduced with permission<br />
from Asanovic [1998].)<br />
Adding multiple lanes is a popular technique to improve vector performance<br />
as it requires little increase in control complexity and does not require changes to<br />
existing machine code. Several vector supercomputers are sold as a range of<br />
models that vary in the number of lanes installed, allowing users to trade price<br />
against peak vector performance. The Cray SV1 allows four two-lane CPUs to be<br />
ganged together using operating system software to form a single larger eightlane<br />
CPU.<br />
Pipelined Instruction Start-Up<br />
Adding multiple lanes increases peak performance, but does not change start-up<br />
latency, and so it becomes critical to reduce start-up overhead by allowing the<br />
start of one vector instruction to be overlapped with the completion of preceding<br />
vector instructions. The simplest case to consider is when two vector instructions<br />
access a different set of vector registers. For example, in the code sequence<br />
ADDV.D V1,V2,V3<br />
ADDV.D V4,V5,V6<br />
A[8]<br />
A[4]<br />
+<br />
C[0]<br />
B[8]<br />
B[4]<br />
A[9] B[9]<br />
A[5]<br />
+<br />
C[1]<br />
B[5]<br />
A[6]<br />
Element group<br />
(a) (b)<br />
+<br />
C[2]<br />
B[6]<br />
A[7]<br />
+<br />
C[3]<br />
B[7]