01.09.2013 Views

Appendix G - Clemson University

Appendix G - Clemson University

Appendix G - Clemson University

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

G-14 ■ <strong>Appendix</strong> G Vector Processors<br />

Typically, penalties for start-ups on load-store units are higher than those for<br />

arithmetic functional units—over 100 clock cycles on some processors. For<br />

VMIPS we will assume a start-up time of 12 clock cycles, the same as the Cray-<br />

1. Figure G.6 summarizes the start-up penalties for VMIPS vector operations.<br />

To maintain an initiation rate of 1 word fetched or stored per clock, the memory<br />

system must be capable of producing or accepting this much data. This is<br />

usually done by creating multiple memory banks, as discussed in Section 5.8. As<br />

we will see in the next section, having significant numbers of banks is useful for<br />

dealing with vector loads or stores that access rows or columns of data.<br />

Most vector processors use memory banks rather than simple interleaving for<br />

three primary reasons:<br />

1. Many vector computers support multiple loads or stores per clock, and the<br />

memory bank cycle time is often several times larger than the CPU cycle<br />

time. To support multiple simultaneous accesses, the memory system needs to<br />

have multiple banks and be able to control the addresses to the banks independently.<br />

2. As we will see in the next section, many vector processors support the ability<br />

to load or store data words that are not sequential. In such cases, independent<br />

bank addressing, rather than interleaving, is required.<br />

3. Many vector computers support multiple processors sharing the same memory<br />

system, and so each processor will be generating its own independent<br />

stream of addresses.<br />

In combination, these features lead to a large number of independent memory<br />

banks, as shown by the following example.<br />

Example The Cray T90 has a CPU clock cycle of 2.167 ns and in its largest configuration<br />

(Cray T932) has 32 processors each capable of generating four loads and two<br />

stores per CPU clock cycle. The CPU clock cycle is 2.167 ns, while the cycle<br />

time of the SRAMs used in the memory system is 15 ns. Calculate the minimum<br />

number of memory banks required to allow all CPUs to run at full memory bandwidth.<br />

Answer The maximum number of memory references each cycle is 192 (32 CPUs times 6<br />

references per CPU). Each SRAM bank is busy for 15/2.167 = 6.92 clock cycles,<br />

which we round up to 7 CPU clock cycles. Therefore we require a minimum of<br />

192 × 7 = 1344 memory banks!<br />

The Cray T932 actually has 1024 memory banks, and so the early models<br />

could not sustain full bandwidth to all CPUs simultaneously. A subsequent memory<br />

upgrade replaced the 15 ns asynchronous SRAMs with pipelined synchronous<br />

SRAMs that more than halved the memory cycle time, thereby providing<br />

sufficient bandwidth.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!