20.11.2014 Views

PPKE ITK PhD and MPhil Thesis Classes

PPKE ITK PhD and MPhil Thesis Classes

PPKE ITK PhD and MPhil Thesis Classes

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

2.1 Introduction 41<br />

2E+07<br />

1E+07<br />

9E+06<br />

6E+06<br />

3E+06<br />

Clock cycles<br />

0E+00<br />

SPE 1/1SPE 2/2SPE 4/2SPE 4/4SPE 8/2SPE 8/4SPE 8/6SPE 8/8<br />

Single cycle<br />

Nop cycle<br />

Stall due to prefetch miss<br />

Stall due to waiting for hint target<br />

Dual cycle<br />

Stall due to branch miss<br />

Stall due to dependency<br />

Channel stall cycle<br />

Figure 2.5: Intruction histogram in case of one <strong>and</strong> multiple SPEs<br />

communication of the state values is required between the adjacent SPEs when<br />

the first or last line of the stripe is computed. Due to the row-wise arrangement<br />

of the state values, this communication between the adjacent SPEs can be carried<br />

out by a single DMA operation. Additionally, the ring structure of the EIB is<br />

well suited for the communication between neighboring SPEs.<br />

To measure the performance of the optimized program 16 iterations were<br />

computed on a 256×256 sized cell array. The number of required clock cycles is<br />

summarized in Figure 2.5. By using only one SPE, the computation is carried<br />

out in 3.3 million clock cycles or 1.04ms, assuming 3.2GHz clock frequency. If<br />

2 SPEs are used to perform the computation, the cycle count is reduced about<br />

by half, <strong>and</strong> nearly linear speedup can be achieved. However, in case of 4 or 8<br />

SPEs the performance cannot be improved. When 4 SPEs are used, SPE number<br />

2 requires more than 5 million clock cycles to compute its stripe. This is larger<br />

than the cycle count in case of a single SPE <strong>and</strong> the performance is degraded.<br />

The examination of the utilization of the SPEs shows that SPE 1 <strong>and</strong> SPE<br />

2 are stalled, most of the time, <strong>and</strong> wait for the completion of the memory

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!