21.01.2013 Views

Lecture Notes in Computer Science 4917

Lecture Notes in Computer Science 4917

Lecture Notes in Computer Science 4917

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Drug Design Issues on the Cell BE 181<br />

Fig. 2. Eklundh matrix transposition idea for a 4 × 4matrix<br />

one SPE, no synchronization is needed <strong>in</strong> order to DMA transfer (swap) the<br />

submatrices to ma<strong>in</strong> memory after be<strong>in</strong>g transposed <strong>in</strong> the SPE. Indeed, double<br />

buffer<strong>in</strong>g can be implemented overlapp<strong>in</strong>g the DMA transfers (put or get) of one<br />

submatrix and the transposition of the other submatrix. In this case, blocksize<br />

B can be up to 64 s<strong>in</strong>ce one SPE has to keep two B × B submatrices <strong>in</strong> its LS.<br />

In the case of us<strong>in</strong>g two SPEs, those have to be synchronized to DMA transfer<br />

(swap) the submatrices transposed to ma<strong>in</strong> memory. The synchronization is done<br />

by us<strong>in</strong>g the atomic read <strong>in</strong>tr<strong>in</strong>sic of the IBM SDK. In this case, blocksize B<br />

can be up to 128 s<strong>in</strong>ce each SPE has to keep one B × B submatrix <strong>in</strong> the LS.<br />

In any case, for 128 × 128 planes, one SPE can locally transpose a complete<br />

plane s<strong>in</strong>ce a plane fits <strong>in</strong> the LS of the SPE. Moreover, <strong>in</strong> this case, Steps 1, 2,<br />

and 3, and Steps 4 and 5 of the 3D FFT can be performed together <strong>in</strong> the same<br />

SPE before transfer<strong>in</strong>g the data back to ma<strong>in</strong> memory.<br />

Sumariz<strong>in</strong>g, our 3D FFT may follow different execution paths depend<strong>in</strong>g on<br />

the plane transposition strategy (one or two SPEs swapp<strong>in</strong>g submatrices) and<br />

the B blocksize algorithm parameter:<br />

1. Case B×B submatrix is not the complete plane and/or does not fit <strong>in</strong> the LS<br />

of a SPE. Plane transposition is done by block<strong>in</strong>g. We evaluate two different<br />

strategies:<br />

(a) One SPE transposes two B ×B submatrices <strong>in</strong> its LS and DMA transfers<br />

(swaps) them to ma<strong>in</strong> memory. B blocksize can be up to 64.<br />

(b) Two SPEs transpose two B ×B submatrices, and synchronize each other<br />

to DMA transfer them to ma<strong>in</strong> memory. B blocksize can be up to 128.<br />

2. Case 128×128 planes and B blocksize is 128. One SPE can locally transpose<br />

a complete plane. Steps 1, 2 and 3 of the 3D FFT are done together <strong>in</strong> a<br />

SPE. Steps 4 and 5 are also done together.<br />

F<strong>in</strong>ally, Steps 6 and 7 of the 3D FFT are optional <strong>in</strong> the context of FTDock<br />

because the complex multiplication ((F ∗ A )(FB)) can be done us<strong>in</strong>g the element<br />

orientation obta<strong>in</strong>ed on the matrix transposition of Step 4 of the 3D FFT. Therefore,<br />

we do not have to perform Steps 6 and 7 for each 3D FFT <strong>in</strong> our FTDock

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!