17.01.2013 Views

Algorithms and Data Structures for External Memory

Algorithms and Data Structures for External Memory

Algorithms and Data Structures for External Memory

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

7.2 Matrix Transposition 67<br />

When the matrices are stored using a sparse representation, transposition<br />

is always as hard as sorting, unlike the B 2 ≤ M case <strong>for</strong> dense<br />

matrix transposition (cf. Theorem 7.2).<br />

Theorem 7.3 ([23]). For a matrix stored in sparse <strong>for</strong>mat <strong>and</strong> containing<br />

Nz nonzero elements, the number of I/Os required to transpose<br />

the matrix from row-major order to column-major order, <strong>and</strong> viceversa,<br />

is Θ � Sort(Nz) � .<br />

Sorting suffices to per<strong>for</strong>m the transposition. The lower bound<br />

follows by reduction from sorting: If the ith item to sort has key<br />

value x �= 0, there is a nonzero element in matrix position (i,x).<br />

Matrix transposition is a special case of a more general class of permutations<br />

called bit-permute/complement (BPC) permutations, which<br />

in turn is a subset of the class of bit-matrix-multiply/complement<br />

(BMMC) permutations. BMMC permutations are defined by a logN ×<br />

logN nonsingular 0–1 matrix A <strong>and</strong> a (logN)-length 0-1 vector c. An<br />

item with binary address x is mapped by the permutation to the binary<br />

address given by Ax ⊕ c, where ⊕ denotes bitwise exclusive-or. BPC<br />

permutations are the special case of BMMC permutations in which A is<br />

a permutation matrix, that is, each row <strong>and</strong> each column of A contain<br />

a single 1. BPC permutations include matrix transposition, bit-reversal<br />

permutations (which arise in the FFT), vector-reversal permutations,<br />

hypercube permutations, <strong>and</strong> matrix reblocking. Cormen et al. [120]<br />

characterize the optimal number of I/Os needed to per<strong>for</strong>m any given<br />

BMMC permutation solely as a function of the associated matrix A,<br />

<strong>and</strong> they give an optimal algorithm <strong>for</strong> implementing it.<br />

Theorem 7.4 ([120]). With D disks, the number of I/Os required to<br />

per<strong>for</strong>m the BMMC permutation defined by matrix A <strong>and</strong> vector c is<br />

� �<br />

n<br />

Θ 1+<br />

D<br />

rank(γ)<br />

��<br />

, (7.2)<br />

logm<br />

where γ is the lower-left log n × logB submatrix of A.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!