14.07.2014 Views

Scheduling Linear Algebra Operations on Multicore ... - Netlib

Scheduling Linear Algebra Operations on Multicore ... - Netlib

Scheduling Linear Algebra Operations on Multicore ... - Netlib

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

33, 34]. Gunter et al. presented an ”out-of-core”<br />

(out-of-memory) implementati<strong>on</strong> [35], Buttari et al.<br />

an implementati<strong>on</strong> for ”standard” (x86 and alike)<br />

multicore processors [27, 28], and Kurzak et al.<br />

an implementati<strong>on</strong> for the CELL processor [22].<br />

The LU algorithm used here was originally devised<br />

by Quintana-Ortí et al. for ”out-of-core”<br />

(out-of-memory) executi<strong>on</strong> [45].<br />

Seminal work <strong>on</strong> performance-oriented data layouts<br />

for dense linear algebra was d<strong>on</strong>e by Gustavs<strong>on</strong><br />

et al. [36, 37] and Elmroth et al. [38] and was also<br />

investigated by Park et al. [39, 40].<br />

of Cilk implementati<strong>on</strong>s, which favor the most aggressive<br />

right-looking variant.<br />

The tile Cholesky algorithm is identical to the<br />

block Cholesky algorithm implemented in LAPACK,<br />

except for processing the matrix by tiles. Otherwise,<br />

the exact same operati<strong>on</strong>s are applied. The algorithm<br />

relies <strong>on</strong> four basic operati<strong>on</strong>s implemented by four<br />

computati<strong>on</strong>al kernels (Figure 1).<br />

DSYRK<br />

DSYRK<br />

DPOTRF<br />

A T T<br />

4 Cholesky Factorizati<strong>on</strong><br />

The Cholesky factorizati<strong>on</strong> (or Cholesky decompositi<strong>on</strong>)<br />

is mainly used for the numerical soluti<strong>on</strong> of<br />

linear equati<strong>on</strong>s Ax = b, where A is symmetric and<br />

positive definite. Such systems arise often in physics<br />

applicati<strong>on</strong>s, where A is positive definite due to the<br />

nature of the modeled physical phenomen<strong>on</strong>. This<br />

happens frequently in numerical soluti<strong>on</strong>s of partial<br />

differential equati<strong>on</strong>s.<br />

The Cholesky factorizati<strong>on</strong> of an n × n real symmetric<br />

positive definite matrix A has the form<br />

DGEMM<br />

DGEMM<br />

DGEMM DTRSM<br />

A<br />

T<br />

B C C<br />

DGEMM DTRSM<br />

A = LL T ,<br />

where L is an n × n real lower triangular matrix<br />

with positive diag<strong>on</strong>al elements. In LAPACK the<br />

double precisi<strong>on</strong> algorithm is implemented by the<br />

DPOTRF routine. A single step of the algorithm is<br />

implemented by a sequence of calls to the LAPACK<br />

and BLAS routines: DSYRK, DPOTF2, DGEMM,<br />

DTRSM. Due to the symmetry, the matrix can be<br />

factorized either as upper triangular matrix or as<br />

lower triangular matrix. Here the lower triangular<br />

case is c<strong>on</strong>sidered.<br />

The algorithm can be expressed using either the<br />

top-looking versi<strong>on</strong>, the left-looking versi<strong>on</strong> of the<br />

right-looking versi<strong>on</strong>, the first being the most lazy algorithm<br />

(depth-first explorati<strong>on</strong> of the task graph)<br />

and the last being the most aggressive algorithm<br />

(breadth-first explorati<strong>on</strong> of the task graph). The<br />

left-looking variant is used here, with the excepti<strong>on</strong><br />

Figure 1: Tile operati<strong>on</strong>s in the tile Cholesky factorizati<strong>on</strong>.<br />

DSYRK: The kernel applies updates to a diag<strong>on</strong>al<br />

(lower triangular) tile T of the input matrix, resulting<br />

from factorizati<strong>on</strong> of the tiles A to the<br />

left of it. The operati<strong>on</strong> is a symmetric rank-k<br />

update.<br />

DPOTRF: The kernel performance the Cholesky<br />

factorizati<strong>on</strong> of a diag<strong>on</strong>al (lower triangular) tile<br />

T of the input matrix and overrides it with the<br />

final elements of the output matrix.<br />

DGEMM: The operati<strong>on</strong> applies updates to an<br />

off-diag<strong>on</strong>al tile C of the input matrix, resulting<br />

from factorizati<strong>on</strong> of the tiles to the left of<br />

it. The operati<strong>on</strong> is a matrix multiplicati<strong>on</strong>.<br />

5

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!