Scheduling Linear Algebra Operations on Multicore ... - Netlib
Scheduling Linear Algebra Operations on Multicore ... - Netlib
Scheduling Linear Algebra Operations on Multicore ... - Netlib
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
33, 34]. Gunter et al. presented an ”out-of-core”<br />
(out-of-memory) implementati<strong>on</strong> [35], Buttari et al.<br />
an implementati<strong>on</strong> for ”standard” (x86 and alike)<br />
multicore processors [27, 28], and Kurzak et al.<br />
an implementati<strong>on</strong> for the CELL processor [22].<br />
The LU algorithm used here was originally devised<br />
by Quintana-Ortí et al. for ”out-of-core”<br />
(out-of-memory) executi<strong>on</strong> [45].<br />
Seminal work <strong>on</strong> performance-oriented data layouts<br />
for dense linear algebra was d<strong>on</strong>e by Gustavs<strong>on</strong><br />
et al. [36, 37] and Elmroth et al. [38] and was also<br />
investigated by Park et al. [39, 40].<br />
of Cilk implementati<strong>on</strong>s, which favor the most aggressive<br />
right-looking variant.<br />
The tile Cholesky algorithm is identical to the<br />
block Cholesky algorithm implemented in LAPACK,<br />
except for processing the matrix by tiles. Otherwise,<br />
the exact same operati<strong>on</strong>s are applied. The algorithm<br />
relies <strong>on</strong> four basic operati<strong>on</strong>s implemented by four<br />
computati<strong>on</strong>al kernels (Figure 1).<br />
DSYRK<br />
DSYRK<br />
DPOTRF<br />
A T T<br />
4 Cholesky Factorizati<strong>on</strong><br />
The Cholesky factorizati<strong>on</strong> (or Cholesky decompositi<strong>on</strong>)<br />
is mainly used for the numerical soluti<strong>on</strong> of<br />
linear equati<strong>on</strong>s Ax = b, where A is symmetric and<br />
positive definite. Such systems arise often in physics<br />
applicati<strong>on</strong>s, where A is positive definite due to the<br />
nature of the modeled physical phenomen<strong>on</strong>. This<br />
happens frequently in numerical soluti<strong>on</strong>s of partial<br />
differential equati<strong>on</strong>s.<br />
The Cholesky factorizati<strong>on</strong> of an n × n real symmetric<br />
positive definite matrix A has the form<br />
DGEMM<br />
DGEMM<br />
DGEMM DTRSM<br />
A<br />
T<br />
B C C<br />
DGEMM DTRSM<br />
A = LL T ,<br />
where L is an n × n real lower triangular matrix<br />
with positive diag<strong>on</strong>al elements. In LAPACK the<br />
double precisi<strong>on</strong> algorithm is implemented by the<br />
DPOTRF routine. A single step of the algorithm is<br />
implemented by a sequence of calls to the LAPACK<br />
and BLAS routines: DSYRK, DPOTF2, DGEMM,<br />
DTRSM. Due to the symmetry, the matrix can be<br />
factorized either as upper triangular matrix or as<br />
lower triangular matrix. Here the lower triangular<br />
case is c<strong>on</strong>sidered.<br />
The algorithm can be expressed using either the<br />
top-looking versi<strong>on</strong>, the left-looking versi<strong>on</strong> of the<br />
right-looking versi<strong>on</strong>, the first being the most lazy algorithm<br />
(depth-first explorati<strong>on</strong> of the task graph)<br />
and the last being the most aggressive algorithm<br />
(breadth-first explorati<strong>on</strong> of the task graph). The<br />
left-looking variant is used here, with the excepti<strong>on</strong><br />
Figure 1: Tile operati<strong>on</strong>s in the tile Cholesky factorizati<strong>on</strong>.<br />
DSYRK: The kernel applies updates to a diag<strong>on</strong>al<br />
(lower triangular) tile T of the input matrix, resulting<br />
from factorizati<strong>on</strong> of the tiles A to the<br />
left of it. The operati<strong>on</strong> is a symmetric rank-k<br />
update.<br />
DPOTRF: The kernel performance the Cholesky<br />
factorizati<strong>on</strong> of a diag<strong>on</strong>al (lower triangular) tile<br />
T of the input matrix and overrides it with the<br />
final elements of the output matrix.<br />
DGEMM: The operati<strong>on</strong> applies updates to an<br />
off-diag<strong>on</strong>al tile C of the input matrix, resulting<br />
from factorizati<strong>on</strong> of the tiles to the left of<br />
it. The operati<strong>on</strong> is a matrix multiplicati<strong>on</strong>.<br />
5