APCOM'07 in conjunction with EPMESC XI, December 3-6, 2007 ...
APCOM'07 in conjunction with EPMESC XI, December 3-6, 2007 ...
APCOM'07 in conjunction with EPMESC XI, December 3-6, 2007 ...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
APCOM’07 <strong>in</strong> <strong>conjunction</strong> <strong>with</strong> <strong>EPMESC</strong> <strong>XI</strong>, <strong>December</strong> 3-6, <strong>2007</strong>, Kyoto, JAPAN<br />
Efficient Sequential and Parallel Solvers for hp F<strong>in</strong>ite Element<br />
Method<br />
Maciej Paszyński 1 *, David Pardo 2 , Carlos Torres-Verd<strong>in</strong> 2 , Paweł Matuszyk 3<br />
1<br />
Department of Computer Science, AGH University of Science and Technology, Al.Mickiewicza 30,<br />
Kraków, 30-059, Poland<br />
2<br />
Department of Petroleum and Geosystems Eng<strong>in</strong>eer<strong>in</strong>g, The University of Texas,1 University Station<br />
C0300, Aust<strong>in</strong>, Texas, 78712, USA<br />
3<br />
Department of Applied Computer Science and Model<strong>in</strong>g, AGH University of Science and Technology,<br />
Al.Mickiewicza 30, Kraków, 30-059, Poland<br />
e-mail: paszynsk@agh.edu.pl, dzubiaur@gmail.com, cverd<strong>in</strong>@uts.cc.utexas.edu, pjm@agh.edu.pl<br />
Abstract We present a sequential and parallel direct solver designed for hp F<strong>in</strong>ite Element Method (FEM)<br />
applied to solve numerous problems, <strong>in</strong>clud<strong>in</strong>g non-stationary heat transfer problem, the Stokes problem,<br />
and the resistivity logg<strong>in</strong>g measurement simulations. The hp FEM <strong>in</strong>corporates a self-adaptive strategy that<br />
generates a sequence of hp ref<strong>in</strong>ed meshes, deliver<strong>in</strong>g exponential convergence of the numerical error <strong>with</strong><br />
respect to the number of degrees of freedom (mesh size or CPU time). The hp meshes generated by the<br />
self-adaptive strategy are obta<strong>in</strong>ed by multiple h or p ref<strong>in</strong>ements of the <strong>in</strong>itial mesh. The self-adaptive<br />
mesh, generated <strong>in</strong> this way, is stored as ref<strong>in</strong>ement trees grow<strong>in</strong>g down from nodes of the <strong>in</strong>itial mesh.<br />
First, we elim<strong>in</strong>ate degrees of freedom start<strong>in</strong>g from leaves of ref<strong>in</strong>ement trees, and then we elim<strong>in</strong>ate<br />
common degrees of freedom travel<strong>in</strong>g up the ref<strong>in</strong>ement trees. The solver is parallelized by utiliz<strong>in</strong>g the<br />
doma<strong>in</strong> decomposition paradigm. In other words, the solver generates Schur complements of local<br />
sub-systems, from bottom of ref<strong>in</strong>ement trees, through <strong>in</strong>itial mesh elements and sub-doma<strong>in</strong>s. Then, the<br />
global problem reduces to relatively small one common "<strong>in</strong>terface" problem, and f<strong>in</strong>ally the backward<br />
substitution must be executed to propagate the solution from the common <strong>in</strong>terface, through sub-doma<strong>in</strong>s,<br />
<strong>in</strong>itial mesh elements, down to leafs of ref<strong>in</strong>ement trees. The LU factorizations computed at different levels<br />
of elim<strong>in</strong>ation trees are stored at tree nodes to be reutilized by the solver after the computational mesh is<br />
locally ref<strong>in</strong>ed. We pesent also the performance measurements of the solver.<br />
Key words: F<strong>in</strong>ite Element Method, hp adaptivity, direct solver, parallel direct solver<br />
INTRODUCTION<br />
The data structures and efficient direct solvers for computational meshes utilized by fully automatic hp<br />
adaptive 2D and 3D F<strong>in</strong>ite Element Method (FEM) codes [1,2,3,4] are presented. The codes generate a<br />
sequence of hp meshes deliver<strong>in</strong>g exponential convergence of the numerical error <strong>with</strong> respect to the<br />
number of degrees of freedom (mesh size or CPU time). The hp meshes consist <strong>in</strong> f<strong>in</strong>ite elements <strong>with</strong><br />
various sizes and various polynomial order of approximation, chang<strong>in</strong>g locally, on f<strong>in</strong>ite element faces,<br />
edges and <strong>in</strong>teriors. The f<strong>in</strong>al optimal mesh is constructed by a sequence of h or p ref<strong>in</strong>ements executed on<br />
the <strong>in</strong>itial mesh. The h ref<strong>in</strong>ements consist <strong>in</strong> break<strong>in</strong>g some f<strong>in</strong>ite elements <strong>in</strong>to smaller son elements,<br />
whilst the p ref<strong>in</strong>ements consist <strong>in</strong> adjust<strong>in</strong>g polynomial orders of approximation on some element faces,<br />
edges and <strong>in</strong>teriors. In the utilized data structure, the h ref<strong>in</strong>ements are stored as trees grow<strong>in</strong>g from <strong>in</strong>itial<br />
mesh elements. This allows us to propose the efficient direct solver work<strong>in</strong>g on the level of ref<strong>in</strong>ement<br />
trees. The degrees of freedom are elim<strong>in</strong>ated by travel<strong>in</strong>g the ref<strong>in</strong>ement trees, from leaves nodes to the<br />
level of <strong>in</strong>itial mesh elements nodes. The local Schur complements associated <strong>with</strong> particular elim<strong>in</strong>ation<br />
levels can be stored <strong>in</strong> tree nodes. Each time the mesh is locally ref<strong>in</strong>ed, only local Schur complements<br />
associated to newly ref<strong>in</strong>ed nodes must be updated. The Schur complements associated <strong>with</strong> not ref<strong>in</strong>ed<br />
nodes can be still utilized <strong>in</strong> the process of solver execution over the newly ref<strong>in</strong>ed mesh. The idea of the
ecursive solver work<strong>in</strong>g on the ref<strong>in</strong>ement trees is generalized <strong>in</strong>to the elim<strong>in</strong>ation tree constructed out of<br />
<strong>in</strong>itial mesh elements, as well as <strong>in</strong>to the elim<strong>in</strong>ation tree built based on sub-doma<strong>in</strong>s obta<strong>in</strong>ed by the<br />
doma<strong>in</strong> decomposition of the entire mesh. The solvers were tested on a sequence of problems, <strong>in</strong>clud<strong>in</strong>g<br />
the non-stationary heat transfer problem, the Stokes problem [5], and the resistivity logg<strong>in</strong>g measurements<br />
simulation [6].<br />
SEQUENTIAL AND PARALLEL DIRECT SOLVERS<br />
In this chapter we present a classification of up to date sequential and parallel direct solvers dedicated to<br />
FEM computations.<br />
1. Frontal solvers The solver browses f<strong>in</strong>ite elements <strong>in</strong> the order prescribed by the user. It aggregates<br />
degrees of freedom to the so-called frontal matrix. Based on the elements connectivity <strong>in</strong>formation it<br />
recognizes fully assembled degrees of freedom and elim<strong>in</strong>ates them from the frontal matrix [7]. This is<br />
done to keep the size of the frontal matrix as small as possible. The key for efficient work of the frontal<br />
solver is optimal order<strong>in</strong>g of f<strong>in</strong>ite elements.<br />
2. Multifrontal solvers The solver constructs the degrees of freedom connectivity tree based on analysis<br />
of the geometry of computational doma<strong>in</strong> [7]. It is usually done by utiliz<strong>in</strong>g graph representation of<br />
computational doma<strong>in</strong> and graph partition<strong>in</strong>g algorithm. The frontal elim<strong>in</strong>ation pattern is utilized on<br />
every tree branch. F<strong>in</strong>ite elements are jo<strong>in</strong>ed <strong>in</strong>to pairs and degrees of freedom are assembled <strong>in</strong>to frontal<br />
matrix associated <strong>with</strong> the branch. The process is repeated until the root of the assembly tree is reached.<br />
F<strong>in</strong>ally, the common dense problem is solved and partial backward substitutions are recursively executed<br />
on the assembly tree.<br />
3. Sub-structur<strong>in</strong>g method solver This is a parallel solver work<strong>in</strong>g over a computational doma<strong>in</strong><br />
partitioned <strong>in</strong>to multiple sub-doma<strong>in</strong>s. It works <strong>in</strong> the follow<strong>in</strong>g steps [8]. First, the sub-doma<strong>in</strong>s <strong>in</strong>ternal<br />
degrees of freedom are elim<strong>in</strong>ated <strong>with</strong> respect to the <strong>in</strong>terface degrees of freedom. Second, the <strong>in</strong>terface<br />
problem is solved. F<strong>in</strong>ally, the <strong>in</strong>ternal problems are solved by execut<strong>in</strong>g backward substitution on each<br />
sub-doma<strong>in</strong>, utiliz<strong>in</strong>g the <strong>in</strong>terface problem solution computed <strong>in</strong> the second step.<br />
4. Multiple fronts solver This is a simplest implementation of the sub-structur<strong>in</strong>g method solver [9]. It<br />
performs partial frontal decomposition on each sub-doma<strong>in</strong>. Then, it sums up contributions from<br />
particular sub-doma<strong>in</strong>s <strong>in</strong>to one common <strong>in</strong>terface problem. F<strong>in</strong>ally, it solves the common <strong>in</strong>terface<br />
problem by utiliz<strong>in</strong>g a sequential frontal solver.<br />
5. Direct sub-structur<strong>in</strong>g method solver In this version of the sub-structur<strong>in</strong>g method solver, the <strong>in</strong>terface<br />
problem is solved by utiliz<strong>in</strong>g the parallel solver [8].<br />
6. Sparse direct method solver This is a parallel implementation of the multifrontal solver. An example of<br />
the sparse direct method solver is the MUlti frontal Massively Parallel Solver (MUMPS) [10-12].<br />
DATA STRUCTURE SUPPORTING HP REFINEMENTS<br />
This section <strong>in</strong>troduces the data structure stor<strong>in</strong>g the history of mesh transformation that can be further<br />
utilized by the direct solver. We propose to store two levels of connectivity trees:<br />
• The <strong>in</strong>itial mesh elements connectivity tree<br />
• h ref<strong>in</strong>ements connectivity trees, which grow down from the level of <strong>in</strong>itial mesh elements.<br />
For parallel implementation of the algorithm we propose three levels of connectivity trees:<br />
• The connectivity tree for sub-doma<strong>in</strong>s built out of the computational mesh distributed <strong>in</strong>to<br />
sub-doma<strong>in</strong>s<br />
• The connectivity trees for <strong>in</strong>itial mesh elements, built separately on every sub-doma<strong>in</strong><br />
• h ref<strong>in</strong>ements connectivity trees, which grow down from every ref<strong>in</strong>ed <strong>in</strong>itial mesh elements.<br />
The three connectivity trees related to computational mesh presented <strong>in</strong> Fig. 1 are presented <strong>in</strong> Fig. 2.<br />
Partial LU factorization performed by the solver will be stored at tree nodes for further reutilization.
Fig. 1 Exemplary computational doma<strong>in</strong> <strong>with</strong> 4 <strong>in</strong>itial mesh elements partitioned <strong>in</strong>to 2 sub-doma<strong>in</strong>s.<br />
Each <strong>in</strong>itial mesh element is broken <strong>in</strong>to 4 son elements.<br />
Fig. 2 Connectivity trees for computational mesh presented <strong>in</strong> Fig. 1<br />
DIRECT SOLVER FOR hp FINITE ELEMENT METHOD<br />
In this section we describe a proposed new direct solver dedicated to fully automatic hp F<strong>in</strong>ite Element<br />
Method [1-4]. The software <strong>in</strong>corporates a self-adaptive strategy that generates a sequence of hp ref<strong>in</strong>ed<br />
meshes, deliver<strong>in</strong>g exponential convergence of the numerical error <strong>with</strong> respect to the number of degrees<br />
of freedom (mesh size or CPU time). The hp meshes generated by the self-adaptive strategy are obta<strong>in</strong>ed<br />
by multiple h or p ref<strong>in</strong>ements of the <strong>in</strong>itial mesh. The h ref<strong>in</strong>ement consists <strong>in</strong> break<strong>in</strong>g some f<strong>in</strong>ite<br />
elements <strong>in</strong>to 2 (<strong>in</strong> horizontal or vertical direction) or 4 son elements, and/or the p ref<strong>in</strong>ement consists <strong>in</strong><br />
<strong>in</strong>creas<strong>in</strong>g polynomial order of approximation on some f<strong>in</strong>ite element edges, faces and <strong>in</strong>teriors.<br />
The self-adaptive mesh, generated <strong>in</strong> this way, is stored as ref<strong>in</strong>ement trees grow<strong>in</strong>g down from the <strong>in</strong>itial<br />
mesh. We utilize a tree-like structure for the computational mesh. First, we elim<strong>in</strong>ate degrees of freedom<br />
start<strong>in</strong>g from leaves of ref<strong>in</strong>ement trees, and then we elim<strong>in</strong>ate common degrees of freedom travel<strong>in</strong>g up<br />
the ref<strong>in</strong>ement trees. In other words, we compute a sequence of Schur complements, start<strong>in</strong>g from the<br />
bottom level, and travel<strong>in</strong>g up the structure of ref<strong>in</strong>ement trees. Then, we utilize the nested dissection<br />
scheme to elim<strong>in</strong>ate degrees of freedom on the level of <strong>in</strong>itial mesh elements.<br />
The parallel version of the solver utilizes the doma<strong>in</strong> decomposition paradigm. The computational mesh is<br />
partitioned <strong>in</strong>to multiple sub-doma<strong>in</strong>s <strong>with</strong> each sub-doma<strong>in</strong> assigned to a separate processor. In other<br />
words, the solver generates Schur complements of local sub-systems, from bottom of ref<strong>in</strong>ement trees,<br />
through <strong>in</strong>itial mesh elements and sub-doma<strong>in</strong>s. Then, the global problem reduces to relatively small one<br />
common "<strong>in</strong>terface" problem, and f<strong>in</strong>ally the backward substitution must be executed to propagate the<br />
solution from the common <strong>in</strong>terface, through sub-doma<strong>in</strong>s, <strong>in</strong>itial mesh elements, down to the leafs of<br />
ref<strong>in</strong>ement trees.<br />
The algorithm of the recursive solver can be summarized <strong>in</strong> the follow<strong>in</strong>g pseudo-code:<br />
matrix function recursive_solver(tree_node)<br />
if tree_node has no son nodes then
elim<strong>in</strong>ate leaf element stiffness matrix <strong>in</strong>ternal nodes<br />
return Schur complement sub-matrix<br />
else if tree_node has son nodes then<br />
do for each son<br />
son_matrix = recursive_solver(tree_node_son)<br />
merge son_matrix <strong>in</strong>to new_matrix<br />
enddo<br />
decide which unknowns of new_matrix can be elim<strong>in</strong>ated<br />
perform partial forward elim<strong>in</strong>ation on new_matrix<br />
return Schur complement sub-matrix<br />
endif<br />
The solver can be used to effectively solve multiple right hand sides, s<strong>in</strong>ce for each new right hand side<br />
only new backward substitution must be executed. This is needed <strong>in</strong> context of goal-oriented adaptivity,<br />
where solution of the dual problem is needed. The solver is written <strong>in</strong> FORTRAN 90. The parallelization<br />
of the solver consists <strong>in</strong> assign<strong>in</strong>g tree branches to particular processors and simply send<strong>in</strong>g Schur<br />
complements contributions from one branch to the other. We implemented the parallel version of the<br />
solver by utiliz<strong>in</strong>g the Message Pass<strong>in</strong>g Interface (MPI).<br />
Fig. 3 Elim<strong>in</strong>ation patterns over the distributed connectivity tree<br />
It should be emphasized, that the communication cost for the solver is related <strong>with</strong> the size of local systems<br />
of equations related to common edges between adjacent elements. The <strong>in</strong>terior and edge degrees of freedom<br />
that are elim<strong>in</strong>ated on current level of connectivity tree are denoted <strong>in</strong> Fig. 3 by dashed l<strong>in</strong>es, whilst degrees<br />
of freedom that rema<strong>in</strong> unelim<strong>in</strong>ated are denoted by solid l<strong>in</strong>es.<br />
COMPUTATIONAL PROBLEMS<br />
1. 3D DC Resistivity logg<strong>in</strong>g measurements simulations <strong>in</strong> deviated wells The problem consists <strong>in</strong><br />
solv<strong>in</strong>g the conductive media equation<br />
imp<br />
( ∇u)<br />
= o J<br />
∇ o σ −∇<br />
(1)<br />
<strong>in</strong> the 3D doma<strong>in</strong> <strong>with</strong> different formation layers, presented <strong>in</strong> Fig. 4. There is a tool <strong>with</strong> one transmitter<br />
and two receiver electrodes <strong>in</strong> the borehole. The tool is shifted along the borehole. The reflected waves are<br />
recorded by the receiver electrodes <strong>in</strong> order to determ<strong>in</strong>e location of the oil formation <strong>in</strong> the ground. Of<br />
particular <strong>in</strong>terest to the oil <strong>in</strong>dustry are 3D simulations <strong>with</strong> deviated wells, where the angle between the<br />
borehole and formation layers is sharp θ 90 . This fully 3D problem can be reduced to 2D by<br />
0 ≠<br />
consider<strong>in</strong>g three non-orthogonal systems of coord<strong>in</strong>ates presented <strong>in</strong> Fig. 4. The variational formulation<br />
u ∈ u<br />
1<br />
+ H Ω such that:<br />
<strong>in</strong> the new system of coord<strong>in</strong>ates consists <strong>in</strong> f<strong>in</strong>d<strong>in</strong>g ( )<br />
∂u<br />
∂v<br />
1<br />
, ˆ σ = v,<br />
fˆ<br />
∀v<br />
∈ H<br />
2<br />
D<br />
∂ξ<br />
∂ξ<br />
L ( Ω)<br />
2<br />
L<br />
( Ω)<br />
( Ω)<br />
D<br />
D<br />
(2)
`<br />
Fig. 4 Three non-orthogonal systems of coord<strong>in</strong>ates <strong>in</strong> the borehole and formation layers<br />
−1<br />
−1T<br />
where new electrical conductivity of the media ˆ σ : = J σ J J and f ˆ : = f J <strong>with</strong><br />
gradient of the impressed current and<br />
( x1,<br />
x2<br />
, x3<br />
)<br />
( ζ , ζ , ζ )<br />
1<br />
2<br />
3<br />
f ∇J<br />
imp<br />
= is the<br />
∂<br />
J =<br />
(3)<br />
∂<br />
stands for the Jacobian matrix of the change of variables from the Cartesian reference to non-orthogonal<br />
J = det J is its determ<strong>in</strong>ant. We take Fourier series expansions <strong>in</strong> the<br />
systems of coord<strong>in</strong>ates, and ( )<br />
azimuthal ζ 2 direction<br />
( 2<br />
1, 2,<br />
3 ) ∑ ( 1,<br />
3 ) ;<br />
+∞ =<br />
u<br />
l<br />
jlζ<br />
ζ ζ = ul<br />
ζ ζ e<br />
l=<br />
−∞<br />
1, 2 , 3 ∑ 1,<br />
3<br />
2<br />
+∞ = m<br />
= m<br />
m=<br />
−∞<br />
jm<br />
e ζ<br />
ζ ζ ζ σ ζ ζ<br />
( 2<br />
1, 2,<br />
3 ) ∑ ( 1,<br />
3 ) ;<br />
+∞ =<br />
f<br />
n<br />
jnζ<br />
ζ ζ = f n ζ ζ e<br />
n=<br />
−∞<br />
ζ (4)<br />
( ) ( ) ;<br />
σ (5)<br />
ζ (6)<br />
1<br />
The f<strong>in</strong>al variational formulation for zero frequency (DC) is the follow<strong>in</strong>g: F<strong>in</strong>d ∈ u + H ( Ω)<br />
⎛ ∂u<br />
⎞<br />
⎛ ∂v<br />
⎞<br />
n=<br />
k + 2<br />
∑ ⎜<br />
⎟ , ˆ σ k −n<br />
⎜<br />
⎟ = vk<br />
, fˆ<br />
n 2<br />
L<br />
n=<br />
k−<br />
2<br />
D<br />
⎝ ∂ξ<br />
⎠k<br />
⎝ ∂ξ<br />
2<br />
⎠n<br />
2<br />
L ( Ω2<br />
D )<br />
( Ω )<br />
s<strong>in</strong>ce five Fourier modes are enough to represent exactly the new material coefficients [13].<br />
∀v<br />
Fig. 5 Geometry of the cavity problem<br />
k<br />
u such that:<br />
In the similar way we can derive the variational formulation for non-zero frequency (AC): F<strong>in</strong>d<br />
1 ( ) s ∈ H Γ ( curl;<br />
Ω)<br />
such that:<br />
E E<br />
D<br />
D<br />
(7)
ζ<br />
−1<br />
ζ<br />
imp<br />
( ∇ × F ) s ( ) s n ( ∇ × E)<br />
l − F ( ) ( ) L ( ) s kˆ<br />
2<br />
, ˆ µ −<br />
, s−n<br />
El<br />
= − jω<br />
Fs<br />
, Jˆ<br />
2<br />
Ω<br />
2<br />
s 2<br />
2 D<br />
L ( Ω ) L ( Ω )<br />
n=<br />
s+<br />
2<br />
∑<br />
n=<br />
s−2<br />
2 D<br />
2 D<br />
2. 2D Stokes problem We consider the SUPG (Streaml<strong>in</strong>e Upw<strong>in</strong>d Petrov-Galerk<strong>in</strong> [14]) stabilized weak<br />
formulation of the Stokes problem: F<strong>in</strong>d velocity and pressure fields ( , p ) ∈ ( u + V)<br />
× Q where<br />
1<br />
V = { v ∈ H ( Ω)<br />
: v = 0 on ΓD<br />
} and = ( Ω)<br />
2<br />
L<br />
∫ µ & ε ( ) : & ε ( v)<br />
Ω − ∫ p∇<br />
o vdΩ<br />
= ∫ ρb<br />
o vdΩ<br />
+ ∫<br />
Ω<br />
−<br />
Ω<br />
Ω<br />
Q such that (9-10):<br />
ΓN<br />
u D<br />
2 u d t o vdΓ<br />
(9)<br />
∫<br />
Ω<br />
∑∫<br />
K∈Th K<br />
∑∫<br />
K∈Th K<br />
N<br />
( ∇ o s(<br />
) )<br />
q∇ o vdΩ − τ ∇q<br />
o ∇pdK<br />
= τ ∇q<br />
o ˆ u dK . (10)<br />
h<br />
h<br />
The SUPG formulation is utilized to solve the plane flow of an isothermal fluid <strong>in</strong> a square lid-driven<br />
0 , 1 × 0,<br />
1 presented <strong>in</strong> Fig. 5. Fluid dynamic viscosity is def<strong>in</strong>ed as µ = 1 and the body force<br />
cavity ( ) ( )<br />
b = 0 . The stabilization coefficient is def<strong>in</strong>ed as τ<br />
K<br />
2<br />
αhK<br />
= <strong>with</strong> = 0.<br />
01<br />
2µ<br />
α .<br />
3. Non-stationary heat transfer problem The weak form of the non-stationary heat transfer problem:<br />
F<strong>in</strong>d the temperature distribution u uD<br />
+ V V =<br />
1<br />
v ∈ H Ω : v = 0 on Γ satisfy<strong>in</strong>g<br />
∈ where ( )<br />
{ }<br />
( ρc pu<br />
v)<br />
+ ∫ k∇u<br />
o ∇v<br />
dΩ<br />
+ ∫ βuv<br />
dΓN<br />
= ∫ fv dΩ<br />
+ ∫ ( βu<br />
N + q)<br />
v dΓN<br />
∀v<br />
∈V<br />
Ω<br />
&, (11)<br />
Ω<br />
ΓN<br />
( u(<br />
) , v)<br />
= ( ρc<br />
u , v)<br />
∀v<br />
∈V<br />
c p 0<br />
Ω p 0<br />
Ω<br />
Ω<br />
ΓN<br />
ρ (12)<br />
FE - discretization <strong>in</strong> time gives the follow<strong>in</strong>g matrix system:<br />
M u&<br />
+ Ku = f<br />
(13)<br />
Apply<strong>in</strong>g the trapezoidal rule for the time discretization we obta<strong>in</strong><br />
k + 1<br />
k k<br />
( M + αδ K)<br />
u = [ M − ( 1−<br />
α ) δ K]<br />
u + δ f<br />
where M is the mass matrix, δ is the time step, ∈[<br />
0,<br />
1]<br />
α gives different time <strong>in</strong>tegration schemes. We<br />
focus on the solution of the heat-transfer problem <strong>in</strong> the L-shape doma<strong>in</strong> presented <strong>in</strong> Fig. 6.<br />
Fig. 6 Geometry of the step problem<br />
The <strong>in</strong>itial temperature distribution is 0 0 = u at 0 = t . The L-shape doma<strong>in</strong> is heated/cooled <strong>with</strong> 1 ± = u N<br />
<strong>with</strong> β = 1 and no <strong>in</strong>ternal heat<strong>in</strong>g f = 0 .<br />
D<br />
∀F<br />
s<br />
(8)<br />
(14)
SOLVER PERFORMANCE AND REUTILIZATION OF PARTIAL LU FACTORIZATIONS<br />
1. 3D DC Resistivity logg<strong>in</strong>g measurements simulations <strong>in</strong> deviated wells We performed<br />
measurements of execution time and relative efficiency on the LONESTAR l<strong>in</strong>ux cluster [15] for the 3D<br />
resistivity logg<strong>in</strong>g measurements simulations problem <strong>with</strong> 2D formulation based on non-orthogonal<br />
system of coord<strong>in</strong>ates and Fourier series expansions. From these measurements its follows that the solver<br />
atta<strong>in</strong>s 60% relative efficiency up to 192 processors, compare Fig. 7 and 8.<br />
1000<br />
100<br />
1.2<br />
10<br />
1<br />
0.8<br />
0.6<br />
0.4<br />
0.2<br />
0<br />
1<br />
1<br />
1<br />
4<br />
4<br />
12<br />
20<br />
28<br />
40<br />
56<br />
72<br />
88<br />
104<br />
120<br />
Time [s]<br />
Fig. 7 Paralell solver execution time for <strong>in</strong>creas<strong>in</strong>g number of processors<br />
12<br />
20<br />
28<br />
40<br />
56<br />
72<br />
88<br />
104<br />
144<br />
176<br />
208<br />
Relative efficiency E<br />
Fig. 8 Relative efficiency E=T1/(pTp) of the parallel solver<br />
2. 2D Stokes problem We analyzed percent of reutilized LU factorizations on a sequence of meshes<br />
generated by the self-adaptive hp FEM for the cavity problem, presented <strong>in</strong> Fig. 9. Some LU factorizations<br />
computed <strong>in</strong> the previous iteration can be effectively reutilized <strong>in</strong> the next iteration. This is because mesh<br />
ref<strong>in</strong>ements occurs only <strong>in</strong> close neighborhood of the two local s<strong>in</strong>gularities localized on the left and right<br />
top parts of the mesh. LU factorizations associated <strong>with</strong> elements denoted by white color <strong>in</strong> Fig. 10 are not<br />
recomputed, but reutilized from the previous mesh. However, on the top of the elim<strong>in</strong>ation tree, when<br />
ref<strong>in</strong>ed parts of the mesh are merged <strong>with</strong> unref<strong>in</strong>ed parts, it is necessary to recompute LU factorization,<br />
s<strong>in</strong>ce one of matrix contributions, com<strong>in</strong>g from ref<strong>in</strong>ed parts of the mesh, is completely new.<br />
Notice that LU factorizations from previous mesh cannot be reutilized if local part of the mesh is either h<br />
ref<strong>in</strong>ed (f<strong>in</strong>ite elements are broken) or p ref<strong>in</strong>ed (local polynomial order of approximation is changed).<br />
This is because <strong>in</strong>crease of the polynomial order of approximation changes number of degrees of freedom<br />
<strong>in</strong> the local matrix and the LU factorization is no longer valid.<br />
120<br />
144<br />
176<br />
208<br />
240<br />
240
3. Non-stationary heat transfer problem F<strong>in</strong>ally, we tested the reutilization of partial factorizations for<br />
the non-stationary heat transfer problem, see Fig. 11. In these k<strong>in</strong>d of problems, it possible to reutilize LU<br />
factorizations <strong>with</strong><strong>in</strong> a sequence of meshes generated for one time step. However it is not possible to<br />
reutilize LU factorization from one time step to the other, s<strong>in</strong>ce the problem is non-stationary and changes<br />
from one time step to the other, even if computational mesh is the same.<br />
Fig. 9 Sequence of meshes for the cavity problem. Different colors denote different polynomial orders of<br />
approximation vary<strong>in</strong>g from p=1 to p=9 on f<strong>in</strong>ite element edges and <strong>in</strong>teriors (<strong>in</strong> both directions).<br />
RESULTS<br />
Fig. 10 Reutilization of LU factorizations from previous meshes dur<strong>in</strong>g first three iterations.<br />
We conclude our presentation by present<strong>in</strong>g <strong>in</strong> Fig. 12 numerical results for the 3D resistivity logg<strong>in</strong>g<br />
measurements problem. In Fig. 13 we present also velocity distribution for the cavity problem, and f<strong>in</strong>al<br />
temperature distribution for the non-stationary heat transfer problem. Thanks to self-adaptive hp FEM all
these results have been computed on optimal meshes deliver<strong>in</strong>g numerical solution <strong>with</strong> less then 3%<br />
relative error.<br />
CONCLUSIONS<br />
We have proposed efficient sequential and parallel solver for hp FEM. The solver scales well up to 200<br />
processors. It provides an <strong>in</strong>frastructure for stor<strong>in</strong>g partial LU factorizations at elim<strong>in</strong>ation tree nodes, to be<br />
reutilized <strong>in</strong> further calls to the solver, after the mesh is hp ref<strong>in</strong>ed. We showed, that partial LU<br />
factorizations can be effectively reutilized <strong>with</strong><strong>in</strong> iterations of self-adaptive hp FEM. However it is not<br />
possible to reutilize partial factorizations from previous time step <strong>in</strong> non-stationary problems.<br />
Fig. 11 Optimal meshes for particular time steps for the non-stationary heat transfer problem.<br />
Fig. 12 Results for 3D AC resistivity logg<strong>in</strong>g measurements <strong>with</strong> 20 kHz wirel<strong>in</strong>e tool, for resistivities of<br />
formation layers presented <strong>in</strong> the second panel, for axi-symmetric as well as 30 and 60 degrees tilted well.<br />
Fig. 13 Results (horizontal and vertical velocity component) for the cavity problem, as well as temperature<br />
distribution for the non-stationary heat transfer problem <strong>in</strong> f<strong>in</strong>al time step t=0.127 s.
Acknowledgements The support of Polish MNiSW grant no. 3 T08B 055 29 is gratefully<br />
acknowledged. The first author is also supported by the Foundation for Polish Science under Homm<strong>in</strong>g<br />
Programme. The second and third authors are supported by The University of Texas at Aust<strong>in</strong>’s Jo<strong>in</strong>t<br />
Industry Research Consortium on Formation Evaluation sponsored by Aramco, Baker Atlas, BP, British<br />
Gas, ConocoPhilips, Chevron, ENI E&P, ExxonMobil, Halliburton Energy Services, Hydro, Marathon Oil<br />
Corporation, Mexican Institute for Petroleum, Occidental Petroleum Corporation, Petrobras,<br />
Schlumberger, Shell International E&P, Statoil, TOTAL, and Weatherford.<br />
REFERENCES<br />
[1] L. Demkowicz, 2D hp-Adaptive F<strong>in</strong>ite Element Package, TICAM Report 02-06, The University of<br />
Texas at Aust<strong>in</strong> (2002)<br />
[2] M. Paszyński, J. Kurtz, L. Demkowicz, Parallel Fully Automatic hp Adaptive 2D F<strong>in</strong>ite Element<br />
Package, Computer Methods <strong>in</strong> Applied Mechanics and Eng<strong>in</strong>eer<strong>in</strong>g, 195, 7-8, (2006), pp. 711-741<br />
[3] L. Demkowicz, D. Pardo, W. Rachowicz, 3D hp-Adaptive F<strong>in</strong>ite Element Package (3Dhp90) The<br />
Ultimate Data Structure for Three Dimensional, Anisotropic hp Ref<strong>in</strong>ements, TICAM Report 02-24,<br />
The University of Texas at Aust<strong>in</strong> (2002)<br />
[4] M. Paszyński, L. Demkowicz, Parallel Fully Automatic hp Adaptive 3D F<strong>in</strong>ite Element Package,<br />
Eng<strong>in</strong>eer<strong>in</strong>g <strong>with</strong> Computers, 22, 3-4, (2006), pp. 255-276.<br />
[5] Matuszyk P., Paszyński M., Extensions of the 2D fully automatic hp adaptive F<strong>in</strong>ite Element Method<br />
for Stokes and non-stationary heat transfer problems, 9 US National Congress on Computational<br />
Mechanics, <strong>2007</strong>, USACM, San Francisco, USA (<strong>2007</strong>)<br />
[6] D. Pardo, L. Demkowicz, C. Torres-Verd<strong>in</strong>, M. Paszyński, Simulation of Resistivity<br />
Logg<strong>in</strong>g-While-Drill<strong>in</strong>g (LWD) Measurements Us<strong>in</strong>g a Self-Adaptive Goal-Oriented hp-F<strong>in</strong>ite<br />
Element Method, SIAM Journal on Applied Mathematics, 66, (2006), pp. 2085-2106.<br />
[7] I. S. Duff , J.K. Reid, The multifrontal solution of <strong>in</strong>def<strong>in</strong>ite sparse symmetric l<strong>in</strong>ear systems, ACM<br />
Trans. on Math. Soft., 9 (1983) pp. 302-325<br />
[8] L. Giraud, A. Marocco, J.-C. Rioual, Iterative versus direct parallel substructur<strong>in</strong>g methods <strong>in</strong><br />
semiconductor device modell<strong>in</strong>g, Numerical L<strong>in</strong>ear Algebra <strong>with</strong> Applications, 12, 1 (2005) pp. 33-55<br />
[9] J. A. Scott, Parallel Frontal Solvers for Large Sparse L<strong>in</strong>ear Systems, ACM Trans. on Math. Soft., 29,<br />
4 (2003) pp. 395-417<br />
[10] P. R. Amestoy, I. S. Duff , J.-Y. L’Excellent, Multifrontal parallel distributed symmetric and<br />
unsymmetric solvers, <strong>in</strong> Comput. Methods <strong>in</strong> Appl. Mech. Eng. 184 (2000) pp. 501-520<br />
[11] P. R. Amestoy, I. S. Duff , J. Koster, J.-Y. L’Excellent, A fully asynchronous multifrontal solver us<strong>in</strong>g<br />
distributed dynamic schedul<strong>in</strong>g, SIAM Journal of Matrix Analysis and Applications, 23, 1 (2001) pp.<br />
15-41<br />
[12] P. R. Amestoy, A. Guermouche, J.-Y. L’Excellent, S. Pralet, Hybrid schedul<strong>in</strong>g for the parallel<br />
solution of l<strong>in</strong>ear systems. Accepted to Parallel Comput<strong>in</strong>g (2005)<br />
[13] D. Pardo, V. Calo, C. Torres-Verd<strong>in</strong>, M.J. Nam, Fourier Series Expansion <strong>in</strong> a Non-Orthogonal<br />
System of Coord<strong>in</strong>ates for Simulation of 3D Borehole Resistivity Measurements. Part I: DC,<br />
submitted to Computer Methods <strong>in</strong> Applied Mechanics and Eng<strong>in</strong>eer<strong>in</strong>g (<strong>2007</strong>)<br />
[14] T. J. R. Hughes, L. P. Franca, A New FEM for Computational Fluid Dynamics: VII The Stokes<br />
Problem <strong>with</strong> Varoious Well-Posed Boundary Conditions: Symmetric Formulations that Converge for<br />
All Velocity/Pressure Spaces, Computer Methods <strong>in</strong> Applied Mechanics and Eng<strong>in</strong>er<strong>in</strong>g, 65 (1987)<br />
pp. 85-96<br />
[15] Lonestar Cluster Users’ Manual http://www.tacc.utexas.edu/services/userguides/lonestar