07.08.2013 Views

APCOM'07 in conjunction with EPMESC XI, December 3-6, 2007 ...

APCOM'07 in conjunction with EPMESC XI, December 3-6, 2007 ...

APCOM'07 in conjunction with EPMESC XI, December 3-6, 2007 ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

APCOM’07 <strong>in</strong> <strong>conjunction</strong> <strong>with</strong> <strong>EPMESC</strong> <strong>XI</strong>, <strong>December</strong> 3-6, <strong>2007</strong>, Kyoto, JAPAN<br />

Efficient Sequential and Parallel Solvers for hp F<strong>in</strong>ite Element<br />

Method<br />

Maciej Paszyński 1 *, David Pardo 2 , Carlos Torres-Verd<strong>in</strong> 2 , Paweł Matuszyk 3<br />

1<br />

Department of Computer Science, AGH University of Science and Technology, Al.Mickiewicza 30,<br />

Kraków, 30-059, Poland<br />

2<br />

Department of Petroleum and Geosystems Eng<strong>in</strong>eer<strong>in</strong>g, The University of Texas,1 University Station<br />

C0300, Aust<strong>in</strong>, Texas, 78712, USA<br />

3<br />

Department of Applied Computer Science and Model<strong>in</strong>g, AGH University of Science and Technology,<br />

Al.Mickiewicza 30, Kraków, 30-059, Poland<br />

e-mail: paszynsk@agh.edu.pl, dzubiaur@gmail.com, cverd<strong>in</strong>@uts.cc.utexas.edu, pjm@agh.edu.pl<br />

Abstract We present a sequential and parallel direct solver designed for hp F<strong>in</strong>ite Element Method (FEM)<br />

applied to solve numerous problems, <strong>in</strong>clud<strong>in</strong>g non-stationary heat transfer problem, the Stokes problem,<br />

and the resistivity logg<strong>in</strong>g measurement simulations. The hp FEM <strong>in</strong>corporates a self-adaptive strategy that<br />

generates a sequence of hp ref<strong>in</strong>ed meshes, deliver<strong>in</strong>g exponential convergence of the numerical error <strong>with</strong><br />

respect to the number of degrees of freedom (mesh size or CPU time). The hp meshes generated by the<br />

self-adaptive strategy are obta<strong>in</strong>ed by multiple h or p ref<strong>in</strong>ements of the <strong>in</strong>itial mesh. The self-adaptive<br />

mesh, generated <strong>in</strong> this way, is stored as ref<strong>in</strong>ement trees grow<strong>in</strong>g down from nodes of the <strong>in</strong>itial mesh.<br />

First, we elim<strong>in</strong>ate degrees of freedom start<strong>in</strong>g from leaves of ref<strong>in</strong>ement trees, and then we elim<strong>in</strong>ate<br />

common degrees of freedom travel<strong>in</strong>g up the ref<strong>in</strong>ement trees. The solver is parallelized by utiliz<strong>in</strong>g the<br />

doma<strong>in</strong> decomposition paradigm. In other words, the solver generates Schur complements of local<br />

sub-systems, from bottom of ref<strong>in</strong>ement trees, through <strong>in</strong>itial mesh elements and sub-doma<strong>in</strong>s. Then, the<br />

global problem reduces to relatively small one common "<strong>in</strong>terface" problem, and f<strong>in</strong>ally the backward<br />

substitution must be executed to propagate the solution from the common <strong>in</strong>terface, through sub-doma<strong>in</strong>s,<br />

<strong>in</strong>itial mesh elements, down to leafs of ref<strong>in</strong>ement trees. The LU factorizations computed at different levels<br />

of elim<strong>in</strong>ation trees are stored at tree nodes to be reutilized by the solver after the computational mesh is<br />

locally ref<strong>in</strong>ed. We pesent also the performance measurements of the solver.<br />

Key words: F<strong>in</strong>ite Element Method, hp adaptivity, direct solver, parallel direct solver<br />

INTRODUCTION<br />

The data structures and efficient direct solvers for computational meshes utilized by fully automatic hp<br />

adaptive 2D and 3D F<strong>in</strong>ite Element Method (FEM) codes [1,2,3,4] are presented. The codes generate a<br />

sequence of hp meshes deliver<strong>in</strong>g exponential convergence of the numerical error <strong>with</strong> respect to the<br />

number of degrees of freedom (mesh size or CPU time). The hp meshes consist <strong>in</strong> f<strong>in</strong>ite elements <strong>with</strong><br />

various sizes and various polynomial order of approximation, chang<strong>in</strong>g locally, on f<strong>in</strong>ite element faces,<br />

edges and <strong>in</strong>teriors. The f<strong>in</strong>al optimal mesh is constructed by a sequence of h or p ref<strong>in</strong>ements executed on<br />

the <strong>in</strong>itial mesh. The h ref<strong>in</strong>ements consist <strong>in</strong> break<strong>in</strong>g some f<strong>in</strong>ite elements <strong>in</strong>to smaller son elements,<br />

whilst the p ref<strong>in</strong>ements consist <strong>in</strong> adjust<strong>in</strong>g polynomial orders of approximation on some element faces,<br />

edges and <strong>in</strong>teriors. In the utilized data structure, the h ref<strong>in</strong>ements are stored as trees grow<strong>in</strong>g from <strong>in</strong>itial<br />

mesh elements. This allows us to propose the efficient direct solver work<strong>in</strong>g on the level of ref<strong>in</strong>ement<br />

trees. The degrees of freedom are elim<strong>in</strong>ated by travel<strong>in</strong>g the ref<strong>in</strong>ement trees, from leaves nodes to the<br />

level of <strong>in</strong>itial mesh elements nodes. The local Schur complements associated <strong>with</strong> particular elim<strong>in</strong>ation<br />

levels can be stored <strong>in</strong> tree nodes. Each time the mesh is locally ref<strong>in</strong>ed, only local Schur complements<br />

associated to newly ref<strong>in</strong>ed nodes must be updated. The Schur complements associated <strong>with</strong> not ref<strong>in</strong>ed<br />

nodes can be still utilized <strong>in</strong> the process of solver execution over the newly ref<strong>in</strong>ed mesh. The idea of the


ecursive solver work<strong>in</strong>g on the ref<strong>in</strong>ement trees is generalized <strong>in</strong>to the elim<strong>in</strong>ation tree constructed out of<br />

<strong>in</strong>itial mesh elements, as well as <strong>in</strong>to the elim<strong>in</strong>ation tree built based on sub-doma<strong>in</strong>s obta<strong>in</strong>ed by the<br />

doma<strong>in</strong> decomposition of the entire mesh. The solvers were tested on a sequence of problems, <strong>in</strong>clud<strong>in</strong>g<br />

the non-stationary heat transfer problem, the Stokes problem [5], and the resistivity logg<strong>in</strong>g measurements<br />

simulation [6].<br />

SEQUENTIAL AND PARALLEL DIRECT SOLVERS<br />

In this chapter we present a classification of up to date sequential and parallel direct solvers dedicated to<br />

FEM computations.<br />

1. Frontal solvers The solver browses f<strong>in</strong>ite elements <strong>in</strong> the order prescribed by the user. It aggregates<br />

degrees of freedom to the so-called frontal matrix. Based on the elements connectivity <strong>in</strong>formation it<br />

recognizes fully assembled degrees of freedom and elim<strong>in</strong>ates them from the frontal matrix [7]. This is<br />

done to keep the size of the frontal matrix as small as possible. The key for efficient work of the frontal<br />

solver is optimal order<strong>in</strong>g of f<strong>in</strong>ite elements.<br />

2. Multifrontal solvers The solver constructs the degrees of freedom connectivity tree based on analysis<br />

of the geometry of computational doma<strong>in</strong> [7]. It is usually done by utiliz<strong>in</strong>g graph representation of<br />

computational doma<strong>in</strong> and graph partition<strong>in</strong>g algorithm. The frontal elim<strong>in</strong>ation pattern is utilized on<br />

every tree branch. F<strong>in</strong>ite elements are jo<strong>in</strong>ed <strong>in</strong>to pairs and degrees of freedom are assembled <strong>in</strong>to frontal<br />

matrix associated <strong>with</strong> the branch. The process is repeated until the root of the assembly tree is reached.<br />

F<strong>in</strong>ally, the common dense problem is solved and partial backward substitutions are recursively executed<br />

on the assembly tree.<br />

3. Sub-structur<strong>in</strong>g method solver This is a parallel solver work<strong>in</strong>g over a computational doma<strong>in</strong><br />

partitioned <strong>in</strong>to multiple sub-doma<strong>in</strong>s. It works <strong>in</strong> the follow<strong>in</strong>g steps [8]. First, the sub-doma<strong>in</strong>s <strong>in</strong>ternal<br />

degrees of freedom are elim<strong>in</strong>ated <strong>with</strong> respect to the <strong>in</strong>terface degrees of freedom. Second, the <strong>in</strong>terface<br />

problem is solved. F<strong>in</strong>ally, the <strong>in</strong>ternal problems are solved by execut<strong>in</strong>g backward substitution on each<br />

sub-doma<strong>in</strong>, utiliz<strong>in</strong>g the <strong>in</strong>terface problem solution computed <strong>in</strong> the second step.<br />

4. Multiple fronts solver This is a simplest implementation of the sub-structur<strong>in</strong>g method solver [9]. It<br />

performs partial frontal decomposition on each sub-doma<strong>in</strong>. Then, it sums up contributions from<br />

particular sub-doma<strong>in</strong>s <strong>in</strong>to one common <strong>in</strong>terface problem. F<strong>in</strong>ally, it solves the common <strong>in</strong>terface<br />

problem by utiliz<strong>in</strong>g a sequential frontal solver.<br />

5. Direct sub-structur<strong>in</strong>g method solver In this version of the sub-structur<strong>in</strong>g method solver, the <strong>in</strong>terface<br />

problem is solved by utiliz<strong>in</strong>g the parallel solver [8].<br />

6. Sparse direct method solver This is a parallel implementation of the multifrontal solver. An example of<br />

the sparse direct method solver is the MUlti frontal Massively Parallel Solver (MUMPS) [10-12].<br />

DATA STRUCTURE SUPPORTING HP REFINEMENTS<br />

This section <strong>in</strong>troduces the data structure stor<strong>in</strong>g the history of mesh transformation that can be further<br />

utilized by the direct solver. We propose to store two levels of connectivity trees:<br />

• The <strong>in</strong>itial mesh elements connectivity tree<br />

• h ref<strong>in</strong>ements connectivity trees, which grow down from the level of <strong>in</strong>itial mesh elements.<br />

For parallel implementation of the algorithm we propose three levels of connectivity trees:<br />

• The connectivity tree for sub-doma<strong>in</strong>s built out of the computational mesh distributed <strong>in</strong>to<br />

sub-doma<strong>in</strong>s<br />

• The connectivity trees for <strong>in</strong>itial mesh elements, built separately on every sub-doma<strong>in</strong><br />

• h ref<strong>in</strong>ements connectivity trees, which grow down from every ref<strong>in</strong>ed <strong>in</strong>itial mesh elements.<br />

The three connectivity trees related to computational mesh presented <strong>in</strong> Fig. 1 are presented <strong>in</strong> Fig. 2.<br />

Partial LU factorization performed by the solver will be stored at tree nodes for further reutilization.


Fig. 1 Exemplary computational doma<strong>in</strong> <strong>with</strong> 4 <strong>in</strong>itial mesh elements partitioned <strong>in</strong>to 2 sub-doma<strong>in</strong>s.<br />

Each <strong>in</strong>itial mesh element is broken <strong>in</strong>to 4 son elements.<br />

Fig. 2 Connectivity trees for computational mesh presented <strong>in</strong> Fig. 1<br />

DIRECT SOLVER FOR hp FINITE ELEMENT METHOD<br />

In this section we describe a proposed new direct solver dedicated to fully automatic hp F<strong>in</strong>ite Element<br />

Method [1-4]. The software <strong>in</strong>corporates a self-adaptive strategy that generates a sequence of hp ref<strong>in</strong>ed<br />

meshes, deliver<strong>in</strong>g exponential convergence of the numerical error <strong>with</strong> respect to the number of degrees<br />

of freedom (mesh size or CPU time). The hp meshes generated by the self-adaptive strategy are obta<strong>in</strong>ed<br />

by multiple h or p ref<strong>in</strong>ements of the <strong>in</strong>itial mesh. The h ref<strong>in</strong>ement consists <strong>in</strong> break<strong>in</strong>g some f<strong>in</strong>ite<br />

elements <strong>in</strong>to 2 (<strong>in</strong> horizontal or vertical direction) or 4 son elements, and/or the p ref<strong>in</strong>ement consists <strong>in</strong><br />

<strong>in</strong>creas<strong>in</strong>g polynomial order of approximation on some f<strong>in</strong>ite element edges, faces and <strong>in</strong>teriors.<br />

The self-adaptive mesh, generated <strong>in</strong> this way, is stored as ref<strong>in</strong>ement trees grow<strong>in</strong>g down from the <strong>in</strong>itial<br />

mesh. We utilize a tree-like structure for the computational mesh. First, we elim<strong>in</strong>ate degrees of freedom<br />

start<strong>in</strong>g from leaves of ref<strong>in</strong>ement trees, and then we elim<strong>in</strong>ate common degrees of freedom travel<strong>in</strong>g up<br />

the ref<strong>in</strong>ement trees. In other words, we compute a sequence of Schur complements, start<strong>in</strong>g from the<br />

bottom level, and travel<strong>in</strong>g up the structure of ref<strong>in</strong>ement trees. Then, we utilize the nested dissection<br />

scheme to elim<strong>in</strong>ate degrees of freedom on the level of <strong>in</strong>itial mesh elements.<br />

The parallel version of the solver utilizes the doma<strong>in</strong> decomposition paradigm. The computational mesh is<br />

partitioned <strong>in</strong>to multiple sub-doma<strong>in</strong>s <strong>with</strong> each sub-doma<strong>in</strong> assigned to a separate processor. In other<br />

words, the solver generates Schur complements of local sub-systems, from bottom of ref<strong>in</strong>ement trees,<br />

through <strong>in</strong>itial mesh elements and sub-doma<strong>in</strong>s. Then, the global problem reduces to relatively small one<br />

common "<strong>in</strong>terface" problem, and f<strong>in</strong>ally the backward substitution must be executed to propagate the<br />

solution from the common <strong>in</strong>terface, through sub-doma<strong>in</strong>s, <strong>in</strong>itial mesh elements, down to the leafs of<br />

ref<strong>in</strong>ement trees.<br />

The algorithm of the recursive solver can be summarized <strong>in</strong> the follow<strong>in</strong>g pseudo-code:<br />

matrix function recursive_solver(tree_node)<br />

if tree_node has no son nodes then


elim<strong>in</strong>ate leaf element stiffness matrix <strong>in</strong>ternal nodes<br />

return Schur complement sub-matrix<br />

else if tree_node has son nodes then<br />

do for each son<br />

son_matrix = recursive_solver(tree_node_son)<br />

merge son_matrix <strong>in</strong>to new_matrix<br />

enddo<br />

decide which unknowns of new_matrix can be elim<strong>in</strong>ated<br />

perform partial forward elim<strong>in</strong>ation on new_matrix<br />

return Schur complement sub-matrix<br />

endif<br />

The solver can be used to effectively solve multiple right hand sides, s<strong>in</strong>ce for each new right hand side<br />

only new backward substitution must be executed. This is needed <strong>in</strong> context of goal-oriented adaptivity,<br />

where solution of the dual problem is needed. The solver is written <strong>in</strong> FORTRAN 90. The parallelization<br />

of the solver consists <strong>in</strong> assign<strong>in</strong>g tree branches to particular processors and simply send<strong>in</strong>g Schur<br />

complements contributions from one branch to the other. We implemented the parallel version of the<br />

solver by utiliz<strong>in</strong>g the Message Pass<strong>in</strong>g Interface (MPI).<br />

Fig. 3 Elim<strong>in</strong>ation patterns over the distributed connectivity tree<br />

It should be emphasized, that the communication cost for the solver is related <strong>with</strong> the size of local systems<br />

of equations related to common edges between adjacent elements. The <strong>in</strong>terior and edge degrees of freedom<br />

that are elim<strong>in</strong>ated on current level of connectivity tree are denoted <strong>in</strong> Fig. 3 by dashed l<strong>in</strong>es, whilst degrees<br />

of freedom that rema<strong>in</strong> unelim<strong>in</strong>ated are denoted by solid l<strong>in</strong>es.<br />

COMPUTATIONAL PROBLEMS<br />

1. 3D DC Resistivity logg<strong>in</strong>g measurements simulations <strong>in</strong> deviated wells The problem consists <strong>in</strong><br />

solv<strong>in</strong>g the conductive media equation<br />

imp<br />

( ∇u)<br />

= o J<br />

∇ o σ −∇<br />

(1)<br />

<strong>in</strong> the 3D doma<strong>in</strong> <strong>with</strong> different formation layers, presented <strong>in</strong> Fig. 4. There is a tool <strong>with</strong> one transmitter<br />

and two receiver electrodes <strong>in</strong> the borehole. The tool is shifted along the borehole. The reflected waves are<br />

recorded by the receiver electrodes <strong>in</strong> order to determ<strong>in</strong>e location of the oil formation <strong>in</strong> the ground. Of<br />

particular <strong>in</strong>terest to the oil <strong>in</strong>dustry are 3D simulations <strong>with</strong> deviated wells, where the angle between the<br />

borehole and formation layers is sharp θ 90 . This fully 3D problem can be reduced to 2D by<br />

0 ≠<br />

consider<strong>in</strong>g three non-orthogonal systems of coord<strong>in</strong>ates presented <strong>in</strong> Fig. 4. The variational formulation<br />

u ∈ u<br />

1<br />

+ H Ω such that:<br />

<strong>in</strong> the new system of coord<strong>in</strong>ates consists <strong>in</strong> f<strong>in</strong>d<strong>in</strong>g ( )<br />

∂u<br />

∂v<br />

1<br />

, ˆ σ = v,<br />

fˆ<br />

∀v<br />

∈ H<br />

2<br />

D<br />

∂ξ<br />

∂ξ<br />

L ( Ω)<br />

2<br />

L<br />

( Ω)<br />

( Ω)<br />

D<br />

D<br />

(2)


`<br />

Fig. 4 Three non-orthogonal systems of coord<strong>in</strong>ates <strong>in</strong> the borehole and formation layers<br />

−1<br />

−1T<br />

where new electrical conductivity of the media ˆ σ : = J σ J J and f ˆ : = f J <strong>with</strong><br />

gradient of the impressed current and<br />

( x1,<br />

x2<br />

, x3<br />

)<br />

( ζ , ζ , ζ )<br />

1<br />

2<br />

3<br />

f ∇J<br />

imp<br />

= is the<br />

∂<br />

J =<br />

(3)<br />

∂<br />

stands for the Jacobian matrix of the change of variables from the Cartesian reference to non-orthogonal<br />

J = det J is its determ<strong>in</strong>ant. We take Fourier series expansions <strong>in</strong> the<br />

systems of coord<strong>in</strong>ates, and ( )<br />

azimuthal ζ 2 direction<br />

( 2<br />

1, 2,<br />

3 ) ∑ ( 1,<br />

3 ) ;<br />

+∞ =<br />

u<br />

l<br />

jlζ<br />

ζ ζ = ul<br />

ζ ζ e<br />

l=<br />

−∞<br />

1, 2 , 3 ∑ 1,<br />

3<br />

2<br />

+∞ = m<br />

= m<br />

m=<br />

−∞<br />

jm<br />

e ζ<br />

ζ ζ ζ σ ζ ζ<br />

( 2<br />

1, 2,<br />

3 ) ∑ ( 1,<br />

3 ) ;<br />

+∞ =<br />

f<br />

n<br />

jnζ<br />

ζ ζ = f n ζ ζ e<br />

n=<br />

−∞<br />

ζ (4)<br />

( ) ( ) ;<br />

σ (5)<br />

ζ (6)<br />

1<br />

The f<strong>in</strong>al variational formulation for zero frequency (DC) is the follow<strong>in</strong>g: F<strong>in</strong>d ∈ u + H ( Ω)<br />

⎛ ∂u<br />

⎞<br />

⎛ ∂v<br />

⎞<br />

n=<br />

k + 2<br />

∑ ⎜<br />

⎟ , ˆ σ k −n<br />

⎜<br />

⎟ = vk<br />

, fˆ<br />

n 2<br />

L<br />

n=<br />

k−<br />

2<br />

D<br />

⎝ ∂ξ<br />

⎠k<br />

⎝ ∂ξ<br />

2<br />

⎠n<br />

2<br />

L ( Ω2<br />

D )<br />

( Ω )<br />

s<strong>in</strong>ce five Fourier modes are enough to represent exactly the new material coefficients [13].<br />

∀v<br />

Fig. 5 Geometry of the cavity problem<br />

k<br />

u such that:<br />

In the similar way we can derive the variational formulation for non-zero frequency (AC): F<strong>in</strong>d<br />

1 ( ) s ∈ H Γ ( curl;<br />

Ω)<br />

such that:<br />

E E<br />

D<br />

D<br />

(7)


ζ<br />

−1<br />

ζ<br />

imp<br />

( ∇ × F ) s ( ) s n ( ∇ × E)<br />

l − F ( ) ( ) L ( ) s kˆ<br />

2<br />

, ˆ µ −<br />

, s−n<br />

El<br />

= − jω<br />

Fs<br />

, Jˆ<br />

2<br />

Ω<br />

2<br />

s 2<br />

2 D<br />

L ( Ω ) L ( Ω )<br />

n=<br />

s+<br />

2<br />

∑<br />

n=<br />

s−2<br />

2 D<br />

2 D<br />

2. 2D Stokes problem We consider the SUPG (Streaml<strong>in</strong>e Upw<strong>in</strong>d Petrov-Galerk<strong>in</strong> [14]) stabilized weak<br />

formulation of the Stokes problem: F<strong>in</strong>d velocity and pressure fields ( , p ) ∈ ( u + V)<br />

× Q where<br />

1<br />

V = { v ∈ H ( Ω)<br />

: v = 0 on ΓD<br />

} and = ( Ω)<br />

2<br />

L<br />

∫ µ & ε ( ) : & ε ( v)<br />

Ω − ∫ p∇<br />

o vdΩ<br />

= ∫ ρb<br />

o vdΩ<br />

+ ∫<br />

Ω<br />

−<br />

Ω<br />

Ω<br />

Q such that (9-10):<br />

ΓN<br />

u D<br />

2 u d t o vdΓ<br />

(9)<br />

∫<br />

Ω<br />

∑∫<br />

K∈Th K<br />

∑∫<br />

K∈Th K<br />

N<br />

( ∇ o s(<br />

) )<br />

q∇ o vdΩ − τ ∇q<br />

o ∇pdK<br />

= τ ∇q<br />

o ˆ u dK . (10)<br />

h<br />

h<br />

The SUPG formulation is utilized to solve the plane flow of an isothermal fluid <strong>in</strong> a square lid-driven<br />

0 , 1 × 0,<br />

1 presented <strong>in</strong> Fig. 5. Fluid dynamic viscosity is def<strong>in</strong>ed as µ = 1 and the body force<br />

cavity ( ) ( )<br />

b = 0 . The stabilization coefficient is def<strong>in</strong>ed as τ<br />

K<br />

2<br />

αhK<br />

= <strong>with</strong> = 0.<br />

01<br />

2µ<br />

α .<br />

3. Non-stationary heat transfer problem The weak form of the non-stationary heat transfer problem:<br />

F<strong>in</strong>d the temperature distribution u uD<br />

+ V V =<br />

1<br />

v ∈ H Ω : v = 0 on Γ satisfy<strong>in</strong>g<br />

∈ where ( )<br />

{ }<br />

( ρc pu<br />

v)<br />

+ ∫ k∇u<br />

o ∇v<br />

dΩ<br />

+ ∫ βuv<br />

dΓN<br />

= ∫ fv dΩ<br />

+ ∫ ( βu<br />

N + q)<br />

v dΓN<br />

∀v<br />

∈V<br />

Ω<br />

&, (11)<br />

Ω<br />

ΓN<br />

( u(<br />

) , v)<br />

= ( ρc<br />

u , v)<br />

∀v<br />

∈V<br />

c p 0<br />

Ω p 0<br />

Ω<br />

Ω<br />

ΓN<br />

ρ (12)<br />

FE - discretization <strong>in</strong> time gives the follow<strong>in</strong>g matrix system:<br />

M u&<br />

+ Ku = f<br />

(13)<br />

Apply<strong>in</strong>g the trapezoidal rule for the time discretization we obta<strong>in</strong><br />

k + 1<br />

k k<br />

( M + αδ K)<br />

u = [ M − ( 1−<br />

α ) δ K]<br />

u + δ f<br />

where M is the mass matrix, δ is the time step, ∈[<br />

0,<br />

1]<br />

α gives different time <strong>in</strong>tegration schemes. We<br />

focus on the solution of the heat-transfer problem <strong>in</strong> the L-shape doma<strong>in</strong> presented <strong>in</strong> Fig. 6.<br />

Fig. 6 Geometry of the step problem<br />

The <strong>in</strong>itial temperature distribution is 0 0 = u at 0 = t . The L-shape doma<strong>in</strong> is heated/cooled <strong>with</strong> 1 ± = u N<br />

<strong>with</strong> β = 1 and no <strong>in</strong>ternal heat<strong>in</strong>g f = 0 .<br />

D<br />

∀F<br />

s<br />

(8)<br />

(14)


SOLVER PERFORMANCE AND REUTILIZATION OF PARTIAL LU FACTORIZATIONS<br />

1. 3D DC Resistivity logg<strong>in</strong>g measurements simulations <strong>in</strong> deviated wells We performed<br />

measurements of execution time and relative efficiency on the LONESTAR l<strong>in</strong>ux cluster [15] for the 3D<br />

resistivity logg<strong>in</strong>g measurements simulations problem <strong>with</strong> 2D formulation based on non-orthogonal<br />

system of coord<strong>in</strong>ates and Fourier series expansions. From these measurements its follows that the solver<br />

atta<strong>in</strong>s 60% relative efficiency up to 192 processors, compare Fig. 7 and 8.<br />

1000<br />

100<br />

1.2<br />

10<br />

1<br />

0.8<br />

0.6<br />

0.4<br />

0.2<br />

0<br />

1<br />

1<br />

1<br />

4<br />

4<br />

12<br />

20<br />

28<br />

40<br />

56<br />

72<br />

88<br />

104<br />

120<br />

Time [s]<br />

Fig. 7 Paralell solver execution time for <strong>in</strong>creas<strong>in</strong>g number of processors<br />

12<br />

20<br />

28<br />

40<br />

56<br />

72<br />

88<br />

104<br />

144<br />

176<br />

208<br />

Relative efficiency E<br />

Fig. 8 Relative efficiency E=T1/(pTp) of the parallel solver<br />

2. 2D Stokes problem We analyzed percent of reutilized LU factorizations on a sequence of meshes<br />

generated by the self-adaptive hp FEM for the cavity problem, presented <strong>in</strong> Fig. 9. Some LU factorizations<br />

computed <strong>in</strong> the previous iteration can be effectively reutilized <strong>in</strong> the next iteration. This is because mesh<br />

ref<strong>in</strong>ements occurs only <strong>in</strong> close neighborhood of the two local s<strong>in</strong>gularities localized on the left and right<br />

top parts of the mesh. LU factorizations associated <strong>with</strong> elements denoted by white color <strong>in</strong> Fig. 10 are not<br />

recomputed, but reutilized from the previous mesh. However, on the top of the elim<strong>in</strong>ation tree, when<br />

ref<strong>in</strong>ed parts of the mesh are merged <strong>with</strong> unref<strong>in</strong>ed parts, it is necessary to recompute LU factorization,<br />

s<strong>in</strong>ce one of matrix contributions, com<strong>in</strong>g from ref<strong>in</strong>ed parts of the mesh, is completely new.<br />

Notice that LU factorizations from previous mesh cannot be reutilized if local part of the mesh is either h<br />

ref<strong>in</strong>ed (f<strong>in</strong>ite elements are broken) or p ref<strong>in</strong>ed (local polynomial order of approximation is changed).<br />

This is because <strong>in</strong>crease of the polynomial order of approximation changes number of degrees of freedom<br />

<strong>in</strong> the local matrix and the LU factorization is no longer valid.<br />

120<br />

144<br />

176<br />

208<br />

240<br />

240


3. Non-stationary heat transfer problem F<strong>in</strong>ally, we tested the reutilization of partial factorizations for<br />

the non-stationary heat transfer problem, see Fig. 11. In these k<strong>in</strong>d of problems, it possible to reutilize LU<br />

factorizations <strong>with</strong><strong>in</strong> a sequence of meshes generated for one time step. However it is not possible to<br />

reutilize LU factorization from one time step to the other, s<strong>in</strong>ce the problem is non-stationary and changes<br />

from one time step to the other, even if computational mesh is the same.<br />

Fig. 9 Sequence of meshes for the cavity problem. Different colors denote different polynomial orders of<br />

approximation vary<strong>in</strong>g from p=1 to p=9 on f<strong>in</strong>ite element edges and <strong>in</strong>teriors (<strong>in</strong> both directions).<br />

RESULTS<br />

Fig. 10 Reutilization of LU factorizations from previous meshes dur<strong>in</strong>g first three iterations.<br />

We conclude our presentation by present<strong>in</strong>g <strong>in</strong> Fig. 12 numerical results for the 3D resistivity logg<strong>in</strong>g<br />

measurements problem. In Fig. 13 we present also velocity distribution for the cavity problem, and f<strong>in</strong>al<br />

temperature distribution for the non-stationary heat transfer problem. Thanks to self-adaptive hp FEM all


these results have been computed on optimal meshes deliver<strong>in</strong>g numerical solution <strong>with</strong> less then 3%<br />

relative error.<br />

CONCLUSIONS<br />

We have proposed efficient sequential and parallel solver for hp FEM. The solver scales well up to 200<br />

processors. It provides an <strong>in</strong>frastructure for stor<strong>in</strong>g partial LU factorizations at elim<strong>in</strong>ation tree nodes, to be<br />

reutilized <strong>in</strong> further calls to the solver, after the mesh is hp ref<strong>in</strong>ed. We showed, that partial LU<br />

factorizations can be effectively reutilized <strong>with</strong><strong>in</strong> iterations of self-adaptive hp FEM. However it is not<br />

possible to reutilize partial factorizations from previous time step <strong>in</strong> non-stationary problems.<br />

Fig. 11 Optimal meshes for particular time steps for the non-stationary heat transfer problem.<br />

Fig. 12 Results for 3D AC resistivity logg<strong>in</strong>g measurements <strong>with</strong> 20 kHz wirel<strong>in</strong>e tool, for resistivities of<br />

formation layers presented <strong>in</strong> the second panel, for axi-symmetric as well as 30 and 60 degrees tilted well.<br />

Fig. 13 Results (horizontal and vertical velocity component) for the cavity problem, as well as temperature<br />

distribution for the non-stationary heat transfer problem <strong>in</strong> f<strong>in</strong>al time step t=0.127 s.


Acknowledgements The support of Polish MNiSW grant no. 3 T08B 055 29 is gratefully<br />

acknowledged. The first author is also supported by the Foundation for Polish Science under Homm<strong>in</strong>g<br />

Programme. The second and third authors are supported by The University of Texas at Aust<strong>in</strong>’s Jo<strong>in</strong>t<br />

Industry Research Consortium on Formation Evaluation sponsored by Aramco, Baker Atlas, BP, British<br />

Gas, ConocoPhilips, Chevron, ENI E&P, ExxonMobil, Halliburton Energy Services, Hydro, Marathon Oil<br />

Corporation, Mexican Institute for Petroleum, Occidental Petroleum Corporation, Petrobras,<br />

Schlumberger, Shell International E&P, Statoil, TOTAL, and Weatherford.<br />

REFERENCES<br />

[1] L. Demkowicz, 2D hp-Adaptive F<strong>in</strong>ite Element Package, TICAM Report 02-06, The University of<br />

Texas at Aust<strong>in</strong> (2002)<br />

[2] M. Paszyński, J. Kurtz, L. Demkowicz, Parallel Fully Automatic hp Adaptive 2D F<strong>in</strong>ite Element<br />

Package, Computer Methods <strong>in</strong> Applied Mechanics and Eng<strong>in</strong>eer<strong>in</strong>g, 195, 7-8, (2006), pp. 711-741<br />

[3] L. Demkowicz, D. Pardo, W. Rachowicz, 3D hp-Adaptive F<strong>in</strong>ite Element Package (3Dhp90) The<br />

Ultimate Data Structure for Three Dimensional, Anisotropic hp Ref<strong>in</strong>ements, TICAM Report 02-24,<br />

The University of Texas at Aust<strong>in</strong> (2002)<br />

[4] M. Paszyński, L. Demkowicz, Parallel Fully Automatic hp Adaptive 3D F<strong>in</strong>ite Element Package,<br />

Eng<strong>in</strong>eer<strong>in</strong>g <strong>with</strong> Computers, 22, 3-4, (2006), pp. 255-276.<br />

[5] Matuszyk P., Paszyński M., Extensions of the 2D fully automatic hp adaptive F<strong>in</strong>ite Element Method<br />

for Stokes and non-stationary heat transfer problems, 9 US National Congress on Computational<br />

Mechanics, <strong>2007</strong>, USACM, San Francisco, USA (<strong>2007</strong>)<br />

[6] D. Pardo, L. Demkowicz, C. Torres-Verd<strong>in</strong>, M. Paszyński, Simulation of Resistivity<br />

Logg<strong>in</strong>g-While-Drill<strong>in</strong>g (LWD) Measurements Us<strong>in</strong>g a Self-Adaptive Goal-Oriented hp-F<strong>in</strong>ite<br />

Element Method, SIAM Journal on Applied Mathematics, 66, (2006), pp. 2085-2106.<br />

[7] I. S. Duff , J.K. Reid, The multifrontal solution of <strong>in</strong>def<strong>in</strong>ite sparse symmetric l<strong>in</strong>ear systems, ACM<br />

Trans. on Math. Soft., 9 (1983) pp. 302-325<br />

[8] L. Giraud, A. Marocco, J.-C. Rioual, Iterative versus direct parallel substructur<strong>in</strong>g methods <strong>in</strong><br />

semiconductor device modell<strong>in</strong>g, Numerical L<strong>in</strong>ear Algebra <strong>with</strong> Applications, 12, 1 (2005) pp. 33-55<br />

[9] J. A. Scott, Parallel Frontal Solvers for Large Sparse L<strong>in</strong>ear Systems, ACM Trans. on Math. Soft., 29,<br />

4 (2003) pp. 395-417<br />

[10] P. R. Amestoy, I. S. Duff , J.-Y. L’Excellent, Multifrontal parallel distributed symmetric and<br />

unsymmetric solvers, <strong>in</strong> Comput. Methods <strong>in</strong> Appl. Mech. Eng. 184 (2000) pp. 501-520<br />

[11] P. R. Amestoy, I. S. Duff , J. Koster, J.-Y. L’Excellent, A fully asynchronous multifrontal solver us<strong>in</strong>g<br />

distributed dynamic schedul<strong>in</strong>g, SIAM Journal of Matrix Analysis and Applications, 23, 1 (2001) pp.<br />

15-41<br />

[12] P. R. Amestoy, A. Guermouche, J.-Y. L’Excellent, S. Pralet, Hybrid schedul<strong>in</strong>g for the parallel<br />

solution of l<strong>in</strong>ear systems. Accepted to Parallel Comput<strong>in</strong>g (2005)<br />

[13] D. Pardo, V. Calo, C. Torres-Verd<strong>in</strong>, M.J. Nam, Fourier Series Expansion <strong>in</strong> a Non-Orthogonal<br />

System of Coord<strong>in</strong>ates for Simulation of 3D Borehole Resistivity Measurements. Part I: DC,<br />

submitted to Computer Methods <strong>in</strong> Applied Mechanics and Eng<strong>in</strong>eer<strong>in</strong>g (<strong>2007</strong>)<br />

[14] T. J. R. Hughes, L. P. Franca, A New FEM for Computational Fluid Dynamics: VII The Stokes<br />

Problem <strong>with</strong> Varoious Well-Posed Boundary Conditions: Symmetric Formulations that Converge for<br />

All Velocity/Pressure Spaces, Computer Methods <strong>in</strong> Applied Mechanics and Eng<strong>in</strong>er<strong>in</strong>g, 65 (1987)<br />

pp. 85-96<br />

[15] Lonestar Cluster Users’ Manual http://www.tacc.utexas.edu/services/userguides/lonestar

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!