Using MPI - Institut für Theoretische Chemie und Computerchemie

theochem.uni.duesseldorf.de

Using MPI - Institut für Theoretische Chemie und Computerchemie

GoBack


Using MPI - Parallelization of the Spin-Orbit-Free

GASCI-Program LUCITA

Stefan Knecht

Institut für Theoretische Chemie und Computerchemie

Heinrich-Heine-Universitität Düsseldorf

Vortrag im Rahmen des Institutsseminars

am 11.05.2006


Outline

1. Motivation

2. Theoretical Background

3. An Initial Application

4. Concluding Remarks


1. Motivation


SFB 663: molecular response upon

electronic excitation

Figure 1: adenine-thymine base pair.

❍ photophysics of the human

DNA/RNA base-pairs (BP) and

their corresponding monomers (M)

are of particular interest

❍ lifetimes in the ps resp. fs regime of

excited (singlet) states for the BP’s

and M’s are known from gas-phase

spectroscopy

❍ the challenge for theoretical chemists consists of :




a precise determination and identification of excited states

a proper characterization of charge-transfer (CT) states

a provision of explanations for the observed behaviour


Requirements

For an accurate treatment of the DNA/RNA bases in calculations three

features are obviously mandatory:

❍ large basis sets exhibiting diffuse and polarizing functions, e.g.

aug-cc-pVXZ (X=D,T,Q,...)

❍ high-level ab initio electron correlation methods (DFT-methods ):



perturbation theory methods like MP2

✗ Coupled-Cluster (CC) methods: CCSD, CCSD(T), ..., CC2, CC3 ...

✗ Configuration Interaction (CI) and MR-CI methods

size of the molecules requires further approximations like

resolution of the identity (RI)

✗ reduction of the number of (ij|lk)-integrals

✗ significant reduction of computational time

✗ employment of larger basis sets becomes possible

✗ yet available: RI-MP2, RI-CC2, ...


Requirements II




e.g. ground states of interesting (biological) molecules may be

multi-reference cases:



one-determinant based methods fail in this case

MR-CI methods are particularly suitable for such systems

→ LUCITA: a string-driven GAS-CI program

despite an aspired RI-approximation, today’s computational ressources

demand for a highly parallelized code

the way of computing requested eigenvalues offers two possible

parallelization routes in LUCITA:


Requirements II




e.g. ground states of interesting (biological) molecules may be

multi-reference cases:



one-determinant based methods fail in this case

MR-CI methods are particularly suitable for such systems

→ LUCITA: a string-driven GAS-CI program

despite an aspired RI-approximation, today’s computational ressources

demand for a highly parallelized code

the way of computing requested eigenvalues offers two possible

parallelization routes in LUCITA:


coarse-grain


Requirements II




e.g. ground states of interesting (biological) molecules may be

multi-reference cases:



one-determinant based methods fail in this case

MR-CI methods are particularly suitable for such systems

→ LUCITA: a string-driven GAS-CI program

despite an aspired RI-approximation, today’s computational ressources

demand for a highly parallelized code

the way of computing requested eigenvalues offers two possible

parallelization routes in LUCITA:



coarse-grain

fine-grain


Requirements II





e.g. ground states of interesting (biological) molecules may be

multi-reference cases:



one-determinant based methods fail in this case

MR-CI methods are particularly suitable for such systems

→ LUCITA: a string-driven GAS-CI program

despite an aspired RI-approximation, today’s computational ressources

demand for a highly parallelized code

the way of computing requested eigenvalues offers two possible

parallelization routes in LUCITA:



coarse-grain

fine-grain

→ Message Passing Interface (MPI) model


2. Theoretical Background

Truncated

GAS III

External

GAS II

GAS I

Frozen Core


LUCITA - a GAS-CI program I

molecules ≫ 2 atoms: FCI wave functions are not feasible → truncated CI

wave functions:




complete active space (CAS) wave functions

restricted active space (RAS) wave functions

generalized active space (GAS) wave functions




complete generalization of the RAS-CI method

allows for an arbitrary division of orbital spaces w.r.t. the considered

chemical or physical problem

no occupation constraints → guarantees a very flexible definition of a

wave function

❍ HC=EC with C= ∑ i C i|i〉 = |0〉+ ˆP |c〉


1+〈c| ˆP |c〉

❍ large CI expansions (more than 10 6 determinants): set-up and

diagonalization of H in a straighforward fashion not possible

❍ typically: only a few lowest eigenvalues are of interest


LUCITA - use of the modified

Davidson-Algorithm



→ modified Davidson-Algorithm, a quasi-Newton method, featuring linear

transformations: σ =HC

algorithm:

1. calculate H ii

2. set up trial vectors building a subspace {C (0)

k

3. calculate linear transformation HC (0)

k

= σ k

} of dimension M

4. calculate and diagonalize the projected matrix ˜H → λ (M)

k

〈C (0)

l

|σ k 〉 = ˜H lk


5. calculate residuals ‖R k ‖ = ∥σ k − λ (M)

k

6. if not converged → new reference vectors



C ′ k = H ii −λ (M) −1 “


k

1 σ k −λ (M)

k

1C (0)

k




H ii −λ (M)

k

1

7. orthogonalize C ′ k

” −1 “

σ k −λ (M)

k

1C (0)

k

” ‚ ‚ ‚‚

to previous vectors

α (M)

k


, α (M)

k

, with


The two parallel routes - benefit

from the Sigma-Vector scheme


time-consuming step in the algorithm: σ-vector generation → 2 schemes:

σ(Ι Ι )

α β

master and slaves

proc 1

CI II (Jα J

1 )

β

master and slaves

proc 1

master

Σ σ(Ι Ι )

α β

σ(Ι Ι )

α β

σ(Ι Ι )

α β

σ(Ι Ι )

α β

σ(Ι Ι )

α β

proc 2

proc 3

proc 4

proc 5

...

master

Σ σ(Ι Ι )

α β

master

C(J J )

α β

C I II 2 (J J )

α β

C I III (J J ) 1 α β

C I III

2

(J α J β )

C I III (J J )

3 α β

proc 2

proc 3

proc 4

proc 5

...

master

σ(Ι Ι )

α β

Figure 2: coarse-grain version.

Figure 3: fine-grain version.

where

σ(I α , I β ) = ∑

J α ,J β


ijkl

〈S(I β )|a † iβ a jβ|S(J β )〉〈S(I α )|a † kα a lα|S(J α )〉(ij|kl)C(I α , I β )

note: strings S(I α ) and S(I β ) are ordered products of creation operators a † for MO’s


coarse-grain ↔ fine-grain







a comparison between both variants points out the superiority of the

fine-grain variant over the coarse-grain one:

coarse-grain

fine-grain

# of procs based on the # of EV # of procs arbitrary

every proc → one σ-vector σ-vector ≺ arb. # of batches

every proc → all MO-Int’s only relevant MO-Int’s

’static’ parallel version ’dynamic’ parallel version

fine-grain version comes up to the demand of more flexibility

→ fine-grain version of LUCITA will be soon implemented in the DIRAC

program package

coarse-grain version is in two different variants already available

both versions were resp. will be implemented using the MPI library

no c○ → MPI is freeware


orthogonalization of actual reference vectors

’broadcast route’

master

’send route’

master informs slaves about actual # of roots

not required slaves return to a waiting loop

master and slaves build a

new communication group

distribution of all c−coeffi−

cients from M to S by

MPI_bcast

stepwise distribution of in−

dividual c−coefficients

from M to S by MPI_send

slave sends new sigma−

vector and subspace matrix

elements to M

σ −vector calculation

slave sends new sigma−

vectors to M by MPI_send

master computes the sub−

space matrix elements

computation of projected hamiltonian matrix

slaves return to waiting loop

master

diagonalization of projected hamiltonian matrix


The MPI library - a short review





some essential features:




communication between processes

widely used: SPDM-scheme (single program, multiple data)

processes are unique w.r.t. their process tag (within a communicator)

MPI offers a huge amount of library routines designed to satisfy nearly

every claims

MPI Send() and MPI Recv() are the basic routines related to (blocking)

point-to-point communication

a very effective way of one-to-all communication is with MPI Bcast()

possible

time

0

0

4

0

2

4

6

0 1

2 3

4

5

6 7


3. An Initial Application

’TIME IS MONEY !’


H 2 O - an appropiate test molecule

GAS III

GAS II

GAS I

Truncated

External

correlating

orbitals

1s(H)

1s

2s

2p(y)

2p

Frozen Core

SD

SDTQ

Figure 2: GAS-CI scheme for H 2 O used

in the timing tests.








employing Dirac-Coulomb Hamiltonian

→ spin-orbit interactions neglected

correlation of all 10 electrons

147 basis functions in total (L+S)

cut-off for MO-transformation:

25 a.u.

5 roots requested

GAS-CI set-up (SDTQ-mrSD) results

in a maximum of 9.472.264 determinants/combinations

three routes of calculations:




serial run (1 proc)

parallel run

(’send’-version; 5 procs)

parallel run

(’broadcast’-version; 5 procs)


H 2 O - timing test results

❍ calculations were performed on FATS and JUMP and stopped after 20

iterations

FATS

JUMP

serial ’send’ ’broadcast’ serial ’send’ ’broadcast’

CPU-time (s) 70014 46033 56713 38275 26931 25182

WALL-time (s) 70005 47380 64683 56511 55567 38310




FATS and JUMP: regarding CPU-times a speed-up can be observed for the

parallel calculations compared to the serial one (at least 19% up to 35%)

comparing WALL-time with CPU-time suggests I/O resp.

communication-traffic overload

in a fine-grain variant this problems should be minimized because of a

reduced I/O and communication load


H 2 O - more numbers ...


another set of timing test calculations were done on the H 2 O molecule

w.r.t. the following set-up:



10 electrons, two GA spaces, 4 roots

either a maximum of quadruple (SDTQ) or of quintuple (SDTQ5)

excitations → 5.354.278 resp. 62.529.742 det’s

❍ tests were done on FATS with regard to calculations with 4 procs on 4

different nodes (4/1) resp. on 2 nodes each with 2 procs (2/2); serial

calculations were performed with a single proc (1/2)

SDTQ

35 iterations 4/1 ’send’ 2/2 ’send’ 4/1 ’bcast’ 2/2 ’bcast’ 1/2

CPU-time (s) 35586 39883 43596 35655 41877

WALL-time (s) 36124 39926 44576 37926 41878

SDTQ5

41 iterations 4/1 ’send’ 2/2 ’send’ 4/1 ’bcast’ 2/2 ’bcast’ 1/2

CPU-time (s) 422460 394740 383460 393660 544200

WALL-time (s) 555660 653940 541980 576660 689580


4. Concluding Remarks


Summary and Outlook


in order to calculate excited states of molecules a variety of methods can

be employed


although size and electronic structure of most molecules limits the options


GAS-CI concept is a powerful technique with a wide application range


a significant reduction of computational limits will especially be achieved

by an efficient parallelization route


current coarse-grain variants show a moderate up to a satisfactory

speed-up w.r.t. a serial run → the not yet implemented fine-grain version

should provide for a more significant reduction in CPU- and WALL-time as

well as an implementation of a RI-approximation


a fine-grain version will also be included in the 4c-GAS-CI code LUCIAREL

which is as well available within the DIRAC program package + ...


I would like to thank:

⋆ Priv.-Doz. Dr. Timo Fleig

⋆ Prof. Dr. Christel Marian

⋆ Lasse Sørensen, Stephan Raub, Martin Kleinschmidt, ..., just the whole group


I would like to thank:

⋆ Priv.-Doz. Dr. Timo Fleig

⋆ Prof. Dr. Christel Marian

⋆ Lasse Sørensen, Stephan Raub, Martin Kleinschmidt, ..., just the whole group

Thank You For Your Attention !

More magazines by this user
Similar magazines