24.10.2014 Views

Get my PhD Thesis

Get my PhD Thesis

Get my PhD Thesis

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>PhD</strong> <strong>Thesis</strong><br />

Optimization of Densities in<br />

Hartree-Fock and Density-functional Theory<br />

Atomic Orbital Based Response Theory<br />

and<br />

Benchmarking for Radicals<br />

Lea Thøgersen<br />

Department of Chemistry<br />

University of Aarhus<br />

2005


"Experiments are the only means of knowledge at our disposal.<br />

The rest is poetry, imagination."<br />

Max Planck


Contents<br />

Preface .........................................................................................................................v<br />

List of Publications ....................................................................................................vii<br />

Part 1 Improving Self-consistent Field Convergence.................................................1<br />

1.1 Introduction .....................................................................................................................1<br />

1.2 The Self-consistent Field Method....................................................................................2<br />

1.3 A Survey of Methods for Improving SCF Convergence .................................................5<br />

1.3.1 Energy Minimization.............................................................................................6<br />

1.3.2 Damping and Extrapolation...................................................................................7<br />

1.3.3 Level Shifting......................................................................................................11<br />

1.4 Development of SCF Optimization Algorithms ............................................................12<br />

1.4.1 Dynamically Level Shifted Roothaan-Hall .........................................................13<br />

1.4.1.1 RH Step with Control of Density Change..............................................13<br />

1.4.1.2 The Trust Region RH Level Shift ..........................................................15<br />

1.4.1.3 DIIS and Dynamically Level Shifted RH ..............................................16<br />

1.4.1.4 Line Search TRRH.................................................................................18<br />

1.4.1.5 Optimal Level Shift without MO Information.......................................19<br />

1.4.1.6 The Trace Purification Scheme..............................................................23<br />

1.4.2 Density Subspace Minimization..........................................................................25<br />

1.4.2.1 The Trust Region DSM Parameterization..............................................25<br />

1.4.2.2 The Trust Region DSM Energy Function ..............................................26<br />

1.4.2.3 The Trust Region DSM Minimization ...................................................27<br />

1.4.2.4 Line Search TRDSM..............................................................................29<br />

1.4.2.5 The Missing Term..................................................................................30<br />

1.4.3 Energy Minimization Exploiting the Density Subspace .....................................32<br />

1.4.3.1 The Augmented RH Energy model........................................................33<br />

1.4.3.2 The Augmented RH Optimization .........................................................34<br />

1.4.3.3 Applications ...........................................................................................36<br />

1.5 The Quality of the Energy Models for HF and DFT .....................................................37<br />

1.5.1 The Quality of the TRRH Energy Model............................................................39<br />

1.5.2 The Quality of the TRDSM Energy Model.........................................................42<br />

1.6 Convergence for Problems with Several Stationary Points...........................................44<br />

1.6.1 Walking Away from Unstable Stationary Points ................................................46<br />

1.6.1.1 Theory....................................................................................................46<br />

1.6.1.2 Examples................................................................................................47<br />

i


1.7 Scaling .......................................................................................................................... 48<br />

1.7.1 Scaling of TRRH ................................................................................................ 49<br />

1.7.2 Scaling of TRDSM ............................................................................................. 51<br />

1.8 Applications.................................................................................................................. 51<br />

1.8.1 Calculations on Small Molecules ....................................................................... 52<br />

1.8.2 Calculations on Metal Complexes...................................................................... 54<br />

1.9 Conclusion .................................................................................................................... 56<br />

Part 2 Atomic Orbital Based Response Theory........................................................ 59<br />

2.1 Introduction................................................................................................................... 59<br />

2.2 AO Based Response Equations in Second Quantization .............................................. 60<br />

2.2.1 The Parameterization.......................................................................................... 60<br />

2.2.2 The Linear Response Function ........................................................................... 62<br />

2.2.3 The Time Development of the Reference State.................................................. 63<br />

2.2.4 The First-order Equation .................................................................................... 64<br />

2.2.5 Pairing................................................................................................................. 66<br />

2.3 Solving the Response Equations................................................................................... 68<br />

2.3.1 Preconditioning................................................................................................... 69<br />

2.3.2 Projections .......................................................................................................... 70<br />

2.4 The Excited State Gradient ........................................................................................... 71<br />

2.4.1 Construction of the Lagrangian .......................................................................... 71<br />

2.4.2 The Lagrange Multipliers ................................................................................... 72<br />

2.4.3 The Geometrical Gradient .................................................................................. 73<br />

2.4.4 The First-order Excited State Properties............................................................. 74<br />

2.5 Test Calculations........................................................................................................... 75<br />

2.6 Conclusion .................................................................................................................... 76<br />

Part 3 Benchmarking for Radicals............................................................................ 77<br />

3.1 Introduction................................................................................................................... 77<br />

3.2 Computational Methods................................................................................................ 77<br />

3.3 Numerical Results......................................................................................................... 79<br />

3.3.1 Convergence of CC and CI Hierarchies ............................................................. 79<br />

3.3.2 The Potential Curve for CN................................................................................ 80<br />

3.3.3 Spectroscopic Constants and Atomization Energy for CN................................. 81<br />

3.3.4 The Vertical Electron Affinity of CN................................................................. 82<br />

3.3.5 The Equilibrium Geometry of CCH ................................................................... 83<br />

3.4 Conclusion .................................................................................................................... 84<br />

ii


Summary....................................................................................................................87<br />

Dansk Resumé ...........................................................................................................89<br />

Appendix A................................................................................................................91<br />

Appendix B................................................................................................................93<br />

Acknowledgements....................................................................................................95<br />

References..................................................................................................................97<br />

iii


Preface<br />

The present <strong>PhD</strong> thesis is the outcome of four years of <strong>PhD</strong> studies at the Faculty of Science,<br />

University of Aarhus, Denmark.<br />

The thesis is divided into three distinct parts which can be read independently. Part 1 deals with the<br />

optimization of the one-electron density in Hartree Fock and density functional theory, and Part 2<br />

deals with atomic orbital based response theory for Hartree Fock and density functional theory. Part<br />

2 thus naturally follows after Part 1. In Part 3 benchmark results from FCI calculations on the<br />

radicals CN and CCH are given.<br />

The work presented in Part 1 has resulted in papers I - III as listed in the following List of<br />

Publications and the work presented in Part 3 has resulted in papers V – VI. The work presented in<br />

Part 2 was initialized in the fall 2004 and will result in paper IV. The development of improved<br />

optimization algorithms for self-consistent field calculations is the subject on which I have spent the<br />

most of <strong>my</strong> time, and Part 1 therefore makes up the larger part of this thesis.<br />

The work has been carried out under the supervision of and in collaboration with Dr. Jeppe Olsen<br />

and Professor Poul Jørgensen at the University of Aarhus. Some work was carried out during visits<br />

at The Royal Institute of Technology in Stockholm, Sweden, the University of Trieste, Italy and the<br />

University of Oslo, Norway. The following people have also contributed to the work presented in<br />

this thesis (see List of Publications): Paweł Sałek (The Royal Institute of Technology in<br />

Stockholm), Sonia Coriani (University of Trieste), Trygve Helgaker (University of Oslo), Stinne<br />

Høst (University of Aarhus), Danny Yeager (Texas A&M University), Andreas Köhn (University of<br />

Aarhus), Jürgen Gauss (University of Mainz), Péter Szalay (Eötvös Loránd University) and Mihály<br />

Kállay (University of Mainz).<br />

The outline of the thesis is as follows: Part 1 is based on the published papers I – II and the<br />

unpublished paper III, but can be read independently of the papers. Certain discussions in the papers<br />

I - II are left out of the thesis and only referred to, as they might as well be read in the papers. Other<br />

discussions not published in the papers are presented in this thesis, including the latest<br />

developments of the algorithms. Part 2 is simply paper IV in preparation. Part 3 is based on the<br />

published papers V – VI and is basically a short version of paper V combined with selected results<br />

from paper VI. Also this part can be read independently of the papers.<br />

v


List of Publications<br />

This thesis includes the following papers. Number I, II, V and VI have already been published and<br />

are attached this thesis, whereas III and IV are in preparation.<br />

Part 1<br />

I. The Trust-region Self-consistent Field Method: Towards a Black Box optimization in Hartree-<br />

Fock and Kohn-Sham Theories,<br />

L. Thøgersen, J. Olsen, D. Yeager, P. Jørgensen, P. Sałek, and T. Helgaker,<br />

J. Chem. Phys. 121, 16 (2004)<br />

II. The Trust-region Self-consistent Field Method in Kohn-Sham Density-functional Theory,<br />

L. Thøgersen, J. Olsen, A. Köhn, P. Jørgensen, P. Sałek, and T. Helgaker,<br />

J. Chem. Phys. 123, 074103 (2005)<br />

III. Augmented Roothaan-Hall for converging Densities in Hartree-Fock and Density-functional<br />

Theory,<br />

S. Høst, L. Thøgersen, P. Jørgensen and J. Olsen<br />

Part 2<br />

IV. Atomic Orbital Based Response Theory,<br />

L. Thøgersen, P. Jørgensen, J. Olsen and S. Coriani<br />

Part 3<br />

V. A Coupled Cluster and Full Configuration Interaction Study of CN and CN - ,<br />

L. Thøgersen and J. Olsen,<br />

Chem. Phys. Lett. 393, 36 (2004)<br />

VI. Equilibrium Geometry of the Ethynyl (CCH) Radical,<br />

P. G. Szalay, L. Thøgersen, J. Olsen, M. Kállay and J. Gauss,<br />

J. Phys. Chem. A 108, 3030 (2004).<br />

vii


Part 1<br />

Improving Self-consistent Field Convergence<br />

1.1 Introduction<br />

The Hartree-Fock (HF) self-consistent field (SCF) method has been around in an orbital formulation<br />

since 1951, where it was introduced by Roothaan 1 and Hall 2 , but today it is as significant as ever.<br />

Even though numerous higher correlated methods with superior accuracy have been developed<br />

since then, most of them still use the Hartree-Fock wave function as the reference function, and are<br />

thus still dependent on a functioning Hartree-Fock optimization. When Kohn and Sham 3 recognized<br />

in 1965 that the Roothaan-Hall SCF scheme had a lot to offer the density optimization in density<br />

functional theory (DFT), the DFT methods entered the chemical scene. Now it was in theory also<br />

possible to obtain results at the exact level from SCF calculations; if only the correct functional<br />

could be found. The developments in computer hardware and linear scaling SCF algorithms over<br />

the last decade have made it possible to carry out ab initio quantum chemical calculations on biomolecules<br />

with hundreds of amino acids and on large molecules relevant for nano-science.<br />

Quantum chemical calculations are thus evolving to become a widespread tool for use in several<br />

scientific branches. It is therefore important that the algorithms work as black-boxes, such that the<br />

user outside quantum chemistry does not have to be concerned with the details of the calculations.<br />

Since no scientific results neither from the higher correlated calculations nor from the large-scale<br />

calculations can be achieved if the SCF optimization does not converge, it is necessary to take an<br />

interest in developing a sound, stable optimization scheme that can handle the complexity in the<br />

problems of the future.<br />

This part of <strong>my</strong> thesis is a contribution to the quest for a black-box SCF optimization algorithm with<br />

optimal convergence properties. In Section 1.2, the basic Hartree-Fock/Kohn-Sham theory and<br />

notation of this part of the thesis is stated, and in Section 1.3 the efforts through the years to<br />

1


Part 1<br />

Improving Self-consistent Field Convergence<br />

improve the Roothaan-Hall SCF scheme are reviewed. Our contributions to the development of<br />

stable and physical sound SCF optimization schemes are presented in Section 1.4, and in Section<br />

1.5 we study the quality of the schemes when applied for HF and DFT. Optimization of problems<br />

with several stationary points is discussed in Section 1.6, in Section 1.7 the scaling of the algorithms<br />

is accounted for, and Section 1.8 contains some convergence examples for HF and DFT calculations<br />

using the algorithms presented in Section 1.4. Finally, Section 1.9 contains concluding remarks;<br />

reviewing the results of this part of the thesis.<br />

1.2 The Self-consistent Field Method<br />

In the following we consider a closed-shell system with N/2 electron pairs. The basic theory of the<br />

Hartree-Fock (HF) and the Kohn-Sham (KS) density optimizations will be described<br />

simultaneously, and the differences will be noted as they appear. Since we are interested in<br />

extending the algorithms presented to large scale calculations, a formulation without reference to<br />

the delocalized molecular orbitals (MOs) is essential, and thus the focus will be on the density in the<br />

atomic orbital (AO) basis rather than the MOs themselves. All through the thesis, SCF will be used<br />

as a general term for HF and KS-DFT methods since they have the SCF optimization scheme in<br />

common. The orbital index convention used in this thesis is i, j, k, l for occupied MOs, a, b, c, d for<br />

virtual MOs, p, q for MOs in general, and Greek letters µ, ν, ρ, σ for AOs.<br />

For closed-shell restricted Hartree-Fock or DFT, the electronic energy is given by<br />

E = 2TrhD + Tr DG( D) + h + E ( D ), (1.1)<br />

SCF nuc XC<br />

where h is the one-electron Hamiltonian matrix in the AO basis, h nuc is the nuclear-nuclear repulsion<br />

contribution, and D is the (scaled) one-electron density matrix in the AO basis, D = ½D AO , which<br />

satisfies the symmetry, trace, and idempotency conditions,<br />

D<br />

T<br />

Tr DS =<br />

= D<br />

N<br />

2<br />

DSD = D ,<br />

(1.2)<br />

of a valid one-electron density matrix. S is the AO overlap matrix. The elements of G(D) are given<br />

by<br />

∑<br />

∑<br />

G ( D ) = 2 g D −γ g D , (1.3)<br />

µν µνρσ ρσ µσρν ρσ<br />

ρσ<br />

ρσ<br />

where g µνρσ are the two-electron AO integrals. The first term in Eq. (1.3) represents the Coulomb<br />

contribution, and the second term is the contribution from exact exchange, with γ = 1 in Hartree-<br />

Fock theory, γ = 0 in pure DFT, and γ ≠ 0 in hybrid DFT. The exchange-correlation energy E XC (D)<br />

in Eq. (1.1) is a nonlinear and non-quadratic functional of the electronic density. This term is only<br />

2


The Self-consistent Field Method<br />

present in the energy expression for the DFT level of theory - the Hartree-Fock energy is expressed<br />

only by the first three terms of Eq. (1.1). The form of E XC depends on the DFT functional chosen for<br />

the calculation.<br />

The first derivative of the electronic energy with respect to the density is found as<br />

where<br />

(1) ∂ESCF<br />

( D)<br />

ESCF<br />

( D) = = 2 F( D)<br />

, (1.4)<br />

∂D<br />

1<br />

2<br />

(1)<br />

XC<br />

FD ( ) = h+ GD ( ) + E ( D )<br />

(1.5)<br />

is the Kohn-Sham matrix in DFT and, if the last term is excluded, the Fock matrix in Hartree-Fock<br />

(1)<br />

theory. From now on F(D) is simply referred to as the Fock matrix. E XC ( D ) is the first derivative<br />

of the term E XC expanded in the density.<br />

The Fock matrix is by design an effective one-electron Hamiltonian which is itself dependent on the<br />

eigenfunctions. Optimizing the electronic energy is thus a nonlinear problem and an iterative<br />

scheme must be applied. In 1951 Roothaan and Hall suggested an iterative procedure 1,2 in which a<br />

set of molecular orbitals (MOs) are constructed in each step through a diagonalization of the current<br />

Fock matrix, which in the AO formulation is written as<br />

FC = SCε , (1.6)<br />

where S is the AO overlap matrix, ε is a diagonal matrix containing the orbital energies, and the<br />

eigenvectors C contain the MO coefficients. The MOs, φ p , are linear combinations of a finite set of<br />

one-electron basis functions, χ µ , with C µp as expansion coefficients<br />

ϕ<br />

p<br />

= ∑ χ C . (1.7)<br />

µ<br />

µ µ p<br />

For the closed shell case the MOs can be divided into an occupied (φ occ ) and a virtual (φ virt ) part,<br />

where the occupied MOs each contain two electrons and the virtual orbitals are empty. If the aufbau<br />

ordering rule is applied, the occupied MOs are chosen as those with the lowest eigenvalues.<br />

A new trial density D can then be constructed from the occupied orbitals as<br />

occ<br />

T<br />

occ<br />

D = C C . (1.8)<br />

From this density a new Fock matrix can be evaluated from Eq. (1.5) and diagonalizing it according<br />

to Eq. (1.6) establishes the iterative procedure. The iterative cycle stops when self-consistency is<br />

obtained, that is, when the new density, energy or molecular orbitals do not change within some<br />

convergence threshold compared to the previous ones.<br />

3


Part 1<br />

Improving Self-consistent Field Convergence<br />

In an iterative scheme it is necessary to have a start guess. For the SCF case it should be a one<br />

electron density which fulfils Eq. (1.2), created directly or from a start guess of the molecular<br />

orbitals as in Eq. (1.8). Different approaches are used; a simple and easily applicable possibility is<br />

to obtain the starting orbitals by diagonalization of the one-electron Hamiltonian (H1-core). This is<br />

the start guess most widely used in this thesis since it is always available. Another popular<br />

possibility is to create a semi-empirical start guess where the orbitals resulting from a semiempirical<br />

calculation (e.g. Hückel) on the molecule are fitted to the current basis.<br />

n = n+1<br />

no<br />

D 0<br />

F(D n<br />

)<br />

F(D n<br />

) D n+1<br />

D n+1<br />

≈ D n<br />

yes<br />

The steps of the self-consistent field (SCF) scheme are summarized<br />

from the density point of view in Fig. 1.1: From a density matrix start<br />

guess a Fock matrix is constructed. From this Fock matrix a new density<br />

matrix can be found and so an iteration procedure is established which<br />

continues until self consistency. The step creating a new density from a<br />

Fock matrix will be referred to as the Roothaan-Hall (RH) step<br />

throughout this thesis, regardless if it is a diagonalization of the Fock<br />

matrix or some alternative scheme.<br />

The purpose of an SCF optimization is typically to find the global<br />

D conv<br />

minimum. Since the HF/KS equations are nonlinear, several stationary<br />

Fig. 1.1 Flow diagram of<br />

points might exist, and depending on the start guess and the<br />

the SCF scheme.<br />

optimization procedure, the converged result can be representing a local<br />

minimum as well as a global or even a saddle point. By evaluating the lowest Hessian eigenvalue it<br />

can be realized whether the stationary point is a minimum or a saddle point, but no simple test can<br />

reveal whether a minimum is global or not. The use of the term “convergence” in this thesis will<br />

simply refer to the iterative development from the start guess to a self-consistent density with a<br />

gradient below the convergence threshold. The issues connected with problems where several<br />

stationary points can be found are discussed in Section 1.6.<br />

Since Roothaan and Hall suggested the iterative diagonalization procedure as a means to solve the<br />

Hartree-Fock equations and Kohn and Sham suggested using the same scheme for optimizing the<br />

electron density for density functional theory 3 , the SCF methods have been used extensively in<br />

quantum chemistry. Unfortunately, it turned out that the simple fixed point scheme sketched in Fig.<br />

1.1 converges only in simple cases. Already around 1960 it was recognized that the method<br />

sometimes fails to converge and that divergent behavior in some cases is intrinsic 4,5 .<br />

4


A Survey of Methods for Improving SCF Convergence<br />

1.3 A Survey of Methods for Improving SCF Convergence<br />

Numerous suggestions have been made to improve upon the convergence of Roothaan and Hall’s<br />

original scheme or to replace it with an alternative scheme. The suggestions can be crudely divided<br />

into three different categories; energy minimization, damping/extrapolation, and level shifting.<br />

Furthermore the different suggestions in these categories have been combined in various ways. The<br />

two latter categories are modifications to the Roothaan-Hall scheme, whereas energy minimization<br />

is a means of avoiding the iterative diagonalization scheme and instead use some optimization<br />

scheme on an energy function.<br />

To <strong>my</strong> knowledge these categories embrace all convergence improvements suggested over the<br />

years, except for the method of fractionally occupying orbitals around the Fermi level 6 which does<br />

not fit in any of the categories. As mentioned, the start guess has a great impact on the optimization,<br />

and a poor start guess with the wrong electron configuration can use many iterations changing to a<br />

more optimal electron configuration and in some cases the proper electron configuration is never<br />

found and the calculation diverges. In the methods using fractional occupations, a number of<br />

orbitals around the Fermi level are allowed to have non-integral occupation. The non-integral<br />

occupations are determined from the Fermi-Dirac distribution which is a function of the<br />

temperature. The non-integral occupations are updated in each iteration, and corrected such that the<br />

total number of electrons is constant. During the optimization either the temperature is decreased to<br />

T = 0K or the number of orbitals allowed to have non-integral occupation is decreased, to have only<br />

integer occupations at the end of the optimization. It is thus possible to optimize the electron<br />

configuration in an effective manner in the beginning of the SCF optimization, and when the proper<br />

configuration has been found, the rest of the optimization has a better chance of convergence since<br />

the start guess in a way has been improved.<br />

In the following, the focus will be on the efforts to improve the convergence behavior of the SCF<br />

scheme through optimization algorithm development in the three categories listed above. Other<br />

efforts bear as much significance and should also be acknowledged, in particular should be<br />

mentioned the generalizations of many well-functioning schemes to the unrestricted level of theory<br />

which has its own challenges. Also the quest for construction of an improved start guess is<br />

important. It is obvious that with an improved start guess, less is demanded from the optimization<br />

method and thus some convergence problems inherent in the methods could be avoided. In the last<br />

decade the effort in SCF scheme development has for a large part been put in decreasing the scaling<br />

of the methods to allow calculations on larger molecules. Scaling is a very important subject and it<br />

should not be ignored. Section 1.7 will therefore discuss the scaling of the algorithms presented in<br />

5


Part 1<br />

Improving Self-consistent Field Convergence<br />

this thesis. Despite the importance of these three SCF related subjects, the rest of this section will be<br />

almost solely on efforts to improve convergence through optimization algorithm development.<br />

1.3.1 Energy Minimization<br />

One of the problems in the simple Roothaan-Hall procedure is the lack of guarantees for energy<br />

decrease in the iterative steps. This was pointed out by McWeeny, and he thus introduced a steepest<br />

descent procedure 7,8 as an energy minimization alternative to Roothaan and Hall’s repeated<br />

diagonalizations. Steepest descent optimizations have the benefit that a decrease in energy can be<br />

guaranteed for each step. McWeeny’s scheme suffers, however, from a slow convergence rate 5 as<br />

often seen for steepest descent methods. Fletcher and Reeves proposed the conjugate gradient<br />

optimization method 9 instead, which often is more efficient than steepest descent and is guaranteed<br />

to converge in a number of steps equal to the dimension of the problem.<br />

A decade later Hilliers and Saunders suggested an improvement to the McWeeny scheme called<br />

energy-weighted steepest descent 10 , in which the coordinates in the orbital space are energyweighted.<br />

In 1976 this work was generalized by Seeger and Pople. They realized that another<br />

problem in the simple Roothaan procedure is the possibility for discontinuous changes in the<br />

orbitals which do not necessarily lower the energy. To ensure energy descent it is necessary to be<br />

able to follow such changes continuously, and methods like the steepest descent have the possibility<br />

to do so. Their procedure proceeds in small steps, where the new occupied trial orbitals are selected<br />

based on a criterion of overlap with the previous set. This technique ensures stability and avoids<br />

switching of orbital occupation. The step is found by a univariate search 11 in the energy, on a path<br />

that passes through the point corresponding to the next iteration step of the classical procedure.<br />

Their scheme can therefore also be seen as a polynomial interpolation along a path joining<br />

successive SCF cycles. Half a decade later, Camp and King followed the same strategy of a<br />

univariant cubic fit technique 12 , but with a different parameterization. Stanton also suggested a<br />

similar approach 13 , but whereas the Seeger-Pople approach requires the evaluation of the Fock<br />

matrix at interior points on the interpolative path, Stanton’s scheme uses a cubic interpolation,<br />

where only the end point properties are needed, making it a less expensive method.<br />

Another way of improving the convergence properties is to evaluate the gradient and Hessian of the<br />

electronic energy analytically with respect to some variational parameter, and then optimize the<br />

energy through Newton-Raphson steps resulting in a quadratically convergent 14 scheme, at least in<br />

the region close to the optimized state where a second order approximation is reasonable. These<br />

methods are computationally very expensive since a four index transformation is required to obtain<br />

the Hessian information. In 1981 Bacskay proposed a quadratically convergent SCF (QC-SCF)<br />

method 15 which escapes the four index transformation while requiring four or five micro iterations<br />

6


A Survey of Methods for Improving SCF Convergence<br />

per step (in non-problematic cases), each of which is about as expensive computationally as<br />

building a Fock matrix. His method was inspired from single excitation configuration interaction<br />

(SX-CI) and multi-configurational SCF (MC-SCF). A possible divergence of the scheme can be<br />

overcome by moderating the orbital update step by the augmented Hessian method 16 or trust radius<br />

techniques 17 . Even though it is still quite expensive, the method is also used today for cases with<br />

convergence problems, since a decrease in energy can be ensured step by step and it has quadratic<br />

convergence properties near the optimized state.<br />

Around 1995, the interest for linear scaling SCF methods took on, since the development in<br />

computer hardware had made calculations on large molecules possible. With newly developed<br />

algorithms the evaluation of the Fock matrix, with the formal scaling of N 4 arising from the fourindex<br />

integrals, could now routinely be decreased to a near-linear scaling. The diagonalization with<br />

a N 3 scaling in standard Roothaan-Hall was now the bottle neck. Inspiration was found in tight<br />

binding theory 18-20 , where a number of linear scaling approaches had been suggested earlier 21 . To<br />

obtain linear scaling of the RH step it is necessary to avoid the diagonalization and to ensure<br />

sparsity in the matrices. This is a problem since the convenient canonical MO basis is inherently<br />

delocalized. Some of the well known schemes were reformulated in localized MOs 22 , while others<br />

developed strict AO formulations 20,23-25 . Most of the suggested linear scaling methods did not arise<br />

so much to improve convergence as to improve the scaling, and will therefore not be discussed in<br />

further detail.<br />

Very recently Francisco, Martínez and Martínez introduced their globally convergent trust region<br />

methods for SCF 26 , where the standard fixed-point Roothaan-Hall step is replaced by a trust region<br />

optimization of a model energy function. This algorithm has very nice features since it can be<br />

proved to be globally convergent, and the step sizes are controlled dynamically through a trust<br />

region update scheme. The convergence rate seems rather random though; sometimes perfect and<br />

sometimes hopeless, but only small test examples have been published, so time will show.<br />

1.3.2 Damping and Extrapolation<br />

In his SCF study of atoms, Hartree noted convergence difficulties and suggested a so-called<br />

damping scheme 27 as a modification to the iterative procedure. Instead of using the newly<br />

constructed density D n+1 , which corresponds to a full step, a linear combination of the new density<br />

matrix with the previous one is constructed<br />

damp<br />

Dn+ 1<br />

= Dn + λ( Dn+ 1 − Dn ) = λDn+<br />

1 + ( 1 −λ)<br />

D n , (1.9)<br />

7


Part 1<br />

Improving Self-consistent Field Convergence<br />

where λ – the damping factor - is a scalar chosen between zero and one. The iterative sequence is<br />

then continued with D damp as the new density. Hartree found that this scheme could force<br />

convergence in problematic cases.<br />

To get an idea of the effect of the damping factor, we consider a block-diagonal Fock matrix in the<br />

MO basis<br />

F<br />

MO<br />

⎛ εo<br />

Fov<br />

⎞<br />

= ⎜ ⎟ , (1.10)<br />

⎝Fvo<br />

εv<br />

⎠<br />

where ‘o’ denotes occupied, ‘v’ virtual and [ε o ] ij = δ ij ε i and [ε v ] ab = δ ab ε a . The change in electronic<br />

energy from the first order variation of the occupied orbitals through first-order perturbation theory<br />

is then given as<br />

virtual occupied 2<br />

( 1)<br />

−Fai<br />

SCF<br />

4<br />

a i<br />

εa<br />

− εi<br />

∆ E =<br />

∑ ∑ . (1.11)<br />

( )<br />

If this first order term is negative and sufficiently small such that the higher order contributions are<br />

insignificant, then a decrease in the electronic energy is seen. If the MOs obey the aufbau principle,<br />

then all ε i < ε a and it is clear that the term is negative as desired. The Hartree damping of Eq. (1.9)<br />

roughly corresponds to multiplying the numerator of Eq. (1.11) by the factor λ, which is positive<br />

and less than one<br />

virtual occupied 2<br />

( 1)<br />

−λFai<br />

SCF<br />

4<br />

a i<br />

εa<br />

− εi<br />

∆ E =<br />

∑ ∑ , (1.12)<br />

( )<br />

thus giving the opportunity to obtain a negative first order change of arbitrarily small magnitude,<br />

making the higher order terms insignificant. Though this would seem promising, the aufbau<br />

principle is seldom obeyed all through the optimization.<br />

If λ could be freely chosen, the damping technique would lead to an extrapolation scheme in the<br />

densities. Since SCF generates an iterative sequence where each step only depends upon the<br />

preceding, it was natural to apply the mathematical extrapolation methods (e.g. the Aitken<br />

extrapolation 28 procedures) on SCF to improve in particular the convergence rate close to the<br />

minimum. When the individual MO expansion coefficients are chosen as the extrapolated<br />

parameters, as Winter and Dunning Jr. 29 suggested, unphysical result may be obtained, though they<br />

can be corrected at the end of the calculation. Nielsen used instead the density matrix as the<br />

extrapolated parameter 30 and an eigenvalue extrapolation instead of the Aitken method. This led to a<br />

scheme more similar to Hartree damping, but with λ found within the eigenvalue extrapolation<br />

scheme.<br />

8


A Survey of Methods for Improving SCF Convergence<br />

Different approaches have been taken to dynamically find the damping factor λ. Zerner and<br />

Hehenberger 31 found it based on an extrapolation of the Mulliken gross population. Karlström 32<br />

expressed the electronic energy in the damped density E(D damp ) and used the first derivative with<br />

respect to λ, to choose in each iteration the λ that minimized the electronic energy.<br />

None of these schemes were very successful solving the convergence problems. They all had some<br />

particular problematic cases they could handle better than the predecessors, but in general they did<br />

not catch on. Pulay then suggested in the early 1980s to use the norm of a linear combination of<br />

error vectors e i from the individual iterations, where the vanishing of the error vector is a necessary<br />

and sufficient condition for SCF convergence. The norm is then optimized with respect to the<br />

coefficients c i<br />

n<br />

e ( c)<br />

= ∑ ciei<br />

, (1.13)<br />

where n is the number of previous iterations, and the coefficients are restricted to add up to 1<br />

n<br />

i=<br />

1<br />

i=<br />

1<br />

∑ ci<br />

= 1. (1.14)<br />

The resulting coefficients are used to construct a favorable linear combination of the previous Fock<br />

matrices<br />

n<br />

F = ∑ ciF i , (1.15)<br />

i=<br />

1<br />

which is diagonalized to obtain a new density, and so the iterative procedure is reestablished. This<br />

was the first density subspace minimization scheme that deliberately exploited the information<br />

obtained in the previous iterations and he named the approach DIIS 33 for “Direct Inversion in the<br />

Iterative Subspace”. For the special case of two matrices, the DIIS density corresponds to the<br />

damped density of Eq. (1.9), but with no restrictions on λ. A decade later the DIIS algorithm was a<br />

standard option in most ab initio programs and had effectively solved a number of the convergence<br />

problems. The orbital rotation gradient was typically used as the error vector for wave function<br />

optimizations, and Sellers pointed out 34 that the DIIS algorithm exploits the second-order<br />

information contained in a set of gradients to obtain quadratic convergence behavior. Some<br />

numerical problems were seen though, where numerical instabilities appeared because of linear<br />

dependencies in the space of error vectors. Sellers introduced the C2-DIIS method 34 , which is<br />

similar to DIIS except the restriction is on the squares of the coefficients<br />

n<br />

2<br />

∑ ci<br />

= 1 , (1.16)<br />

i=<br />

1<br />

9


Part 1<br />

Improving Self-consistent Field Convergence<br />

with a renormalization at the end. This gives an eigenvalue problem to be solved instead of the set<br />

of linear equations in normal DIIS, and thus singularities are more easily handled. However, one of<br />

the examples (Pd 2 in the Hyla-Kripsin basis set 35 ) given in ref. 34 , where DIIS supposedly diverges,<br />

converges for our plain DIIS implementation to 10 -7 in the energy in 14 iterations.<br />

Even though DIIS is successful, examples of divergence with no relation to numerical instabilities<br />

have been encountered over the years. In the year 2000 Cancès and Le Bris presented a damping<br />

algorithm named the Optimal damping Algorithm 36 (ODA) that ensures a decrease in energy at each<br />

iteration and converges toward a solution to the HF equations. In ODA the damping factor λ is<br />

found based on the minimum of the Hartree-Fock energy for the damped density in Eq. (1.9)<br />

E<br />

damp<br />

( Dn+<br />

1<br />

, λ) = E ( Dn ) + 2λTrF( Dn )( Dn+<br />

−Dn<br />

)<br />

HF HF 1<br />

2<br />

+ λ Tr ( D −D ) G( D − D ) + h ,<br />

n+ 1 n n+<br />

1 n nuc<br />

(1.17)<br />

much like Karlström did it in 1979. The damping factor is thus optimized in each iteration, hence<br />

the name of the algorithm.<br />

Recently Kudin, Scuseria, and Cancès proposed a method in which the gradient-norm minimization<br />

in DIIS is replace by a minimization of an approximation to the true energy function and they<br />

named it the energy DIIS (EDIIS) method 37 . Where the ODA used the energy expression of Eq.<br />

(1.17) to find the optimal λ, EDIIS uses an approximation of the Hartree-Fock energy for the<br />

averaged density<br />

n<br />

EDIIS 1<br />

n<br />

D = ∑ ciD i , (1.18)<br />

i=<br />

1<br />

( , ) = ∑ i SCF ( i ) −<br />

2 ∑ i j Tr( ( i − j ) ⋅( i − j ))<br />

i= 1 i, j=<br />

1<br />

n<br />

E Dc c E D c c F F D D , (1.19)<br />

where the sum of the coefficients c i is still restricted to 1. They combine the scheme with DIIS, such<br />

that the EDIIS optimized coefficients are used to construct the averaged Fock matrix if all<br />

coefficients fall between 0 and 1. If not, the coefficients from the DIIS scheme are used instead. The<br />

EDIIS scheme introduces some Hessian information not found in DIIS and thus improves<br />

convergence in cases where the start guess has a Hessian structure far from the optimized one. For<br />

non-problematic cases and near the optimized state EDIIS has a slower convergence rate than DIIS,<br />

but it has been demonstrated that EDIIS can converge cases where DIIS diverges.<br />

Recently, we suggested another subspace minimization algorithm along the same line as EDIIS, but<br />

with a smaller idempotency error in the energy model and the same orbital rotation gradient in the<br />

subspace as the SCF energy (the EDIIS energy model actually has a different gradient). We named<br />

it TRDSM 38 for trust region density subspace minimization since a trust region optimization is<br />

10


A Survey of Methods for Improving SCF Convergence<br />

carried out of the energy model in the subspace of previous densities. In the second paper on<br />

TRDSM 39 , a comparison with the EDIIS and DIIS models can be found stating explicitly that the<br />

EDIIS energy model does not have the correct gradient and is wrong for other reasons as well at the<br />

DFT level of theory.<br />

Many of the energy minimization techniques can be combined with a damping or extrapolation<br />

scheme to improve the convergence. Typically, DIIS has been the choice 24,40,41 , but TRDSM could<br />

be used just as well.<br />

1.3.3 Level Shifting<br />

In 1973 Saunders and Hillier introduced the level shift concept 42 . They suggested adding a positive<br />

scalar µ to the diagonal of the virtual-virtual block of the Fock matrix in the MO basis, Eq. (1.10),<br />

before diagonalizing<br />

MO<br />

MO<br />

( µ ( ) )<br />

F + I− D C = Cε , (1.20)<br />

where I is the identity matrix and D MO is the scaled one-electron density matrix in the MO basis<br />

with 1 in the diagonal of the occupied-occupied block and zeros for the rest.<br />

To compare level shifting with the damping scheme of Hartree 27 , consider the first order variation in<br />

the energy change as in Eq. (1.11); the level shift µ then corresponds to adding a positive constant to<br />

the denominator<br />

virtual occupied 2<br />

( 1)<br />

−Fai<br />

SCF<br />

4<br />

a i a i<br />

∆ E =<br />

∑ ∑ . (1.21)<br />

( ε − ε + µ )<br />

The level shift thus has, as the damping factor, the possibility to decrease the magnitude of the term.<br />

The problems with respect to the aufbau principle mentioned in connection with the damping can be<br />

overcome with the level shift. The level shift can separate the occupied orbitals from the virtuals<br />

and thereby ensure a positive denominator and an overall decrease in energy. As the level shift is<br />

increased towards infinity, the obtained decrease in energy will correspond to that of the steepest<br />

descent method as explained in Section 1.4.1.4, and thus the convergence will be slow. This<br />

connection between a large gap between the occupied and the virtual orbitals (HOMO-LUMO gap)<br />

and slow convergence was exploited by Bhattacharyya in 1978 to accelerate convergence for cases<br />

with large HOMO-LUMO gaps. His “reverse level shift” technique 43 uses a negative level shift<br />

instead of a positive, thus decreasing the gap and accelerating the convergence.<br />

In 1977, Carbó, Hernández and Sanz claimed unconditional convergence for an SCF process with a<br />

properly used level shift 44 , and two decades later, Cancès and Le Bris 45 made a formal proof that for<br />

11


Part 1<br />

Improving Self-consistent Field Convergence<br />

any initial guess D 0 , there exists a level shift µ 0 > 0 such that for level shift parameters µ > µ 0 , the<br />

energy decreases at each step and converges towards a stationary value.<br />

The level shift technique is still routinely used for cases where the DIIS scheme has problems. The<br />

level shifts are typically found on a trial and error basis. Recently, we advocated the use of a level<br />

shift to control the changes introduced in the Roothaan-Hall step 38 , and we suggested a way of<br />

optimizing the level shift at each iteration based on physical arguments and without guesswork. The<br />

algorithm is based on the trust region philosophy in which a model energy function is optimized,<br />

but restricted with respect to the step length. We thus named the algorithm trust region Roothaan-<br />

Hall (TRRH), even though it is not a true trust region optimization scheme like e.g. the energy<br />

minimization of Francisco, Martínez, and Martínez 26 or our TRDSM scheme 38 .<br />

Level shifting can be combined with a damping or extrapolation scheme. When the TRRH approach<br />

is combined with the subspace minimization method TRDSM it seems to outperform DIIS in<br />

stability and to have a better or similar convergence rate, as will be illustrated in the following<br />

sections. Combining level shifting with DIIS can occasionally be a benefit, but typically DIIS and<br />

level-shifting does not work well together, and in Section 1.4.1.3 we will try to justify this.<br />

1.4 Development of SCF Optimization Algorithms<br />

The SCF scheme as it typically looks today is sketched in Fig. 1.2. Compared to Fig. 1.1, the step <br />

is inserted, illustrating a density subspace minimization, where<br />

some function f is minimized with respect to the coefficients c i<br />

which expand the previous densities D i . The function f could<br />

be the gradient norm as in DIIS or some energy model<br />

D 0<br />

F(D n<br />

)<br />

n<br />

approximating the SCF energy in the subspace of the previous<br />

D = ∑ciDi,minf<br />

( c)<br />

densities as in EDIIS and TRDSM. In the Roothaan-Hall step<br />

i=<br />

1<br />

<br />

, the averaged Fock matrix F found from the optimization in<br />

n<br />

n = n+1 F =<br />

is then used instead of the most recent Fock matrix F(D n ) to<br />

∑ciF( Di)<br />

i=<br />

1<br />

find a new trial density D n+1 . In general, the averaged density<br />

matrix D is not idempotent and therefore does not represent a<br />

valid density matrix; moreover, since the Kohn-Sham matrix<br />

F D n+1 <br />

(unlike the Fock matrix) is nonlinear in the density matrix, the<br />

averaged Kohn-Sham matrix F is different from FD. ( ) For<br />

these reasons, the averaged Fock matrix F cannot be<br />

no<br />

D n+1<br />

≈ D n<br />

yes<br />

D conv<br />

associated uniquely with a valid Fock matrix. Usually, this<br />

Fig. 1.2 Flow diagram of the SCF<br />

does not matter much since the subsequent diagonalization of scheme including the density<br />

the Fock matrix nevertheless produces a valid density matrix subspace minimization step.<br />

12


Development of SCF Optimization Algorithms<br />

according to Eq. (1.8). The complications arising from the use of the averaged Fock matrix is<br />

disregarded in the following, noting that the errors introduced by this approach may easily be<br />

corrected for, if necessary.<br />

The rest of this part of the thesis will focus on the work we have done over the last couple of years<br />

to improve SCF convergence. We have made developments in all of the three categories of the<br />

previous section. The density subspace minimization scheme TRDSM and the level shift scheme in<br />

TRRH, both briefly described in the previous section, make up a total scheme we have named<br />

TRSCF, where each SCF iteration contains a TRDSM and a TRRH step. The first subsection will<br />

go into further detail on TRRH and will thus be concerned with our modifications to step in Fig.<br />

1.2. The second subsection will likewise go into further detail on TRDSM and will describe the<br />

scheme we apply in step . In the third subsection, a recently developed energy minimization<br />

procedure will be presented. The procedure merges step and integrating a subspace<br />

minimization in the optimization of a new trial density.<br />

This section will primarily take the Hartree-Fock point of view, acknowledging that with small<br />

adjustments and the word Fock replaced by Kohn-Sham, it would describe the DFT situation as<br />

well. In Section 1.5 the differences appearing when the algorithms are applied to the HF and DFT<br />

cases, respectively, will be discussed.<br />

1.4.1 Dynamically Level Shifted Roothaan-Hall<br />

The problems inherent to the RH diagonalization method are the discontinuous changes in the<br />

density and the lack of guarantees for energy decrease. To overcome these problems, we introduced<br />

in 2004 a means to restrict the RH step to the trust region of the RH energy model, with the purpose<br />

of both controlling the changes in the density and ensuring an energy decrease. Since then, the same<br />

ideas have been put forward by Francisco et. al. 26 as well, suggesting a trust region optimization of<br />

a RH energy model.<br />

In this section, our trust region Roothaan-Hall scheme and related subjects are discussed. In<br />

particular, we present two different schemes for dynamic level shifting and an alternative to<br />

diagonalization.<br />

1.4.1.1 RH Step with Control of Density Change<br />

The solution of the traditional Roothaan–Hall eigenvalue problem Eq. (1.6) may be regarded as the<br />

minimization of the sum of the energies of the occupied MOs 8,46<br />

RH<br />

subject to MO orthonormality constraints<br />

E<br />

∑<br />

( D) = 2 ε = 2TrF D (1.22)<br />

i<br />

i<br />

0<br />

13


Part 1<br />

Improving Self-consistent Field Convergence<br />

T<br />

occ occ = N<br />

C SC I , (1.23)<br />

where F 0 is typically obtained as a weighted sum of the previous Fock matrices such as F in Eq.<br />

(1.15). Since Eq. (1.22) represents a crude model of the true Hartree-Fock energy (with the same<br />

first-order term, but different zero- and second-order terms), it has a rather small trust radius. A<br />

global minimization of E RH (D), as accomplished by the solution of the Roothaan–Hall eigenvalue<br />

problem Eq. (1.6), may therefore easily lead to steps that are longer than the trust radius and hence<br />

unreliable. To avoid such steps, we shall impose on the optimization of Eq. (1.22) the constraint that<br />

the new density matrix D does not differ much from the old D 0 , that is, the S-norm of the density<br />

difference should be equal to a small number ∆<br />

2<br />

2<br />

D− D0 S<br />

= Tr ( D−D0 ) S( D− D0 ) S = − 2Tr D0SDS + N = ∆, (1.24)<br />

where N is the number of electrons – see Eq. (1.2) – and the S-norm used throughout this thesis is<br />

defined as<br />

2<br />

S<br />

A = Tr ASAS (1.25)<br />

for symmetric A. The optimization of Eq. (1.22) subject to the constraints Eq. (1.23) and Eq. (1.24)<br />

may be carried out by introducing the Lagrangian<br />

1<br />

T<br />

L = 2TrFD 0 −2µ<br />

( TrDSDS 0 − ( N −∆)<br />

) −2Trη( CoccSCocc<br />

−I N ) , (1.26)<br />

2<br />

where µ is the undetermined multiplier associated with the constraint Eq. (1.24), whereas the<br />

symmetric matrix η contains the multipliers associated with the MO orthonormality constraints.<br />

Differentiating this Lagrangian with respect to the MO coefficients and setting the result equal to<br />

zero, we arrive at the level-shifted Roothaan–Hall equations:<br />

( F − µ SD S) C ( µ ) = SC ( µ ) λ ( µ ). (1.27)<br />

0 0 occ occ<br />

Since the density matrix, Eq. (1.8), is invariant to unitary transformations among the occupied MOs<br />

in C occ ( µ ), we may transform this eigenvalue problem to the canonical basis:<br />

( F − µ SD S) C ( µ ) = SC ( µ ) ε ( µ ) , (1.28)<br />

0 0 occ occ<br />

where the diagonal matrix ε(µ) contains the orbital energies. Note that, since D 0 S projects onto the<br />

part of C occ that is occupied in D 0 (see ref. 46 ), the level-shift parameter µ shifts only the energies of<br />

the occupied MOs. Therefore, the role of µ is to modify the difference between the energies of the<br />

occupied and virtual MOs - in particular, the HOMO–LUMO gap.<br />

Clearly, the success of the trust region Roothaan–Hall (TRRH) method will depend on our ability to<br />

make a judicious choice of the level-shift parameter µ in Eq. (1.28). In our standard TRRH<br />

implementation, we determine µ by requiring that D(µ) does not differ much from D 0 in the sense of<br />

2<br />

14


Development of SCF Optimization Algorithms<br />

Eq. (1.24), thereby ensuring a continuous and controlled development of the density matrix from the<br />

initial guess to the converged one.<br />

1.4.1.2 The Trust Region RH Level Shift<br />

The constraint on the change in the AO density Eq. (1.24) refers to a change which may arise not<br />

only from small changes in many MOs but also from large changes in a few MOs or even in a<br />

single MO. To obtain a high level of control, we shall require that the changes in the individual<br />

new<br />

MOs are all small. Expanding the MOs ϕ i , obtained by diagonalization of Eq. (1.28), in the old<br />

MOs, we obtain<br />

occ<br />

virt<br />

new old new old old new old<br />

i = j i j + a i a<br />

j<br />

a<br />

∑ ∑ , (1.29)<br />

ϕ ϕ ϕ ϕ ϕ ϕ ϕ<br />

where the first summation is over the occupied MOs and the second over the virtual MOs. The<br />

new<br />

squared norm of the projection of ϕ i onto the MO space associated with D 0 is therefore<br />

orb old new<br />

i j i<br />

j<br />

2<br />

a = ∑ ϕ ϕ . (1.30)<br />

To ensure small individual MO changes in each iteration (to within a unitary transformation of the<br />

occupied MOs), we shall therefore require<br />

orb orb orb<br />

min<br />

min i<br />

i<br />

min<br />

a = a ≥ A , (1.31)<br />

orb<br />

where Amin<br />

is close to one (0.98 or 0.975 in practice). This way of controlling the changes in the<br />

density was also used by Seeger and Pople in their steepest descent method 11 .<br />

To illustrate how this scheme is used in practice, detailed<br />

information from the TRRH step in iteration 7 of a HF/6-31G and<br />

an LDA/6-31G calculation on the zinc complex depicted in Fig.<br />

1.3 is displayed in Fig. 1.4 and Fig. 1.5, respectively. In the upper<br />

orb orb<br />

panels is illustrated how a search for amin<br />

= Amin<br />

determines the<br />

optimal level shift µ for the TRRH step. The TRRH energy model<br />

is more accurate for HF than for DFT (see Section 1.5.1), and<br />

consequently larger changes can be handled in the TRRH step for Fig. 1.3 Zn 2+ in complex with<br />

orb<br />

ethylenediamine-N,N'-disuccinic<br />

HF than for DFT. A<br />

min<br />

is thus set to 0.975 for HF and 0.98 for<br />

acid (EDDS).<br />

DFT. In the lower panels is seen that the chosen level shifts avoid<br />

an increase in the energy which would have been the case if the Roothaan-Hall step was not level<br />

shifted (µ = 0). Notice also that an even lower energy would have been obtained by reducing the<br />

level shift, but then the restrictions on the overlap should be loosened, and this would result in<br />

15


Part 1<br />

Improving Self-consistent Field Convergence<br />

energy increase in other iterations. In short, the identification of µ from the overlap requirement<br />

a<br />

orb<br />

min<br />

orb<br />

min<br />

= A appears to be a good and secure way to control the step sizes in the optimization.<br />

orb<br />

a min<br />

1.0<br />

0.8<br />

orb<br />

A min = 0.975<br />

orb<br />

a min<br />

1.0<br />

0.8<br />

orb<br />

A min = 0.98<br />

0.6<br />

0.6<br />

0.4<br />

0.2<br />

0.0<br />

A<br />

0 2 4 6 8 10<br />

µ<br />

0.4<br />

0.2<br />

0.0<br />

A<br />

0 2 4 6 8 10<br />

µ<br />

40.0<br />

20.0<br />

RH<br />

∆E HF<br />

40.0<br />

20.0<br />

RH<br />

∆E LDA<br />

∆E / a.u.<br />

0.0<br />

-20.0<br />

-40.0<br />

RH<br />

∆E<br />

0 2 4 6 8 10<br />

µ<br />

Fig. 1.4 HF/6-31G, iteration 7. (A) The overlap<br />

orb<br />

RH<br />

a<br />

min<br />

and (B) the changes in the HF energy ∆ E HF<br />

RH<br />

and in the RH energy model ∆ E as a function of<br />

the level shift µ.<br />

B<br />

∆E / a.u.<br />

0.0<br />

-20.0<br />

-40.0<br />

∆E RH<br />

0 2 4 6 8 10<br />

µ<br />

Fig. 1.5 LDA/6-31G, iteration 7. (A) The overlap<br />

orb<br />

a<br />

min<br />

and (B) the changes in the LDA energy<br />

RH<br />

RH<br />

∆ E LDA<br />

and in the RH energy model ∆ E as a<br />

function of the level shift µ.<br />

B<br />

1.4.1.3 DIIS and Dynamically Level Shifted RH<br />

For accelerating the SCF convergence, DIIS is a simple and in general very successful scheme. We<br />

would expect to get an even better performance and improve the stability of the scheme if DIIS was<br />

combined with a dynamically level shifted RH step like TRRH instead of the standard RH with no<br />

control of the step. To investigate how a combination of DIIS and TRRH performs, we carried out a<br />

number of DIIS-TRRH optimizations. A typical example is seen in Fig. 1.7 and an extraordinary<br />

example is seen in Fig. 1.8.<br />

Fig. 1.6 Cd 2+ complexed with an<br />

imidazole ring.<br />

16


Development of SCF Optimization Algorithms<br />

Error in energy / E h<br />

1.E+02<br />

1.E+00<br />

1.E-02<br />

1.E-04<br />

1.E-06<br />

1.E-08<br />

DIIS<br />

DIIS-TRRH<br />

TRSCF<br />

0 5 10 15 20 25<br />

Iteration<br />

Fig. 1.7 LDA/STO-3G calculations with a H1-core<br />

start guess on the cadmium complex in Fig. 1.6.<br />

Error in energy / E h<br />

1.E+02<br />

1.E+00<br />

1.E-02<br />

1.E-04<br />

1.E-06<br />

TRSCF<br />

DIIS-TRRH<br />

DIIS<br />

0 5 10 15 20 25 30<br />

Iteration<br />

Fig. 1.8 LDA/STO-3G calculations with a Hückel<br />

start guess on the zinc complex in Fig. 1.3.<br />

Somewhat surprisingly the calculations rarely converge with the DIIS-TRRH method. To<br />

understand this behavior, we note that, in the global region, the TRRH method typically produces<br />

gradients that do not change much, even though large changes may occur in the energy. In such<br />

cases, the DIIS method may stall, not being able to identify a good combination of density matrices.<br />

This behavior is illustrated in Table 1-1, where the gradient norm and Kohn–Sham energy of the<br />

first six iterations of the cadmium complex calculations in Fig. 1.7 are listed.<br />

Table 1-1. The Gradient norm ||g||=||4(SDF-FDS)|| in the first six<br />

iterations of the cadmium complex calculations of Fig. 1.7.<br />

DIIS DIIS-TRRH TRSCF<br />

It. E KS ||g|| E KS ||g|| E KS ||g||<br />

1 -5597.0 7.8 -5597.0 7.8 -5597.0 7.8<br />

2 -5502.3 14.9 -5598.4 7.2 -5598.3 7.1<br />

3 -5602.1 9.7 -5600.3 8.5 -5603.7 9.3<br />

4 -5628.5 2.1 -5599.9 7.7 -5611.1 9.1<br />

5 -5627.4 3.5 -5599.9 7.8 -5616.8 7.7<br />

6 -5628.8 0.8 -5600.2 8.1 -5622.7 7.5<br />

conv no conv conv<br />

The TRSCF and DIIS-TRRH gradients stay almost the same during these iterations, stalling the<br />

DIIS-TRRH optimization but not the TRSCF optimization, whose energy decreases in each<br />

iteration. In the pure DIIS optimization, by contrast, the gradient changes significantly from<br />

iteration to iteration; at the same time, the energy decreases at each iteration except the second and<br />

fifth, where also the gradient norms increase. Eventually, DIIS enters the local region with its rapid<br />

rate of convergence although we note a sudden, large increase in the energy in iterations 10 and 11.<br />

However, these changes are accompanied with large increases in the gradient norm, allowing DIIS<br />

to recover safely.<br />

17


Part 1<br />

Improving Self-consistent Field Convergence<br />

In the example Fig. 1.8 standard DIIS diverges. TRSCF converges, but a minimum level shift of 0.1<br />

is used all through the calculation. When DIIS is combined with TRRH in this case, also using a<br />

minimum level shift of 0.1, it converges as well as TRSCF. Table 1-2 contains the gradient norm<br />

and Kohn-Sham energy of the first six iterations of the calculations in Fig. 1.8.<br />

Table 1-2. The gradient norm ||g||=||4(SDF-FDS)|| in the first six<br />

iterations of the zinc complex calculations of Fig. 1.8.<br />

DIIS DIIS-TRRH TRSCF<br />

It. E KS ||g|| E KS ||g|| E KS ||g||<br />

1 -2826.95 11.6 -2826.95 11.6 -2826.95 11.6<br />

2 -2745.49 24.0 -2830.11 3.3 -2830.06 3.4<br />

3 -2809.38 13.6 -2831.04 1.6 -2831.11 1.5<br />

4 -2819.16 9.7 -2831.44 0.8 -2831.42 1.1<br />

5 -2776.74 15.4 -2831.34 1.5 -2831.40 1.5<br />

6 -2826.55 7.0 -2831.41 1.5 -2831.47 0.9<br />

no conv conv conv<br />

In this case the gradient norms for the TRSCF calculation change significantly and a decrease in<br />

gradient relates directly to a decrease in the energy, where in the first example there were no direct<br />

connection between the gradient norm and the energy. The DIIS-TRRH calculation follows the<br />

same gradient behavior as TRSCF, just as in the first example, and they both converge. The DIIS<br />

gradient norm changes, but does not decrease as in the first example. There is still the connection<br />

between small gradients and low energies though, so why DIIS cannot find the proper directions in<br />

this case is not evident.<br />

In our experience DIIS should not be used in connection with a dynamic level shift scheme like<br />

TRRH, since for all but the simplest cases DIIS-TRRH diverged if DIIS converged. We<br />

encountered, however, the example in Fig. 1.8 where DIIS does not converge and DIIS-TRRH does,<br />

but it was the exception.<br />

1.4.1.4 Line Search TRRH<br />

In view of the relative crudeness of the E RH (D) model, a more robust approach for choosing the<br />

level shift µ than the one presented in Section 1.4.1.2 consists of performing a line search along the<br />

RH<br />

path defined by µ to obtain the minimum of the energy E SCF ( D ( µ )). Strictly speaking, this<br />

optimization is not a line search but rather a univariate search. A univariate search has previously<br />

been used by Seeger and Pople 11 to stabilize convergence of the RH procedure.<br />

For µ → ∞ Eq. (1.28) becomes equivalent to solving the eigenvalue equation<br />

0 0<br />

0 occ = occ<br />

SD SC SC η , (1.32)<br />

18


Development of SCF Optimization Algorithms<br />

where η has eigenvalues 1 for the set of orbitals that are occupied in D 0 and eigenvalues 0 for the<br />

set of virtual orbitals. Eq. (1.32) thus effectively divides the molecular orbitals into a set that is<br />

occupied and a set that is unoccupied. If D 0 is idempotent, it can be reconstructed from the occupied<br />

0<br />

set of eigenvectors C occ . If D 0 is not idempotent, a purification of D 0 is obtained<br />

( ) T<br />

occ<br />

idem 0 0<br />

0<br />

= occ<br />

D C C . (1.33)<br />

Since F 0 is the gradient of E(D 0 ), the step from Eq. (1.28) corresponding to a large µ is in the<br />

steepest descent direction, and will therefore give a decrease in the Hartree-Fock energy compared<br />

to the energy at D 0 . Thus a µ exists for which the energy decreases and a line search can then find<br />

the µ leading to the largest decrease in the energy. Using the same example as in Section 1.4.1.2,<br />

Fig. 1.9 and Fig. 1.10 illustrate how the optimal µ is chosen for the line search TRRH (TRRH-LS)<br />

algorithm. A simple search in the energy change for the RH step is carried out, where the energy<br />

change is found as<br />

( ) SCF ( )<br />

RH<br />

idem<br />

∆ E ( µ ) = E D( µ ) − E D , (1.34)<br />

SCF SCF<br />

0<br />

and the µ leading to the largest decrease in energy is chosen as marked on the figures.<br />

40.0<br />

20.0<br />

RH<br />

∆E HF<br />

40.0<br />

20.0<br />

∆E / a.u.<br />

0.0<br />

-20.0<br />

-40.0<br />

RH<br />

∆E<br />

0 2 4 µ 6 8 10<br />

Fig. 1.9 HF/6-31G, iteration 7. The changes in the<br />

RH<br />

HF energy ∆ E HF<br />

and in the RH energy model<br />

RH<br />

∆ E as a function of the level shift µ.<br />

∆E / a.u.<br />

0.0<br />

-20.0<br />

-40.0<br />

RH<br />

∆E LDA<br />

∆E RH<br />

0 2 4 µ 6 8 10<br />

Fig. 1.10 LDA/6-31G, iteration 7. The changes in<br />

RH<br />

the LDA energy ∆ E LDA<br />

and in the RH energy<br />

RH<br />

model ∆ E as a function of the level shift µ.<br />

The TRRH-LS algorithm thus ensures an energy decrease in the RH step, but is of course much<br />

more expensive than the standard method, requiring the repeated construction of the Fock matrix for<br />

a single RH step. However, the first derivative dE<br />

SCF dµ can be evaluated from the Fock matrix,<br />

RH<br />

and a cubic spline interpolation can thus be made from only two points on the ∆ E SCF<br />

curve.<br />

1.4.1.5 Optimal Level Shift without MO Information<br />

As seen from Eq. (1.29) the individual MOs are used to find a suitable level shift in the TRRH<br />

scheme. We are very much aware that this is the most import point to improve on in our scheme. To<br />

obtain this MO information, the cubically scaling diagonalization of the Fock matrix is necessary,<br />

19


Part 1<br />

Improving Self-consistent Field Convergence<br />

and furthermore the MO coefficient matrices C are inherently non-sparse. Several linear or nearlinear<br />

scaling alternatives to diagonalization have been suggested in the literature 18-20 . These<br />

methods could be reformulated with a dynamical level shift scheme like ours if the scheme could do<br />

without the MO information, but it is not an easy task to find a good dynamic level shift scheme<br />

with a high level of control without the knowledge of the developments in the individual MOs. The<br />

search used to find the level shift in TRRH-LS is directly applicable since it is not dependent on the<br />

MO information; the problem is only the number of Fock evaluations. The Fock evaluation is still<br />

expensive even though algorithms which make the evaluation of the Fock matrix cheaper are<br />

continually developed.<br />

This section describes a very recently developed approach to find the optimal level shift in the<br />

TRRH step without the use of individual MOs or knowledge of the HOMO-LUMO gap. So far it<br />

has proven to be the most successful level shift scheme we have studied. The scheme is build on the<br />

assumption that the TRRH step is taken in connection with a TRDSM step (or some other density<br />

subspace minimization method). In this case it can be exploited that TRDSM is a very good energy<br />

model (see Section 1.4.2.2) and can be trusted with the responsibility to find the best direction as<br />

long as not too much new information is introduced to the density subspace in each step.<br />

A new density, found by diagonalization of a level shifted Fock matrix or by some alternative, can<br />

be split in a part D ⊥<br />

that can be described in the previous densities and a part D with new<br />

information orthogonal to the existing subspace<br />

D can be expanded in the previous densities as<br />

⊥<br />

D( µ ) = D + D . (1.35)<br />

n<br />

<br />

D = ∑ωiDi<br />

, (1.36)<br />

i=<br />

1<br />

where n is the number of previously stored densities D i and the expansion coefficients ω i are<br />

dependent on µ and determined in a least-squares manner<br />

n<br />

−1<br />

ω i ( µ ) = ∑ ⎡⎣M ⎤⎦<br />

Tr D jSD( µ ) S, Mij = Tr DiSD jS . (1.37)<br />

j=<br />

1<br />

ij<br />

⊥<br />

It is obvious that when µ → ∞ then D → 0 since the new density then approaches the initial<br />

density D 0 , see Eq. (1.32) and (1.33), which belongs to the set of previous densities. Thus, there is a<br />

⊥<br />

connection between D and µ which we can exploit. If the ratio d orth ⊥ 2<br />

of the square norm D<br />

S<br />

2<br />

relative to D<br />

S<br />

is small, only small changes to the density subspace are introduced;<br />

20


Development of SCF Optimization Algorithms<br />

d<br />

orth<br />

⊥ 2<br />

S<br />

2<br />

S<br />

D<br />

⊥ ⊥<br />

Tr D SD S<br />

= = < δ , (1.38)<br />

D Tr DSDS<br />

⊥<br />

where δ is some small number and D can be found as D ⊥ = D−<br />

D . To illustrate how this is used<br />

in a dynamic level shift scheme, the examples from the previous sections are again seen in Fig. 1.11<br />

and Fig. 1.12.<br />

In the rest of the thesis the level shift scheme described in Section 1.4.1.2 will be referred to as the<br />

C-shift scheme since it involves the eigenvectors C from the diagonalization of the Fock matrix,<br />

and the level shift scheme described in this section will be referred to as the d orth -shift scheme. If<br />

nothing is mentioned about the level shift scheme, the C-shift is implied.<br />

1.0<br />

0.8<br />

A<br />

1.0<br />

0.8<br />

A<br />

d orth<br />

0.6<br />

0.4<br />

d orth<br />

0.6<br />

0.4<br />

0.2<br />

δ = 0.08<br />

0.2<br />

δ = 0.03<br />

0.0<br />

0 2 4 6 8 10<br />

µ<br />

0.0<br />

0 2 4 6 8 10<br />

µ<br />

40.0<br />

20.0<br />

RH<br />

∆E HF<br />

B<br />

40.0<br />

20.0<br />

RH<br />

∆E LDA<br />

B<br />

∆E / a.u.<br />

0.0<br />

-20.0<br />

-40.0<br />

RH<br />

∆E<br />

0 2 4 µ 6 8 10<br />

Fig. 1.11 HF/6-31G iteration 7. (A) The ratio d orth<br />

RH<br />

and (B) the changes in the HF energy ∆ E HF<br />

and in<br />

RH<br />

the RH energy model ∆ E as a function of the<br />

level shift µ.<br />

∆E / a.u.<br />

0.0<br />

-20.0<br />

-40.0<br />

RH<br />

∆E<br />

0 2 4 µ 6 8 10<br />

Fig. 1.12 LDA/6-31G iteration 7. (A) The ratio d orth<br />

RH<br />

and (B) the changes in the LDA energy ∆ E LDA<br />

and<br />

RH<br />

in the RH energy model ∆ E as a function of the<br />

level shift µ.<br />

The upper panels now display the search made in d orth , and it is clearly seen that d orth → 0 for µ → ∞<br />

as expected, and increases for µ → 0. As for the C-shift scheme we can allow larger changes in the<br />

HF method than in DFT, and thus δ is set to 0.08 for HF and 0.03 for DFT. In the lower panels are<br />

seen that this level shift avoids an increase in the energy just as the C-shift scheme, but the level<br />

shift chosen here is closer to the optimal line search level shift, and thus leads to a larger decrease in<br />

the energy than was the case for the C-shift scheme.<br />

21


Part 1<br />

Improving Self-consistent Field Convergence<br />

In the C-shift scheme seen in Eq. (1.31) the changes introduced are controlled compared to the<br />

previous density, whereas in the d orth -shift scheme the changes are controlled compared to the<br />

subspace of all the previous densities. This scheme is thus less restrictive than the C-shift scheme,<br />

but it seems that the C-shift scheme is too restrictive, ignoring the stability gained from the<br />

subspace information. To compare the overall effect of the two level shift schemes on the SCF<br />

convergence, calculations are given in Fig. 1.13 and Fig. 1.14, for HF and LDA, respectively. The<br />

HF calculations are on CrC with bond distance 2.00Å in the STO-3G basis and the LDA<br />

calculations are on the zinc complex seen in Fig. 1.3 in the 6-31G basis, both cases for which DIIS<br />

diverges. The starting orbitals have been obtained by diagonalization of the one-electron<br />

Hamiltonian (H1-core start guess).<br />

Error in energy / E h<br />

1.E+02<br />

1.E+00<br />

1.E-02<br />

1.E-04<br />

1.E-06<br />

TRSCF<br />

d orth -shift<br />

DIIS<br />

TRSCF<br />

C-shift<br />

Error in energy / E h<br />

1.E+02<br />

1.E+00<br />

1.E-02<br />

1.E-04<br />

1.E-06<br />

TRSCF<br />

d orth -shift<br />

DIIS<br />

TRSCF<br />

C-shift<br />

1.E-08<br />

0 4 8 12 16<br />

Iteration<br />

Fig. 1.13 SCF convergence for HF/STO-3G calculations<br />

on CrC.<br />

1.E-08<br />

0 8 16 24 32<br />

Iteration<br />

Fig. 1.14 SCF convergence for LDA/6-31G calculations<br />

on the zinc complex in Fig. 1.3.<br />

The only difference in the “TRSCF/d orth -shift” and the “TRSCF/C-shift” optimizations is the way<br />

the level shift is found in the TRRH step. Since DIIS diverges, the examples display the stability of<br />

the TRSCF algorithm, and the ability of the two level shifting schemes to handle problematic cases.<br />

In all examples studied so far, both problematic and simple, the d orth -shift has proven as good as or<br />

better than the C-shift. The cost of the level shift search process is similar in the two schemes; the<br />

matrix M in Eq. (1.37) is updated in each iteration as a part of TRDSM and is then reused for the<br />

d orth -shift scheme in TRRH.<br />

In Table 1-3 The SCF energy change in each iteration is divided in the part of the change obtained<br />

from the RH and DSM step, respectively, and it is seen how the RH step is now allowed to accept<br />

larger changes in the density, but still in a controlled manner, thus leading to larger decreases in the<br />

energy and improved convergence.<br />

22


Development of SCF Optimization Algorithms<br />

Table 1-3. The SCF energy change for each RH and DSM step<br />

in the TRSCF calculations in Fig. 1.13.<br />

C-shift<br />

d orth -shift<br />

It.<br />

RH<br />

DSM<br />

RH DSM<br />

∆ E HF ∆ E HF<br />

∆ E HF ∆ E HF<br />

2 -1.1768 0.0000 -1.3976 0.0000<br />

3 -1.8964 -3.8998 -4.1319 -4.5865<br />

4 -1.6764 -1.9603 -1.8021 -1.0448<br />

5 -0.3655 -1.7543 -0.2103 -0.1200<br />

6 -0.1881 -0.1624 -0.0111 -0.0463<br />

7 -0.0932 -0.1505 -0.0036 -0.0037<br />

8 0.0065 -0.0212 -0.0001 -0.0008<br />

9 -0.0039 -0.0154<br />

10 0.0002 -0.0009<br />

1.4.1.6 The Trace Purification Scheme<br />

The dynamic level shift scheme described in the previous section has no reference to the MO basis.<br />

This opens the possibility to replace the diagonalizations in the TRRH step with some alternative<br />

scheme without affecting the overall result.<br />

There have been many suggestions as to how the diagonalization can be replaced by a linear scaling<br />

algorithm 47 . The trace purification (TP) scheme 19,48 , however, is a simple and useful approach and it<br />

has thus been implemented in our SCF program in a local version of DALTON 38,49 . The trace<br />

purification scheme was originally formulated for tight binding theory by Palser and<br />

Manolopoulos 19 and later improved by Niklasson 48 , and is linear scaling when formulated in an<br />

orthogonal basis. The scheme uses the trace and idempotency properties of the density to iteratively<br />

find the new density from a suitable start guess constructed from the Fock matrix.<br />

Since the SCF optimization is formulated in the non-orthogonal AO basis to avoid the delocalized<br />

MO basis, it is necessary to transform the matrices to an orthogonal basis. This is done by a<br />

Cholesky decomposition 50 of the AO overlap matrix S<br />

T<br />

S = LL , (1.39)<br />

where L then is used to transform the Fock matrix to an orthogonal basis<br />

orth -1 −T<br />

F = L FL . (1.40)<br />

The density resulting from the trace purification scheme will also be in the orthogonal basis and<br />

should be transformed back as<br />

−T orth -1<br />

D = L D L . (1.41)<br />

Since the AO overlap matrix does not change during the optimization, the Cholesky decomposition<br />

and the inversion of L can be done once and for all in the beginning of the calculation.<br />

23


Part 1<br />

Improving Self-consistent Field Convergence<br />

F orth<br />

R<br />

λ min<br />

Estimate and<br />

for F orth<br />

λ max<br />

0<br />

orth<br />

( λ<br />

max<br />

I<br />

−<br />

F<br />

)<br />

=<br />

( λ<br />

−<br />

λ<br />

)<br />

max<br />

min<br />

1<br />

x n +1 = 2x n - x n<br />

2<br />

n = n + 1<br />

Tr Rn > N<br />

yes<br />

R<br />

n+ 1 =<br />

R<br />

2<br />

n<br />

no<br />

2<br />

n+ 1 = 2 n − n<br />

R R R<br />

x n +1<br />

no<br />

Tr Rn N ε<br />

+ 1 − <<br />

yes<br />

D orth = R n+1<br />

Fig. 1.15 Flow diagram for the trace purification (TP)<br />

scheme. N is the number of electrons.<br />

0<br />

x n +1 = x n<br />

2<br />

0 x n<br />

1<br />

Fig. 1.16 The purifying polynomials used in<br />

the trace purification scheme. The orange line<br />

is the McWeeny purification polynomial<br />

x n+1 = 3x n 2 – 2x n 3 .<br />

The trace purification is carried out by the Niklasson model with second order purification<br />

polynomials, and is schematized in Fig. 1.15. The initial density guess R 0 is obtained by<br />

normalizing the Fock matrix such that it only has eigenvalues between 0 and 1. To do this, the<br />

bounds for the Fock eigenvalues, λ min and λ max , must be found. They can be estimated using<br />

Gerschgorin’s theorem or the Lanczos algorithm for eigenvalues 51 with only a small extra<br />

computational cost. R is then iteratively purified, and the purification function applied in each<br />

iteration is chosen based on the trace of the matrix R, always keeping the direction towards the<br />

correct trace condition. The purification functions are sketched in Fig. 1.16 including the McWenny<br />

purification function 8 . One of the functions used in the scheme has a stationary point for x = 1 and<br />

the other has a stationary point for x = 0; depending of the function chosen we thus go towards a<br />

larger or smaller trace. When R fulfils the trace and/or idempotency conditions Eq. (1.2) of the one<br />

electron density within some threshold ε, the new density D orth = R has been found and the density<br />

to use in the next TRSCF iteration can be evaluated from Eq. (1.41).<br />

The number of purification iterations required to obtain a new density depends on the threshold ε.<br />

For the test calculations carried out so far, the threshold has been an error of 10 -7 in the trace, and<br />

the number of iterations ranges from 30 to 70 for a single RH step, with the typical number being<br />

closer to 30 than 70. Still, it is less expensive than the diagonalization as soon as more than a couple<br />

24


Development of SCF Optimization Algorithms<br />

of thousand basis functions are needed. The scaling of the TRRH step in general and the trace<br />

purification scheme in particular is illustrated and discussed in Section 1.7.1.<br />

1.4.2 Density Subspace Minimization<br />

The DIIS scheme seems to have been the overall most successful of all the suggestions on how to<br />

improve SCF convergence described in Section 1.3. DIIS was the first scheme to take advantage of<br />

the information contained in the densities and Fock matrices of the previous iterations, and this<br />

made the difference.<br />

This is also exploited in the EDIIS scheme by Kudin et. al. 37 in which an energy model is optimized<br />

with respect to the linear combination of previous densities. The density subspace minimization<br />

presented in this section is an improvement to EDIIS with a smaller idempotency error in the<br />

density, the correct gradient compared to SCF, and thus better convergence properties in both the<br />

local and global region of the optimization.<br />

1.4.2.1 The Trust Region DSM Parameterization<br />

After a sequence of Roothaan-Hall iterations, we have determined a set of density matrices D i and a<br />

corresponding set of Fock matrices F i = F(D i ). An improved density D and Fock matrix F should<br />

now be found as a linear combination of the previous n + 1 stored matrices. Taking D 0 as the<br />

reference density matrix, the improved density matrix can be written<br />

n<br />

= 0 +∑ ci<br />

i=<br />

0<br />

D D D , (1.42)<br />

which, ideally, should satisfy the symmetry, trace and idempotency conditions Eq. (1.2) of a valid<br />

one-electron density matrix. Whereas the symmetry condition is trivially satisfied for any such<br />

linear combination, the trace condition holds only for combinations that satisfy the constraint<br />

n<br />

i=<br />

0<br />

i<br />

∑ ci<br />

= 0 , (1.43)<br />

leading to a set of n + 1 constrained parameters c i with 0 ≤ i ≤ n. Alternatively, an unconstrained set<br />

of n parameters c i with 1 ≤ i ≤ n can be used, with c 0 defined so that the trace condition is fulfilled:<br />

c<br />

0<br />

n<br />

=−∑ c . (1.44)<br />

i=<br />

1<br />

i<br />

In terms of these independent parameters, the density matrix D becomes<br />

where we have introduced the notation<br />

D = D0 + D + , (1.45)<br />

25


Part 1<br />

Improving Self-consistent Field Convergence<br />

D<br />

+<br />

=<br />

n<br />

∑<br />

i=<br />

1<br />

c D<br />

i0<br />

D = D −D<br />

i0 i 0 .<br />

i<br />

(1.46)<br />

Unlike the symmetry and trace conditions in Eq. (1.2), the idempotency condition is in general not<br />

fulfilled for linear combinations of D i . Still, for any averaged density matrix D in Eq. (1.45) that<br />

does not fulfill the idempotency condition, we may generate a purified density matrix with a smaller<br />

idempotency error by the transformation 8<br />

D = 3DSD−2DSDSD. (1.47)<br />

Introducing the idempotency correction<br />

Dδ = D − D, (1.48)<br />

we may then write the purified averaged density matrix in the form<br />

D = D + D + D . (1.49)<br />

0 + δ<br />

1.4.2.2 The Trust Region DSM Energy Function<br />

Having established a useful parameterization of the averaged density matrix Eq. (1.45) and having<br />

considered its purification Eq. (1.47), let us now consider how to determine the best set of<br />

coefficients c i . Expanding the energy in the purified averaged density matrix, Eq. (1.49), around the<br />

reference density matrix D 0 , we obtain to second order<br />

T<br />

( ) ( ) ( ) (1) 1<br />

T<br />

D = D + D+ + D E + ( D+ + D ) E (2) ( D+<br />

+ D )<br />

E E δ δ δ . (1.50)<br />

SCF(2) SCF 0 0 2<br />

0<br />

To evaluate the terms containing<br />

(1)<br />

E<br />

0<br />

and<br />

(2)<br />

E<br />

0<br />

we make the identifications<br />

(1)<br />

0<br />

= 2 0<br />

2 2<br />

0 + = 2 + + +<br />

E F (1.51)<br />

( )<br />

( )<br />

E D F O D , (1.52)<br />

which follow from Eq. (1.4) and from the second-order Taylor expansion of about D 0 . The<br />

n<br />

notation Eq. (1.46) has now been generalized to the Fock matrix F+ = ∑ c<br />

i=<br />

1 iF i0<br />

. Ignoring the<br />

terms quadratic in D δ in Eq. (1.50) and quadratic in D + in Eq. (1.52), we then obtain the DSM<br />

energy<br />

DSM<br />

E () = ESCF ( 0 ) + 2Tr + 0 + Tr + + + 2Tr δ 0 + 2Tr δ +<br />

(1)<br />

E0<br />

c D DF DF DF DF. (1.53)<br />

Finally, for a more compact notation, we introduce the weighted Fock matrix<br />

n<br />

0 + 0 ci<br />

i0<br />

i=<br />

1<br />

and find that the DSM energy may be written in the form<br />

F = F + F = F +∑ F , (1.54)<br />

26


Development of SCF Optimization Algorithms<br />

DSM<br />

( ) ( )<br />

where the first term is quadratic in the expansion coefficients c i<br />

E c = E D + 2TrDδ<br />

F, (1.55)<br />

( ) SCF 0 0<br />

E D = E ( D) + 2TrDF + + TrDF, + +<br />

(1.56)<br />

and the second, idempotency-correction term is quartic in these coefficients:<br />

( )<br />

2TrDδ F = Tr 6DSD −4DSDSD −2D F . (1.57)<br />

The derivatives of E DSM (c) are straightforwardly obtained by inserting the expansions of F and D ,<br />

using the independent parameter representation. The expressions are given in Error! Reference<br />

source not found..<br />

The energy function E DSM (c) in Eq. (1.55) provides an excellent approximation to the exact SCF<br />

energy E SCF (c) about D 0 , with an error quadratic in D δ (see Section 1.5.2). The EDIIS energy model<br />

corresponds to the first term E( D ) in Eq. (1.55) and has thus an error linear in D δ .<br />

1.4.2.3 The Trust Region DSM Minimization<br />

The DSM energy, Eq. (1.55), is minimized with respect to the independent parameters c i with 1 ≤ i<br />

≤ n. The vector containing the parameters is initialized to zero c (0) = 0 such that D = D 0 , where D 0<br />

is chosen as the density matrix with the lowest energy E SCF (D i ), usually the one from the latest<br />

TRRH step. The minimization is then carried out by the trust region method 52 , taking a number of<br />

steps from the initial parameters c (0) to the final optimized parameters c* as illustrated in Fig. 1.17.<br />

c (0) = 0 c*<br />

c (1) c (2) c (3) ....<br />

Fig. 1.17 Steps in the trust region minimization of the DSM energy.<br />

We thus consider in each step the second-order Taylor expansion of the DSM energy in Eq. (1.55).<br />

Introducing the step vector<br />

( i+<br />

1) ( i)<br />

∆c = c −c , (1.58)<br />

we obtain<br />

E<br />

i<br />

( )<br />

DSM ( ) T 1 T<br />

(2)<br />

+ = E0<br />

+ +<br />

2<br />

c ∆c ∆c g ∆c H∆c , (1.59)<br />

where the energy, gradient, and Hessian at the expansion point are given by<br />

E<br />

DSM 2 DSM<br />

DSM ( i)<br />

∂E ( c) ∂ E ( c)<br />

= E ( c ), g = , H =<br />

∂c<br />

i<br />

∂c<br />

0 2<br />

c= c<br />

c=<br />

c<br />

() () i<br />

. (1.60)<br />

27


Part 1<br />

Improving Self-consistent Field Convergence<br />

DSM ( i)<br />

We then introduce a trust region of radius h for E ( c + )<br />

(2)<br />

∆c and require that steps are always<br />

taken inside or to the boundary of this region. To determine a step to the boundary, we restrict the<br />

step to have the length h in the S metric norm M <br />

n<br />

2 2<br />

S<br />

= ∑ ∆cM i ij∆ cj<br />

= h<br />

ij=<br />

1<br />

∆c . (1.61)<br />

In the unconstrained formulation defined by Eq. (1.44), the metric M of Eq. (1.37), is found as<br />

M = Tr DSDS−Tr DSDS− Tr DSDS+ Tr DSDS, i, j ≠ 0 , (1.62)<br />

ij i j i 0 0 j<br />

0 0<br />

Introducing the undetermined multiplier ν for the step-size constraint, we arrive at the following<br />

Lagrangian for minimization on the boundary of the trust region:<br />

L E h . (1.63)<br />

T T T 2<br />

( ∆c,<br />

ν ) = + ∆c g+ 1 ∆c H∆c − 1 ν ( ∆c M∆c − )<br />

0 2 2<br />

Differentiating this Lagrangian and setting the derivatives equal to zero, we obtain the equations<br />

∂L<br />

= g+ H∆c− ν M∆c = 0<br />

∂∆c<br />

(1.64)<br />

∂ L 1 T 2<br />

2 ( ∆c M∆c − h ) 0 .<br />

∂ν<br />

(1.65)<br />

The optimization of the Lagrangian thus corresponds to the solution of the following set of linear<br />

equations:<br />

H− M ∆c =−g<br />

(1.66)<br />

( ν )<br />

where the multiplier ν is iteratively adjusted until the step is to the boundary of the trust region Eq.<br />

(1.65). The step length restriction may be lifted by setting ν = 0 as needed for steps inside the trust<br />

region.<br />

To illustrate how the level shift parameter ν in Eq. (1.66) is determined, we consider in Fig. 1.18<br />

and Fig. 1.19 the third and fourth DSM step respectively, in iteration five of the HF/STO-3G<br />

calculation on CrC seen in Fig. 1.13. The step length ||∆c|| S is plotted as a function of ν. The plots<br />

consist of branches between asymptotes where ν makes the matrix on the left hand side of Eq.<br />

(1.66) singular. This happens whenever ν equals one of the Hessian eigenvalues. The lowest<br />

eigenvalue ω 1 of the Hessian H is found, and the level shift parameter is chosen in the interval -∞ <<br />

ν < min(0,ω 1 ). The proper value is found where the step length function crosses the line<br />

DSM<br />

representing the trust radius h, as marked in Fig. 1.18. If the step that minimizes E<br />

(2)<br />

is inside the<br />

trust region, ν = 0 is chosen as is the case in Fig. 1.19. The trust region is updated during the<br />

iterative procedure and therefore h is different in the two steps.<br />

28


Development of SCF Optimization Algorithms<br />

3<br />

3<br />

2<br />

2<br />

1<br />

h = 0.34<br />

1<br />

h = 0.44<br />

0<br />

-5 -2.5 0 2.5 5 7.5<br />

ν<br />

Fig. 1.18 The step length as a function of the<br />

multiplier ν in the third DSM step.<br />

0<br />

-5 -2.5 0 ν 2.5 5 7.5<br />

Fig. 1.19 The step length as a function of the<br />

multiplier ν in the fourth DSM step.<br />

Each of the trust region steps require the construction of the gradient g and the Hessian H in the<br />

density subspace, and the solution of the level shifted Newton equations Eq. (1.66). Since E DSM is a<br />

local model of the true energy function E SCF , it resembles E SCF only in a small region about the<br />

initial point c (0) . The DSM iterations are therefore terminated if the total step length after p iterations<br />

||c (p) – c (0) || S exceeds some preset value k. If a minimum of E DSM is found inside the trust region ||c (p)<br />

– c (0) || S < k, then the step ||c* - c (0) || S to the minimum is taken and the iterations are terminated. This<br />

is the typical situation.<br />

When the trust region minimization has terminated, an improved density matrix D can be<br />

constructed. However, to avoid the expensive calculation of the Fock matrix from D we use instead<br />

the averaged density matrix from eq. (1.45) and exploit that the Fock matrix is linear in the density<br />

for Hartree-Fock such that F( D ) is simply the averaged Fock matrix of Eq. (1.54). For DFT this is<br />

an approximation, but typically insignificant improvements are obtained by evaluating the correct<br />

Kohn-Sham matrix. The improved Fock matrix and density matrix then enters the TRRH step as F 0<br />

and D 0 , respectively.<br />

By construction E DSM (c) is lowered at each iteration of the trust region minimization. Since E DSM is<br />

a local model to the true energy E SCF , the lowering of E DSM will also lead to a lowering of E SCF<br />

provided the total step is sufficiently short and thus stays in the local region.<br />

1.4.2.4 Line Search TRDSM<br />

As in the TRRH step, the averaged density matrix D may also be determined by a line search and<br />

we denote this line search algorithm TRDSM-LS. Here, the line search is made in the direction<br />

defined by the first step c (1) of the TRDSM algorithm—that is, the step at the expansion point D 0 .<br />

As in the TRRH step, such a line search is guaranteed to reduce the energy. The first step is scaled<br />

by a parameter α,<br />

29


Part 1<br />

Improving Self-consistent Field Convergence<br />

tot<br />

(1)<br />

∆c = α ⋅ c (1.67)<br />

DSM<br />

and a search is made in ∆ E SCF<br />

to find the step ∆c tot that leads to the largest decrease in energy.<br />

E SCF (α) is found by evaluating the averaged density of Eq. (1.45) for the coefficients (c 0 + ∆c tot ),<br />

purifying it as in Eq. (1.32)–(1.33) and inserting it in the energy expression of Eq. (1.1). Then<br />

DSM<br />

∆ E SCF ( α)<br />

can be found as DSM<br />

∆ E ( α ) = E ( α ) − E ( D ). (1.68)<br />

SCF<br />

SCF SCF 0<br />

Fig. 1.20 and Fig. 1.21 illustrate the search in α, again for iteration seven of the HF and LDA<br />

calculations on the zinc complex in Fig. 1.3. For α = 0, no step is taken and hence no energy<br />

decrease is seen. For the marked choice of α, the optimal step length is obtained.<br />

0<br />

-5<br />

-10<br />

-15<br />

-20<br />

-25<br />

-30<br />

-35<br />

0 4 8 12 16 20<br />

α<br />

Fig. 1.20 Decrease in HF energy as a function of<br />

the step length α.<br />

0<br />

-5<br />

-10<br />

-15<br />

-20<br />

-25<br />

0 4 8 12 16 20<br />

α<br />

Fig. 1.21 Decrease in LDA energy as a function of<br />

the step length α.<br />

1.4.2.5 The Missing Term<br />

In the construction of the TRDSM energy model Eq. (1.55), the term of second order in the<br />

idempotency correction D δ was neglected from Eq. (1.50), since this term required a new Fock<br />

evaluation F(D δ ), which would increase the expenses of the scheme considerably. This section will<br />

be concerned with this neglected term and how a part of it can be described without the evaluation<br />

of a new Fock matrix, leading to an improved energy model for TRDSM at no considerable extra<br />

cost. The actual effect of this improvement to the energy model will then be discussed through a<br />

case study. This section will only be concerned with Hartree-Fock theory and examples, but it might<br />

equally well be done for DFT even though the improvement should be less significant since for<br />

DFT, also terms of order ||D + || 3 are neglected. These are of the same size as the neglected term<br />

quadratic in D δ . In Section 1.5.2 these errors are discussed.<br />

Since the only neglect in the DSM energy model Eq. (1.55) for Hartree-Fock is the term quadratic<br />

in D δ , and since the only term quadratic in the density is TrDG(D), the HF energy for the density D <br />

can be written as<br />

30


Development of SCF Optimization Algorithms<br />

( D) = ( D) + D F+<br />

D G( D )<br />

E HF E 2Tr δ Tr δ δ , (1.69)<br />

where E ( D ) is seen in Eq. (1.56). Even though a new Fock matrix h + G(D δ ) should be evaluated<br />

to describe the last term exactly, a part of the term can be described in the subspace of the previous<br />

densities.<br />

As exploited in the level-shift scheme Section 1.4.1.5, a density or density difference, in this case<br />

D δ , can be divided in a part that can be described in the subspace of the previous densities D <br />

δ<br />

and<br />

an unknown part orthogonal to the space<br />

D <br />

δ<br />

D<br />

⊥<br />

δ<br />

δ = <br />

δ<br />

+<br />

⊥<br />

δ<br />

D D D<br />

is expanded in the previous densities D i as<br />

. (1.70)<br />

D<br />

<br />

δ<br />

n<br />

= ∑ωiD<br />

i=<br />

0<br />

i<br />

, (1.71)<br />

where the expansion coefficients ω i are determined in a least-squares manner<br />

ω<br />

n<br />

i =<br />

−1<br />

⎡⎣<br />

⎤⎦<br />

Tr<br />

ij<br />

j=<br />

0<br />

j δ , Mij = Tr i j<br />

∑ M D SD S D SD S . (1.72)<br />

Inserting Eq. (1.70) for D δ in Eq. (1.69), an improved DSM energy model can be written<br />

DSM <br />

( c) = ( D) + D F+ ( D −D ) G( D )<br />

Eimp E 2Tr δ Tr 2 δ δ δ<br />

where only previous density and Fock matrices enter. The relation<br />

, (1.73)<br />

Tr AG( B) = Tr BG( A )<br />

(1.74)<br />

⊥ ⊥<br />

for symmetric matrices A and B is used and the term ( )<br />

Tr Dδ G D<br />

δ<br />

is neglected. A second order<br />

Taylor expansion of the improved DSM energy can then be made as in Eq. (1.59) and a trust region<br />

minimization carried out.<br />

To study the improvement to the energy function, two TRSCF calculations are carried out on the<br />

cadmium complex seen in Fig. 1.6 in the STO-3G basis and with a H1-core start guess. The<br />

convergence profiles of the calculations are displayed in Fig. 1.22, the one denoted “Improved<br />

TRDSM” is a TRSCF calculation just as the one denoted “TRSCF” with the only difference that the<br />

improved energy model in Eq. (1.73) is used for TRDSM instead of the one in Eq. (1.55). To<br />

illustrate the impact of the improvement in a single TRDSM step, a line search like the one in Fig.<br />

1.20 is made in iteration 7 of the same TRSCF calculation as in Fig. 1.22. Apart from displaying the<br />

change in SCF energy as a function of the step length α, also the DSM energy of Eq. (1.55) and the<br />

improved DSM energy of Eq. (1.73) are evaluated for the different choices of α, and their energy<br />

changes found as well.<br />

31


Part 1<br />

Improving Self-consistent Field Convergence<br />

Error in energy / E h<br />

1.E+02<br />

1.E+00<br />

1.E-02<br />

1.E-04<br />

1.E-06<br />

1.E-08<br />

TRSCF<br />

Improved TRDSM<br />

0 5 10 15 20<br />

Iteration<br />

Fig. 1.22 Convergence for the cadmium complex in<br />

Fig. 1.6, both for TRSCF with no improvements,<br />

and for TRSCF where E is used in TRDSM.<br />

DSM<br />

imp<br />

∆E / E h<br />

1.0<br />

0.0<br />

-1.0<br />

-2.0<br />

-3.0<br />

-4.0<br />

DSM<br />

∆E<br />

DSM<br />

∆E HF<br />

DSM<br />

∆E imp<br />

0 2 4 6 8 10 12<br />

α<br />

Fig. 1.23 TRDSM line search for iteration 7 in the<br />

TRSCF optimization Fig. 1.22. For different α in<br />

DSM<br />

Eq. (1.67), the changes in E<br />

HF<br />

, E DSM DSM<br />

and E<br />

imp<br />

compared to E HF (D 0 ) are found.<br />

It is seen in Fig. 1.23 that the improved DSM energy describes the HF energy better than the<br />

standard DSM energy does, just as expected. As the step moves away from the expansion point, the<br />

part of the energy which cannot be described in the old densities grows and both the DSM energy<br />

models become poor.<br />

The improvements presented in this section add complexity to the TRDSM algorithm, even though<br />

the computational cost is not significant. As seen in Fig. 1.22 and Fig. 1.23, the improvements to the<br />

TRSCF calculation are minor. The overall gain does not justify the extra complexity added to the<br />

TRDSM algorithm.<br />

1.4.3 Energy Minimization Exploiting the Density Subspace<br />

Section 1.3.1 describes how different approaches have been taken to avoid the diagonalization in<br />

the Roothaan-Hall step. Replacing the standard diagonalization of the Fock matrix can be done for<br />

the purpose of improving either the convergence properties or the scaling of the algorithm or for<br />

both reasons. With the purpose of improving both, a newly developed scheme is presented in this<br />

section, in which an energy minimization replaces the standard diagonalization in the SCF<br />

optimization.<br />

When the RH energy model is minimized, the density subspace information used with great success<br />

in TRDSM is ignored. The novel idea is thus to exploit the valuable information saved in the<br />

density subspace of the previous densities to construct an improved RH energy model and minimize<br />

this model instead of the RH model. This makes the TRDSM step redundant since a density<br />

subspace minimization now is included in the RH energy model minimization.<br />

The Hessian update methods 40,53 , in which an approximate Hessian is updated in each iteration and<br />

an approximate Newton step is taken, exploit some of the same ideas, but they are all based on<br />

32


Development of SCF Optimization Algorithms<br />

approximate second order energy expansions in the orbital rotation parameters and therefore do not<br />

include the third and higher order terms included in the RH energy.<br />

In the following subsections the improved RH energy model and its minimization will be described.<br />

The SCF convergence of a test case is then displayed, in which the new energy minimization<br />

approach is compared to standard DIIS and the TRSCF schemes. As the scheme has not yet been<br />

extended to DFT, this section will only consider HF theory and calculations.<br />

1.4.3.1 The Augmented RH Energy model<br />

If the Hartree-Fock energy, Eq. (1.1), is expanded through second order around some reference<br />

density D 0<br />

E ( D) = E ( D ) + 2TrF( D )( D− D ) + Tr( D−D ) G( D−D ) , (1.75)<br />

HF HF 0 0 0 0 0<br />

the first two terms are recognized as E RH (D) from Eq. (1.22) plus the terms of zeroth order E HF (D 0 )<br />

and - E RH (D 0 )<br />

( ) ( ) ( )<br />

RH<br />

RH<br />

E ( D) = E ( D) + E ( D ) − E ( D ) + Tr D−D G D−D . (1.76)<br />

HF HF 0 0 0 0<br />

In a standard RH step, the energy function to minimize is the RH energy, neglecting the last term<br />

which contains the Hessian information, because it is too expensive to evaluate. Since Hessian<br />

information is very valuable to an optimization, the scheme presented in this section will replace the<br />

diagonalization in the RH step by an energy minimization of an augmented RH (ARH) energy<br />

model, where as much Hessian information as possible is included without directly evaluating new<br />

Fock matrices. This is done by exploiting the information contained in the density and Fock<br />

matrices of the previous iterations.<br />

As previously exploited, a density or density difference, in this case ∆ = D – D 0 , can be split in a<br />

part that can be described in the subspace of the n + 1 previous densities ∆ and an unknown part<br />

orthogonal to the space<br />

⊥<br />

∆<br />

∆ is expanded in the previous densities D i as<br />

D− D = ∆ = ∆ + ∆<br />

0<br />

n<br />

i=<br />

0<br />

⊥<br />

. (1.77)<br />

<br />

∆ = ∑ωiDi<br />

, (1.78)<br />

where n is the number of previously stored densities and the expansion coefficients ω i are<br />

determined in a least-squares manner<br />

ω<br />

n<br />

i =<br />

−1<br />

⎡⎣<br />

⎤⎦<br />

Tr<br />

ij<br />

j=<br />

0<br />

j , Mij = Tr i j<br />

∑ M D S∆S D SD S . (1.79)<br />

33


Part 1<br />

Improving Self-consistent Field Convergence<br />

⊥ ⊥<br />

Inserting Eq. (1.77) in the last term of Eq. (1.76) and neglecting the term Tr ∆ G ( ∆ ) , the<br />

augmented Roothaan-Hall energy model can be written as<br />

( ) ( ) ( )<br />

ARH ( ) RH ( ) ( ) RH<br />

<br />

E D = E D + EHF D0 − E ( D0 ) + Tr 2∆−∆ G ∆ , (1.80)<br />

where G ( ∆ ) is evaluated as a linear combination of previous Fock matrices<br />

n<br />

<br />

( ) ∑ωi ( i ) ∑ωi ( i )<br />

G ∆ = G D = ( F D − h ). (1.81)<br />

i= 1 i=<br />

1<br />

The energy model E ARH has no intrinsic restrictions with respect to how different the densities<br />

spanning the subspace are allowed to be, and this is one of the benefits compared to the TRSCF<br />

scheme. For the TRDSM energy model, the purification implicit in the DSM energy makes no sense<br />

if the densities are too different, in particular if they have different electron configurations. In ARH,<br />

configuration shifts can be handled without problems, and whereas old, obsolete densities pollute<br />

the DSM energy model, they simply disappear from the ARH energy model, since their weights ω i<br />

diminish.<br />

We expect a faster convergence rate for ARH compared to TRSCF, mainly because the RH and<br />

DSM steps are merged to an energy model with correct gradient (not just in the subspace) and an<br />

approximate Hessian, which is improved in each iteration using the information from the previous<br />

density and Fock matrices.<br />

1.4.3.2 The Augmented RH Optimization<br />

The density for which the ARH energy model should be optimized can be expanded in the antisymmetric<br />

matrix X<br />

n<br />

D ( X () () () ()<br />

) = exp 1<br />

( − XS ) D i 0 exp ( SX ) = D i ⎡ i 0<br />

+<br />

0 , ⎤ + ⎡⎡ i<br />

2 0<br />

, ⎤ , ⎤<br />

⎣ D X ⎦ ⎣ ⎦<br />

+<br />

⎣<br />

D X X ⎦<br />

, (1.82)<br />

() i<br />

S S S<br />

where D<br />

0<br />

is the reference density from which the step X is taken. Optimizing the ARH energy is<br />

thus a nonlinear problem and an iterative scheme should be applied.<br />

A Newton-Raphson (NR) optimization of the ARH energy is therefore carried out, and the steps are<br />

ARH<br />

found minimizing a second order approximation of the ARH energy E<br />

(2)<br />

by the preconditioned<br />

conjugate gradient (PCG) method. The second order approximation of the ARH energy, where the<br />

constant terms are excluded, can be written as<br />

34


Development of SCF Optimization Algorithms<br />

E<br />

where<br />

() i<br />

() i<br />

( X)<br />

= 2Tr F0 ⎡<br />

0<br />

, ⎤ + Tr ⎡<br />

0<br />

⎡<br />

0<br />

, ⎤ , ⎤<br />

⎣<br />

D X<br />

⎦<br />

F<br />

⎣⎣ D X<br />

⎦<br />

X<br />

⎦<br />

ARH<br />

(2) S S S<br />

() i<br />

(1) (2)<br />

( D0<br />

D0<br />

) ∑( ωi<br />

ωi<br />

) G( Di<br />

)<br />

+ 2Tr − +<br />

i, j=<br />

1<br />

n<br />

i=<br />

1<br />

n<br />

n<br />

() i<br />

(0) (1) () (0)<br />

0 S ⎣ 0 S ⎦<br />

i= 1<br />

S<br />

i=<br />

1<br />

i<br />

∑( ωi ωi ) ( i ) ⎡<br />

⎤ ∑ωi<br />

( i )<br />

+ 2Tr ⎡ , ⎤ Tr ⎡ , ⎤<br />

⎣<br />

D X<br />

⎦<br />

+ G D +<br />

⎣<br />

D X<br />

⎦<br />

, X G D<br />

n<br />

∑<br />

( ) ⎤DG i ( Dj<br />

)<br />

(0) (1) (2) (1) (1)<br />

j i i i j<br />

− Tr ⎡<br />

⎣<br />

2 ω ω + ω + ω ω<br />

⎦<br />

,<br />

(1.83)<br />

ω<br />

ω<br />

ω<br />

n<br />

(0) −1<br />

( )<br />

i = ∑ ⎡⎣ ⎤⎦<br />

Tr<br />

ij<br />

j=<br />

1<br />

i<br />

( j 0 )<br />

M D SD S<br />

i<br />

( j<br />

⎡ ⎤ )<br />

n<br />

(1) −1<br />

( )<br />

i = ∑ ⎡⎣ ⎤⎦ Tr<br />

0<br />

,<br />

ij ⎣ ⎦S<br />

j=<br />

1<br />

M D S D X S<br />

( ⎡<br />

i<br />

j<br />

⎡ ⎤ ⎤<br />

0<br />

)<br />

n<br />

(2) 1 −1<br />

( )<br />

i =<br />

2 ∑ ⎡⎣ ⎤⎦ij<br />

⎣⎣ ⎦S ⎦<br />

j=<br />

1<br />

S<br />

M Tr D S D , X , X S .<br />

(1.84)<br />

If the summations are put in the most favorable way, the number of matrix multiplications is limited<br />

and independent of subspace size. Only the update of the metric M takes a number of matrix<br />

multiplications linearly in the subspace size.<br />

ARH<br />

∂E (2)<br />

∂X<br />

From the derivative , the problem to be solved by PCG is set up for the current reference<br />

() i<br />

density D<br />

0<br />

where i denotes the Newton-Raphson step number. Through the whole NR<br />

optimization D 0 and F 0 are the density and Fock matrices from the previous SCF iteration. The NR<br />

step X found by PCG is used to evaluate a new density from Eq. (1.82) and if the new density is<br />

similar to the previous one, the Newton-Raphson optimization has converged, if not, the density is<br />

() i<br />

used as reference density D in the next step.<br />

0<br />

The final density matrix resulting from the NR optimization is then used to evaluate a new Fock<br />

matrix, and so the SCF iterative procedure is established. The SCF scheme for the described<br />

algorithm is illustrated in Fig. 1.24.<br />

35


Part 1<br />

Improving Self-consistent Field Convergence<br />

( 0 )<br />

D 0<br />

( 0 )<br />

( )<br />

F D n<br />

ARH<br />

min E(2) ( X ) ( i<br />

D )<br />

n<br />

by PCG<br />

( i 1<br />

D<br />

) ( X)<br />

n +<br />

i = i + 1<br />

n = n + 1<br />

no<br />

( i+<br />

1) ( i)<br />

no<br />

n ≈ Dn<br />

yes<br />

( 0 ) ( i+<br />

1)<br />

n+ 1 = D n<br />

D ( 0 ) ( 0 )<br />

n+ 1 ≈ D n<br />

D<br />

D<br />

yes<br />

D conv<br />

1.4.3.3 Applications<br />

Fig. 1.24 Flow diagram of the SCF optimization with<br />

the diagonalization of the Fock matrix replaced by a<br />

minimization of the ARH energy. The light blue box<br />

embraces the Newton-Raphson optimization of E ARH .<br />

SCF calculations have been carried out using the ARH scheme. In Fig. 1.25 the convergence of<br />

HF/STO-3G calculations on CrC with 2.00Å bond distance are displayed. Results are given for the<br />

augmented RH scheme, DIIS and TRSCF with the C-shift and d orth -shift schemes, respectively. For<br />

the first iterations in the ARH optimization a limit is put on the ||X|| S norm to avoid changes in the<br />

densities which go beyond the region that is well described by the energy model.<br />

The ARH scheme is clearly superior for this test case, even with the convergence improvements for<br />

TRSCF obtained with the d orth -shift scheme; ARH is almost an iteration in front of ‘TRSCF/d orth -<br />

shift’ in the local region. The standard DIIS approach does not converge at all for this case.<br />

36


The Quality of the Energy Models for HF and DFT<br />

Error in energy / E h<br />

1.E+02<br />

1.E+00<br />

1.E-02<br />

1.E-04<br />

1.E-06<br />

1.E-08<br />

1.E-10<br />

DIIS<br />

TRSCF C-shift std.<br />

TRSCF dnew<br />

orth -shift<br />

ARH<br />

1 3 5 7 9<br />

Iteration<br />

Fig. 1.25 HF/STO-3G calculations on CrC using<br />

different approaches.<br />

0.8<br />

0.6<br />

0.4<br />

0.2<br />

0.0<br />

1 3 5 7 9<br />

Iteration<br />

Fig. 1.26 Details from the ARH optimization in<br />

Fig. 1.25: The part of the density change which can<br />

be described in the subspace of the previous<br />

densities.<br />

To illustrate how information gradually is obtained from the previous densities in ARH, the part of<br />

the density change ∆D = D n+1 - D n in each iteration that can be described in the previous densities<br />

∆D is found as in Eq. (1.78)-(1.79), and the ratio<br />

<br />

∆D<br />

∆D<br />

is depicted in Fig. 1.26. It is<br />

seen how the description of ∆D improves during the first five iterations until a significant part of the<br />

Hessian is described, then a qualified step is taken to another region, and the new density is<br />

therefore not well described in the previous densities. This step is followed by a significant decrease<br />

in SCF energy of two orders of magnitude. The same pattern is repeated after two additional<br />

iterations.<br />

Even though only preliminary results are given in this section, the ARH energy minimization seems<br />

promising, taking the best of the RH and DSM energy models, and improving the convergence<br />

compared to TRSCF, which already saw better or as good convergence rates as DIIS. It could be<br />

expected that this scheme has the ability to converge in fewest SCF iterations overall. The future<br />

success of ARH is dependent on the development of effective ways of solving the nonlinear<br />

equations in X, e.g. by setting up a good preconditioner.<br />

1.5 The Quality of the Energy Models for HF and DFT<br />

Having considered the theory behind the TRRH and TRDSM steps in Section 1.4.1 and 1.4.2<br />

without being concerned with the approximations introduced in the energy functions, this section<br />

takes a closer look at the errors in the energy models compared to the SCF energy. The SCF<br />

optimization of Hartree-Fock and Kohn-Sham-DFT energies is similar; the only difference lies in<br />

the energy expressions to be optimized. The approximations in the energy models will thus also<br />

differ in HF and DFT, and while Section 1.2 described the HF and DFT theory in a generic manner,<br />

this section will focus on the differences, ignoring the general elements already stated in Section<br />

1.2.<br />

S<br />

S<br />

37


Part 1<br />

Improving Self-consistent Field Convergence<br />

To make the differences in the HF and DFT energy expressions clear, we will now study them<br />

separately:<br />

E<br />

= 2TrhD + Tr DG ( D ) + h , (1.85)<br />

HF HF nuc<br />

E = 2TrhD + Tr DG ( D) + h + E ( D ), (1.86)<br />

DFT DFT nuc XC<br />

where<br />

[ G HF ( D ) ] = 2 gµνρσ Dρσ − gµσρν Dρσ<br />

, (1.87)<br />

µν<br />

∑<br />

ρσ<br />

∑<br />

ρσ<br />

[ G DFT ( D )]<br />

= 2 gµνρσ Dρσ −γ gµσρν Dρσ<br />

. (1.88)<br />

µν<br />

∑<br />

ρσ<br />

∑<br />

ρσ<br />

The second term in Eq. (1.87) and Eq. (1.88) is the contribution from exact exchange, with γ = 0 in<br />

pure DFT (LDA), and γ ≠ 0 in hybrid DFT. The exchange-correlation energy E XC (D) in Eq. (1.86) is<br />

a functional of the electronic density. In the local-density approximation (LDA), the exchangecorrelation<br />

energy is local in the density, whereas in the generalized gradient approximation (GGA),<br />

it is also local in the squared density gradient, and may thus be expressed as<br />

EXC ( D) = ∫ f ( ρ( x), ζ( x)<br />

) dx. (1.89)<br />

Here the electron density ρ(x) and its squared gradient norm ζ(x) are given by<br />

T<br />

ρ( x) = χ ( xDχ ) ( x),<br />

ζ( x) =∇ρ( x) ⋅∇ρ( x),<br />

(1.90)<br />

where χ(x) is a column vector containing the AOs. Note that the exchange-correlation energy<br />

density f(ρ(x), ζ(x)) in Eq. (1.89) is a nonlinear (and non-quadratic) function of ρ(x) and ζ(x). In the<br />

following is relied on an expansion of E XC (D) around some reference density matrix D 0<br />

E<br />

T<br />

T<br />

XC XC 0 0 XC 2 0 XC 0<br />

(1) (2)<br />

( D) = E ( D ) +<br />

1<br />

( D− D ) E + ( D−D ) E ( D− D ) + , (1.91)<br />

( n)<br />

where the derivatives E<br />

XC<br />

have been evaluated at D = D 0 and where for convenience a vectormatrix<br />

notation for D, E<br />

(1)<br />

XC<br />

, and E (2)<br />

XC<br />

is used. The precise form of E XC depends on the DFT<br />

functional chosen for the calculation.<br />

It is often more problematic to obtain convergence for DFT than HF, mainly for two reasons: The<br />

HOMO-LUMO gap ∆ε ai is smaller for DFT than for HF, and a determinant with a well separated<br />

occupied and virtual part has better convergence properties than one with a lot of close lying<br />

states 54,55 . Also, since the exchange-correlation is nonlinear and non-quadratic in the density, the<br />

higher order terms in the density not present in Hartree-Fock theory introduces some extra<br />

approximations to the SCF scheme for DFT. In this section these differences and their consequences<br />

for the convergence properties will be discussed for the TRSCF algorithm. It is here assumed that if<br />

the energy models employed in TRSCF were of the same quality for HF and DFT, that is, had errors<br />

38


The Quality of the Energy Models for HF and DFT<br />

of the same order compared to the true SCF energy, then the convergence properties would also be<br />

of the same quality.<br />

The study is mainly performed in the MO basis with a block diagonal Fock matrix as in Eq. (1.10)<br />

and the reference density matrix<br />

MO<br />

D<br />

0<br />

2δ<br />

ij<br />

MO ⎛ 0 ⎞<br />

D0<br />

= ⎜ ⎟<br />

⎝ 0 0 ⎠<br />

. (1.92)<br />

It is also exploited that any valid density matrix D may be expressed in terms of a valid reference<br />

density matrix D 0 as<br />

MO<br />

MO<br />

D ( K)<br />

= exp( −K) D exp( K ) , (1.93)<br />

and can thus be expanded in orders of K through the BCH-expansion 46<br />

MO MO MO 1 MO 3<br />

=<br />

0<br />

+ ⎡<br />

0<br />

⎤ + ⎡⎡<br />

2 0<br />

⎤ ⎤ +<br />

0<br />

D ( K) D ⎣D , K⎦ ⎣⎣D , K⎦, K⎦<br />

O ( K ). (1.94)<br />

The anti-symmetric rotation matrix may be written in the form<br />

⎛ 0 −κ<br />

⎞<br />

K = ⎜ ⎟ , (1.95)<br />

⎝κ 0 ⎠<br />

where κ holds the orbital rotation parameters. The diagonal block matrices representing rotations<br />

among the occupied MOs and among the virtual MOs are zero since the density matrix in Eq. (1.8)<br />

is invariant to such rotations.<br />

In the following subsections the RH energy model Eq. (1.22) and the DSM energy model Eq. (1.55)<br />

are analyzed separately with respect to differences for HF and DFT.<br />

1.5.1 The Quality of the TRRH Energy Model<br />

To compare the RH energy model to the SCF energy, both are expanded about a reference density<br />

matrix D 0 (neglecting the possible difference between F 0 and F(D 0 ) noted in Section 1.4)<br />

E<br />

T<br />

RH RH<br />

E ( D) = E ( D0) + 2Tr F( D0)<br />

( D−D 0 ), (1.96)<br />

( D) = E ( D ) + 2TrF( D )( D− D ) + Tr( D−D ) G( D−D<br />

)<br />

(1)<br />

+ E ( D) − E ( D ) −Tr ( D−D ) E ( D ),<br />

(1.97)<br />

SCF SCF 0 0 0 0 0<br />

XC XC 0 0 XC 0<br />

where the last three terms of Eq. (1.97) only are present in DFT theory. These expansions have the<br />

same first-order term 2TrF(D 0 )(D - D 0 ) and thus the same first derivative with respect to the orbital<br />

rotation parameters κ ai of Eq. (1.95)<br />

RH<br />

(1) ∂E<br />

( κ )<br />

⎡ ⎤<br />

⎣<br />

E<br />

RH ⎦<br />

= = −4F<br />

ai , (1.98)<br />

ai ∂κ<br />

ai<br />

κ=<br />

0<br />

39


Part 1<br />

Improving Self-consistent Field Convergence<br />

(1) ∂ESCF<br />

( κ )<br />

⎡ ⎤<br />

⎣<br />

E<br />

SCF ⎦<br />

= = −4F<br />

ai . (1.99)<br />

ai ∂κ ai κ=<br />

0<br />

The expressions are found replacing D in Eqs. (1.96) and (1.97) with D MO in Eq. (1.94) and<br />

differentiating with respect to κ ai .<br />

All higher order terms in κ arising from 2TrF(D 0 )(D - D 0 ) are consequently also shared for the SCF<br />

and RH energies whereas terms of second and higher order arising from the last term(s) in Eq. 1.94<br />

are neglected in the RH energy model. To study the differences, the second order derivatives in κ<br />

are found in the same way as the first derivatives<br />

2 RH<br />

(2) ∂ E ( κ)<br />

⎡ ⎤<br />

⎣<br />

E<br />

RH ⎦<br />

= = 4δ ij δ ab ( ε a −ε<br />

i )<br />

(1.100)<br />

aibj ∂κ<br />

∂κ<br />

2<br />

ai<br />

bj<br />

κ=<br />

0<br />

(2) ∂ ESCF<br />

( κ)<br />

⎡ ⎤<br />

⎣<br />

E<br />

SCF ⎦<br />

= = 4δδ ij ab ( ε a − ε i ) + W aibj , (1.101)<br />

aibj ∂κ<br />

∂κ<br />

ai<br />

bj<br />

κ=<br />

0<br />

where<br />

HF<br />

16 4( )<br />

W = g − g + g<br />

(1.102)<br />

aibj<br />

aibj abij ajib<br />

( )<br />

DFT<br />

Waibj<br />

= 16gaibj − 4 γ gabij + gajib<br />

+ ⎡ ( ) ⎤ ⎣<br />

E κ ⎦<br />

. (1.103)<br />

(2)<br />

XC<br />

aibj<br />

(2)<br />

E XC ( κ ) is the second derivative of the term E XC expanded in the orbital rotation parameters κ. The<br />

error in the RH energy model can then be said to depend partly on the size of W and partly on the<br />

size of the third and higher order contributions from the nonlinear terms in Eq. (1.97) which are not<br />

included in Eq. (1.96). This general consideration goes for DFT as well as HF, but with different<br />

impact. As seen in Eq. (1.102) and (1.103), the definition of W differs in the two approaches and<br />

even differs depending on which DFT functional is chosen. Furthermore, since the size of the<br />

HOMO-LUMO gap ∆ε ai = ε a - ε i is typically smaller in DFT, the term 4δ ij δ ab (ε a – ε i ) will have<br />

different weights in Eq. (1.101) depending on the method. Also the size of the third and higher<br />

order contributions in Eq. (1.97) would be expected to differ for HF and DFT, since for DFT both<br />

the terms Tr(D - D 0 )G(D - D 0 ) and E XC (D) contribute whereas HF only contains the Tr(D - D 0 )G(D<br />

- D 0 ) term. In the beginning of the optimization, where large steps are taken, the size of the third<br />

and higher order contributions is the potential source of error. Near convergence this should be less<br />

of an issue, and in this region the size of the lowest Hessian eigenvalues should be the decisive error<br />

source.<br />

HF and LDA calculations have been carried out and the part of the SCF energy change arising from<br />

RH<br />

the RH step ∆ E SCF<br />

has been found as well as the change in the RH energy model ∆E RH in each<br />

iteration.<br />

40


The Quality of the Energy Models for HF and DFT<br />

4.0<br />

2.0<br />

0.0<br />

-2.0<br />

HF<br />

LDA<br />

0 5 10 15 20<br />

Iteration<br />

Fig. 1.27 Calculations on the cadmium complex in<br />

Fig. 1.6 in the STO-3G basis set.<br />

3.0<br />

2.0<br />

1.0<br />

0.0<br />

-1.0<br />

-2.0<br />

HF<br />

LDA<br />

0 5 10 15 20 25<br />

Iteration<br />

Fig. 1.28 Calculations on the zinc complex in Fig.<br />

1.3 in the 6-31G basis set.<br />

The change in the RH energy model is found as<br />

idem<br />

( n )<br />

RH<br />

E 2Tr + 1 0<br />

∆ = F D −D , (1.104)<br />

idem<br />

where D<br />

0<br />

is the reference density matrix, typically a D from the previous TRDSM step purified<br />

as in Eqs. (1.32)-(1.33), and D n+1 is the new density found from diagonalization of the Fock matrix.<br />

In the C-shift scheme the criterion Eq. (1.31) ensures that the occupied and virtual orbitals do not<br />

mix, and thus the Hessian, Eq. (1.100), is positive and the RH energy decreases. The SCF energy<br />

change is found as<br />

RH<br />

idem<br />

SCF SCF n+<br />

1 SCF 0<br />

∆ E = E ( D ) − E ( D ). (1.105)<br />

The ratio between Eq. (1.104) and Eq. (1.105) contains information of the quality of the RH energy<br />

model. If the errors are negligible, the ratio is close to 1. If the ratio is larger than one, the RH<br />

energy model exaggerates the energy decrease, and if it is between 0 and 1 it underestimates the<br />

energy decrease. If it is negative, the SCF energy increases even though the RH energy model<br />

predicts an energy decrease.<br />

RH RH<br />

For two test cases the ∆E ∆ E SCF<br />

ratio is displayed in Fig. 1.27 and Fig. 1.28, respectively. It is<br />

clearly seen that generally, the RH energy model is better for HF than for DFT, in particular,<br />

negative values are seen for the LDA ratios. The errors in the RH energy model for the LDA<br />

calculations get worse as convergence is approached, so it would be expected that the significant<br />

source of error is the neglected term W in the Hessian rather than the higher order terms. Since<br />

locally the lowest Hessian eigenvalue should be the one controlling the optimization, this theory is<br />

inspected evaluating the lowest Hessian eigenvalue for both the RH energy model and for SCF<br />

according to Eq. (1.100) and Eq. (1.101), respectively, at convergence of the two test cases. The<br />

results are compared in Table 1-4.<br />

41


Part 1<br />

Improving Self-consistent Field Convergence<br />

Table 1-4 The lowest Hessian eigenvalues for the RH energy<br />

model and SCF energy at convergence of the calculations in Fig.<br />

1.27 and Fig. 1.28. The deviation is found as<br />

( ⎡ (2) ⎤ ⎡ (2) ⎤ )<br />

(2)<br />

RH SCF<br />

100% ⎡ ⎤<br />

⎣<br />

E<br />

⎦<br />

−<br />

⎣<br />

E<br />

⎦<br />

⋅<br />

⎣<br />

E<br />

SCF ⎦<br />

.<br />

(2)<br />

SCF<br />

(2)<br />

RH<br />

min min min<br />

cadmium complex zinc complex<br />

HF LDA HF LDA<br />

⎡<br />

⎣<br />

E ⎤<br />

⎦ min<br />

0.557 0.017 1.000 0.290<br />

⎡ ⎤<br />

⎣<br />

E<br />

⎦ min<br />

1.112 0.014 1.621 0.281<br />

Deviation 100% -21% 62% -2%<br />

As expected, the lowest Hessian eigenvalue for the RH energy model, that is the HOMO-LUMO<br />

gap, is much smaller for LDA than for HF, but surprisingly it is seen that the Hessian prediction in<br />

the RH energy model for LDA is much better than the one for HF. Of course this is only the lowest<br />

eigenvalue, and we have not studied the corresponding eigenvector. We know for sure that the size<br />

of the orbital rotation parameters κ ai decreases during the optimization and should be very small at<br />

convergence, where only small adjustments to the density are made. It is thus difficult to imagine<br />

that terms of third and higher order in κ should be the reason for the larger errors in the DSM<br />

energy model for LDA compared to HF.<br />

This is a matter we will investigate further in the future since it is not understood at the moment.<br />

The importance of the higher order terms should be examined directly to understand how they affect<br />

the errors, and the Hessian should be studied more carefully introducing information about the<br />

direction of the eigenvalues. However, it can still be concluded from Fig. 1.27 and Fig. 1.28 that the<br />

RH energy model is poorer for LDA than for HF optimizations.<br />

1.5.2 The Quality of the TRDSM Energy Model<br />

The TRDSM energy model of Section 1.4.2.2 is formulated in a general manner and is as applicable<br />

to DFT theory as to HF theory. Still, the model will be poorer for DFT than for HF because of the<br />

general exchange-correlation term appearing in the DFT energy.<br />

For the DSM energy model there are in general four possible sources of errors:<br />

1. The purified density D still has an idempotency error.<br />

2. The term<br />

1 T [2]<br />

2 δ 0 δ<br />

D E D in E( D ) , Eq. (1.50), is neglected.<br />

3. E( D ) , Eq. (1.50), is truncated after second order.<br />

4.<br />

( 2 )<br />

0 +<br />

E D in Eq. (1.50) is approximated by 2 F + .<br />

42


The Quality of the Energy Models for HF and DFT<br />

Let us take a closer look at the errors one by one. In ref. 39 a general order analysis of the purified<br />

density D used in the parameterization of the DSM energy is given, and the results are summarized<br />

in Table 1-5.<br />

Table 1-5. Comparison of the properties of the unpurified density D and the purified<br />

density D . c is the density expansion coefficients and κ is the orbital rotation parameters<br />

that change D 0 to another density in the subspace D i .<br />

D<br />

Differences D+ = D− D0 = ( c κ )<br />

O<br />

2<br />

Dδ = D<br />

− D = O ( c κ )<br />

Idempotency error<br />

2<br />

4<br />

DSD − D = O ( c κ ) DSD − D<br />

= O ( c 2 κ )<br />

Trace error Tr DS − N / 2 = 0<br />

2 4<br />

Tr DS − N / 2 = O ( c κ )<br />

In the D column, the order of the idempotency correction D δ and the idempotency error for D are<br />

found. These are the same for DFT and HF; the idempotency error is of order c 2 ||κ|| 4 , and since D δ<br />

is of the order c||κ|| 2 , the error connected to the neglect of the term second order in D δ , will be of<br />

order c 2 ||κ|| 4 as well.<br />

The third possible source of errors is the truncation of the energy E( D ) after second order in the<br />

density. Since the Hartree-Fock energy is quadratic in the density, this truncation leads to no errors<br />

for HF, but for DFT there will be an error of order ||D + || 3 and from the first column in Table 1-5 it is<br />

seen that it can be written as an error of order c 3 ||κ|| 3 , since D + is of the order c||κ||. Also since the<br />

(3)<br />

HF energy is quadratic in the density, no third derivative E<br />

0<br />

exists and thus the Taylor expansion<br />

( 2 )<br />

used to find E0 D+ = 2F + is terminated for HF, but for DFT terms of order ||D + || 2 are neglected.<br />

( 2 )<br />

Since E0 D + is multiplied by D + in the energy function Eq. (1.50), this gives an error for DFT of<br />

the order ||D + || 3 or as before c 3 ||κ|| 3 . The sizes of the introduced errors are summarized in Table 1-6.<br />

Table 1-6. Comparison of the errors introduced in the DSM energy model for<br />

HF and DFT respectively.<br />

D <br />

1 Idempotency error DSD − D<br />

2 Neglected term<br />

3 Truncation of ( )<br />

4 Approximation of<br />

( )<br />

error in HF<br />

error in DFT<br />

( 2 4<br />

O c κ )<br />

2 4<br />

O ( c κ )<br />

1 T [2]<br />

D<br />

2 δ<br />

E0<br />

D<br />

2 4<br />

2 4<br />

δ O ( c κ ) O ( c κ )<br />

E D 0 3 3<br />

O ( c κ )<br />

2<br />

E0 D +<br />

0 3 3<br />

O ( c κ )<br />

Depending on the sizes of c and ||κ|| respectively, the error for DFT will be of same or lower order<br />

than the one for HF. To inspect whether or not the DSM energy is a poorer model for DFT than for<br />

HF, a number of calculations have been carried out, and the sizes of ||D δ || and ||D + || for the DSM<br />

step in each iteration are examined. Since D δ is of the order c||κ|| 2 and D + is of the order c||κ||, the<br />

43


Part 1<br />

Improving Self-consistent Field Convergence<br />

size of ||D δ || 2 and ||D + || 3 will indicate whether the error in the energy model is controlled by the<br />

( c 2 4<br />

3 3<br />

O κ ) or the ( c κ )<br />

O error. The test cases showed similar behavior and results from HF<br />

and LDA calculations on the cadmium complex in Fig. 1.6 with a STO-3G basis and a H1-core start<br />

guess are displayed in Fig. 1.29 and Fig. 1.30.<br />

1.E+00<br />

1.E-02<br />

1.E-04<br />

1.E-06<br />

1.E-08<br />

1.E-10<br />

4<br />

||D+||^4D + S<br />

||D+||^3<br />

3<br />

D + S<br />

2||Ddelta||^2 2<br />

2 Dδ<br />

S<br />

dEDSM<br />

E − E<br />

HF<br />

DSM<br />

2 5 8 11 14 17 20<br />

Iteration<br />

1.E+01<br />

1.E-01<br />

1.E-03<br />

1.E-05<br />

1.E-07<br />

1.E-09<br />

1.E-11<br />

4<br />

||D+||^4D + S<br />

3<br />

||D+||^3D + S<br />

||Ddelta||^2 2<br />

D δ S<br />

dEDSM<br />

E − E<br />

LDA<br />

DSM<br />

2 5 8 11 14 17 20 23<br />

Iteration<br />

Fig. 1.29 HF/STO-3G calculation. The size of<br />

different density norms compared to the actual<br />

error in the DSM energy model.<br />

Fig. 1.30 LDA/STO-3G calculation. The size of<br />

different density norms compared to the actual<br />

error in the DSM energy model.<br />

DSM<br />

The SCF energy at the end of a DSM step ESCF<br />

is found by purifying the resulting D by Eq. (1.32)<br />

–(1.33) and evaluating the SCF energy, Eq. (1.1), for this density. The DSM energy, Eq. (1.55), is<br />

DSM DSM<br />

also evaluated and the error of the DSM energy model is then found as the size ESCF<br />

− E .<br />

For the HF calculation this error is expected to be of the size ||D δ || 2 , and it is seen in Fig. 1.29 that<br />

this is actually the case; if ||D δ || 2 is multiplied by 2, there is a remarkable fit. Also it is seen that if<br />

the error in the DSM energy for HF should be expressed in the density differences D + , it would be<br />

the density differences to the third rather than the fourth order. For the DFT calculation the<br />

interesting point was to see whether or not ||D + || 3 is the controlling error. In Fig. 1.30 is seen that<br />

even though there is not an obvious fit as for HF, ||D δ || 2 seems to be the dominant error here as well.<br />

Still, if the error should be expressed in the density differences D + , it would be the density<br />

differences to the third rather than the fourth order as expected for DFT.<br />

In conclusion it seems that the dominating error in the DSM energy both for HF and DFT is ||D δ || 2 ,<br />

that is, the idempotency correction squared. In comparison it should be mentioned that the EDIIS<br />

model 37 by Kudin, Scuseria, and Cancès corresponds to E( D ) in Eq. (1.55) and thus has an error of<br />

the order ||D δ || compared to the SCF energy.<br />

1.6 Convergence for Problems with Several Stationary Points<br />

The HF equation is a nonlinear equation and, therefore, it presents in principle several solutions.<br />

Several minima might exist, and even though it is typically preferred to find the global minimum,<br />

44


Convergence for Problems with Several Stationary Points<br />

no optimization method can make that a guarantee. Furthermore, it cannot be tested if the minimum<br />

found is a local or the global minimum without knowledge of the whole surface. Depending on the<br />

start guess and the optimization approach, an optimization can converge to different stationary<br />

points. Further, it is necessary to decide in which subspace of orbital rotations the desired solution<br />

should be found, since a solution representing a stable stationary point in one subspace is not<br />

necessarily stable in another.<br />

Orbital rotations can be divided in real and complex rotations and each of those can be further<br />

divided in singlet and triplet rotations. Each of those can then again be divided in rotations within<br />

the different point group symmetries. Generally, we do not consider the complex rotations, and we<br />

only optimize in the real space. Further, when optimizing a closed shell wave function, only the<br />

total-symmetric part of the singlet rotations is considered. A stationary point in the subspace of real,<br />

total-symmetric, singlet rotations can be shown through elementary arguments to be a stationary<br />

point for all types of rotations. However, a stationary point can both be a maximum, a saddle point<br />

or a minimum. A way to realize if the stationary point also is a minimum is to evaluate the Hessian<br />

eigenvalues. This is done within the subspace in which the solution should be stable. If a negative<br />

Hessian eigenvalue is found in the subspace of singlet rotations, the stationary point is said to have<br />

a singlet instability and if a negative Hessian eigenvalue is found in the subspace of triplet rotations,<br />

it is said to have a triplet instability 54,56 . Triplet instabilities are connected to breaking the symmetry<br />

between α and β orbitals. If a triplet instability is found, a minimum with a lower energy than the<br />

current stationary point can be found, if the α and β parts are allowed to differ, typically leading to<br />

2<br />

a solution which is not an eigenfunction of Ŝ . Hence, the lower minimum could be found by an<br />

unrestricted HF (UHF) optimization. A singlet instability found in the total-symmetric subspace<br />

indicates that the current stationary point is a saddle point and a minimum with lower energy exists<br />

within the subspace. If a singlet instability is found outside the total-symmetric subspace, orbitals of<br />

different symmetries should be mixed to decrease the energy further, changing the symmetry of the<br />

resulting wave function.<br />

The aufbau ordering rule assumes that occupying the orbitals of lowest energy also leads to the<br />

lowest Hartree-Fock energy. This cannot be proven to always apply for restricted HF as it can for<br />

UHF 57 . Thus it is a risk when the aufbau ordering is forced upon an optimization, that a lower<br />

energy with the aufbau ordering broken could exist. However in a study by Dardenne et. al. 58 , in<br />

which different ordering schemes were tested, they found in all cases that the minimum was an<br />

aufbau solution. The aufbau ordering was broken only for saddle points. In our schemes we always<br />

apply the aufbau ordering rule, but if the RH step is level shifted to the end of the optimization, it<br />

can force the convergence to a non-aufbau solution.<br />

45


Part 1<br />

Improving Self-consistent Field Convergence<br />

1.6.1 Walking Away from Unstable Stationary Points<br />

As concluded in the previous section, the Hessian eigenvalues should be tested to make sure the<br />

optimized state is stable. This is expensive, so it is only done when it is expected that the problem<br />

has several stationary points. Depending on the desired solution, only the relevant part of the<br />

Hessian is checked. So far we have only considered singlet instabilities, but currently tests for triplet<br />

instabilities are implemented as well.<br />

The check for singlet instabilities is made on the converged wave function, finding the lowest<br />

Hessian eigenvalue of the Hessian in the real, singlet subspace. If the lowest Hessian eigenvalue<br />

turns out to be positive, we are sure to have a solution which is stable with respect to singlet<br />

rotations, but if it is negative we are in a saddle point, and a minimum with a lower energy exists<br />

within the subspace. We have in our SCF program implemented the possibility to test the singlet<br />

Hessian and in case of a negative lowest Hessian eigenvalue follow the corresponding direction<br />

downhill and away from the saddle point. The scheme and some examples of its use will be<br />

described in the following.<br />

1.6.1.1 Theory<br />

When the SCF optimization has converged, the set of optimized orbitals described by their<br />

expansion coefficients C opt are used to evaluate the lowest Hessian eigenvalues and the<br />

corresponding eigenvectors by an iterative subspace method. If the lowest Hessian eigenvalue ε min is<br />

found positive, then it is clear that the optimization has converged to a minimum. If on the other<br />

hand the eigenvalue is negative, we know for sure that a lower stationary point exists.<br />

We would then like to take a step downhill in the direction x corresponding to the negative<br />

eigenvalue ε min<br />

( 2 )<br />

SCF<br />

E x = εminx. (1.106)<br />

This can be accomplished making a unitary transformation of the optimized expansion coefficients<br />

C opt with x as the orbital rotation parameters to define the direction X dir of the step<br />

X<br />

dir<br />

T<br />

ai<br />

⎡ 0 −x<br />

⎤<br />

= ⎢ ⎥ . (1.107)<br />

⎣ xai<br />

0 ⎦<br />

The step length is controlled by a parameter α<br />

Uα<br />

= exp ( −α<br />

X dir )<br />

(1.108)<br />

C′ ( α ) = C U . (1.109)<br />

opt opt α<br />

A line search is then carried out for α > 0 to find the lowest SCF energy in the direction X dir . This is<br />

of course expensive since every point in the line search requires an evaluation of the Fock matrix<br />

46


Convergence for Problems with Several Stationary Points<br />

with respect to the new coefficients C opt ′ . When the SCF energy minimum in the direction X dir is<br />

found, the corresponding coefficients should be the initial orbitals for a new SCF optimization,<br />

hopefully now optimizing further downhill to a minimum. In problematic cases, e.g. with a very flat<br />

saddle point close to the minimum, we have found it convenient to continue the optimization with<br />

the line search scheme TRSCF-LS (the combination of TRRH-LS and TRDSM-LS described in<br />

Sections 1.4.1.4 and 1.4.2.4) to ensure a continued decrease in the energy.<br />

1.6.1.2 Examples<br />

In Fig. 1.31 and Fig. 1.32 two examples of problems with several stationary points are given.<br />

Error in energy / E h<br />

1.E+02<br />

1.E+00<br />

1.E-02<br />

1.E-04<br />

1.E-06<br />

1.E-08<br />

TRSCF<br />

d orth -shift<br />

TRSCF C-shift<br />

Line search<br />

0 20 40 60<br />

Iteration<br />

Fig. 1.31 HF calculations on the rhodium complex.<br />

Error in energy / E h<br />

1.E+02<br />

1.E+00<br />

1.E-02<br />

1.E-04<br />

1.E-06<br />

1.E-08<br />

(1)<br />

Line search<br />

(2)<br />

(3)<br />

TRSCF<br />

0 40 80 120<br />

Iteration<br />

Fig. 1.32 HF/STO-3G calculation on CrC.<br />

The first example is a HF optimization on the rhodium complex seen in Fig. 1.33 in the<br />

AhlrichsVDZ basis 59 combined with STO-3G on rhodium. For this example DIIS diverges, but the<br />

TRSCF scheme with C-shift converges nicely in 38 iterations. However, when the Hessian is<br />

inspected it is found that the lowest eigenvalue is negative, and a search in α is carried out in the<br />

direction corresponding to the negative eigenvalue. This is<br />

illustrated with the orange line in the picture. Since each<br />

evaluation of a step-length α necessitates an evaluation of the<br />

Rh Cl<br />

Fock matrix, it is fair to display each line search step as an<br />

iteration on the SCF iteration scale. When a minimum is found in<br />

this direction, the corresponding orbitals are used as a start guess<br />

for a new TRSCF optimization, and it is seen that it now Fig. 1.33 Rhodium complex.<br />

converges nicely to a new and lower stationary point which is<br />

found to be a minimum. When the d orth -shift scheme is applied in the TRRH steps instead of the C-<br />

shift scheme, it turns out that convergence to the minimum is obtained with no problems, as seen<br />

from Fig. 1.31, illustrating how the stationary point found from an SCF optimization not only<br />

depends on the start guess, but also on the optimization procedure.<br />

47


Part 1<br />

Improving Self-consistent Field Convergence<br />

The second example is a HF/STO-3G optimization of CrC with a bond distance on 2.00Å. The<br />

example is also used in Fig. 1.13 and Fig. 1.25, but without discussing the stability of the converged<br />

state. Also in this case DIIS diverges whereas TRSCF converges nicely in 12-13 iterations to a<br />

stationary point which is found to have singlet instabilities. As for the first example, a line search is<br />

carried out in the downhill direction and a new TRSCF optimization is started from the resulting<br />

orbitals. This time the second optimization has more problems than was the case for the rhodium<br />

example, but finally it converges to a minimum. Whereas in the rhodium case, only one plateau<br />

corresponding to the saddle point could be seen, in this case three plateaus can be found, marked by<br />

numbers on the figure. The first is the saddle point that TRSCF converges to, at E SCF =<br />

− 1068.77014939 and with a lowest Hessian eigenvalue of -0.624. The second and third stationary<br />

points are recognized as saddle points by TRSCF itself and it manages to move away. If a DIIS<br />

optimization is carried out with a Hückel start guess, it converges to the second stationary point,<br />

which has E SCF = -1069.21761813 and a lowest Hessian eigenvalue of -0.038, again demonstrating<br />

that depending on the optimization procedure and start guess, different stationary points can be<br />

found. It is thus necessary to check the Hessian of the result to know for sure that a minimum is<br />

found, and in this case the final minimum has E SCF = -1069.30090709 and a lowest Hessian<br />

eigenvalue of 0.043. CrC is well known for being a molecule with a complicated electronic energy<br />

surface and has been the object for several theoretical studies 60 .<br />

The scheme testing for singlet instabilities and walking away from unstable stationary points could<br />

be integrated more efficiently in the optimization than is done here. It can be seen from Fig. 1.31<br />

and Fig. 1.32 that the optimizations are completely converged before the Hessian check is made,<br />

spending many iterations improving the unwanted result. The check could be made in an earlier<br />

stage, saving a number of iterations. Also the steps taken in the line search could be optimized such<br />

that fewer steps were necessary to find the minimum. Anyhow, it is convenient to have the<br />

possibility to continue an optimization until a minimum is found.<br />

1.7 Scaling<br />

As mentioned in the introduction, it is now possible to apply ab-initio quantum chemical methods,<br />

in particular HF and DFT, to large molecular systems of interest for biology and nano-science. This<br />

is due to both the developments in integral screening and algorithms for the Fock matrix builder and<br />

to approaches avoiding diagonalization and exploiting sparsity in the matrices. Since the TRSCF<br />

scheme has properties which would be of great advantage for SCF calculations on large and<br />

complex molecules, it is crucial that the scheme can be formulated in a linear or near-linear scaling<br />

manner. We have not been concerned with the build of the Fock matrix, and any state-of-the-art,<br />

linear or near-linear scaling approach could be used as the Fock builder for our scheme. The steps to<br />

48


Scaling<br />

consider are thus the Roothaan-Hall step TRRH, which evaluates a new density matrix, and the<br />

density subspace minimization TRDSM, which improves convergence. In the following subsections<br />

the scaling of these steps will be discussed.<br />

1.7.1 Scaling of TRRH<br />

The TRRH scheme with C-shift described in Section 1.4.1.2 requires the diagonalization of a level<br />

shifted Fock matrix and the knowledge of the occupied molecular orbital coefficients. The<br />

diagonalization scales as well as a matrix multiplication as N 3 , where N is the dimension of the<br />

problem, in this case the number of basis functions. However, a diagonalization is ineffective and<br />

cannot be nearly as well optimized as a matrix multiplication, and thus the scaling factor is much<br />

larger for the diagonalization than for the matrix multiplication. Also, the matrix multiplication can<br />

exploit sparsity and obtain a scaling linearly in the number of non-zero elements whereas sparsity is<br />

not as easily exploited in diagonalizations. Furthermore, the molecular orbitals described by the<br />

eigenvectors from the diagonalization of the Fock matrix are inherently delocalized and thus there is<br />

no sparsity to exploit.<br />

To obtain a linear scaling TRRH step it is thus necessary to avoid completely the diagonalizations<br />

and any reference to the MO basis. This can be done in our SCF program – a local version of<br />

DALTON 38,49 - by combining the d orth -shift scheme described in Section 1.4.1.5 with the trace<br />

purification (TP) described in Section 1.4.1.6.<br />

The trace purification scheme replaces the diagonalization of the level shifted Fock matrix and<br />

makes it possible to exploit sparsity in the matrices. A sparse blocked matrix storage scheme has<br />

been implemented for this purpose. In this scheme the columns and rows in the matrices are<br />

permuted such that close lying atoms are collected in blocks, making it possible to exploit the<br />

locality in the basis functions. Based on some drop tolerance for the size of matrix elements, pure<br />

zero blocks can be found and neglected, both saving storage and computing time. A library has been<br />

developed for the purpose of handling the matrix operations for this type of matrices and controlling<br />

the truncation error arising from the neglect of elements 49 .<br />

Calculations have been carried out on glycine chains of different length in the 4-31G basis set on a<br />

3.4GHz Xeon/Nocona Machine with EM64T architecture and MKL BLAS+LAPACK library.<br />

Timings have been made in the third iteration of the SCF optimization, measuring how much time<br />

(CPU) is spent in the TRRH step in the case of full matrices and diagonalizations of the level<br />

shifted Fock matrix (Diag./full) and in the case of sparse blocked matrices and the TP scheme<br />

(TP/sparse). The results are seen in Fig. 1.34. Both in the full and sparse case the d orth -shift scheme<br />

is applied.<br />

49


Part 1<br />

Improving Self-consistent Field Convergence<br />

60<br />

Time / min.<br />

50<br />

40<br />

30<br />

20<br />

10<br />

Diag./full<br />

TP/sparse<br />

0<br />

400 1050 1700 2350 3000<br />

Number of basis functions<br />

Fig. 1.34 Timings of a TRRH step in case of<br />

diagonalizations of full matrices (Diag./full) and in<br />

case of trace purification of sparse blocked matrices<br />

(TP/sparse).<br />

The crossover is already around 1500 basis functions, and it is clear how the diagonalization<br />

scheme quickly will become too time consuming if the number of basis functions is increased<br />

further. Of course, this is a linear molecule as seen from Fig. 1.35, and the cross over will be later<br />

for more three-dimensional molecules. The TP method does not have an exact linear scaling<br />

because of the transformation to the orthogonal basis which gives rise to a quadratic term, but the<br />

scaling factor on the quadratic term is very small. It should be noted that the dynamic level shift<br />

scheme typically takes 5-10 diagonalizations or trace purifications to find the optimal level shift in<br />

the first couple of iterations, and as the timings are from the third iteration, then not just one, but<br />

several diagonalizations or purifications are included in the timings in Fig. 1.34. Currently a full<br />

trace purification optimization (30-70 purification iterations) is carried out for each level shift tested<br />

to find the optimal level shift. It is straightforward to optimize this process such that the purification<br />

is not converged as hard for the level shifts tested and rejected, as for the final optimal level shift.<br />

Fig. 1.35 Glycine chain.<br />

To conclude, the scaling of the TRRH scheme with C-shift is dominated by the diagonalization, and<br />

sparsity cannot be exploited. Still with a good Fock builder it can run effectively up to a couple of<br />

thousand basis functions, but at some point the diagonalizations get too time consuming. For larger<br />

systems the purification scheme with the d orth -shift scheme can be used with blocked sparse matrices<br />

resulting in a near-linear scaling.<br />

50


Applications<br />

1.7.2 Scaling of TRDSM<br />

For the density subspace minimization, a set of linear equations, Eq. (1.66), are solved in each DSM<br />

step, but only in the dimension of the subspace which is much smaller than the number of basis<br />

functions. It is therefore of no significance compared to the matrix additions and multiplications<br />

needed to set up the DSM gradient g and Hessian H for the linear equations. For TRDSM it will<br />

thus only be the number of matrix multiplication that determines the scaling. Nothing has to be<br />

changed to exploit sparsity in the matrices, and linear scaling is automatically obtained from the<br />

point where the number of non-zero elements in the matrices is linear scaling. For full matrices the<br />

scaling is formally N 3 , where N is the number of basis functions, but as mentioned in the previous<br />

subsection this is not a problem as it is for the diagonalization, since matrix multiplications can be<br />

carried out with close to peak performance on computers. However, the number of matrix<br />

multiplications should be kept at a minimum as it affects the scaling factor.<br />

The number of matrix multiplications is dependent on the dimension of the subspace as the number<br />

of gradient and Hessian elements grows with the size of the subspace, but even though the Hessian<br />

is set up explicitly, the number of matrix multiplications only scales linearly with the dimension of<br />

the subspace. The expressions for the DSM gradient and Hessian are found in 0, and it is seen that if<br />

only the matrices FD i , SD i , FDiS and DSD i are evaluated, then all the terms for a Hessian<br />

element can be expressed as the trace of two known matrices or their transpose. As the operation<br />

TrAB scales quadratically instead of cubically, the overall scaling of TRDSM will be nN 3 for full<br />

matrices, where n is the dimension of the subspace and N the dimension of the problem. For sparse<br />

matrices both the matrix multiplications and TrAB scale linearly, but since n 2 TrABs are evaluated,<br />

the overall scaling is n 2 N. However, the trace operations have a very small prefactor.<br />

In the TRSCF scheme with C-shift the diagonalizations are thus the dominating operations, but<br />

since both the TRRH and TRDSM step can be carried out without any reference to the MO basis<br />

and with matrix multiplications as the most expensive operations, the TRSCF scheme is near-linear<br />

scaling and has what it takes to be applied to really large molecular systems. It is still a work in<br />

progress to get all the parts working together, so unfortunately no large scale TRSCF calculations<br />

will appear in this thesis, and no benchmarks in which sparsity in the matrices is exploited for<br />

TRDSM can be presented, but the whole framework is in place.<br />

1.8 Applications<br />

In this section, numerical examples are given to illustrate the convergence characteristics of the<br />

TRSCF and ARH calculations. Comparisons are made with DIIS, the TRSCF-LS method, and the<br />

globally convergent trust-region minimization method (GTR) of Francisco et. al. 26 .<br />

51


Part 1<br />

Improving Self-consistent Field Convergence<br />

In Section 1.8.1 a set of small molecules used by Francisco et. al. to illustrate the convergence<br />

characteristics of GTR is considered. Next in Section 1.8.2 the convergence of calculations on three<br />

metal complexes is discussed for the DIIS, TRSCF and TRSCF-LS methods.<br />

1.8.1 Calculations on Small Molecules<br />

As an alternative to the RH diagonalization, Francisco et. al. have developed an energy<br />

minimization method (GTR), where an energy model is minimized by a trust-region minimization.<br />

They have proven that it is a globally convergent algorithm, that is, no matter the starting point; the<br />

iterative steps will converge towards a stationary point. The best results are obtained when they<br />

combine GTR with DIIS and thereby let DIIS accelerate the convergence. To examine the<br />

convergence characteristics of TRSCF and ARH compared to GTR, calculations have been carried<br />

out with the attempt to reproduce the conditions given in the paper by Francisco et. al.. Thus HF<br />

calculations have been carried out with a maximum number of 10 previous density matrices for the<br />

density subspace minimizations and convergence is obtained when the difference between two<br />

consecutive energies is smaller than 10 -9 E h . The results are given in Table 1-7; the numbers found<br />

with our SCF program are on a white background, whereas results copied from the GTR paper are<br />

on a grey background.<br />

Table 1-7 Number of iterations in HF calculations performed by each algorithm in some test problems. The<br />

geometry of the molecules and the results in grey are taken from the paper by Francisco et. al. 26 , and<br />

GTR+DIIS is their globally convergent trust-region algorithm with DIIS acceleration.<br />

Algorithm<br />

Molecule Basis Start guess DIIS TRSCF<br />

C-shift<br />

TRSCF<br />

d orth -shift<br />

ARH DIIS GTR<br />

+DIIS<br />

H 2 O STO-3G H1-core 7 7 7 6 5 5<br />

6-31G H1-core 10 9 8 8 8 8<br />

NH 3 STO-3G H1-core 7 8 7 6 7 7<br />

6-31G H1-core 9 9 8 8 7 7<br />

CO STO-3G H1-core 12 9 9 9 11 10<br />

Hückel 8 8 8 - 7 7<br />

CO(Dist) * STO-3G H1-core 39(a) 9 8 8 117(b) 10<br />

Hückel 35 10 8 - 85 15<br />

6-31G H1-core 24(a) 13 10 9 27(b) 115<br />

Hückel 21(a) 10 10 - 36(b) 59<br />

Cr 2 STO-3G H1-core 34(a) 14(a) 10(a) 12(a) 13 38<br />

CrC STO-3G H1-core 29(a) 13(a) 11(a) 10(a) (X) 29<br />

* Distorted geometry – double bond length compared to CO<br />

(a) Negative Hessian eigenvalue.<br />

(b) Converged to a higher energy than some of the other algorithms<br />

(X) No convergence in 5001 iterations.<br />

Let us first consider the results obtained from our SCF program. Comparing the TRSCF results<br />

(both C-shift and d orth -shift) to the DIIS results, it is clear that the TRSCF method not only is an<br />

52


Applications<br />

improvement when DIIS cannot converge, but also for small simple examples, the convergence of<br />

TRSCF is as good as or better than for DIIS. Also it is observed that in five instances DIIS converge<br />

to a stationary point which is not a minimum, while that only happens in two instances for TRSCF.<br />

This suggests that the TRSCF algorithm does not have a high tendency to converge to saddle points<br />

compared to DIIS. Comparing the results obtained for TRSCF with the C-shift and the d orth -shift<br />

schemes, only minor differences are seen for these small examples, but in all cases the d orth -shift<br />

scheme presents a faster or similar convergence rate compared to the C-shift scheme. With the<br />

ARH method the convergence is further improved compared to the TRSCF/d orth -shift scheme. It is<br />

only a matter of saving a single iteration in some of the examples, but the tendency is clear. As the<br />

algorithm is still in the implementation phase, no numbers can currently be obtained with the<br />

Hückel start guess.<br />

Comparing now the results from our SCF program with the results from the GTR paper, the obvious<br />

peculiarity is the discrepancies between the DIIS results obtained by Francisco et. al. and by us. A<br />

plain DIIS optimization should be completely reproducible, but there is a difference of two out of<br />

seven iterations. These differences cannot be explained and make it more difficult to compare our<br />

results with theirs. Furthermore it seems that they have not tested the Hessian eigenvalues at the<br />

end; only if they for some other start guess or optimization method found a lower energy, it is noted<br />

in their table, and thus we cannot know for sure if the given number of iterations corresponds to<br />

convergence to a minimum. For Cr 2 and CrC it is very difficult to find the minimum, and several<br />

saddle points exist where convergence can be obtained (see Section 1.6). It is thus an open question<br />

whether the GTR+DIIS calculations for Cr 2 and CrC actually converge to a minimum or to a saddle<br />

point as for the TRSCF methods.<br />

In the examples where GTR+DIIS gives an improvement compared to their DIIS results, TRSCF<br />

and ARH also give significant improvements to our DIIS results. For the distorted CO example,<br />

TRSCF and ARH show better convergence than GTR+DIIS even if the results could be compared<br />

directly. For all examples TRSCF and ARH converge in 7-14 iterations, whereas GTR+DIIS use<br />

between five and 115. However, as discussed in Section 1.4.1.3, DIIS does not perform well when<br />

the gradient and energy are not correlated as is often the case in the global region when using<br />

TRRH, and could very well be the case for GTR as well. TRRH should be combined with a density<br />

subspace minimization method in the energy (e.g. TRDSM), and the same probably applies for<br />

GTR. We would thus suggest an implementation of TRDSM in connection with GTR.<br />

In conclusion it has been illustrated that the TRSCF and ARH methods have very nice convergence<br />

properties with improvements compared to DIIS in general and to GTR+DIIS as well, in case of<br />

more problematic examples.<br />

53


Part 1<br />

Improving Self-consistent Field Convergence<br />

1.8.2 Calculations on Metal Complexes<br />

In reference 39 and throughout this part of the thesis, three molecules including transition metals<br />

have been used for examples, namely the molecules in Fig. 1.3, Fig. 1.6 and Fig. 1.33. In this<br />

section HF and LDA calculations on these metal complexes are given both for DIIS, TRSCF and<br />

TRSCF-LS. For all calculations a H1-core start guess has been employed and a maximum of 10<br />

matrices are used to define the subspace in the density subspace minimization. This is different<br />

from the examples given in ref. 39, where the subspace dimension never was larger than eight.<br />

Furthermore for the TRSCF calculations in ref. 39 the C-shift scheme was applied whereas in the<br />

calculations reported here, the d orth -scheme has been applied.<br />

TRSCF-LS is the TRSCF line search method in which the TRRH-LS and TRDSM-LS steps<br />

described in Sections 1.4.1.4 and 1.4.2.4 are combined to set up an expensive, but highly robust<br />

method, in which the lowest SCF energy is identified by a line search at each step. The convergence<br />

results of the optimizations are seen in Fig. 1.36. For the cadmium complex a STO-3G basis set has<br />

been applied, for the rhodium complex the AhlrichsVDZ basis set 59 has been applied except for the<br />

rhodium which is described in the STO-3G basis and for the zinc complex the 6-31G basis set has<br />

been applied.<br />

The convergence of the TRSCF and TRSCF-LS methods is comparable for all cases in Fig. 1.36,<br />

and in general the TRSCF calculations converge in fewer iterations than the TRSCF-LS calculations<br />

do. As mentioned the line search method TRSCF-LS is much more expensive than TRSCF, and the<br />

only reason for applying it instead of TRSCF is for very difficult examples, where convergence<br />

cannot be obtained in any other way.<br />

The convergence behavior of the DIIS method is somewhat more erratic than that of the TRSCF<br />

methods since it makes no use of Hessian information and therefore cannot predict reliably what<br />

directions will reduce the total energy. The HF calculation on the rhodium complex and the LDA<br />

calculation on the zinc complex both diverge for the DIIS method. In general the erratic behavior is<br />

in particular seen in the global region whereas in the local region, it converges as well as the<br />

TRSCF method.<br />

54


Applications<br />

HF<br />

LDA<br />

Error in energy / E h<br />

1.E+02<br />

1.E+00<br />

1.E-02<br />

1.E-04<br />

1.E-06<br />

A<br />

Error in energy / E h<br />

1.E+02<br />

1.E+00<br />

1.E-02<br />

1.E-04<br />

1.E-06<br />

A<br />

1.E-08<br />

0 5 10 15 20<br />

Iteration<br />

1.E-08<br />

0 5 10 15 20<br />

Iteration<br />

Error in energy / E h<br />

1.E+02<br />

1.E+00<br />

1.E-02<br />

1.E-04<br />

1.E-06<br />

B<br />

Error in energy / E h<br />

1.E+02<br />

1.E+00<br />

1.E-02<br />

1.E-04<br />

1.E-06<br />

B<br />

1.E-08<br />

0 10 20 30 40<br />

Iteration<br />

1.E-08<br />

0 10 20 30 40<br />

Iteration<br />

Error in energy / E h<br />

1.E+02<br />

1.E+00<br />

1.E-02<br />

1.E-04<br />

1.E-06<br />

C<br />

Error in energy / E h<br />

1.E+02<br />

1.E+00<br />

1.E-02<br />

1.E-04<br />

1.E-06<br />

C<br />

1.E-08<br />

0 10 20 30 40<br />

Iteration<br />

1.E-08<br />

0 10 20 30 40<br />

Iteration<br />

DIIS TRSCF TRSCF-LS<br />

Fig. 1.36 Convergence of HF and LDA calculations on (A) the cadmium complex from Fig. 1.6,<br />

(B) the rhodium complex from Fig. 1.33, and (C) the zinc complex from Fig. 1.3.<br />

For the examples presented both in this and the previous subsection, the TRSCF convergence is as<br />

good as or better than DIIS, and for problems where DIIS diverges, convergence is obtained with<br />

the TRSCF methods. It thus seems that TRSCF has the properties of a good black-box optimization<br />

algorithm.<br />

55


Part 1<br />

Improving Self-consistent Field Convergence<br />

1.9 Conclusion<br />

In this part of the thesis the trust region SCF (TRSCF) algorithm is presented as a means to improve<br />

SCF convergence compared to methods typically used today e.g. DIIS. In the TRSCF method, both<br />

the Roothaan-Hall (RH) step and the density-subspace minimization (DSM) steps are replaced by<br />

optimizations of local energy models of the Hartree-Fock/Kohn-Sham energy E SCF . These local<br />

models have the same gradient as the energy E SCF , but an approximate Hessian. Restricting the steps<br />

of the TRSCF algorithm to the trust region of these local models, that is, to the region where the<br />

local models approximate E SCF well, smooth and fast convergence may be obtained.<br />

The developments through the years in SCF optimization algorithms are reviewed, and it is found<br />

that the fundamental schemes used in TRSCF to improve convergence have been around for several<br />

years; DIIS is actually a subspace minimization in the gradient norm, and level shifts have been<br />

used to improve or force convergence since 1973. Anyhow, the level shifts have previously been<br />

found on a trial and error basis as a constant parameter, whereas we advocate a dynamic level shift<br />

scheme in which the level shift is used to control the density change in the RH step. As such the<br />

level shift is optimized in each iteration to allow the density to change to the trust radius of the RH<br />

energy model, hence the name trust region Roothaan-Hall (TRRH) for our RH scheme. Also, the<br />

density subspace minimization has been improved compared to previous methods. An accurate<br />

energy model is constructed in the iterative subspace, where only minor approximations are made<br />

compared to the SCF energy. The trust region minimization of this energy model thus corresponds<br />

well to a minimization of E SCF in the iterative subspace, thus resulting in an energy decrease in each<br />

trust region DSM (TRDSM) step. The TRRH and TRDSM steps in combination make up a<br />

successful scheme with a high convergence rate without compromising the control of the density<br />

changes in each step.<br />

Compared to ref. 38 and 39 , an alternative level shift scheme (d orth -shift) for the TRRH step is<br />

presented which does not control the density change through the overlap of the individual orbitals,<br />

but instead controls the amount of new information added to the density subspace. Thus the d orth -<br />

shift scheme does not contain any reference to the MO basis and can be used in connection with<br />

alternatives to diagonalization. Also, it is found that the d orth -shift scheme leads to a faster<br />

convergence since the former level shift scheme is too restrictive, ignoring the well known changes<br />

contained in the density subspace.<br />

For TRDSM, an improvement of the energy model is developed, in which a part of the term<br />

neglected in the DSM energy model compared to the SCF energy is recovered. However, the effects<br />

of the improvement are found rather small compared to the extra complexity added to the algorithm.<br />

56


Conclusion<br />

An energy minimization algorithm is presented as well, replacing the standard RH-diagonalization<br />

in the SCF optimization. The novel idea is to exploit the valuable information saved in the density<br />

subspace of the previous densities to construct an improved RH energy model (augmented<br />

Roothaan-Hall - ARH) and minimize this model instead of the RH model. This makes the TRDSM<br />

step redundant since a density subspace minimization now is included in the minimization of the<br />

RH energy model. We expect a faster convergence rate for ARH compared to TRSCF, mainly<br />

because the RH and DSM steps are merged to an energy model with correct gradient (not just in the<br />

subspace) and an approximate Hessian, which is improved in each iteration using the information<br />

from the previous density and Fock matrices. The preliminary results from the ARH energy<br />

minimization seems promising, with convergence improvements compared to TRSCF, which<br />

already had better or as good convergence rates as DIIS.<br />

The errors introduced in the TRRH and TRDSM energy models compared to the SCF energy are<br />

studied. Since the DFT and HF energy expressions differ, the errors in the energy models are<br />

potentially different for the two methods. It is found that the DSM energy model has the same error<br />

of the order ||D δ || 2 for both HF and DFT, where D δ is the idempotency correction we impose on the<br />

averaged density. For the RH energy model it is found by inspecting test cases that the errors are<br />

larger for LDA than for HF, especially when convergence is approached. The error can be divided<br />

into two sources, namely the error in the RH Hessian compared to the SCF Hessian, and the size of<br />

the third and higher order contributions from the nonlinear terms in the SCF energy, which are not<br />

included in the RH energy model. By further tests it seems that the Hessian is better described in<br />

LDA than in HF, and since the errors are larger for LDA in particular close to convergence, it seems<br />

unlikely that the third and higher order terms are causing the difference. The question why larger<br />

errors are seen for LDA than for HF is thus still unanswered and it will be further investigated.<br />

The stability of stationary points is discussed and a method to test and walk away from unstable<br />

stationary points is described, and examples are given, where it has been applied. It is<br />

acknowledged that such a method is very valuable since otherwise a minimum could not have been<br />

found for the examples given.<br />

The scaling of TRSCF is also considered. An alternative to diagonalization has been implemented<br />

in our SCF program, where instead of diagonalizing the Fock matrix, the trace purification scheme<br />

by Palser and Manolopoulos 19 and later Niklasson 48 is used. The purification scheme in combination<br />

with the d orth -shift scheme make the TRRH step near-linearly scaling. The trace purification scheme<br />

is linear scaling in an orthogonal basis, but since the optimization scheme is formulated in the nonorthogonal<br />

AO basis, the transformation to an orthogonal basis has an N 2 scaling with a small<br />

prefactor. Timings for the TRRH step with diagonalizations and with purifications are given, and it<br />

57


Part 1<br />

Improving Self-consistent Field Convergence<br />

is seen that the trace purification scheme is a major improvement compared to diagonalization when<br />

more that a couple of thousand basis functions are needed. The TRDSM step is based on matrix<br />

multiplications and additions, so by construction it will be linearly scaling when sparsity in the<br />

matrices is exploited.<br />

As illustrated in the examples throughout this part of the thesis and in the applications section,<br />

significant improvements to SCF convergence have been obtained. For both the TRSCF and ARH<br />

examples presented, the convergence is as good as or better than DIIS, and for problems where<br />

DIIS diverges, convergence is obtained with the TRSCF and ARH methods. The globally<br />

convergent trust region method by Francisco et. al. 26 is found to be better only for the simplest<br />

examples whereas for the rest, the TRSCF and ARH methods are found superior. The future success<br />

of the TRSCF method depends on a well optimized implementation of the diagonalization<br />

alternative combined with the dynamic level shift scheme, and sparsity being exploited in an<br />

efficient manner such that it can compete with the linear scaling SCF programs used today. The<br />

future success of the ARH method depends on finding efficient ways of solving the nonlinear<br />

equations corresponding to the minimization of the energy model. For this purpose different<br />

preconditioners will be tested.<br />

To conclude, there are still some adjustments that should be done to improve the algorithms, but the<br />

framework is in place. The SCF optimization algorithms presented in this thesis, each make up a<br />

black-box optimization scheme for HF and DFT as there is one scheme without any user-adjustment<br />

that lead to fast and stable convergence for both simple and problematic systems studied so far. We<br />

are thus convinced that TRSCF and ARH are build to handle the optimization problems of the<br />

future.<br />

58


Part 2<br />

Atomic Orbital Based Response Theory<br />

2.1 Introduction<br />

The first part of this thesis was concerned with the optimization of the one electron density matrix<br />

for Hartree-Fock (HF) and density-functional theory (DFT). From such an optimized density,<br />

information about excited states and how the system reacts to a perturbation (e.g. an external<br />

electric field) may be obtained using response theory. Response theory and the derivation of<br />

molecular properties will be the subject of this part of the thesis.<br />

Response theory provides a rigorous approach for calculating molecular properties. As for the SCF<br />

optimization algorithms, the theory has usually been formulated in the molecular orbital (MO) basis<br />

which is inherently delocal, making the implicated matrices non-sparse. A reformulation in the local<br />

atomic orbital (AO) basis is thus necessary to obtain linear scaling algorithms and permit<br />

calculations of properties for large systems. Such a reformulation, in which an exponential<br />

parameterization of the density matrix is employed, is given in a paper by Larsen et al. 61 .<br />

The AO formulation of the response functions has a number of advantages compared to the MO<br />

formulation, besides locality. The response equations and molecular property expressions are<br />

simpler in the AO basis as the involved matrices (e.g. the Fock and property matrices) enter the<br />

equations in the basis they are evaluated in originally. No transformation between bases is necessary<br />

in the AO formulation as it is in the MO formulation. The AO formulation is particular convenient<br />

for perturbation dependent basis sets. In the MO formulation a set of perturbation dependent<br />

orthonormal molecular orbitals must be introduced. These orbitals have no physical content and<br />

thus add artificial complexity to the problem. To exemplify the benefits of the AO formulation, the<br />

expression for the excited state geometrical gradient is derived in Section 2.4.<br />

59


Part 2<br />

Atomic Orbital Based Response Theory<br />

In the conventional MO formulation, number operators are redundant and can be eliminated.<br />

However, in the AO basis the number operators are not redundant and must be included. Because of<br />

this, the proof of pairing in the solutions of the response equations cannot be directly taken from the<br />

MO basis to the AO basis. It is thus necessary to study the impact of the included number operators<br />

on the solver for the AO response equations. This has been done in Section 2.2, using the method of<br />

second quantization to formulate the AO based response equations. Implementation issues<br />

connected to solving the AO response equations are discussed in Section 2.3. In Section 2.5 a<br />

couple of simple examples are given, where the AO response solver is used to find ground and<br />

excited state properties. In Section 2.6 the results of this part of the thesis are summarized.<br />

2.2 AO Based Response Equations in Second Quantization<br />

In this section the linear response equations are derived for Hartree-Fock theory, but with minor<br />

technical changes they apply to DFT as well. The quadratic and higher response equations could<br />

equally well be derived in this formulation; however, this is not necessary to arrive at the basic<br />

conclusions.<br />

2.2.1 The Parameterization<br />

Consider a set of atomic orbitals (χ µ ) with the real and symmetric metric S. The creation and<br />

annihilation operators for the atomic orbitals fulfil the anticommutation relation<br />

†<br />

µ , ν + νµ<br />

⎣⎡a a ⎦ ⎤ = S . (2.1)<br />

We will consider the following exponential operator<br />

Tˆ<br />

= exp ( iκˆ<br />

), (2.2)<br />

where ˆκ is a Hermitian one-electron operator<br />

To examine the action of<br />

ˆ κ = ∑ κ<br />

(2.3)<br />

µν<br />

†<br />

µν aµ aν<br />

†<br />

κ = κ .<br />

(2.4)<br />

exp( iκ ˆ)<br />

, we consider the transformed creation operators<br />

a = exp( iˆ) a exp( −iˆ<br />

κ)<br />

. (2.5)<br />

† †<br />

µ κ µ<br />

It is seen that the transformed operators satisfy the same anticommutation relations as the<br />

untransformed operators<br />

⎡⎣a<br />

a<br />

⎤⎦ ⎡⎣ iˆ a iˆ iˆ a iˆ<br />

⎤⎦<br />

† †<br />

µ , ν = exp( κ) µ exp( − κ),exp( κ) ν exp( − κ)<br />

+ +<br />

= exp( iˆ<br />

κ) ⎡⎣a , a exp( − iˆ) = S .<br />

†<br />

µ ν ⎤⎦<br />

κ<br />

+<br />

νµ<br />

(2.6)<br />

60


AO Based Response Equations in Second Quantization<br />

The exponential operators of Eq. (2.2) are therefore the manifold of operators that conserves the<br />

general metric S. In the special case where S = 1, the exponential operator reduces to the standard<br />

exponential operator occurring in the second quantization formalism of the molecular orbital based<br />

method. 46<br />

Using the Baker-Champbell-Hausdorff expansion 46 and the anticommutation relation of Eq. (2.1),<br />

we get<br />

a<br />

a i ˆ a ˆ ˆ a<br />

† † † 1<br />

†<br />

µ = µ + ⎡⎣κ, µ ⎤⎦− ⎡ , ,<br />

2 ⎣κ ⎡⎣κ<br />

µ ⎤⎤ ⎦⎦ + <br />

2<br />

µ ∑ νµ ν 2 ∑ νµ ν<br />

ν<br />

ν<br />

† † 1<br />

†<br />

= a + i ( κS ) a − ( κS ) a + . (2.7)<br />

=<br />

∑<br />

ν<br />

†<br />

exp ( iκS<br />

) a .<br />

νµ<br />

ν<br />

To further investigate the properties of the above exponential transformation, we next consider the<br />

transformation of a single determinant state 0 with exp( iκ ˆ)<br />

0 = exp( iκˆ<br />

) 0 . (2.8)<br />

The properties of 0 may be obtained by comparing the expectation values of transformed<br />

creation-annihilation operators<br />

∆ = 0<br />

a a 0<br />

= 0 exp( −iˆ κ) a exp( iˆ κ) exp( −iˆ κ) a exp( iˆ<br />

κ) 0<br />

(2.9)<br />

† †<br />

µν µ ν µ ν<br />

with the expectation values of the untransformed operators<br />

†<br />

µν aµ aν<br />

∆ = 0 0 . (2.10)<br />

To rewrite Eq. (2.9) in terms of Eq. (2.10) we use Eq. (2.7) to write the transformed creation- and<br />

annihilation-operators in terms of the untransformed operators<br />

∑<br />

∑<br />

exp( − iˆ<br />

κ) a exp( iˆ) = exp( −i ) a<br />

† †<br />

µ κ<br />

κS ρµ ρ<br />

ρ<br />

exp( − iˆ<br />

κ) a exp( iˆ<br />

κ) = exp( iSκ) a .<br />

ν νρ ρ<br />

ρ<br />

T<br />

T<br />

( i ) exp ( i )<br />

(2.11)<br />

Substituting these expressions into Eq. (2.9) gives<br />

∆ = exp - Sκ ∆ κ S . (2.12)<br />

In Appendix B, it is shown that if 0 is a single determinant wave function, then ∆ fulfils Eqs.<br />

(B-7), corresponding to the symmetry, trace, and idempotency condition for the one-electron<br />

density. We will now show that if ∆ fulfils these equations then so does ∆ . The Hermiticity of ∆<br />

follows from the Hermiticity of S and κ and will not be shown explicitly here. The trace relation is<br />

shown as follows<br />

61


Part 2<br />

Atomic Orbital Based Response Theory<br />

Tr ∆S = Tr ∆exp( iκ SS ) exp( −iSκ ) SS<br />

−1 T −1 T −1<br />

−1<br />

T T −1<br />

= Tr ∆exp( iκ S) exp( −iκ SS )<br />

= Tr ∆S ,<br />

(2.13)<br />

where we have used the relation<br />

−1 −1<br />

B exp( A) B = exp( B AB ) . (2.14)<br />

The same relation may be used to show the idempotency relation<br />

−<br />

( i ) ( i ) ( i ) ( i )<br />

T T T −1 T<br />

( iSκ ) ∆ ( iκ S ) ( iκ S) S ∆ ( iκ S )<br />

T −1 T<br />

( iSκ ) ∆S ∆exp<br />

( iκ S )<br />

T<br />

T<br />

( iSκ ) ∆ ( iκ S ) ∆<br />

−1 T T 1 T T<br />

∆S ∆<br />

= exp − Sκ ∆exp κ S S exp − Sκ ∆exp<br />

κ S<br />

= exp − exp exp − exp<br />

= exp −<br />

= exp − exp = .<br />

(2.15)<br />

We can therefore conclude that ∆ fulfils Eqs. (B-7) and exp( iκ ˆ) 0 is therefore a legitimate<br />

normalized single-determinant wave function. It can be shown that all matrices fulfilling Eqs. (B-7)<br />

can be obtained from an appropriate choice of κ, so the transformation of Eq. (2.8) is a complete<br />

parameterization.<br />

2.2.2 The Linear Response Function<br />

We will now use the parameterization of Eq. (2.8) for an arbitrary single-determinant wave function<br />

to describe a Hartree-Fock wave function in an external, time-dependent field. The parameters in κ<br />

will become time-dependent and we will in the following develop equations for obtaining these<br />

parameters. The time-dependent Hamiltonian can be written as<br />

H = H0 + Vt<br />

, (2.16)<br />

where H 0 is the Hamiltonian for the unperturbed system, and V t is a first-order perturbation. The<br />

perturbation will be turned on adiabatically, and V t can be expressed as<br />

∞<br />

−∞<br />

Vt<br />

= ∫ dωVω<br />

exp( ( − iω + ε ) t)<br />

, (2.17)<br />

where ε is a positive infinitesimal that ensures V t → 0 as t → -∞. The perturbation is required to be<br />

Hermitian, so we have the relation<br />

†<br />

ω<br />

V<br />

= V . (2.18)<br />

−ω<br />

To determine the linear response function, we begin by considering the time dependence of the<br />

expectation value 0<br />

A 0 of a one-electron operator A. We need only expand the wave function<br />

0 of Eq. (2.8) to first order in the external perturbation to obtain the linear response:<br />

(1) (2)<br />

t t<br />

ˆ κ = ˆ κ + ˆ κ +. (2.19)<br />

62


AO Based Response Equations in Second Quantization<br />

(0)<br />

ˆt κ<br />

The zero-order contribution, , vanishes as the unperturbed wave function 0 is assumed to be<br />

optimized for the zero-order Hamiltonian, so the Brillouin-conditions in the AO basis hold<br />

∂<br />

∂<br />

κ µν<br />

†<br />

µ ν<br />

0 H0 0 = i 0 ⎡⎣H0, a a ⎤⎦<br />

0 = 0. (2.20)<br />

Substitution of the expansion of ˆκ into Eq. (2.8) gives to first order:<br />

(1)<br />

0 A 0 = 0 A 0 −i 0 ⎡ ˆ κt<br />

, A⎤<br />

⎣ ⎦<br />

0 . (2.21)<br />

Since the response functions are defined in the frequency rather than the time domain, we formulate<br />

the wave function corrections in the frequency space. By analogy with Eq. (2.17), we write<br />

∞<br />

−∞<br />

Inserting Eq. (2.22) into Eq. (2.21) we obtain<br />

(1) (1)<br />

κt = ∫ dωκω<br />

exp( ( − iω + ε ) t)<br />

. (2.22)<br />

∞<br />

(1)<br />

0 A 0 = 0 A 0 −i dω 0 ⎡ ˆ κω<br />

, A⎤<br />

⎣ ⎦<br />

0 exp (( − iω + ε)<br />

t)<br />

. (2.23)<br />

∫<br />

-∞<br />

Comparing Eq. (2.23) with the formal expansion of an expectation value in terms of a response<br />

function<br />

∞<br />

-∞<br />

0 A 0 = 0 A 0 + d ω A; V exp (( − iω + ε)<br />

t)<br />

, (2.24)<br />

we may identify the linear response function as<br />

∫<br />

ω<br />

ω<br />

(1)<br />

ω<br />

AV ; ω 0 ˆ<br />

ω<br />

=−i ⎡κ<br />

, A⎤<br />

⎣ ⎦<br />

0<br />

. (2.25)<br />

2.2.3 The Time Development of the Reference State<br />

Before the explicit time-dependent equations are set up for determining the time-dependent<br />

parameters of κ, it is convenient to rewrite ˆκ , Eq. (2.3), as<br />

† † †<br />

∑( µν µ ν ∗<br />

µν ν µ ) ∑ µµ µ µ , (2.26)<br />

ˆ κ = κ a a + κ a a + κ a a<br />

µ > ν µ<br />

which follows from the Hermiticity of ˆκ . The operators of ˆκ may be collected in a vector (here in<br />

row form):<br />

where the three classes of operators are defined as<br />

† †<br />

( )<br />

Λ = Q D Q , (2.27)<br />

Q<br />

D<br />

Q<br />

† †<br />

m aµ aν<br />

= , µ > ν<br />

† †<br />

m = aµ aµ<br />

m<br />

†<br />

ν µ<br />

= a a , µ > ν .<br />

(2.28)<br />

63


Part 2<br />

Atomic Orbital Based Response Theory<br />

The parameters of κ may similarly be arranged in a vector<br />

such that<br />

⎛ ⎞ ><br />

() i<br />

κ µν µ ν<br />

⎜ ⎟<br />

() i () i<br />

= ⎜ κµµ<br />

⎟<br />

⎜ () i<br />

κ µ ν ,<br />

µν<br />

∗ ⎟ ><br />

α (2.29)<br />

⎝<br />

⎠<br />

ˆ() i ()<br />

κ = ∑ αm<br />

i Λm<br />

. (2.30)<br />

m<br />

Here the index m on Λ runs over all three classes of operators listed in Eq. (2.28).<br />

The single excitation operators a †<br />

µ aν have by Eq. (2.27)-(2.28) been divided into a set of atomic<br />

orbital excitations, corresponding to µ > ν and a set of atomic orbital deexcitations, corresponding to<br />

µ < ν. As the atomic orbital excitations and deexcitation have the same formal properties, this<br />

division does not have any physical content. However, the division will prove important when the<br />

paired structure of the response equations is investigated in Section 2.2.5. Note that it is not possible<br />

to exclude the number operators a †<br />

µ aµ in the atomic orbital representation, whereas they are<br />

redundant in the standard molecular orbital formulation.<br />

In the presence of the time-dependent perturbation, we introduce the time transformed operator<br />

basis<br />

⎛ Q<br />

⎞<br />

† ⎜ ⎟<br />

Λ<br />

= ⎜ D<br />

⎟ , (2.31)<br />

⎜ † ⎟<br />

⎝Q<br />

<br />

⎠<br />

where<br />

and similarly for<br />

†<br />

Q m and D m .<br />

Q = exp( iˆ<br />

κ) Q exp( −iˆ<br />

κ)<br />

(2.32)<br />

m<br />

The time evolution of 0 may now be determined using Ehrenfest’s theorem for the transformed<br />

†<br />

operators of Λ in Eq. (2.31):<br />

d † ∂<br />

0 0 0<br />

† 0 0<br />

†<br />

Λ −<br />

⎛<br />

Λ<br />

⎞<br />

= − ⎡ Λ , 0 + ⎤ 0<br />

dt<br />

2.2.4 The First-order Equation<br />

m<br />

<br />

⎜<br />

i H V<br />

∂t<br />

⎟<br />

<br />

⎣ t<br />

<br />

⎝ ⎠<br />

⎦ . (2.33)<br />

We now expand Eq. (2.33) in orders of the external perturbation, restricting ourselves to terms that<br />

are linear in the amplitudes. Inserting Eq. (2.19) into Eq. (2.33) and collecting the terms linear in the<br />

perturbation, we obtain the first-order time-dependent equation<br />

64


AO Based Response Equations in Second Quantization<br />

† (1) † † (1)<br />

κt<br />

=− ⎡ t ⎤ +<br />

0 ˆ κt<br />

i 0 ⎡ , ⎤ 0 i 0 , V 0 0 ⎡ , ⎡H<br />

, ⎤⎤<br />

⎣ Λ <br />

⎦ ⎣ Λ ⎦ ⎣ Λ ⎣ ⎦⎦<br />

0 . (2.34)<br />

To solve the time-dependent equation Eq. (2.34), we insert the frequency expansion of the wave<br />

function correction of Eq. (2.22) and of the external perturbation Eq. (2.17)<br />

∞<br />

−∞<br />

∞<br />

∫−∞<br />

∫<br />

(1) (1)<br />

( − i + t)( ⎡Λ<br />

† ˆω<br />

⎤ − ⎡Λ<br />

† ⎡H0<br />

ˆω<br />

⎤⎤ )<br />

dωexp ( ω ε) ω 0<br />

⎣<br />

, κ<br />

⎦<br />

0 0<br />

⎣<br />

,<br />

⎣<br />

, κ<br />

⎦⎦<br />

0<br />

†<br />

( i t)( i ⎡Λ<br />

Vω<br />

⎤ )<br />

= dωexp ( − ω + ε) − 0 ⎣ , ⎦ 0 .<br />

The first-order response equation is then found as<br />

† (1) † (1)<br />

ˆ<br />

†<br />

ω H0<br />

ˆω i Vω<br />

(2.35)<br />

ω 0 ⎡ , κ ⎤ 0 0 ⎡ , ⎡ , κ ⎤⎤<br />

⎣<br />

Λ<br />

⎦<br />

−<br />

⎣<br />

Λ<br />

⎣ ⎦⎦<br />

0 = − 0 ⎡⎣ Λ , ⎤⎦<br />

0 . (2.36)<br />

The equation may be written in terms of the matrices<br />

and the vector<br />

E<br />

= 0 ⎡⎣ Λ ,[ H0<br />

, Λ ] ⎤⎦ 0 , (2.37)<br />

[2] †<br />

Smn = 0 ⎡⎣ Λm , Λn<br />

⎤⎦ 0 , (2.38)<br />

[2] †<br />

mn m n<br />

[1] †<br />

ω = Λ<br />

m<br />

m<br />

⎡⎣V<br />

⎤⎦ 0 ⎡⎣ , Vω<br />

⎤⎦ 0 . (2.39)<br />

Using Eqs. (2.37)-(2.39) and (2.29)-(2.30), we now write the first-order response equations, Eq.<br />

(2.36), in the form<br />

( ω )<br />

[2] − [2] (1) = i [1]<br />

ω<br />

E S α V , (2.40)<br />

where E [2] and S [2] may be viewed as generalized electronic Hessian and overlap matrices 61,62 . The<br />

[2] [2]<br />

matrix elements E mn and S mn (Eq. (2.37) and (2.38)) can be expressed as matrix multiplications<br />

and additions of the density, Fock and overlap matrices. 61<br />

The linear response function is obtained by inserting the first-order correction as obtained in Eq.<br />

(2.40) in the expression for the linear response function Eq. (2.25). Renaming the perturbation<br />

operator V ω to B and introducing<br />

we obtain<br />

A<br />

B<br />

[1]<br />

m =− ⎡ ⎣ Λm<br />

[1] †<br />

m = ⎡Λm<br />

0 , A⎤<br />

⎦ 0<br />

(2.41)<br />

0 ⎣ , B⎤<br />

⎦ 0<br />

−<br />

( ) 1<br />

[1] [2] [2] [1]<br />

AB ; ω<br />

=−A E −ωS B . (2.42)<br />

The linear response function may thus be calculated by solving one set of linear equations at each<br />

frequency. To be more explicit, denoting the solution vector to the linear response equation<br />

B ω<br />

−<br />

( ω ) 1<br />

[2] [2] [1]<br />

N ( ) = E − S B , (2.43)<br />

65


Part 2<br />

Atomic Orbital Based Response Theory<br />

the linear response function in Eq. (2.42) can be obtained as<br />

[1]<br />

B<br />

AB ; ω<br />

=−A N ( ω)<br />

. (2.44)<br />

2.2.5 Pairing<br />

The excitation energies are identified as the poles of the linear response function of Eq. (2.42) and<br />

are therefore solutions to the generalized eigenvalue problem<br />

[2] [2]<br />

E X = ωS X. (2.45)<br />

In the MO formulation of response theory, it has been shown that the excitation energies are<br />

paired 63 , so that if ω i is an eigenvalue for Eq. (2.45) then so is -ω i . It is important to understand how<br />

pairing appears in the AO basis, in particular since this structural feature is exploited when the<br />

equations are solved iteratively as is necessary for large problems. This is further discussed in<br />

Section 2.3. Since the proof of the pairing given in the MO formulation cannot be directly<br />

transferred to the AO formulation due to the presence of the diagonal operators D m , this section<br />

gives the proof in the AO formulation.<br />

The structure of E [2] and S [2] in the AO formulation is analyzed for the purpose of examining the<br />

pairing structure. Dividing Λ into the tree classes of Eq. (2.28), the matrix E [2] may be written as<br />

†<br />

⎛ 0 ⎡⎣Q, ⎡⎣H0, Q ⎤⎤ ⎦⎦ 0 0 [ Q, [ H0, D]<br />

] 0 0 [ Q, [ H0, Q]<br />

] 0 ⎞<br />

[2]<br />

⎜<br />

⎟<br />

†<br />

E = ⎜ 0 ⎣⎡D, ⎣⎡H0, Q ⎦⎦ ⎤⎤ 0 0 [ D, [ H0, D]<br />

] 0 0 [ D, [ H0, Q]<br />

] 0 ⎟. (2.46)<br />

⎜ † † † †<br />

⎟<br />

⎝ 0 ⎣⎡Q , ⎣⎡H0, Q ⎦⎦ ⎤⎤ 0 0 ⎣⎡Q ,[ H0, D] ⎦⎤ 0 0 ⎣⎡Q ,[ H0, Q]<br />

⎦⎤<br />

0 ⎠<br />

If we assume for simplicity that all orbitals and integrals for the unperturbed system are real, the<br />

†<br />

elements of for example the block 0 ⎡⎣Q ,[ H0<br />

, Q ] ⎤⎦<br />

0 are trivially rewritten as<br />

† †<br />

∗<br />

0 ⎡⎣Qm, [ H0, Qn ] ⎤⎦ 0 = 0 ⎡⎣Qm, [ H0, Qn<br />

] ⎤⎦<br />

0<br />

(2.47)<br />

†<br />

= 0 ⎡⎣Qm, ⎡⎣H0<br />

, Qn<br />

⎤⎤ ⎦⎦ 0 .<br />

The nine blocks in Eq. (2.46) can then all be written in terms of the following four matrices<br />

and we obtain<br />

†<br />

mn m 0 n<br />

A = 0 ⎡⎣Q , ⎡⎣H , Q ⎤⎤ ⎦⎦ 0 ,<br />

Bmn = 0 ⎡⎣Qm , ⎡⎣H0<br />

, Qn<br />

⎤⎤ ⎦⎦ 0 ,<br />

(2.48)<br />

Fmn = 0 ⎡⎣Qm , ⎡⎣H0<br />

, Dn<br />

⎤⎤ ⎦⎦ 0 ,<br />

Gmn = 0 ⎡⎣Dm , ⎡⎣H0<br />

, Dn<br />

⎤⎤ ⎦⎦ 0 ,<br />

⎛ A F B ⎞<br />

[2] ⎜ T T<br />

E = F G F<br />

⎟<br />

. (2.49)<br />

⎜<br />

⎟<br />

⎝ B F A ⎠<br />

66


AO Based Response Equations in Second Quantization<br />

The matrix S [2] may in a similar way be written as<br />

⎛ Σ Ω ∆ ⎞<br />

[2] T T<br />

S =<br />

⎜<br />

Ω 0 -Ω<br />

⎟<br />

⎜<br />

- - -<br />

⎟<br />

⎝ ∆ Ω Σ ⎠<br />

, (2.50)<br />

where<br />

†<br />

mn ⎡Qm Qn<br />

Σ = 0 ⎣ , ⎤⎦<br />

0 ,<br />

∆ mn = 0 ⎡⎣Qm , Qn<br />

⎤⎦<br />

0 ,<br />

Ω = 0 [ Q , D ] 0 .<br />

mn m n<br />

(2.51)<br />

Note that the block containing two diagonal operators vanishes as<br />

† † † †<br />

[ Dm<br />

Dn<br />

] = ⎡⎣aµ aµ aνaν ⎤⎦ = Sµν aµ aν − Sνµ aν aµ<br />

= . (2.52)<br />

0 , 0 0 , 0 0 0 0 0 0<br />

To illustrate how the pairing is obtained in the AO formulation, we assume that the vector<br />

⎛ Z ⎞<br />

X =<br />

⎜<br />

U<br />

⎟<br />

⎜ ⎟<br />

⎝Y<br />

⎠<br />

(2.53)<br />

is an eigenvector for Eq. (2.45) with eigenvalue ω<br />

⎛ A F B ⎞⎛ Z⎞ ⎛ Σ Ω ∆ ⎞⎛ Z ⎞<br />

⎜ T T ⎟⎜ ⎟ T T<br />

F G F U = ω<br />

⎜<br />

Ω 0 -Ω ⎟⎜<br />

U<br />

⎟<br />

. (2.54)<br />

⎟⎜ ⎟⎜ ⎜ ⎟⎜ ⎟ ⎜<br />

- - -<br />

⎟⎜ ⎟<br />

⎝ B F A ⎠⎝Y⎠ ⎝ ∆ Ω Σ ⎠⎝Y<br />

⎠<br />

Multiplying the blocks of Eq. (2.54) gives three sets of equations<br />

AZ + FU + BY = ω ( ΣZ + ΩU + ∆Y )<br />

( )<br />

T T T T<br />

F Z+ GU+ F Y = ω Ω Z −Ω Y<br />

BZ + FU + AY = ω ( −∆Z −ΩU −ΣY<br />

).<br />

(2.55)<br />

We will now prove that the paired vector<br />

X<br />

P<br />

⎛Y<br />

⎞<br />

=<br />

⎜<br />

U<br />

⎟<br />

⎜ ⎟<br />

⎝ Z ⎠<br />

(2.56)<br />

is an eigenvector for Eq. (2.45) with eigenvalue –ω<br />

⎛ A F B ⎞⎛Y⎞ ⎛ Σ Ω ∆ ⎞⎛Y<br />

⎞<br />

⎜ T T ⎟⎜ ⎟ T T<br />

F G F U =−ω<br />

⎜<br />

Ω 0 -Ω ⎟⎜<br />

U<br />

⎟<br />

. (2.57)<br />

⎟⎜ ⎟⎜ ⎜ ⎟⎜ ⎟ ⎜<br />

- - -<br />

⎟⎜ ⎟<br />

⎝ B F A ⎠⎝ Z⎠ ⎝ ∆ Ω Σ ⎠⎝ Z ⎠<br />

Multiplying the blocks of Eq. (2.57) leads to the three sets of equations<br />

67


Part 2<br />

Atomic Orbital Based Response Theory<br />

AY + FU + BZ = − ω ( ΣY + ΩU + ∆Z )<br />

( )<br />

T T T T<br />

F Y+ GU+ F Z = −ω<br />

Ω Y −Ω Z<br />

BY + FU + AZ = −ω<br />

( −∆Y −ΩU −ΣZ<br />

),<br />

(2.58)<br />

which are identical to Eqs. (2.55). It is thus concluded that if X is an eigenvector of Eq. (2.45) with<br />

eigenvalue ω, then X P is also an eigenvector with eigenvalue –ω.<br />

2.3 Solving the Response Equations<br />

For large systems, the response equations<br />

( ω )<br />

[2] [2] [1]<br />

E − S N B ( ω ) = B (2.59)<br />

are best solved using iterative algorithms. These algorithms rely on the ability to set up linear<br />

transformations. Expressions for E [2] b and S [2] b, where b is a trial vector, have previously been<br />

derived. 61 [2]<br />

σ = E b (2.60)<br />

[2]<br />

ρ = S b. (2.61)<br />

In each iteration, the response equations are set up and solved in a reduced space. For a reduced<br />

space consisting of k trial vectors, the equations can be written as<br />

where the reduced matrices are found as<br />

( ω )<br />

[2] [2] RED [1]<br />

RED<br />

−<br />

RED<br />

=<br />

RED<br />

E S X B , (2.62)<br />

[2] T [2] T<br />

RED ⎦ i j i j<br />

ij<br />

⎡<br />

⎣<br />

E ⎤ = b E b = b σ<br />

[2] T [2] T<br />

RED ⎦ i j i j<br />

ij<br />

⎡<br />

⎣<br />

S ⎤ = b S b = b ρ<br />

[1] T [1]<br />

RED ⎦<br />

bi<br />

B .<br />

i<br />

⎡<br />

⎣<br />

B ⎤ =<br />

(2.63)<br />

Normally when this type of iterative procedure is used, the reduced space is extended with one new<br />

trial vector in each iteration. However, due to the pairing described in the previous section, the<br />

linear transformations of E [2] and S [2] on a trial vector, here exemplified by E [2] b,<br />

⎛ A F B ⎞⎛ Z⎞ ⎛ AZ+ FU+<br />

BY ⎞<br />

[2] ⎜ T T ⎟⎜ ⎟ ⎜ T T<br />

E b = F G F U = F Z+ GU+ F Y<br />

⎟<br />

= σ , (2.64)<br />

⎟⎜ ⎜ ⎟⎜ ⎟ ⎜<br />

+ +<br />

⎟<br />

⎝ B F A ⎠⎝Y⎠ ⎝ BZ FU AY ⎠<br />

may be obtained directly for the paired trial vector as well<br />

⎛ A F B ⎞⎛Y⎞ ⎛ AY+ FU+<br />

BZ ⎞<br />

[2] P ⎜ T T ⎟⎜ ⎟ ⎜ T T ⎟ P<br />

E b = F G F U = F Y+ GU+ F Z = σ . (2.65)<br />

⎟⎜ ⎜ ⎟⎜ ⎟ ⎜<br />

+ +<br />

⎟<br />

⎝ B F A ⎠⎝ Z⎠ ⎝ BY FU AZ ⎠<br />

68


Solving the Response Equations<br />

The reduced space is therefore extended with both vectors without additional cost. Furthermore,<br />

when a trial vector and its paired counterpart are simultaneously added to the reduced space, the<br />

paired structure of the response equations is preserved. With this structure preserved, the<br />

eigenvalues in the reduced space will also be real and paired, and the lowest eigenvalue will<br />

monotonically decrease towards the converged value as the reduced space is increased. 64<br />

The solution vector in the reduced space X RED , can be expanded in the basis of trial vectors to<br />

express the solution vector in the full space<br />

k<br />

B<br />

. (2.66)<br />

N<br />

= ∑<br />

i=<br />

1<br />

RED<br />

( X i bi<br />

)<br />

The residual can then be found as<br />

k<br />

( ω )<br />

R = E − S N<br />

−B<br />

k<br />

∑<br />

[2] [2] B [1]<br />

= X ( σ −ωρ ) −B<br />

i=<br />

1<br />

RED [1]<br />

i i i<br />

.<br />

(2.67)<br />

If the norm of the residual is smaller than some specified tolerance, the iterative procedure is ended<br />

and the converged solution vector has been found<br />

B<br />

B<br />

N ( ω ) = N . (2.68)<br />

If the residual is too large, a new trial vector may be generated from the residual, preferably with a<br />

preconditioner A to speed up the convergence<br />

k+ 1 =<br />

−1<br />

b A R . (2.69)<br />

The reduced space is then extended with b k+1 and bk<br />

+ 2 = b<br />

k + 1<br />

and Eq. (2.62) is set up and solved<br />

again, establishing the iterative procedure.<br />

2.3.1 Preconditioning<br />

As mentioned above, the residual found in each iteration should be preconditioned to obtain an<br />

effective solver. As a consequence of the strict AO formulation, the electronic Hessian has no<br />

diagonal dominance as was the case in the MO basis. This makes preconditioning a challenge. So<br />

far, this problem has not been solved in our SCF response solver. Instead, a transformation is made<br />

to the MO basis, where the preconditioning is carried out in the usual way using the orbital<br />

eigenvalue differences,<br />

k<br />

P<br />

MO<br />

T<br />

⎣⎡b + 1 ⎦⎤ = ⎣⎡C RkC ⎦⎤<br />

( εa −εi<br />

), (2.70)<br />

k ai ai<br />

69


Part 2<br />

Atomic Orbital Based Response Theory<br />

where C is the MO expansion coefficients and ε the orbital energies of the reference state. The<br />

index a refers to virtual orbitals and i refers to occupied orbitals. The resulting vector is then back<br />

transformed to the AO basis<br />

MO<br />

k + 1 =<br />

k + 1<br />

T<br />

b Cb C . (2.71)<br />

An AO alternative to this preconditioner should of course be found, since the reference to the MO<br />

basis in this preconditioner introduces dense matrix intermediates. Moreover, at least one<br />

diagonalization should be carried out at the end of the optimization of the reference state to obtain<br />

the information on the MOs.<br />

2.3.2 Projections<br />

In the MO basis, the orbital rotations within the occupied and virtual spaces are redundant. The<br />

response equations in the MO formulation are thus simply set up in the non-redundant occupiedvirtual<br />

space to avoid linear dependencies. In the AO basis no such separation exists and the<br />

equations are set up in the full space. To avoid redundancies in the AO formulation, projections<br />

onto the non-redundant space should be made. In the exponential parameterization of the density<br />

matrix used in our AO formulation of the response functions, the projector 23<br />

where<br />

P = P⊗ Q+ Q⊗P<br />

T T<br />

( X) = ∑ µν ρσ X ρσ = ( PXQ + QXP )<br />

P P (2.72)<br />

µν , ,<br />

µν<br />

ρσ<br />

P = DS<br />

Q = 1−DS,<br />

(2.73)<br />

projects onto the non-redundant parameter space. It can be shown that all new trial vectors b and<br />

linear transformations σ and ρ should be projected onto the non-redundant space in the following<br />

manner<br />

b<br />

σ<br />

ρ<br />

= P b<br />

k+ 1 k+<br />

1<br />

T<br />

k+ 1=<br />

P σk+<br />

1<br />

T<br />

k+ 1=<br />

P ρk+<br />

1<br />

,<br />

,<br />

.<br />

(2.74)<br />

When solving the response equations as described in the beginning of this section, the vectors<br />

projected as in Eq. (2.74) are used.<br />

70


The Excited State Gradient<br />

2.4 The Excited State Gradient<br />

In this section the expression for the geometrical gradient of the singlet excited state is derived, to<br />

illustrate how expressions for properties can straightforwardly be derived in the AO response<br />

framework.<br />

As for the derivations in Section 2.2 we assume that the wave function of the ground state is<br />

optimized at the point of the potential surface, x 0 , where the excited state gradient is evaluated. The<br />

variational condition is thus fulfilled at that point<br />

FDS − SDF = 0, (2.75)<br />

and the ground-state energy at x 0 is further obtained as<br />

E<br />

0<br />

= 2TrhD + TrDG ( D ) + h , (2.76)<br />

nuc<br />

where h is the one-electron Hamiltonian matrix in the AO basis, h nuc is the nuclear-nuclear<br />

repulsion, G holds the two-electron AO integrals and the Fock matrix F is given by h + G(D).<br />

As mentioned previously, the excitation energy corresponding to the excitation from the ground<br />

state 0 to the excited state f can be found from the poles of the linear response function for the<br />

optimized ground state, 62 i.e. as the eigenvalue of the linear response generalized eigenvalue<br />

equation as Eq. (2.45)<br />

where ω f is the electronic excitation energy<br />

and b f is the normalized eigenvector. 61,62<br />

( ω f )<br />

[2] [2] f<br />

0<br />

The excitation energy can then be obtained from Eq. (2.77) as<br />

E − S b = , (2.77)<br />

f<br />

0<br />

ω f = E − E<br />

(2.78)<br />

f<br />

f † [2]<br />

assuming that the eigenvectors b f satisfy the normalization condition<br />

f<br />

ω = b E b , (2.79)<br />

f † [2] f<br />

b S b = 1. (2.80)<br />

Since we are interested in the molecular gradient for the excited state, f , the energy of the excited<br />

state should be defined at arbitrary points on the potential surface.<br />

2.4.1 Construction of the Lagrangian<br />

The analytic expression for the excited state gradient is found using the Lagrangian technique 65 . We<br />

construct the Lagrangian for the excited state energy E f = E 0 + ω f , using a matrix-vector notation,<br />

( 1) ( )<br />

f 0 f † [2] f f † [2] f<br />

†<br />

L = E + b E b −ω<br />

b S b − −X FDS−SDF . (2.81)<br />

71


Part 2<br />

Atomic Orbital Based Response Theory<br />

The variational condition on the ground state, Eq. (2.75), and the orthonormality constraint<br />

condition on the eigenvectors, Eq. (2.80), are included, and they are multiplied by the Lagrange<br />

multipliers ω and X , respectively.<br />

We then require the Lagrangian to be variational in all parameters<br />

∂L f<br />

= SDF − FDS = 0<br />

(2.82)<br />

∂X<br />

f<br />

∂L<br />

f † [2] f<br />

= b S b − 1=<br />

0<br />

(2.83)<br />

∂ω<br />

f<br />

∂L<br />

[2] f [2] f<br />

= E b − ωS b = 0<br />

(2.84)<br />

f †<br />

∂b<br />

f<br />

∂L<br />

f † [2] f † [2]<br />

= b E − ωb S = 0<br />

(2.85)<br />

f<br />

∂b<br />

f 0 f † [2] f f † [2] f<br />

∂L<br />

∂E<br />

∂b E b ∂b S b ∂( FDS −SDF<br />

)<br />

n<br />

= + −ω<br />

− X n<br />

= 0<br />

∂X ∂X ∂X ∂X ∑<br />

, (2.86)<br />

∂X<br />

m m m m n<br />

m<br />

where X m are the orbital rotation parameters. Due to the 2n + 1 rule, and since the gradient is a firstorder<br />

property, we only need to solve the above equations through zero order. Eqs. (2.82)-(2.85) are<br />

thus already taken care of, and it is seen that the multiplier ω is determined as the eigenvalue of the<br />

linear response equations, i.e. it corresponds to the excitation energy. It is then only necessary to<br />

determine the Lagrange multipliers X such that Eq. (2.86) is also fulfilled.<br />

2.4.2 The Lagrange Multipliers<br />

To evaluate the terms in Eq. (2.86), the asymmetric Baker-Campbell-Hausdorff (BCH) expansion 46<br />

of the exponentially parameterized density is applied<br />

DX ( ) = exp( − XSD ) exp( SX) = D+ [ DX , ] S<br />

+ , (2.87)<br />

where<br />

[ AB , ] S<br />

= ASB−BSA. (2.88)<br />

Since the derivatives are evaluated at the expansion point, only terms of first order in X are nonzero.<br />

The last term in Eq. (2.86) is found to be equal to 61<br />

[2]<br />

[ , ] [ , ] ([ , ] ) ([ , ] )<br />

E X = F X D S− S X D F+ G X D DS−SDG X D . (2.89)<br />

S S S S<br />

We can thus find X by solving the set of linear equations<br />

E<br />

[2]<br />

0 f † [2] f f † [2] f<br />

∂E<br />

∂b E b ∂b S b<br />

X = + −ω<br />

∂X ∂X ∂X<br />

From the matrix expressions for b f† E [2] b f and b f† S [2] b f 61<br />

. (2.90)<br />

72


The Excited State Gradient<br />

( )<br />

b E b F ⎡ b D b ⎤ G b D D b (2.91)<br />

f † [2] f f f † f f †<br />

=−Tr ⎣<br />

⎡⎣ , ⎤⎦ , −Tr ⎡ , ⎤ ⎡ , ⎤<br />

S ⎦S<br />

⎣ ⎦S ⎣ ⎦S<br />

f † [2] f f † f<br />

b S b = Tr b S⎡⎣D,<br />

b ⎤⎦<br />

S (2.92)<br />

and the relations for the two-electron integrals<br />

T<br />

S<br />

T<br />

( ) = ( )<br />

G A G A (2.93)<br />

Tr AG ( B ) = Tr BG ( A ) , (2.94)<br />

the terms on the right hand side of Eq. (2.90) are found as<br />

where<br />

0<br />

∂E<br />

∂X<br />

= 0 , (2.95)<br />

f † [2] f<br />

f f †<br />

A<br />

2 ω⎡<br />

, ⎤<br />

⎣<br />

SDS ⎡b b ⎤ S<br />

∂X S ⎦<br />

, (2.96)<br />

∂b S b<br />

− ω<br />

= − ⎣ ⎦<br />

f † [2]<br />

∂b E b<br />

∂X<br />

f<br />

= ADS −SDA<br />

, (2.97)<br />

( ) ( ) ( ⎡ , ⎡ , ⎤ ⎤ )<br />

f † f f f f f † f f †<br />

A = Sb Fb S−Sb F − Fb S− Sb F b S+ G<br />

⎣<br />

b ⎣D b ⎦<br />

( ⎡ ⎤ ) ( ⎡ ⎤ )<br />

+ 2 ⎡<br />

, − , ⎤<br />

⎣<br />

Sb G b D G b D b S<br />

⎦<br />

f † f f f †<br />

⎣ ⎦S<br />

⎣ ⎦S<br />

S<br />

S<br />

⎦<br />

S<br />

(2.98)<br />

and<br />

[ ] A 1 1 †<br />

M = M−<br />

M (2.99)<br />

2 2<br />

[ ] S 1 1 †<br />

M = M + M . (2.100)<br />

2 2<br />

Eq. (2.95) is straight forward since the variational condition Eq. (2.75) is fulfilled at the expansion<br />

point.<br />

2.4.3 The Geometrical Gradient<br />

The excited state geometrical gradient should be expressed in terms of the first derivatives of the<br />

one and two electron integral matrices h x , G x , S x and the density, Fock and overlap matrices at the<br />

expansion point x 0 . The notation A x denotes the geometrical first derivative of A. In ref. 66 it was<br />

found that the first derivative of the density D x (X) is given by the first derivative of the reference<br />

density matrix D x which, from the idempotency condition for D, is found to be<br />

x<br />

x<br />

D =−DS D. (2.101)<br />

The first-order geometrical derivative is given by<br />

f f 0 f † [2] f f † [2] f<br />

dE dL dE<br />

∂b E b ∂b S b ∂( FDS −SDF<br />

)<br />

= = + −ω<br />

−X . (2.102)<br />

dx dx dx ∂x ∂x ∂x<br />

73


Part 2<br />

Atomic Orbital Based Response Theory<br />

The first term is simply the geometrical gradient of the ground state. In ref. 66 this was shown to be<br />

E<br />

0 x = 2Tr x + Tr x ( ) + Tr<br />

x + hnuc<br />

x<br />

Dh DG D D F . (2.103)<br />

The other terms are found as the derivative of the matrix expressions in Eq. (2.91) and (2.92)<br />

f † [2]<br />

∂b E b<br />

∂x<br />

f † [2]<br />

f<br />

f<br />

( ( ))<br />

=− Tr F + G D ⎡ , , ⎤ −Tr ⎡ , , ⎤<br />

⎣<br />

b D b<br />

⎦<br />

F<br />

⎣<br />

b D b<br />

⎦<br />

x x f f † f x f †<br />

⎡⎣ ⎤⎦ ⎡ ⎤<br />

S S ⎣ ⎦S<br />

−Tr F⎡⎡⎣ , ⎤⎦ x<br />

, ⎤ Tr ⎡⎡ ⎣ , ⎤⎦<br />

, ⎤<br />

⎣<br />

b D b<br />

⎦<br />

F<br />

⎣<br />

b D b<br />

⎦<br />

f f † f f †<br />

−<br />

S S<br />

S<br />

†<br />

( ⎡ ⎤ ) ⎡ ⎤<br />

S<br />

S<br />

f x f † f †<br />

( ⎡ ⎤ )( ⎡ ⎤ ⎡ ⎤<br />

x )<br />

x f f<br />

− Tr G ⎣b , D⎦ ⎣D,<br />

b ⎦<br />

− 2Tr G ⎣b , D⎦ ⎣D , b ⎦ + ⎣D,<br />

b ⎦<br />

∂b S b<br />

f † x f<br />

− ω<br />

= −ωTr b S ⎡ , ⎤<br />

∂x<br />

⎣D b ⎦ S<br />

S<br />

S S S<br />

( ⎡ ⎤ ⎡ ⎤<br />

x<br />

⎡ ⎤ )<br />

f † x f f f x<br />

⎣ ⎦S ⎣ ⎦S ⎣ ⎦S<br />

− ω Tr b S D , b S+ D, b S+<br />

D,<br />

b S<br />

S<br />

x<br />

S<br />

(2.104)<br />

(2.105)<br />

∂( FDS −SDF<br />

)<br />

x x x x A<br />

− X = − 2X⎡<br />

+ ( ) + + ⎤<br />

∂x<br />

⎣F DS G D DS FD S FDS ⎦ , (2.106)<br />

where F x = h x + G x (D). Collecting the various terms we obtain<br />

f<br />

∂E<br />

∂x<br />

f f † x f † x f<br />

( D ⎡<br />

⎤<br />

⎣<br />

⎡⎣b D⎤⎦ b [ ]<br />

S ⎦<br />

D X<br />

S ) h ⎡ ⎤<br />

S<br />

( ⎡ ⎤<br />

S<br />

)<br />

S<br />

⎣D b ⎦ G ⎣b D⎦<br />

f f †<br />

x x<br />

( D ⎡⎡<br />

⎣b D⎤⎦<br />

b ⎤ [ D X]<br />

) G D hnuc<br />

= Tr 2 − , , − , −Tr , ,<br />

+ Tr −<br />

⎣<br />

, ,<br />

⎦<br />

− , ( ) +<br />

S<br />

S<br />

S<br />

x f f † x f † f †<br />

f<br />

DG( ⎡<br />

⎤<br />

⎣<br />

⎡⎣b D⎤⎦ b ) ( x<br />

S ⎦<br />

⎡<br />

S S<br />

) (<br />

S<br />

)<br />

S ⎣D b ⎤⎦ ⎡⎣Db ⎤⎦ G ⎡⎣b D⎤⎦<br />

x<br />

x<br />

DG( [ DX]<br />

) ( ⎡ ⎤ [ ] x<br />

S ⎣D X⎦<br />

DX<br />

S<br />

S<br />

) F<br />

f x<br />

( ⎡<br />

f † † †<br />

⎡⎣<br />

b D ⎤ , ⎤ ⎡<br />

f f<br />

,<br />

x<br />

, ⎤ ⎡<br />

f f<br />

, , ⎤<br />

⎣ ⎦ b<br />

S ⎦<br />

+<br />

S S<br />

x )<br />

S ⎣⎣ ⎡b D⎦⎤ b<br />

⎦<br />

+<br />

S ⎣⎣ ⎡b D⎦⎤<br />

b<br />

⎦<br />

F<br />

S<br />

f † f x f f x<br />

Tr b S( ⎡b , D ⎤ S ⎡b , D⎤ x<br />

S ⎡b , D⎤<br />

S )<br />

−Tr , , − 2Tr , + , ,<br />

−Tr , − Tr , + ,<br />

− Tr ,<br />

+ ω f ⎣ ⎦ + ⎣ ⎦ + ⎣ ⎦<br />

f † x f<br />

+ ω f Tr b S ⎡⎣b , D⎤⎦<br />

S,<br />

G b D b , ( [ , ] )<br />

where ( ⎡<br />

f<br />

f †<br />

, , ⎤<br />

⎣<br />

⎡⎣<br />

⎤⎦S<br />

⎦ )<br />

S<br />

G x x f<br />

(D), ( ⎡ , ⎤ )<br />

S<br />

S S S<br />

f<br />

G D X , ( ⎡ , ⎤ )<br />

(2.107)<br />

G S ⎣ b D ⎦ and F can be evaluated, whereas<br />

S<br />

G ⎣ b D ⎦ , h x and nuc<br />

x<br />

h have to be evaluated for each geometrical perturbation.<br />

S<br />

Note that no two-electron integrals are represented explicitly, in order to obtain the best<br />

performance – e.g. for linear scaling codes - no reference should be made to four-index integrals.<br />

2.4.4 The First-order Excited State Properties<br />

The expression for the first-order one-electron excited state properties for perturbation independent<br />

basis sets is obtained from the expression for the excited state gradient by omitting all two-electron<br />

derivative terms, as well as all terms involving the derivative of the overlap matrix<br />

74


Test Calculations<br />

( ⎡<br />

†<br />

⎡⎣<br />

⎤⎦<br />

⎤ [ ] )<br />

x 2Tr x Tr f , , f , x x<br />

= −<br />

⎣<br />

S ⎦<br />

− +<br />

S<br />

S nuc<br />

f h f Dh b D b D X h h . (2.108)<br />

The first and last terms in Eq. (2.108) correspond to the ground state first order property as seen<br />

from Eq. (2.103).<br />

2.5 Test Calculations<br />

To illustrate the possibilities of an AO response solver in connection with our SCF optimization<br />

program, test calculations have been carried out on problematic cases from the first part of the<br />

thesis. The lowest excitation energy and the average polarizability, both static and in a field with ω<br />

= 0.03a.u., have been found for the zinc complex in Fig. 1.3 and the rhodium complex in Fig. 1.33.<br />

The levels of theory chosen are those where DIIS could not optimize the reference state, namely<br />

LDA/6-31G for the zinc complex and HF/AhlrichsVDZ with STO-3G on the rhodium for the<br />

rhodium complex.<br />

Table 2-1 Ground state properties obtained with our AO response solver. All numbers are in a.u.<br />

The average polarizability Excitation<br />

static ω = 0.03 energy<br />

Rhodium complex HF/AhrichsVDZ 170.598 173.349 0.0938<br />

Zinc complex LDA/6-31G 161.406 162.517 0.0713<br />

The basis sets applied in the test calculations are not satisfactory for serious polarizability<br />

calculations, and the numbers only demonstrate the perspectives of the AO response solver in<br />

combination with the SCF optimization algorithms described in Part 1. When the solver is fully<br />

implemented in the AO basis, we will be able to obtain molecular properties for large complex<br />

molecules in a routine manner.<br />

The implementation of the excited state gradient is a work in progress. So far we have implemented<br />

calculation of first-order one-electron properties of the excited state for perturbation independent<br />

basis sets as described in Section 2.4.4. The excited state dipole moment of the Rhodium complex<br />

from above has been found as<br />

Rh<br />

Cl<br />

µ = 5.960a.u.<br />

Again it should be noted that the basis set is insufficient for this type of calculation. This is only to<br />

demonstrate that it can be done.<br />

75


Part 2<br />

Atomic Orbital Based Response Theory<br />

2.6 Conclusion<br />

The atomic orbital (AO) based response equations have been derived using the second quantization<br />

framework. In particular, the proof of pairing is considered. Since the diagonal elements in κ are not<br />

redundant in the AO basis, the proof given in the MO basis cannot be directly applied. However, it<br />

is shown that there is also pairing in the AO basis.<br />

An AO response solver has been implemented similar to the solver in the MO basis with a few<br />

exceptions. The lack of diagonal dominance in the electronic Hessian in the AO basis makes<br />

preconditioning a difficult task. Optimally, the AO solver should be implemented in a linear scaling<br />

manner with only matrix multiplications and additions, and without reference to the MO basis.<br />

However, currently a transformation is made to the MO basis where the preconditioning is carried<br />

out followed by a transformation back to the AO basis. The redundant orbital rotations, which are<br />

simply left out of the MO equations, are removed in the AO formulation using projection operators.<br />

The response equations and molecular property expressions are simpler in the AO formulation than<br />

in the MO formulation. To demonstrate how expressions for properties can easily be derived in the<br />

AO response framework, the expression for the geometrical gradient of the singlet excited state has<br />

been derived.<br />

To illustrate the possibilities of the AO optimization methods presented in Part 1, joined with the<br />

AO response solver presented in this part of the thesis, test calculations are given for cases where<br />

DIIS diverged when optimizing the reference state. The averaged polarizability and the lowest<br />

excitation energy are given as well as the excited state dipole for one of the examples.<br />

The derivation and implementation of the various molecular properties is straightforward in the AO<br />

formulation compared to the MO formulation as exemplified by the excited state geometrical<br />

gradient. Especially the derivation of higher derivatives of molecular properties is simplified, and it<br />

will thus be natural to expand our response program in this direction. However, before calculations<br />

of molecular properties of large and complex molecules can be carried out in a truly linear scaling<br />

framework, the problems related to preconditioning of the AO solver must be solved.<br />

76


Part 3<br />

Benchmarking for Radicals<br />

3.1 Introduction<br />

To corroborate the reliability of ab initio quantum chemical predictions of molecular properties, it is<br />

important to investigate and describe strengths and weaknesses of the many-electron models<br />

through systematic benchmark studies on different kinds of molecules.<br />

Regarding open-shell molecules, benchmarks have been reported comparing open- and closed-shell<br />

molecules examining the accuracy of molecular properties computed by various many-electron<br />

models. In a study of the atomization energies of 11 small molecules 67 no significant difference in<br />

the performance for closed- and open-shell molecules was found for the CCSDT model. However,<br />

in another study 68 it was found that even though the CCSD(T) model performs convincingly for<br />

closed-shell molecules, the performance for open-shell molecules is less impressive.<br />

In this part of the thesis full configuration interaction (FCI) benchmarks of molecular properties for<br />

the small open-shell molecules CN and CCH are presented. In the FCI model, all Slater<br />

determinants arising from distributing the electrons in the given one-electron basis with correct<br />

symmetry and spin-projection are included. Errors due to truncation of the many-electron basis are<br />

thus eliminated in an FCI calculation and it provides important benchmarks for other many-electron<br />

models. For open-shell molecules, the number of FCI benchmarks is limited and the work presented<br />

in this part of the thesis is an attempt to improve on this situation. We thus hope our results will<br />

serve as valuable benchmarks for further analysis of open-shell methods.<br />

3.2 Computational Methods<br />

All calculations have been carried out with the quantum chemical program package LUCIA 69 , using<br />

integrals and Hartree-Fock (HF) orbitals obtained from the DALTON 70 program. The calculations<br />

77


Part 3<br />

Benchmarking for Radicals<br />

are based on a ROHF reference wave function, but no spin-adaption is imposed in the CI and CC<br />

calculations.<br />

All FCI calculations have been carried out in the Dunnings cc-pVDZ 71 basis set. Since the number<br />

of determinants in the FCI model increases exponentially with the number of basis functions and<br />

electrons, it is currently not feasible to do the FCI calculations on CN and CCH in the cc-pVTZ<br />

basis. As the cc-pVDZ basis does not provide accurate geometries and energetics, 46 we will also<br />

obtain the equilibrium geometry, harmonic frequency, and dissociation energy for CN using the ccpVTZ<br />

71 basis set in coupled cluster calculations, including up to quadruple excitations. In addition,<br />

FCI and CC calculations up to quadruples level have been carried out on CN and CN - in the basis<br />

set aug-cc-pVDZ without the diffuse d-functions (aug´-cc-pVDZ) to obtain the vertical electron<br />

affinity of CN.<br />

We investigate two ways of defining the excitation-level in CC. The typical approach is to let the<br />

excitation level identify the allowed number of orbital excitations, denoted CC(orb). If instead the<br />

excitation level is taken to identify the spin-orbital excitation level, selected excitations, which<br />

involve spin-flipping and other internal excitations, are excluded from the calculation for open-shell<br />

molecules. This scheme will be referred to as CC(spin-orb). The difference between the two<br />

definitions of the excitation level is illustrated in Fig. 3.1. The CI calculations will all be carried out<br />

with orbital excitations.<br />

Double<br />

orbital<br />

excitation<br />

Triple<br />

Spin-orbital<br />

excitation<br />

Fig. 3.1 An excitation which would be<br />

included in a CCSD(orb) calculation, but<br />

not in a CCSD(spin-orb) calculation.<br />

In the following SD, SDT, SDTQ, SDTQ5, SDTQ56 and SDTQ567 denote excitation-spaces which<br />

include up to 2, 3, 4, 5, 6 and 7 excitations from the occupied spin-orbitals respectively.<br />

78


Numerical Results<br />

3.3 Numerical Results<br />

First, the convergence of the CC and CI hierarchies for the open shell molecule CN is studied. Next,<br />

the potential curve for CN is obtained from CCSD, CCSDT, CCSDTQ, and FCI calculations at<br />

various inter-nuclear distances. In Section 3.3.3, the equilibrium geometries, harmonic frequencies,<br />

and dissociation energies obtained for CN are presented and in Section 3.3.4 the vertical electron<br />

affinity for CN is found. Finally, in Section 3.3.5 a minor benchmark study is presented where the<br />

equilibrium geometry of the intergalactic radical CCH is determined at the FCI level.<br />

3.3.1 Convergence of CC and CI Hierarchies<br />

The convergence of the CC and CI hierarchies are studied. For CN calculations have been carried<br />

out at the experimental equilibrium distance 72 r exp = 1.1718Å at the levels CCSD through<br />

CCSDTQ56. Both the orbital excitation and spin-orbital excitation approaches are considered. In<br />

addition, calculations have been carried out at the levels CISD through CISDTQ567 and in FCI. In<br />

all calculations the cc-pVDZ basis-set is used. The results are seen in Fig. 3.2.<br />

1.E-01<br />

1.E-02<br />

CI<br />

E dev / E h<br />

1.E-03<br />

1.E-04<br />

1.E-05<br />

CC(spinorb)<br />

CC(orb)<br />

1.E-06<br />

SD<br />

SDT<br />

SDTQ<br />

SDTQ5<br />

SDTQ56<br />

SDTQ567<br />

Fig. 3.2 E dev for CC with spin-orbital and orbital<br />

excitation levels and for CI with orbital excitation<br />

levels. E dev = E – E FCI .<br />

The first thing to note is the similarity of the two CC curves. Clearly the spin-orbital excitation<br />

restriction does not affect the accuracy in a significant way, the deviation energies are in all cases<br />

smaller for CC(orb), but the difference is negligible.<br />

Comparing the CI curve with the CC curves, two trends are obvious; the smooth convergence of the<br />

CC hierarchy compared to the CI hierarchy and the faster convergence of the CC hierarchy. The CC<br />

energy obtained using up to n-fold excitations is roughly as accurate as the CI energy using up to<br />

n+1-fold excitations. Both phenomena are explained by the inclusion of disconnected clusters in the<br />

CC wave function. At a given level of CC theory, the CC wave function includes all the CI<br />

configurations at the same level of CI theory plus some higher excitations arising from disconnected<br />

clusters. Consequently, it covers the dynamical correlation better than CI and is thus at the given<br />

79


Part 3<br />

Benchmarking for Radicals<br />

level closer to the FCI solution. Describing the convergence pattern of the CI and CC hierarchies<br />

through orders of Møller-Plesset perturbation theory (MPPT), 73 the form of the curves can be<br />

predicted. Because also disconnected products of excitations are included in the ansatz of CC, the<br />

order of its error grows continually in the order of MPPT. Going from uneven to even excitation<br />

levels, both methods have an increase in the order of error in energy of two orders of MPPT, thus,<br />

the graphs are parallel. Going from even to uneven excitation levels, the CC error increases one<br />

order, whereas the CI error remains unchanged, giving a greater slope for the CC curve. This<br />

explains the parallel behavior going from uneven to even excitation levels and the smoother<br />

convergence of the CC hierarchy compared to the CI hierarchy. The stepwise convergence<br />

predicted by MPPT, which should be significant for CI and noticeably for CC, is not apparent<br />

though. The reason could be that CN is not strictly mono-configurational.<br />

The convergence patterns for CI and CC are very similar to the convergence patterns previously<br />

reported for N 2 . 74 Therefore, it does not seem that the open-shell nature of CN leads to slow<br />

convergence of the CI and CC hierarchies compared to closed shell cases.<br />

3.3.2 The Potential Curve for CN<br />

The potential curve for CN was determined from single-point calculations at the FCI level with<br />

basis set cc-pVDZ. Close to equilibrium the energies were converged to 10 -9 E h making the<br />

determination of accurate spectroscopic constants possible. The result is displayed in Fig. 3.3.<br />

E FCI / E h<br />

-92.15<br />

-92.20<br />

-92.25<br />

-92.30<br />

-92.35<br />

-92.40<br />

-92.45<br />

-92.50<br />

0.5 1.5 R / Å 2.5 3.5<br />

Fig. 3.3 The potential curve for CN found from FCI<br />

cc-pVDZ calculations.<br />

E dev / E h<br />

0.03<br />

0.02<br />

0.01<br />

0.00<br />

CCSD<br />

CCSDT<br />

CCSDTQ<br />

0.9 1.2 R / Å 1.5 1.8<br />

Fig. 3.4 E dev for the CC potential curves. E dev (R) =<br />

E(R) – E FCI (R).<br />

The potential curve was also created with the methods CCSD(orb), CCSDT(orb) and CCSDTQ(orb)<br />

in the basis set cc-pVDZ. Since the weight of the reference HF- determinant decreases as the internuclear<br />

distance increases, we examine the HF-coefficients from the FCI calculations and discover<br />

that it is irrelevant to make single-reference CC calculations beyond R = 1.8Å, since the weight of<br />

the reference has already dropped to 0.57 at that point. Fig. 3.4 displays the differences of the CC<br />

80


Numerical Results<br />

potential curves compared to the FCI curve. At a given inter-nuclear distance, the FCI energy has<br />

been subtracted from the CC energy.<br />

The decreasing weight of the reference ground state with increasing atomic distance is reflected in<br />

the quality of the CC wave functions. The correlation in the wave function compensates partially for<br />

the lack of a single dominant configuration; the higher the correlation level, the better the<br />

compensation. This is illustrated by the slopes of the curves in Fig. 3.4. Furthermore, it should be<br />

noticed how the deviation energy is nearly linear in R, with a slightly positive curvature around the<br />

equilibrium geometry.<br />

3.3.3 Spectroscopic Constants and Atomization Energy for CN<br />

The equilibrium geometry and harmonic frequency for CN were found from single-point<br />

calculations using quartic interpolation. The atomization energy was found at the experimental<br />

equilibrium distance. The results are displayed in Table 3-1.<br />

Table 3-1 Equilibrium geometry, harmonic frequency, and atomization energy for CN.<br />

R eq / Å ω e / cm -1 D e / kJ/mol<br />

CCSD(spin-orb) cc-pVDZ 1.1855 2114 629.2<br />

CCSD(orb) cc-pVDZ 1.1860 2111 631.6<br />

CCSDT(spin-orb) cc-pVDZ 1.1944 2046 662.9<br />

CCSDT(orb) cc-pVDZ 1.1946 2043 663.0<br />

CCSDTQ(spin-orb) cc-pVDZ 1.1964 2026 666.4<br />

CCSDTQ(orb) cc-pVDZ 1.1964 2025 666.5<br />

FCI cc-pVDZ 1.1969 2020 667.0<br />

CCSD(spin-orb) cc-pVTZ 1.1688 2136 674.2<br />

CCSDT(spin-orb) cc-pVTZ 1.1783 2067 714.4<br />

CCSDTQ(spin-orb) cc-pVTZ 1.1804 2045 718.5<br />

Experimental 72 1.1718 2069 ---<br />

As mentioned in Section 3.2, it is not feasible to carry out FCI calculations at the cc-pVTZ level.<br />

Still, the convergence of the CC hierarchy can be estimated by examining the changes in the<br />

constants. Since the difference in accuracy between the models CC(orb) and CC(spin-orb) is<br />

negligible compared to the deviation from FCI, only the CC(spin-orb) results are discussed from<br />

now on and only the CC(spin-orb) numbers are found at the cc-pVTZ level.<br />

The deviation curves for the coupled cluster energies (see Fig. 3.4) are increasing functions, and<br />

thus the coupled cluster equilibrium bond lengths are shorter than the one found from FCI.<br />

Furthermore, the positive curvature of the deviation-curves around the equilibrium leads to coupled<br />

cluster frequencies that are higher than the FCI frequency.<br />

81


Part 3<br />

Benchmarking for Radicals<br />

As expected, the cc-pVDZ basis set does not provide accurate geometries and frequencies, and the<br />

cc-pVTZ numbers are clearly more in the range of the experimental data than the cc-pVDZ<br />

numbers.<br />

CCSD displays its insufficiency for prediction of equilibrium properties by differing from the FCI<br />

values by 0.01Å in the geometry, 90 cm -1 in the frequency, and 35 kJ/mol in the atomization energy.<br />

The errors in R eq and ω e are reduced by a factor of four going to the CCSDT level and a factor of<br />

five going from the CCSDT to the CCSDTQ level. The error in the atomization energy is reduced<br />

by a factor of nine going to the CCSDT level and a factor of eight going from the CCSDT to the<br />

CCSDTQ level, but while the equilibrium geometry on the CCSDTQ level is only 0.0005Å from<br />

the FCI value, the harmonic frequency is still about 5 cm -1 too high.<br />

Both the equilibrium geometry and the harmonic frequency are apparently better approximated by<br />

the CCSDT method than the CCSDTQ. This is due to a favorable cancellation in errors for CCSDT<br />

calculations in small basis sets. By extrapolation to the larger aug-cc-pVQZ basis, 67,75 we get an<br />

equilibrium distance of 1.1759Å and a harmonic frequency of 2060cm -1 at the CCSDTQ level.<br />

3.3.4 The Vertical Electron Affinity of CN<br />

Calculations on CN - and CN were carried out in the aug´-cc-pVDZ basis at the experimental<br />

equilibrium geometry for CN. The FCI calculation on CN - is one of the largest FCI calculations<br />

carried out so far containing about 20 billion Slater determinants. The vertical electron affinity (EA)<br />

was found and is displayed in Table 3-2. Again only CC(spin-orb) calculations have been carried<br />

out because of the rather small difference in performance of CC(spin-orb) and CC(orb).<br />

Table 3-2 The vertical electron affinity of CN.<br />

EA / E h EA - EA FCI<br />

CCSD(spin-orb) aug’-cc-pVDZ 0.13025 0.00063<br />

CCSDT(spin-orb) aug’-cc-pVDZ 0.12977 0.00014<br />

CCSDTQ(spin-orb) aug’-cc-pVDZ 0.12966 0.00003<br />

FCI aug’-cc-pVDZ 0.12962 ---<br />

The convergence is remarkable; already at the CCSD level we are down to an error of 0.5% of the<br />

FCI value, on the CCSDT level it is 0.1% and on the CCSDTQ level 0.02%. The reason for the<br />

excellent convergence is found in a cancellation of errors that influence the result. The deviations of<br />

the individual energies are always roughly an order of magnitude larger than the deviation of the<br />

affinity, 75 but the errors cancel when the CN and CN - energies are subtracted. That the convergence<br />

is from above is also noteworthy. This is because the CC hierarchy converges faster for CN - than for<br />

82


Numerical Results<br />

CN. This seems surprising since CN - contains one more electron than CN, but it could be explained<br />

by CN - being more one-configurational than CN.<br />

3.3.5 The Equilibrium Geometry of CCH<br />

The equilibrium geometry of CCH found from FCI/cc-pVDZ calculations is used in ref. 76 to<br />

calibrate coupled cluster calculations in larger basis sets. The FCI correction is assumed to be<br />

independent of basis set.<br />

To optimize for the two variables R(CC) and R(CH), the CCH radical is assumed linear and the CC<br />

and CH bonds are then distorted in step-lengths of δ = 0.01Å from an initial geometry making a grid<br />

of single-point calculations around the equilibrium geometry with R(CC) on the one axis and R(CH)<br />

on the other. The initial geometry is taken from a CCSDT cc-pVDZ study 76 , the geometry being<br />

R CCSDT (CC) = 1.23448Å and R CCSDT (CH) = 1.07924Å. The resulting potential energy surface is seen<br />

in Fig. 3.5.<br />

-76.4020<br />

-76.4024<br />

E FCI / E h<br />

-76.4028<br />

-76.4032<br />

-76.4036<br />

1.09924<br />

1.08924<br />

1.07924<br />

1.06924<br />

R (C-H)/Å<br />

1.21448<br />

1.22448<br />

1.23448<br />

R (C-C)/Å<br />

1.24448<br />

1.25448<br />

1.05924<br />

Fig. 3.5 The potential energy surface of CCH.<br />

From finite-difference expressions with the error being of the order δ 4 , the gradient and Hessian are<br />

found for the initial geometry and a Newton step is taken giving an improved guess for the<br />

equilibrium geometry. The FCI equilibrium geometry is thus found as<br />

FCI<br />

CCSDT −1<br />

R = R −H G, (3.1)<br />

where G is the gradient, H the Hessian, and R CCSDT the CCSDT geometry.<br />

The equilibrium geometry at the FCI level is found to be<br />

83


Part 3<br />

Benchmarking for Radicals<br />

R FCI (CC) = 1.2367Å and R FCI (CH) = 1.0802Å.<br />

The error in the resulting geometry is a sum of the error from the finite difference approximations<br />

and the error from the Newton step. The gradient and Hessian carry an error of O(δ 4 ) where δ =<br />

0.01Å, this is an error in the order of 10 -8 Å. The Newton step has an error of O((H -1 G) 2 ), in this<br />

case H -1 G is of the size 10 -3 Å and so the error is in the order of 10 -6 Å. The error in total is thus in<br />

the order of 10 -6 Å.<br />

The gradient for the FCI equilibrium geometry has been found as above, making single-point<br />

calculations at the FCI geometry and at geometries distorted in steps of 0.01Å from the FCI<br />

geometry. The same finite-difference expressions as before are used. The gradient is found to be<br />

⎡<br />

FCI 1.8593 10<br />

E<br />

⎢<br />

;3.0661 10<br />

⎣<br />

Å<br />

⎤<br />

Å⎥⎦<br />

G −5 h<br />

−5<br />

= − ⋅ ⋅ h , (3.2)<br />

thus verifying the correctness of the FCI geometry.<br />

Since the geometry was determined at the CCSDT level to be R CCSDT (CC) = 1.23448Å and<br />

R CCSDT (CH) = 1.07924Å, the error due to truncation of the many-electron basis in CCSDT is in the<br />

order of 10 -3 Å. This is similar to the results obtained for CN. This also suggests that the quadruples<br />

correction to the equilibrium geometry is in the order of 0.001-0.002Å.<br />

3.4 Conclusion<br />

Full configuration interaction (FCI) and coupled cluster (CC) calculations have been carried out on<br />

CN using the cc-pVDZ and cc-pVTZ basis sets. The equilibrium bond distance, harmonic<br />

frequency, atomization energy, and vertical electron affinity have been evaluated on the various<br />

levels of theory.<br />

As expected, the cc-pVDZ basis set does not provide accurate geometries and frequencies and<br />

CCSD is insufficient for prediction of equilibrium properties. Apparently, the CCSDT method is a<br />

better approximation than CCSDTQ for obtaining the equilibrium geometry and the harmonic<br />

frequency. This is due to a favorable cancellation of errors for CCSDT calculations in small basis<br />

sets. Also the vertical electron affinities are affected by cancellation of errors, and already at the<br />

CCSD level, the error is less than 1mE h compared to the FCI value.<br />

The convergence patterns for the CI and CC hierarchies are studied for CN and it is found similar to<br />

the convergence patterns previously reported for N 2 . 74 Thus, it does not seem that the open-shell<br />

nature of CN leads to slow convergence of the CI and CC hierarchies compared to closed shell<br />

cases.<br />

E<br />

84


Conclusion<br />

For a number of the CC calculations, the excitation levels have been defined by spin-orbital<br />

excitations instead of orbital excitations. Certain internal excitations are thereby omitted, but it is<br />

seen that this does not affect the accuracy in any significant way. For a given excitation level, the<br />

energies obtained in the orbital formalism are in all cases closer to the FCI energy than the ones<br />

obtained in the spin-orbital formalism. However, the difference is negligible.<br />

The equilibrium geometry of CCH has been found at the FCI level in the cc-pVDZ basis set to be<br />

R FCI (CC) = 1.2367Å and R FCI (CH) = 1.0802Å. The correction found to the initial CCSDT geometry<br />

is in the order of 10 -3 Å. The FCI correction to the CCSDT equilibrium geometry of CN was of the<br />

same order.<br />

85


Summary<br />

The developments in computer hardware and linear scaling algorithms over the last decade have<br />

made it possible to carry out ab-initio quantum chemical calculations on bio-molecules with<br />

hundreds of amino acids and on large molecules relevant for nano-science. Quantum chemical<br />

calculations are thus evolving to become a widespread tool for use in several scientific branches. It<br />

is therefore important that the algorithms work as black-boxes, such that the user outside quantum<br />

chemistry does not have to be concerned with the details of the calculations. In particular Hartree<br />

Fock (HF) and density functional theory (DFT) methods are employed for calculations on large<br />

systems as they represent good compromises between relatively low computational costs and<br />

reasonable accuracy of the results. The HF and DFT methods have been a fundamental part of<br />

quantum chemistry for many years, and calculations on molecules of ever increasing size and<br />

complexity are made possible due to increasing computer resources. The conventional algorithms<br />

used for optimization of the one-electron density in HF and DFT are therefore continually tried on<br />

their stability and general performance and occasionally they break down. In these cases the<br />

calculation takes more time to complete than acceptable or no result can be obtained at all.<br />

We have improved on this situation. In the first part of this thesis, algorithms are presented which<br />

improve the optimization in HF and DFT significantly. The optimization has become more effective<br />

and where the optimization broke down using conventional algorithms, it now converges without<br />

problems. Furthermore, the presented algorithms have no problem-specific parameters and can thus<br />

be used as black-boxes.<br />

When the one-electron density has been optimized, molecular properties such as polarizabilities and<br />

excitation energies can be calculated. Response theory is often used for this purpose. In the second<br />

part of this thesis an atomic orbital (AO) based formulation of response theory is presented which<br />

allows linear scaling calculations of molecular properties. Furthermore, the derivation of<br />

expressions for molecular properties is simpler in the AO formulation than in the molecular orbital<br />

formulation typically used. To illustrate the benefits, the expression for the geometrical derivative<br />

of the excited state is derived in the AO formulation.<br />

To confirm the reliability of quantum chemical predictions of molecular properties, it is important<br />

to investigate and describe strengths and weaknesses of the quantum chemical models employed.<br />

The full configuration interaction (FCI) model is exact within a certain basis set of atomic orbitals.<br />

It is thus of great value to be able to compare results from approximate models with FCI results. In<br />

the third part of this thesis FCI results are presented for two open-shell molecules, namely CN and<br />

CCH. The FCI results are compared with results from approximate models used today for<br />

calculations where an accuracy comparable to the experimental is needed.<br />

87


Dansk Resumé<br />

Udviklingen i det seneste årti indenfor computerhardware og lineært skalerende algoritmer har gjort<br />

det muligt at udføre ab-initio kvantekemiske beregninger på bio-molekyler med hundredvis af<br />

aminosyrer og på store molekyler relevant for nanoteknologi. Kvantekemiske beregninger udvikler<br />

sig derfor til at være et bredt anvendt værktøj til brug for adskillige naturvidenskabelige grene. Det<br />

er derfor vigtigt at algoritmerne fungerer som såkaldte black-boxes, således at brugere uden for<br />

kvantekemi ikke behøver bekymre sig om detaljerne i beregningen. Især Hartree Fock (HF) og<br />

density functional theory (DFT) metoderne er benyttet til beregninger på store systemer, da de<br />

repræsenterer et godt kompromis mellem fornuftig nøjagtighed af resultaterne og relativ kort<br />

beregningstid. HF og DFT er metoder, som har været anvendt i kvantekemien igennem mange år,<br />

og da stadig større computer ressourcer er til rådighed bliver de brugt til at udføre beregninger på<br />

stadigt større og mere komplekse molekyler. De algoritmer som benyttes i dag til optimering af den<br />

en-elektroniske densitet i HF og DFT bliver derfor til stadighed testet på deres stabilitet og<br />

effektivitet og til tider bryder de sammen. I disse tilfælde tager beregningen enten uacceptabelt lang<br />

tid eller opgiver at levere et resultat.<br />

Vi har forbedret denne situation. I den første del af afhandlingen præsenteres algoritmer, som<br />

signifikant forbedrer optimeringen i HF og DFT. Optimeringen er blevet mere effektiv, og tilfælde<br />

hvor optimeringen før brød sammen kan nu udføres uproblematisk. De præsenterede algoritmer har<br />

desuden ingen problem-specifikke parametre og kan derfor betragtes som black-boxes.<br />

Når den en-elektroniske densitet er optimeret, kan molekylære egenskaber såsom polarisabiliteter<br />

og eksitationsenergier beregnes. Til det formål benyttes ofte responsteori. I anden del af<br />

afhandlingen præsenteres en atomorbitalformulering af responsteori, som muliggør en lineær<br />

skalering af egenskabsberegningerne. Desuden er udviklingen af udtryk for molekylære egenskaber<br />

blevet simplere i atomorbitalformuleringen sammenlignet med molekylorbitalformuleringen som<br />

ellers typisk benyttes. For at illustrere fordelene er udtrykket for den eksiterede tilstands<br />

geometriske gradient udviklet i atomorbitalformuleringen.<br />

For at bekræfte troværdigheden af kvantekemiske forudsigelser af molekylære egenskaber, er det<br />

vigtigt at undersøge og beskrive styrker og svagheder ved de kvantekemiske modeller som<br />

anvendes. Full configuration interaction (FCI) er en eksakt model inden for et bestemt sæt af<br />

atomorbital basisfunktioner. Det er derfor værdifuldt at kunne sammenligne resultater fra<br />

approksimative modeller med FCI resultater. I tredje del af afhandlingen er FCI resultater<br />

præsenteret for to åben-skal molekyler, CN og CCH. Disse resultater er sammenlignet med<br />

resultater fra approksimative modeller, som i dag bruges til at levere kvantekemiske beregninger<br />

med en nøjagtighed, som i visse tilfælde overgår den eksperimentelle.<br />

89


Appendix A<br />

The Derivatives of the DSM Energy<br />

The first and second derivatives of the DSM energy model with respect to c is found recalling that<br />

and<br />

DSM<br />

( ) ( )<br />

( ) ( ) 2Tr<br />

E c = E D + 2TrFD δ , (A-1)<br />

E D = E D0 + DF + 0 + TrDF, + +<br />

(A-2)<br />

n<br />

D = c ( D −D ), (A-3)<br />

+<br />

∑<br />

i=<br />

1<br />

The two terms in Eq. (A-1) is evaluated one by one:<br />

and<br />

∂E<br />

∂c<br />

( D )<br />

x<br />

i<br />

i<br />

0<br />

D δ = 3DSD −2DSDSD −D. (A-4)<br />

= Tr DF − Tr DF + Tr DF + Tr DF−Tr DF −Tr<br />

DF (A-5)<br />

x 0 0 x x x<br />

0 0<br />

∂<br />

∂F<br />

∂D<br />

2TrFDδ<br />

= 2Tr Dδ<br />

+ 2TrF<br />

∂c ∂c ∂c<br />

x x x<br />

∂Dδ<br />

= 2TrFD<br />

x δ + 2Tr F ,<br />

∂c<br />

x<br />

δ<br />

(A-6)<br />

where<br />

∂D<br />

∂<br />

δ<br />

c x<br />

= 3DSD + 3D SD −2DSDSD −2DSD SD −2D SDSD −D . (A-7)<br />

The second derivative is found in the same manner<br />

∂<br />

where<br />

2<br />

E<br />

∂c<br />

x<br />

( D )<br />

∂c<br />

y<br />

x x x x x x<br />

= 2TrDF + TrDF + TrDF −TrDF −TrDF −TrDF −TrDF, (A-8)<br />

0 0 x y y x 0 x x 0 y 0 0 y<br />

2<br />

2<br />

∂<br />

∂ δ ∂ δ ∂ δ<br />

2Tr δ = 2Tr D x + 2Tr D y + 2Tr<br />

D<br />

x y y x x y<br />

FD F F F , (A-9)<br />

∂c ∂c ∂c ∂c ∂c ∂c<br />

2<br />

∂ D<br />

∂c<br />

∂c<br />

x<br />

δ<br />

y<br />

= 3D SD + 3D SD −2DSD SD −2D SDSD −2DSD SD<br />

y x x y y x y x x y<br />

−2DSDSD−2DSDSD −2 DSDSD.<br />

y x x y x y<br />

(A-10)<br />

91


Appendix B<br />

The Density Matrix in the Atomic Orbital Basis<br />

In this appendix we will briefly review the density matrix in the atomic orbital basis and derive the<br />

most important relations. For convenience consider a single-determinant wave function with n<br />

molecular orbitals occupied. The expectation value of a one-electron operator may then be written<br />

as a sum over occupied spin-orbitals<br />

0 hˆ<br />

0<br />

n<br />

= ∑ h . (B-1)<br />

i=<br />

1<br />

ii<br />

Explicitly introducing the MO-AO transformation matrix C allow us to write the expectation value<br />

as<br />

0 hˆ<br />

0<br />

=<br />

n<br />

i=<br />

1<br />

ii<br />

N n<br />

⎛<br />

∗<br />

∑ hµν ∑Cµ iCν<br />

i<br />

µν , = 1 i=<br />

1<br />

N<br />

h<br />

⎞<br />

= ⎜ ⎟<br />

⎝ ⎠<br />

=<br />

∑<br />

∑<br />

h<br />

D<br />

µν µν<br />

µν , = 1<br />

,<br />

(B-2)<br />

where N is the number of AO basis functions and we have introduced D as<br />

D<br />

n<br />

µν C ∗<br />

µ iCνi<br />

i=<br />

1<br />

= ∑ . (B-3)<br />

It is of interest to study the relation between D and the expectation values ∆ of Eq. (2.10). To<br />

accomplish this we consider the second quantization expression for 0 h ˆ 0 in the nonorthogonal<br />

atomic orbital basis. According to ref. 46 one obtains<br />

N<br />

0 hˆ<br />

0 =<br />

0 0<br />

µν , = 1<br />

N<br />

µν , = 1<br />

N<br />

h<br />

1 1 †<br />

aµ a<br />

µν ν<br />

= ∆<br />

=<br />

− −<br />

∑ ( S hS )<br />

−1 −1<br />

∑ ( S hS )<br />

∑<br />

µν<br />

−1 −1<br />

( S ∆S )<br />

µν<br />

µν µν<br />

µν , = 1<br />

.<br />

(B-4)<br />

By comparing Eqs. (B-4) and (B-2) we have the identification<br />

−1 −1<br />

D = S ∆S . (B-5)<br />

93


Thus, the density element D µν is only identical to the matrix element ∆ µν in an orthonormal basis.<br />

Although it could be argued that it would be appropriate to call ∆ the one-electron density matrix in<br />

the AO-basis, we will be consistent with the standard literature and call D the density matrix in the<br />

AO basis, and ∆ the matrix of expectation values of creation-annihilation operators. From the<br />

properties of the one-electron density matrix<br />

D<br />

†<br />

= D<br />

Tr DS = N<br />

DSD = D ,<br />

elec.<br />

(B-6)<br />

one straightforwardly obtains the following relations for ∆<br />

∆<br />

Tr ∆S<br />

−1<br />

†<br />

−1<br />

= ∆<br />

= N<br />

∆S ∆ = ∆.<br />

elec.<br />

(B-7)<br />

Although Eqs. (B-6) and Eqs. (B-7) are formally equivalent, the equations for the standard AO<br />

density matrix D are somewhat simpler to use as they contain the metric S whereas the equations for<br />

∆ involves the inverted metric S -1 . It should be noted that Eqs. (B-7) are necessary and sufficient<br />

conditions, so all three equations are fulfilled if and only if 0 is a normalized single-determinant<br />

wave function.<br />

94


Acknowledgements<br />

A number of people have made <strong>my</strong> four years of <strong>PhD</strong> study a pleasant and interesting experience,<br />

and I could not have done it without them. First of all I would like to thank Jeppe Olsen and Poul<br />

Jørgensen for guidance and support through the years; they are a fantastic team. I am grateful to the<br />

whole theoretical chemistry group for nice lunch breaks and cake-meetings, and I would like to<br />

thank in particular Ove Christiansen for his career advices and Andreas Hesselman for sharing some<br />

of his latest work with me. And Stinne, how I managed to get through the days before Stinne joined<br />

the group is a <strong>my</strong>stery. It quickly turned out that we have much the same attitude towards life and<br />

we have shared many a wholehearted opinion of the life as such and our work situation in<br />

particular.<br />

I would like to thank Pawel Salek for being good company during development and debugging of<br />

Fortran90 code of the finest quality and for being willing to help with any problems that I might<br />

have. A special thanks goes to Sonia Coriani and her husband Asger Halkier who took very good<br />

care of me during <strong>my</strong> visits in Trieste (even though I still havn’t tasted her mum’s lasagna).<br />

For a number of conferences, winter schools and summer schools a group of mainly Scandinavian<br />

people made <strong>my</strong> trips an extra pleasant experience. They were always ready for some boozing and<br />

all sorts of crazy ideas. In particular should be mentioned Patzke-guy; a gentleman disguised as a<br />

theoretician, Pekka; the lizard king, Ulf; the sweet Swede, crazy Mikael, Ola, Tom<strong>my</strong> and all the<br />

others. It has been some really fine hours spent with you guys, and I hope to see you all again,<br />

maybe for a salmari or two – no miksi ei.<br />

I also had the pleasure to spend a summer school with some of the students from the Copenhagen<br />

group: Marianne, Anders, Jacob and Thorsten. Anders and Jacob got connected to the Aarhus group<br />

at some point and have always been up for a nice chat and disgusting body noises to cheer up a grey<br />

day at work.<br />

I would like to thank Birgit Schiøtt for nice colleagueship in connection with teaching and for<br />

coffee and talks in her office. I look forward to our collaboration on <strong>my</strong> next project.<br />

I am grateful to the girl-gang; Louise, Trine, Cindie, and Rikke for keeping the connection to Århus<br />

and for gossip, lunch dates and girl nights.<br />

I would also like to thank <strong>my</strong> parents for raising me as a good girl who always did her homework,<br />

otherwise I would never have gotten this far, and last but not least a great thanks goes to Kristoffer<br />

for putting up with me and being considerate and caring when needed.<br />

95


References<br />

1<br />

2<br />

3<br />

4<br />

5<br />

6<br />

7<br />

8<br />

9<br />

C. C. J. Roothaan, Rev. modern Physics 23, 69 (1951).<br />

G. G. Hall, Proc. R. Soc. London, Ser. A 205, 541 (1951).<br />

W. Kohn and L. J. Sham, Phys. Rev. 140, A1133 (1965).<br />

J. Koutecky and V. Bonacic, J. Chem. Phys. 55, 2408 (1971); T. Claxton and W. Smith, Theor.<br />

Chim. Acta 22, 399 (1971); W. A. Lathan, L. A. Curtiss, W. J. Hehre et al., Progress in<br />

Physical Organic Chemistry. (Wiley, New York, 1974).<br />

D. H. Sleeman, Theor. Chim. Acta 11, 135 (1968).<br />

J. C. Slater, J. B. Mann, T. M. Wilson et al., Phys. Rev. 184, 672 (1969); A. D. Rabuck and<br />

G. E. Scuseria, J. Chem. Phys. 110, 695 (1999); B. I. Dunlap, Phys. Rev. A 29, 2902 (1984).<br />

R. McWeeny, Proc. R. Soc. London Ser. A 235, 496 (1956).<br />

R. McWeeny, Rev. Mod. Phys. 32, 335 (1960).<br />

R. Fletcher and C. M. Reeves, Comput. J. 7, 149 (1964).<br />

10 I. H. Hillier and V. R. Saunders, Proc. R. Soc. London Ser. A 320, 161 (1970).<br />

11 R. Seeger and J. A. Pople, J. Chem. Phys. 65, 265 (1976).<br />

12 R. N. Camp and H. F. King, J. Chem. Phys. 75, 268 (1981).<br />

13 R. E. Stanton, J. Chem. Phys. 75, 3426 (1981).<br />

14 W. R. Wessel, J. Chem. Phys. 47, 3253 (1967); Douady, Ellinger, Subra et al., J. Chem. Phys.<br />

72, 1452 (1980).<br />

15 G. B. Bacskay, Chem. Phys. 61, 385 (1981).<br />

16 R. Shepard, I. Shavitt, and J. Simons, J. Chem. Phys. 76, 543 (1982).<br />

17 H. J. Aa. Jensen and P. Jørgensen, J. Chem. Phys. 80, 1204 (1984); H. J. Aa. Jensen and H.<br />

Ågren, Chem. Phys. Lett. 110, 140 (1984).<br />

18 X. Li, J. M. Millam, G. E. Scuseria et al., J. Chem. Phys. 119, 7651 (2003); E. Hernández, M.<br />

J. Gillan, and C. M. Goringe, Phys. Rev. B 53, 7147 (1996); J. M. Millam and G. E. Scuseria, J.<br />

Chem. Phys. 106, 5569 (1997); M. Challacombe, J. Chem. Phys. 110, 2332 (1999).<br />

19 A. H. R. Palser and D. E. Manolopoulos, Phys. Rev. B 58, 12704 (1998).<br />

20 C. Ochsenfeld and M. Head-Gordon, Chem. Phys. Lett. 270, 399 (1997).<br />

21 R. W. Nunes and D. Vanderbilt, Phys. Rev. B 50, 17611 (1994); M. S. Daw, Phys. Rev. B 47,<br />

10895 (1993); X. P. Li, R. W. Nunes, and D. Vanderbilt, Phys. Rev. B 47, 10891 (1993).<br />

22 G. Galli and M. Parrinello, Phys. Rev. Lett. 69, 3547 (1992); F. Mauri, G. Galli, and R. Car,<br />

Phys. Rev. B 47, 9973 (1993); W. Kohn, Chem. Phys. Lett. 208, 167 (1993); P. Ordejon, D.<br />

Drabold, M. Grunbach et al., Phys. Rev. B 48, 14646 (1993).<br />

23 T. Helgaker, H. Larsen, J. Olsen et al., Chem. Phys. Lett. 327, 397 (2000).<br />

24 A. D. Daniels and G. E. Scuseria, Phys. Chem. Chem. Phys. 2, 2173 (2000).<br />

25 J. VandeVondele and J. Hutter, J. Chem. Phys. 118, 4365 (2003).<br />

26 J. B. Francisco, J. M. Martínez, and L. Martínez, J. Chem. Phys. 121, 10863 (2004).<br />

27 D. R. Hartree, The calculation of atomic structures. (John Wiley and Sons, Inc., New York,<br />

1957).<br />

28 E. Isaacson and H. B. Keller, Analysis of numerical methods. (Wiley, New York, 1966); C. C. J.<br />

Roothaan and P. S. Bagus, Methods in Computational Physics. (Academic, New York, 1963).<br />

29 N. W. Winter and T. H. Dunning Jr., Chem. Phys. Lett. 8, 169 (1971).<br />

97


30 W. B. Neilsen, Chem. Phys. Lett. 18, 225 (1973).<br />

31 M. C. Zerner and M. Hehenberger, Chem. Phys. Lett. 62, 550 (1979).<br />

32 G. Karlström, Chem. Phys. Lett. 67, 348 (1979).<br />

33 P. Pulay, Chem. Phys. Lett. 73, 393 (1980); P. Pulay, J. Comput. Chem. 3, 556 (1982).<br />

34 H. Sellers, Int. J. Quant. Chem. 45, 31 (1993).<br />

35 I. Hyla-Krispin, J. Demuynck, A. Strich et al., J. Chem. Phys. 75, 3954 (1981).<br />

36 E. Cancès and C. Le Bris, Int. J. Quant. Chem. 79, 82 (2000).<br />

37 K. N. Kudin, G. E. Scuseria, and E. Cancès, J. Chem. Phys. 116, 8255 (2002).<br />

38 L. Thøgersen, J. Olsen, D. Yeager et al., J. Chem. Phys. 121, 16 (2004).<br />

39 L. Thøgersen, J. Olsen, A. Köhn et al., J. Chem. Phys. 123, 074103 (2005).<br />

40 A. P. Rendell, Chem. Phys. Lett. 229, 204 (1994).<br />

41 H. Sellers, Chem. Phys. Lett. 180, 461 (1991); C. Kollmar, Int. J. Quant. Chem. 62, 617 (1997).<br />

42 V. R. Saunders and I. H. Hillier, Int. J. Quant. Chem. 7, 699 (1973).<br />

43 S. P. Bhattacharyya, Chem. Phys. Lett. 56, 395 (1978).<br />

44 R. Carbó, J. A. Hernández, and F. Sanz, Chem. Phys. Lett. 47, 581 (1977).<br />

45 E. Cancès and C. Le Bris, Math. Model. Num. Anal. 34, 749 (2000).<br />

46 T. Helgaker, P. Jørgensen, and J. Olsen, Molecular Electronic Structure Theory. (Wiley,<br />

Chichester, 2000).<br />

47 S. Goedecker, Rev. Mod. Phys. 71, 1085 (1999).<br />

48 A. M. N. Niklasson, Phys. Rev. B 66, 155115 (2002).<br />

49 E. Rubensson, Masters <strong>Thesis</strong>, Royal Institute of Technology (KTH), Stockholm, 2005.<br />

50 G. W. Stewart, Introduction to Matrix Computations. (Academic Press, inc., New York, 1973).<br />

51 J. W. Demmel, Applied Numerical Linear Algebra. (SIAM, 1997).<br />

52 R. Fletcher, Practical Methods of Optimization, 2nd ed. (Wiley, New York, 1987).<br />

53 G. Chaban, M. W. Schmidt, and M. S. Gordon, Theor. Chem. Acc. 97, 88 (1997); T. H. Fischer<br />

and J. E. Almlöf, J. Phys. Chem. 96, 9768 (1992).<br />

54 R. E. Stanton, J. Chem. Phys. 75, 5416 (1981).<br />

55 M. A. Natiello and G. E. Scuseria, Int. J. Quant. Chem. 26, 1039 (1984).<br />

56 P. Cizek and J. Paldus, J. Chem. Phys. 47, 3976 (1967); H. Fukutome, Int. J. Quant. Chem. 20,<br />

955 (1981); P. J. Thouless, Nucl. Phys. 21, 225 (1960).<br />

57 V. Bach, E. H. Lieb, M. Loss et al., Phys. Rev. Lett. 72, 2981 (1994); P.-L. Lions, Comm. Math.<br />

PHys. 109, 33 (1987).<br />

58 L. E. Dardenne, N. Makiuchi, L. A. C. Malbouisson et al., Int. J. Quant. Chem. 76, 600 (2000).<br />

59 A. Schafer, H. Horn, and R. Ahlrichs, J. Chem. Phys. 97, 2571 (1992).<br />

60 A. Kalemos, T. H. Dunning Jr., and A. Mavridis, J. Chem. Phys. 123, 014302 (2005); R. G. A.<br />

R. Maclagan and G. E. Scuseria, J. Chem. Phys. 106, 1491 (1997); I. Shim and K. A. Gingerich,<br />

Int. J. Quant. Chem. S23, 409 (1989).<br />

61 H. Larsen, P. Jørgensen, J. Olsen et al., J. Chem. Phys. 113, 8908 (2000).<br />

62 J. Olsen and P. Jørgensen, in Modern Electronic Structure Theory, Part II, edited by D. R.<br />

Yarkony (World Scientific, Singapore, 1995).<br />

63 J. Olsen and P. Jørgensen, J. Chem. Phys. 82, 3235 (1985).<br />

64 J. Olsen, H. J. Aa. Jensen, and P. Jørgensen, J. Comp. Phys. 74, 265 (1988).<br />

98


65 T. Helgaker and P. Jørgensen, Theor. Chim. Acta 75, 111 (1989); T. Helgaker and P. Jørgensen,<br />

in Advances in Quantum Chemistry (Academic Press, 1988), Vol. 19; T. Helgaker and P.<br />

Jørgensen, in Methods in Computational Molecular Physics, edited by S. Wilson and G. H. F.<br />

Diercksen (Plenum Press, New York, 1992).<br />

66 H. Larsen, T. Helgaker, P. Jørgensen et al., J. Chem. Phys. 115, 10344 (2001).<br />

67 D. Feller and J. A. Sordo, J. Chem. Phys. 113, 485 (2000).<br />

68 D. Sherrill E. F. C. Byrd, and M. Head-Gordon, J. Phys. Chem. A 105, 9736 (2001).<br />

69 J. Olsen, LUCIA, a quantum chemical program package.<br />

70 T. Helgaker, H. J. Aa. Jensen, P. Joergensen et al., DALTON, an electronic structure program<br />

(1997).<br />

71 T. H. Dunning Jr., J. Chem. Phys. 90, 1007 (1989).<br />

72 K. P. Huber and G. Herzberg, Molecular Spectra and Molecular Structure IV. Constants of<br />

Diatomic Molecules. (Van Nostrand, New York, 1979).<br />

73 W. Kutzelnigg, Theor. Chim. Acta 80, 349 (1991).<br />

74 J. W. Krogh and J. Olsen, Chem. Phys. Lett. 344, 578 (2001).<br />

75 L. Thøgersen and J. Olsen, Chem. Phys. Lett. 393, 36 (2004).<br />

76 P. G. Szalay, L. Thøgersen, J. Olsen et al., J. Phys. Chem. A 108, 3030 (2004).<br />

99


Part 1<br />

The Trust-region Self-consistent Field Method:<br />

Towards a Black Box optimization in Hartree-Fock and Kohn-Sham Theories,<br />

L. Thøgersen, J. Olsen, D. Yeager, P. Jørgensen, P. Sałek, and T. Helgaker,<br />

J. Chem. Phys. 121, 16 (2004)


JOURNAL OF CHEMICAL PHYSICS VOLUME 121, NUMBER 1 1 JULY 2004<br />

The trust-region self-consistent field method: Towards a black-box<br />

optimization in Hartree–Fock and Kohn–Sham theories<br />

Lea Thøgersen, Jeppe Olsen, Danny Yeager, a) and Poul Jørgensen<br />

Department of Chemistry, University of Århus, DK-8000 Århus C, Denmark<br />

Paweł Sałek<br />

Laboratory of Theoretical Chemistry, The Royal Institute of Technology,<br />

Teknikringen 30, Stockholm SE-10044, Sweden<br />

Trygve Helgaker<br />

Department of Chemistry, University of Oslo, P.O. Box 1033 Blindern, N-0315 Norway<br />

Received 17 February 2004; accepted 5 April 2004<br />

The trust-region self-consistent field TRSCF method is presented for optimizing the total energy<br />

E SCF of Hartree–Fock theory and Kohn–Sham density-functional theory. In the TRSCF method,<br />

both the Fock/Kohn–Sham matrix diagonalization step to obtain a new density matrix and the step<br />

to determine the optimal density matrix in the subspace of the density matrices of the preceding<br />

diagonalization steps have been improved. The improvements follow from the recognition that local<br />

models to E SCF may be introduced by carrying out a Taylor expansion of the energy about the<br />

current density matrix. At the point of expansion, the local models have the same gradient as E SCF<br />

but only an approximate Hessian. The local models are therefore valid only in a restricted region—<br />

the trust region—and steps can only be taken with confidence within this region. By restricting the<br />

steps of the TRSCF model to be inside the trust region, a monotonic and significant reduction of the<br />

total energy is ensured in each iteration of the TRSCF method. Examples are given where the<br />

TRSCF method converges monotonically and smoothly, but where the standard DIIS method<br />

diverges. © 2004 American Institute of Physics. DOI: 10.1063/1.1755673<br />

I. INTRODUCTION<br />

The steady progress in computer technology and<br />

quantum-chemical methodology has widened the range of<br />

users of quantum-chemical software packages to include a<br />

vast number of practicing, experimental chemists. Routinely,<br />

such users perform Hartree–Fock HF calculations and<br />

Kohn–Sham KS density-functional theory DFT calculations<br />

for molecules of a size and complexity that, a decade<br />

ago, were beyond reach even for the most advanced research<br />

codes. This development calls for further advances in the<br />

automatization of the self-consistent field SCF procedure<br />

used to optimize the HF and DFT energies, so as to ensure<br />

that convergence may be reached in a routine manner even<br />

for very complex molecules.<br />

In the original formulation, the SCF procedure consists<br />

of a sequence of Roothaan–Hall RH iterations. 1,2 At each<br />

iteration, a Fock/KS matrix is first constructed from the current<br />

approximation to the one-electron density matrix and<br />

then diagonalized to yield an improved set of orbitals and<br />

orbital energies and thus an improved density matrix. In the<br />

subsequent iteration, this improved density matrix is then<br />

used to construct a new Fock/KS matrix, thereby establishing<br />

the iteration procedure. However, such a sequence of RH<br />

a On leave. Permanent address: Department of Chemistry, Texas A&M University,<br />

P.O. Box 30012, College Station, Texas 77842-3012.<br />

iterations converges only in simple cases. To improve upon<br />

the convergence, each RH iteration may be extended to include,<br />

in addition to the diagonalization step, also a step<br />

where the best density matrix is generated in the subspace of<br />

the density matrices of the current and preceding RH iterations.<br />

In the next RH iteration, this averaged density matrix<br />

rather than the pure density matrix obtained in the last diagonalization<br />

is used to construct the new Fock/KS matrix.<br />

In this paper, we make improvements both to the RH<br />

diagonalization step and to the density-subspace optimization<br />

step of the SCF scheme. Our approach follows from the<br />

recognition that, in both steps, we may construct local models<br />

to the SCF energy function E SCF by a Taylor expansion of<br />

the energy about the current density matrix. However, since,<br />

at the point of expansion, these models have an exact gradient<br />

but only an approximate Hessian, they are valid only in a<br />

restricted region about the current approximation to the density<br />

matrix—the trust region. Therefore, when these local<br />

models are used in the course of the SCF optimization, it is<br />

essential they are used only to generate steps within their<br />

trust region. Only in this manner can it be ensured that the<br />

SCF energy is systematically and sufficiently lowered at each<br />

iteration.<br />

In the RH diagonalization part of the SCF optimization,<br />

the improvements are obtained by introducing an energy<br />

function E RH that corresponds to the sum of the occupied<br />

0021-9606/2004/121(1)/16/12/$22.00 16<br />

© 2004 American Institute of Physics<br />

Downloaded 25 Jun 2004 to 130.225.22.254. Redistribution subject to AIP license or copyright, see http://jcp.aip.org/jcp/copyright.jsp


J. Chem. Phys., Vol. 121, No. 1, 1 July 2004 Self-consistent field method<br />

17<br />

orbital energies. 3 An unconstrained minimization of E RH results<br />

in the same solution i.e., density matrix as obtained by<br />

a diagonalization of the Fock/KS matrix. However, since, at<br />

the point of expansion, the RH energy function E RH has only<br />

the gradient in common with the true SCF energy E SCF ,a<br />

global minimization of E RH may lead to steps that are too<br />

long to be trusted. We therefore introduce a trust region<br />

where E RH is a good approximation to E SCF . If a global<br />

minimization of E RH leads to a step outside the trust region,<br />

then the step to the minimum on the boundary of the trust<br />

region for E RH is taken instead. This step is found by a<br />

level-shifting technique, where the occupied molecular orbital<br />

energies effectively are shifted by some constant to increase<br />

the gap between the occupied and virtual molecular<br />

orbitals. Level shifting has previously been used to improve<br />

the convergence of the simple RH sequence of iterations. An<br />

essential feature of our implementation is to adjust the level<br />

shift in such a manner that the step is to the boundary of the<br />

trust region, recognizing that only in this manner does a lowering<br />

of E RH result in a lowering of E SCF . For this reason,<br />

the resulting method is called the trust-region RH TRRH<br />

method.<br />

The optimization of the density matrix in the subspace of<br />

the density matrices of the preceding RH iterations has a<br />

long history. Early on, it was recognized that a simple averaging<br />

of the density matrices of the last few RH iterations<br />

significantly improves the convergence of the RH scheme.<br />

This simple density-matrix averaging technique was later rationalized<br />

and systematized in the direct inversion in iterative<br />

subspace DIIS method of Pulay. 4 In the DIIS method,<br />

an improved density matrix is obtained as a linear combination<br />

of the previous density matrices by minimizing the norm<br />

of the corresponding linear combination of gradients. The<br />

DIIS method significantly speeds up the local convergence<br />

and convergence can often be obtained to ground states of<br />

rather complex molecules with a small gap between energies<br />

of the highest occupied molecular orbital HOMO and the<br />

lowest unoccupied molecular orbital LUMO and with a<br />

large number of close-lying electronic states.<br />

Several attempts have been made to modify the DIIS<br />

algorithm so as to improve upon its global convergence behavior.<br />

Recently, Kudin, Scuseria, and Cances proposed the<br />

energy DIIS EDIIS method, where the DIIS gradient-norm<br />

minimization is replaced by a minimization of an approximate<br />

energy function. 5 In EDIIS, the variational parameters,<br />

which are the linear expansion coefficients of the density<br />

matrices from the previous RH iterations, may only take on<br />

values that give densities in the convex set—that is, densities<br />

with occupation numbers between 0 and 1. As the EDIIS<br />

method is based on the minimization of an approximate energy<br />

function, it may have some advantages in the global<br />

region. However, it is worrying that a convex solution often<br />

cannot be obtained and that the observed local convergence<br />

of the EDIIS method is slower than in the standard DIIS<br />

method.<br />

In the DIIS and EDIIS methods, an improved density<br />

matrix is obtained as a sum of the density matrices from the<br />

preceding RH diagonalization steps. Consequently, the averaged<br />

density matrix is not idempotent as required in HF and<br />

KS theories. The deviation from idempotency may be reduced<br />

using a purified density matrix as the one suggested by<br />

McWeeny. 6 This has been done for the SCF energy minimization<br />

by several workers including Nunes and Vanderbilt 7<br />

and Daniels and Scuseria 8 and for the calculation of geometrical<br />

derivatives by Ochsenfeld and co-workers. 9 It may<br />

also be done for the EDIIS energy function. The energy function<br />

then has the same gradient as E SCF , but also contains<br />

terms which cannot be obtained from the densities and<br />

Fock/KS matrices of the previous RH iterations. Neglecting<br />

these terms, we arrive at the density-subspace minimization<br />

DSM algorithm proposed in this paper. At the point of expansion,<br />

the DSM energy function E DSM thus has the same<br />

gradient as the true energy function E SCF but only an approximate<br />

Hessian. Again, a trust region may be introduced<br />

and only steps within this region are taken, ensuring that any<br />

lowering of E DSM also corresponds to a lowering of E SCF .<br />

The resulting method is called the trust-region DSM<br />

TRDSM method.<br />

In the next section, we first describe the standard optimization<br />

of the SCF energy function in a density-matrix formulation.<br />

The TRRH method is then discussed in Sec. II A<br />

and the TRDSM method in Sec. II B. In Sec. III, we give<br />

some numerical examples to demonstrate the performance of<br />

the resulting trust-region SCF TRSCF method. The last<br />

section contains some concluding remarks.<br />

II. THEORY<br />

For a closed-shell system with N/2 electron pairs, the<br />

Hartree–Fock HF energy excluding the nuclear–nuclear repulsion<br />

energy is given by 3<br />

E SCF D2 TrhDTr DGD,<br />

1<br />

where D is the one-electron density matrix in the atomicorbital<br />

AO basis, h is the one-electron Hamiltonian matrix<br />

and GD is defined as<br />

G D <br />

2g g D , 2<br />

where g is a two-electron integral in the AO basis. For<br />

the energy in Eq. 1 to be a valid approximation to the true<br />

HF energy, the density matrix D must satisfy the symmetry,<br />

trace, and idempotency conditions:<br />

D T D,<br />

3<br />

Tr DS N 2 ,<br />

DSDD.<br />

5<br />

Similar conditions apply in the Kohn–Sham KS theory, but<br />

the energy function of Eq. 1 must then be modified by<br />

including the exchange-correlation term and by scaling or<br />

complete removal of the exchange term from Eq. 2.<br />

The traditional approach to the optimization of the HF<br />

energy is an iterative one. From the current approximation to<br />

the density matrix D n in iteration n, a Fock matrix is built<br />

FD n hGD n <br />

6<br />

4<br />

Downloaded 25 Jun 2004 to 130.225.22.254. Redistribution subject to AIP license or copyright, see http://jcp.aip.org/jcp/copyright.jsp


18 J. Chem. Phys., Vol. 121, No. 1, 1 July 2004 Thøgersen et al.<br />

and, following the Roothaan–Hall RH procedure, the Fock<br />

matrix is diagonalized<br />

FD n C occ SC occ ,<br />

7<br />

where S is the overlap matrix in the AO basis, to give a set of<br />

occupied molecular orbitals MOs, from which a new approximation<br />

to the density matrix is obtained as<br />

D n1 C occ C T occ . 8<br />

The iteration procedure is established using D n1 as the current<br />

density in Eq. 6. The final solution to the minimization<br />

problem is obtained when the D n and D n1 are the same.<br />

This self-consistent field SCF procedure may also be used<br />

in KS theory, the only difference being the addition of the<br />

exchange-correlation potential and the scaling of the exchange<br />

contribution in the Fock matrix to yield the KS matrix.<br />

The pure RH iterations presented above often do not<br />

converge. A powerful method for handling this divergence is<br />

not to construct the Fock matrix from the density matrix D n<br />

but rather from an average of all previous density matrices:<br />

n<br />

D¯n c i D i .<br />

i1<br />

The averaged density matrix D¯n is then used in place of the<br />

pure density matrix D n in Eq. 6 to obtain the Fock matrix<br />

F(D¯n) as<br />

n<br />

FD¯n c i FD i <br />

10<br />

i1<br />

and the iteration procedure is established. In the course of the<br />

TRSCF iterations, the following matrices are set up in the<br />

order indicated: D 1 , F(D 1 ), D 2 , F(D 2 ), D¯2 , F(D¯2), D 3 ,<br />

F(D 3 ), D¯3 , F(D¯3),.... Among these, D 1 , F(D 1 ), D 2 ,<br />

F(D 2 ), D 3 , F(D 3 ), . . . are saved during the iteration procedure.<br />

In the following, we describe improvements to the SCF<br />

diagonalization and density-subspace optimization steps. In<br />

Sec. II A, we describe how the trust-region RH TRRH<br />

method is used to generate new density matrices by a modification<br />

of the traditional RH method Eqs. 7 and 8. Next,<br />

in Sec. II B, we introduce the trust-region density-subspace<br />

minimization TRDSM method for calculating the averaged<br />

density matrix of Eq. 9. In the following, we use the indices<br />

i, j,k,l for occupied MOs and the indices a,b,c,d for the<br />

virtual MOs.<br />

A. The trust-region Roothaan–Hall method<br />

As discussed in Ref. 3, the traditional RH method may<br />

be viewed as a minimization of the sum of the orbital energies<br />

of the occupied MOs<br />

9<br />

E RH 2<br />

i<br />

i 2TrFD¯D, 11<br />

subject to orthonormality constraints on the occupied MOs<br />

i :<br />

i j ij .<br />

12<br />

Whereas D¯ is the current approximation to the HF/KS density<br />

matrix, usually obtained as a linear combination of the<br />

previous densities according to Eq. 9, the density matrix D<br />

to be optimized in Eq. 11 is related to the occupied MOs<br />

resulting from the diagonalization of F(D¯) as<br />

DC occ C T occ . 13<br />

To see this, consider the constrained minimization of E RH in<br />

Eq. 11 expressed in terms of the Lagrangian<br />

L2 TrFD¯D2 T TrCocc SC occ I N/2 , 14<br />

where the multipliers ij ensure orthonormality among the<br />

occupied MOs. Minimization of this Lagrangian leads to the<br />

standard RH equations:<br />

FD¯Cocc SC occ .<br />

15<br />

However, since E RH of Eq. 11 is only a crude model of the<br />

true energy E SCF the gradient is correct at D¯ assuming D¯ is<br />

idempotent, a global minimization of E RH according to Eq.<br />

15 may easily lead to steps that are too long to be trusted as<br />

they are outside the region where E RH is a good approximation<br />

to E SCF . Steps outside the trust region may often not<br />

lead to a reduction of the total energy E SCF .<br />

1. The level-shifted Roothaan–Hall equations<br />

To avoid too long steps, an additional constraint is imposed<br />

on the optimization of Eq. 11, namely, that the new<br />

density matrix D in Eq. 13 does not differ too much from<br />

the old matrix D¯. This condition is conveniently expressed in<br />

terms of the overlap between the density matrices in the S<br />

metric norm<br />

DD¯ S Tr DSD¯Sa N 2 Tr D¯SD¯S,<br />

16<br />

where Tr D¯SD¯S N/2 since D¯ is not necessarily idempotent.<br />

Note that, for D equal to an idempotent D¯, a is equal to<br />

one. For a sufficiently close to one, a step will therefore be<br />

taken in the local region. In practice, we define sufficiently<br />

close to one by the parameter a min 0.975.<br />

Introducing an undetermined multiplier associated<br />

with this new constraint, we obtain the following Lagrangian:<br />

L2 TrFD¯D2Tr SD¯SDa N 2 Tr D¯SD¯S <br />

2 TrC T occ SC occ I N/2 . 17<br />

Differentiating this Lagrangian with respect to the MO coefficients<br />

and setting the result equal to zero, we arrive at the<br />

level-shifted RH equations<br />

FD¯SD¯SC occ SC occ .<br />

18<br />

To interpret the level-shift term, we note that D¯S projects out<br />

the component of C occ that is occupied in D¯ assuming idempotent<br />

D¯), see Ref. 3. The level shift therefore works only on<br />

the occupied part of F(D¯), shifting all the occupied orbital<br />

energies and increasing the gap between the occupied and<br />

virtual MOs, in particular the HOMO-LUMO gap.<br />

Downloaded 25 Jun 2004 to 130.225.22.254. Redistribution subject to AIP license or copyright, see http://jcp.aip.org/jcp/copyright.jsp


J. Chem. Phys., Vol. 121, No. 1, 1 July 2004 Self-consistent field method<br />

19<br />

where i HOMO () and a LUMO () are the HOMO and LUMO<br />

orbital energies, respectively; in Fig. 1b, we have plotted<br />

the overlap between the old and new density matrices as<br />

given by<br />

DD¯ S<br />

a<br />

,<br />

DDS D¯D¯ S<br />

22<br />

where D()D() S is equal to N/2. For sufficiently large<br />

, the HOMO-LUMO gap Eq. 21 is linear in . This linearity<br />

of ai () for large arises from the dependence of<br />

the orbital energies on in Eq. 19, where is effectively<br />

subtracted from the occupied orbital energies. The MOs C¯occ<br />

occupied in D¯ satisfy the generalized eigenvalue equations<br />

SD¯SC¯occ SC¯occ ,<br />

23<br />

and become identical to the MOs C occ () obtained from Eq.<br />

19 when tends to infinity. The corresponding density is<br />

denoted<br />

T<br />

DC¯occ C¯occ<br />

24<br />

FIG. 1. For the fourth iteration of the rhodium calculation described in Sec.<br />

III we have displayed as a function of the level-shift parameter ; a the<br />

HOMO-LUMO gap ai , where min is the smallest accepted level-shift,<br />

b the overlap a between the old and new density matrices, where opt is<br />

the optimal level-shift, and c the change in the model energy E RH and the<br />

actual energy E RH SCF .<br />

Since the SCF energy E SCF is invariant with respect to<br />

an orthogonal transformation between the MOs, Eq. 18<br />

may be transformed to the canonical basis:<br />

FD¯SD¯SC occ SC occ ,<br />

where the diagonal matrix contains the orbital energies.<br />

2. Choice of the RH level-shift parameter<br />

19<br />

The density matrix generated from the restricted RH solution<br />

Eq. 19 depends on the level-shift parameter :<br />

DC occ C T occ . 20<br />

To see how is determined, we consider the determination<br />

of in the fourth iteration of the rhodium-complex calculation<br />

described in Sec. III. In Fig. 1a, we have plotted the<br />

HOMO-LUMO gap as a function of ,<br />

ai a LUMO i HOMO ,<br />

21<br />

and represents a purified D¯. In the linear regime of ai (),<br />

there is a continuous development of the occupied MOs from<br />

those occupied in D¯. As decreases and we enter the nonlinear<br />

regime at min , the MOs in Eq. 20 no longer correspond<br />

to those in Eq. 23. Comparing plot a and b in Fig.<br />

1, we note that the region a()a min in Fig. 1b corresponds<br />

roughly to the region min in Fig. 1a.<br />

As we insist on a controlled, continuous development of<br />

the MOs from those occupied in D¯, the level-shift parameter<br />

should be restricted to the linear regime min . To determine<br />

the optimal level-shift parameter opt , we therefore<br />

begin by establishing the onset of linearity min by linear<br />

extrapolation by means of two Fock/KS matrix diagonalizations,<br />

giving the two ai values marked by crosses and the<br />

linearly interpolated min value marked with an arrow. Next,<br />

since, in the linear interval, a small corresponds to a large<br />

step, we investigate whether min is acceptable by checking<br />

if a( min )a min . If this step is too long, we backtrack by<br />

increasing using inexact line search until an acceptable<br />

value opt is found such that a( opt )a min , requiring a few<br />

additional Fock/KS matrix diagonalizations. In Fig. 1b, the<br />

accepted opt is marked with an arrow.<br />

For a better understanding of this step, consider the Hessian<br />

of the E RH energy function:<br />

A RH ai,bj ij ab a i . 25<br />

By restricting the level-shift parameter to min where<br />

LUMO a () HOMO i ()0, we ensure that the effective Hessian<br />

is positive definite and that the model energy function<br />

E RH is reduced. We note that the Hessian of the true energy<br />

function E SCF is given by the more complicated expression<br />

A SCF ai,bj ij ab a i 4g aibj g abij g ajib . 26<br />

Often, the orbital energy difference dominates the Hessian.<br />

In such cases, we expect the above step to reduce the SCF<br />

energy E SCF as well as the model function E RH . In any case,<br />

when a sufficiently large level shift is added in Eq. 19, the<br />

Downloaded 25 Jun 2004 to 130.225.22.254. Redistribution subject to AIP license or copyright, see http://jcp.aip.org/jcp/copyright.jsp


20 J. Chem. Phys., Vol. 121, No. 1, 1 July 2004 Thøgersen et al.<br />

Hessian structure of Eq. 25 becomes similar to that of the<br />

true energy function E SCF in Eq. 26. The steps generated<br />

from E RH with such level shifts will therefore have essentially<br />

the same direction as the ones generated from E SCF .<br />

By construction, the E RH energy function is lowered<br />

when is chosen according to the above prescription<br />

E RH 2TrFD¯DD¯0.<br />

27<br />

Since E RH is only a local model of the true energy function<br />

E SCF , the associated change in the true energy<br />

E RH SCF E SCF DE SCF D¯<br />

28<br />

may be either negative or positive, depending on how well<br />

E RH represents E SCF for the chosen step. However, for sufficiently<br />

small steps, E RH SCF 0, since the model function then<br />

represents the true energy well.<br />

Let us consider the relationship between the true lowering<br />

E RH<br />

SCF and the lowering predicted by the model function<br />

E RH . Introducing the presumably small differential density<br />

matrix<br />

DD¯<br />

29<br />

and using the identity Tr AG(B)Tr BG(A) valid for symmetric<br />

matrices A and B, we find that the change in the true<br />

energy Eq. 28 may be written in the form<br />

E RH SCF 2TrhDD¯TrD¯<br />

GD¯Tr D¯GD¯<br />

2 Trh2 TrGD¯Tr G,<br />

30<br />

which shows that the changes in the true energy and in the<br />

model energy are related as<br />

E RH SCF E RH Tr G.<br />

31<br />

If the last term which is second order in is negligible, the<br />

energy lowering predicted by the local model E RH becomes<br />

equal to E RH SCF . However, since the correction term is positive<br />

strictly positive in the absence of exchange, its presence<br />

in Eq. 31 shows that, for sufficiently large steps, a<br />

lowering of the model function may not lead to a lowering of<br />

the total energy. To avoid such steps, it would be useful to<br />

provide an alternative prediction of E RH<br />

SCF that is less expensive<br />

than the calculation of Tr G itself. Section II A 3 is<br />

concerned with this problem.<br />

To demonstrate the efficiency of the chosen level shift<br />

opt in the global region of a SCF optimization, we have for<br />

the fourth iteration of the rhodium-complex calculation plotted<br />

in Fig. 1c, E RH<br />

SCF and E RH as a function of . The<br />

energy gain E RH SCF is about optimal for the level shift opt .<br />

Increasing gives a smaller energy gain while decreasing <br />

gives a slight increase in the energy gain and from 4.5,<br />

RH is actually positive. Note also that for opt , E RH<br />

E SCF<br />

RH<br />

and E SCF start to differ indicating that the importance of<br />

Tr G increases. The step representing a RH iteration<br />

where 0 is far too long to be trusted and results in a<br />

significant increase of the total energy.<br />

3. Prediction of the energy close to the minimum<br />

To develop a better prediction of E RH<br />

SCF than E RH ,we<br />

note that the only part, that cannot easily be evaluated from<br />

known Fock-matrices, is the second-order contribution to Eq.<br />

31 from that part of that does not belong to the linear<br />

space spanned by the previous density matrices D i . To see<br />

this, we decompose the current density matrix D into two<br />

parts<br />

DD D ,<br />

32<br />

where D belongs to the linear space spanned by the previous<br />

density matrices and D belongs to its orthogonal complement.<br />

We then expand D in the following manner:<br />

n<br />

D <br />

i1<br />

c i D i ,<br />

33<br />

where the expansion coefficients c i () are determined in a<br />

least-squares manner<br />

n<br />

c i M 1 ij Tr D j SDS, M ij Tr D i SD j S.<br />

j1<br />

34<br />

The change in the SCF energy associated with the change of<br />

density matrix from D¯ to D may be expressed as<br />

E RH SCF E SCF D E SCF D¯2 TrD FD <br />

Tr D GD .<br />

35<br />

Ignoring the small term quadratic in D , we may now predict<br />

the change in the SCF energy at little cost from the<br />

expression<br />

E P SCF E SCF D E SCF D¯2 TrD FD , 36<br />

using only the density matrices and Fock/KS matrices of the<br />

previous iterations. In particular in the later parts of the iteration<br />

sequence, where the space spanned by the densities<br />

of the preceding RH iterations is large, an accurate estimate<br />

of E RH<br />

SCF may be obtained from this formula. In the following,<br />

we shall see how we may use this prediction to determine<br />

the level shift when min 0 and a(0)a min .<br />

P<br />

To illustrate how E SCF is used to find the level-shift<br />

parameter, consider as an example the determination of the<br />

level-shift parameter in the ninth iteration of the rhodiumcomplex<br />

calculation of Sec. III. The plot of the HOMO-<br />

LUMO gap in Fig. 2a shows that the allowed level-shift<br />

interval is 0. In Fig. 2b, we have plotted the overlap<br />

a() as a function of . Since a(0)a min , we should,<br />

according to the discussion in Sec. II A 2, use opt 0 to<br />

determine the step. In short, considerations based on the<br />

HOMO-LUMO gap and on the overlap with the averaged<br />

density matrix indicate that the next density matrix should be<br />

determined from the standard, unshifted RH equations.<br />

However, from the nine density matrices of the previous<br />

P<br />

RH iterations, we can use E SCF () to predict the change in<br />

E RH SCF () more accurately than with E RH (). Indeed, from<br />

P<br />

Fig. 2c, we see that E SCF () provides a good global representation<br />

of E RH SCF (), with a minimum close to the minimum<br />

of E RH SCF (). By contrast, the local model E RH ()<br />

Downloaded 25 Jun 2004 to 130.225.22.254. Redistribution subject to AIP license or copyright, see http://jcp.aip.org/jcp/copyright.jsp


J. Chem. Phys., Vol. 121, No. 1, 1 July 2004 Self-consistent field method<br />

21<br />

n<br />

D¯ c i D i .<br />

i1<br />

37<br />

Ideally, this averaged density should also fulfill the conditions<br />

Eqs. 3–5. The symmetry condition Eq. 3 is trivially<br />

satisfied since the averaged density Eq. 37 is a linear<br />

combination of symmetric density matrices. The trace condition<br />

Eq. 4 is also easily taken care of by imposing the<br />

restriction<br />

n<br />

i1<br />

c i 1<br />

38<br />

on the expansion coefficients<br />

n<br />

Tr D¯S c i Tr D i S N<br />

i1<br />

2 .<br />

39<br />

By contrast, the idempotency condition Eq. 5 cannot be<br />

imposed on the averaged density matrix. However, the idempotency<br />

may be significantly improved if, instead of working<br />

with D¯, we work with the purified density matrix 6<br />

D˜ 3D¯SD¯2D¯SD¯SD¯,<br />

40<br />

as proposed by Nunes and Vanderbilt. 7 The electronic energy<br />

may be expressed in terms of the purified average density<br />

matrix as<br />

ED˜ 2 TrhD˜ Tr D˜ GD˜ .<br />

41<br />

FIG. 2. For the ninth iteration of the rhodium calculation described in Sec.<br />

III we have displayed as a function of the level-shift parameter ; a the<br />

HOMO-LUMO gap ai , where min 0, b the overlap a between the old<br />

and new density matrices, where a min is the smallest accepted overlap and<br />

c the change in the model energy E RH , the actual energy E RH<br />

SCF and the<br />

P<br />

P<br />

predicted energy E SCF . opt is found at the minimum of E SCF ().<br />

gives a minimum at 0. Clearly, 0 should be avoided<br />

in the calculation since it would lead to an increase in the<br />

SCF energy. Instead, the value of the level-shift parameter<br />

P<br />

that corresponds to the minimum of E SCF denoted by opt )<br />

is chosen for the calculation of the next density matrix.<br />

This procedure may be summarized as follows. If min<br />

0 and a(0)a min , then we calculate the predicted energies<br />

P<br />

P<br />

P<br />

E SCF (0) and E SCF () with 0. If E SCF (0)<br />

P<br />

E SCF (), then we use D0. Otherwise, we estimate the<br />

P<br />

minimum opt of E SCF () by an inexact line search and<br />

use the density matrix D( opt ) at this minimum.<br />

B. Density-subspace minimization<br />

1. The DSM energy function<br />

Let us assume that we have carried out n RH iterations<br />

and that we have kept all previous density matrices D i and<br />

the corresponding Fock matrices F i . We would now like to<br />

construct an optimal density as a linear combination of the<br />

densities from these iterations according to Eq. 9,<br />

We note that the purified density is correct to first order in<br />

the expansion coefficients c i and that E(D˜ ) thus contains<br />

errors through second order in c i . To determine the best<br />

average density matrix Eq. 37, we shall minimize Eq. 41<br />

with respect to the expansion coefficients c i subject to the<br />

condition Eq. 38.<br />

One problem we encounter when minimizing Eq. 41 is<br />

that new Fock matrices F(D˜ ) need to be evaluated. To avoid<br />

this problem, we shall use an approximate form of Eq. 41.<br />

Since the purified density matrix D˜ is close to the original<br />

density matrix D¯, we can write it as<br />

D˜ D¯,<br />

42<br />

where is the correction term. Inserting Eq. 42 into Eq.<br />

41, we obtain<br />

E2 TrhD¯Tr D¯GD¯2 Trh<br />

2 TrGD¯Tr G.<br />

43<br />

Since is small, we may ignore the term quadratic in and<br />

arrive at the density-subspace minimization DSM energy<br />

function<br />

E DSM c2 TrhD¯Tr D¯GD¯2 Trh2 TrGD¯<br />

ED¯2 TrFD¯D˜ D¯.<br />

44<br />

Since is first order in the expansion coefficients c i , the<br />

DSM energy differs from the true energy to second and<br />

Downloaded 25 Jun 2004 to 130.225.22.254. Redistribution subject to AIP license or copyright, see http://jcp.aip.org/jcp/copyright.jsp


22 J. Chem. Phys., Vol. 121, No. 1, 1 July 2004 Thøgersen et al.<br />

higher orders in c i . The first contribution to the DSM energy<br />

function may for example be evaluated using the energy expression<br />

of the EDIIS algorithm, 5<br />

Lc,,E 0 c T g 1 2 c T Hcc T 1<br />

1 2 c T Mch 2 ,<br />

52<br />

ED¯ <br />

i<br />

c i E SCF D i 1 2<br />

ij<br />

c i c j TrF i F j D i D j .<br />

45<br />

Using Eq. 40, we find that the second contribution may be<br />

evaluated as<br />

where 1 is a column vector with elements equal to 1. Differentiating<br />

this Lagrangian and setting the derivatives equal to<br />

zero, we obtain the equations<br />

L<br />

c gHcMc10,<br />

53<br />

2TrFD¯D˜ D¯2 <br />

ij<br />

c i c j Tr F i D j<br />

L<br />

cT 10,<br />

54<br />

6<br />

ijk<br />

c i c j c k Tr F i D j SD k<br />

L<br />

1 2 c T Mch 2 0.<br />

55<br />

4<br />

ijkl<br />

c i c j c k c l Tr F i D j SD k SD l .<br />

46<br />

All contributions to the DSM energy function are therefore<br />

easily calculated from the previous density and Fock/KS<br />

matrices.<br />

2. The trust-region DSM minimization<br />

We minimize the DSM energy functional by the trustregion<br />

method. 12 We thus consider the second-order Taylor<br />

expansion of the DSM energy in Eq. 44 about c 0 . Introducing<br />

the step vector<br />

ccc 0 ,<br />

we obtain<br />

47<br />

E DSM (2) cE 0 c T g 1 2 c T Hc,<br />

48<br />

where the energy, gradient, and Hessian at the expansion<br />

point are given by<br />

E 0 Ec 0 ,<br />

g Ec<br />

c<br />

cc 0<br />

, H 2 Ec<br />

c 2 cc 0<br />

. 49<br />

As starting point c 0 , we choose the density matrix with the<br />

lowest energy E SCF (D i ), usually from the last RH iteration.<br />

The trace condition Eq. 38 imply<br />

n<br />

i1<br />

c i 0.<br />

50<br />

We also introduce a trust region of radius h for E DSM (2) (c)<br />

and require that steps are always taken inside or to the<br />

boundary of this region. To determine a step to the boundary,<br />

we restrict the step to have the length h in the S metric norm<br />

of Eq. 34,<br />

c S 2 <br />

ij<br />

c i M ij c j h 2 . 51<br />

Introducing the undetermined multipliers and for the<br />

trace and step-size constraints, we arrive at the following<br />

Lagrangian for minimization on the boundary of the trust<br />

region:<br />

The optimization of the Lagrangian thus corresponds to the<br />

solution of the following set of linear equations:<br />

HM<br />

1 T<br />

1<br />

0<br />

c<br />

<br />

g 0 ,<br />

56<br />

where the multiplier is iteratively adjusted until the step is<br />

to the boundary of the trust region Eq. 55. The step-length<br />

restriction may be lifted by setting 0, as needed for steps<br />

inside the trust region.<br />

To understand the behavior of the step-length function,<br />

we consider first the generalized eigenvalue problem<br />

<br />

H 1<br />

v 1 T 0 M 0<br />

0 T<br />

v , 57<br />

where 0 is a column vector with zero elements, is a small<br />

positive constant, and the eigenvector is normalized such that<br />

v T v 2 1.<br />

58<br />

We first note that, for a finite , v0. Next, carrying out<br />

block multiplications in Eq. 57, we obtain<br />

Hv1Mv,<br />

1 T v,<br />

59<br />

60<br />

which upon elimination of from the first equation yields<br />

the relation<br />

Hv1 T v1 2 Mv.<br />

61<br />

Since (1 T v)1 is finite, we conclude that, as tends to zero,<br />

the eigenvalue tends to either plus or minus infinity<br />

1/2 . Next, substituting these values of into Eq. 60,<br />

we find that v tends to the zero vector with elements proportional<br />

to 1/2 and that , because of the normalization Eq.<br />

58, tends to 1. In short, the eigenvalue problem Eq. 57<br />

with 0 has two eigenvalues , whose eigenvectors<br />

have zero elements except for the last element, which is<br />

equal to 1. Finally, invoking the Hylleraas–Undheim interlace<br />

theorem, 10,11 we conclude that the remaining n1 finite<br />

eigenvalues of Eq. 57 bisects the n eigenvalues of the reduced<br />

eigenvalue problem<br />

HvMv.<br />

62<br />

Downloaded 25 Jun 2004 to 130.225.22.254. Redistribution subject to AIP license or copyright, see http://jcp.aip.org/jcp/copyright.jsp


J. Chem. Phys., Vol. 121, No. 1, 1 July 2004 Self-consistent field method<br />

23<br />

Let us now consider the step length c() S as a function<br />

of . In the diagonal representation of the augmented<br />

matrix in the linear equations Eq. 57, we may write these<br />

equations in the following uncoupled form:<br />

h i m i i i , i1,2,3,...,n1. 63<br />

Here, the h i and m i are the diagonal elements of the Hessian<br />

and metric matrices, respectively, of the generalized eigenvalue<br />

problem Eq. 57, whereas the i and i , respectively,<br />

are the corresponding elements of the solution and gradient<br />

vectors of Eq. 56. Since the last element of the gradient<br />

vector in Eq. 56 is zero, the gradient vector has no contributions<br />

from the eigenvectors with infinite eigenvalues<br />

1 n1 0, 1 n1 64<br />

assuming that the eigenvalues are sorted in increasing order<br />

1 2 ¯ n1 . In the diagonal representation, therefore,<br />

we may write the step norm in the form<br />

c S <br />

i2<br />

n<br />

m i i<br />

2<br />

h i m i 2 .<br />

65<br />

From this expression, we note that the step function consists<br />

of n branches separated by n1 asymptotes at the finite<br />

eigenvalues i . Moreover, it increases monotonically from<br />

zero to infinity as increases from minus infinity and approaches<br />

the lowest finite eigenvalue 2 . Therefore, there is<br />

always one and only one 2 that gives rise to a<br />

step of length h. As shown by Fletcher, 12 this value of <br />

corresponds to the global minimum on the boundary of the<br />

trust region.<br />

In practice, we cannot easily determine the eigenvalues<br />

i of the augmented eigenvalue problem Eq. 57. Instead,<br />

we determine the eigenvalues i of the reduced problem Eq.<br />

62 and restrict our search of to the smaller monotonic<br />

interval 1 . Since 1 2 , it is possible that no<br />

solution exists in this reduced interval. Mostly, however, this<br />

restriction is mild since the two eigenvalues are usually<br />

close. If no solution is found, we choose instead the slightly<br />

shorter step obtained with 1 .<br />

To illustrate how the level-shift parameter in Eq. 56<br />

is determined, we consider the first Fig. 3a and third Fig.<br />

3b DSM step in the eighth iteration of the rhodiumcomplex<br />

calculation in Sec. III. We have plotted the steplength<br />

function c() S as a function of . The plots consist<br />

of a series of branches between asymptotes where <br />

makes the matrix on the left-hand side of Eq. 56 singular.<br />

The lowest eigenvalue 1 is marked with a vertical dashed<br />

line in Figs. 3a and 3b. For minimization, the level-shift<br />

parameter is chosen in the interval min( 1 ,0),<br />

where 1 is the lowest eigenvalue of Eq. 62. The proper<br />

value is found where the step-length function crosses the line<br />

representing the trust radius h, as marked with a cross in Fig.<br />

3a. If the step that minimizes E DSM (2) is inside the trust region,<br />

0 is chosen as marked with a cross in Fig. 3b.<br />

The trust region is updated during the iterative procedure.<br />

FIG. 3. The step-length function c() S is plotted as a function of for<br />

the first a and third b DSM step in the eighth iteration of the rhodium<br />

calculation described in Sec. III. The trust radius h is represented by a<br />

horizontal line. The proper value is marked with a cross.<br />

3. Global optimization of the DSM function<br />

The optimization of the E DSM energy is carried out in the<br />

usual manner, requiring several trust-region steps, each of<br />

which involves the construction of the gradient g and the<br />

Hessian H, and the solution of the modified level-shifted<br />

Newton equations Eq. 56. After p iterations, the density is<br />

calculated from the coefficients<br />

p<br />

c p c (0) c i .<br />

66<br />

i1<br />

However, since E DSM itself is a rather crude model of the<br />

true energy function E SCF , it resembles E SCF only in a small<br />

region about the initial point c (0) . The DSM iterations are<br />

therefore terminated when the total step length c p c (0) <br />

exceeds some preset value k. If a minimum of E DSM is found<br />

inside the trust region c p c (0) k, then the step to the<br />

minimum is taken and the iterations are terminated. This is<br />

often the case.<br />

Occasionally, the iterations start where the lowest eigenvalue<br />

of the Hessian in Eq. 62 is negative. In the course of<br />

the iterations, the Hessian can become positive definite and a<br />

minimum is reached. In a few cases, however, a negative<br />

Hessian eigenvalue may persist, changing little from iteration<br />

to iteration. In our experience, a step along the eigenvector<br />

corresponding to the negative eigenvalue cannot be<br />

trusted. This direction is therefore projected out from the step<br />

and the DSM function is minimized in the orthogonal subspace.<br />

As an illustration, consider the first DSM step of the<br />

tenth SCF iteration of the rhodium-complex calculation in<br />

Sec. III. In Fig. 4, we have, for comparison, plotted the steplength<br />

functions with the negative component kept and projected<br />

out. The level shifts resulting from the two situations<br />

Downloaded 25 Jun 2004 to 130.225.22.254. Redistribution subject to AIP license or copyright, see http://jcp.aip.org/jcp/copyright.jsp


24 J. Chem. Phys., Vol. 121, No. 1, 1 July 2004 Thøgersen et al.<br />

FIG. 4. The step-length function c() S is plotted as a function of with<br />

the direction corresponding to the negative Hessian eigenvalue kept — and<br />

projected out - --, respectively. The values resulting from the two<br />

situations are marked with crosses.<br />

are marked with crosses in Fig. 4. The level shift used in the<br />

DSM optimization is, in this particular case, 0.<br />

When the trust-region minimization is terminated, a new<br />

RH iteration is initiated by constructing a new density and<br />

associated Fock matrix<br />

n<br />

n<br />

D¯ c i D i ,<br />

i1<br />

F¯ c i FD i ,<br />

67<br />

i1<br />

where we have used the fact that the Fock matrix is linear in<br />

the density. By construction E DSM (c) is lowered at each iteration<br />

of the trust-region minimization. The total energy<br />

lowering at the pth iteration is given by<br />

E DSM E DSM c p E DSM c (0) .<br />

68<br />

Since E DSM is a local model to the true energy E SCF , the<br />

lowering of E DSM will also lead to a lowering of E SCF provided<br />

the total step is sufficiently short to be in the local<br />

region.<br />

4. Relationship to the DIIS method<br />

The optimal density has previously been determined using<br />

the DIIS scheme of Pulay. 4 In the DIIS method, the improved<br />

density matrix is obtained as a linear combination of<br />

the previous density matrices where the expansion coefficients<br />

are determined by minimizing the norm of the error<br />

vector, using the gradients of the previous iterations as error<br />

vectors. To highlight the difference between TRDSM and<br />

DIIS, we give below an alternative derivation of the DIIS<br />

algorithm.<br />

In an SCF calculation, the electronic gradient with the<br />

averaged density matrix D¯ in Eq. 37 may be expressed in<br />

the form, 3<br />

gD¯4D¯SFD¯FD¯SD¯.<br />

69<br />

To determine the best linear combination of densities D i ,we<br />

minimize the norm of the squared gradient<br />

gD¯ 2 16 TrD¯SFD¯FD¯SD¯2 .<br />

70<br />

Inserting the expansion Eq. 37, we obtain a quartic polynomial<br />

in c i ,<br />

FIG. 5. The convergence of calculations on the rhodium complex using<br />

AhlrichsVDZ basis Ref. 16 combined with STO-3G for Rh. The error in<br />

the total energy is given for the TRSCF, the standard DIIS, and the QRHF<br />

method as a function of the iteration number. Furthermore results are given<br />

where DIIS is applied after nine TRSCF iterations.<br />

gD¯ 2 16 Tr <br />

i<br />

c i gD i <br />

i, j<br />

c i c j D i SFD j D i <br />

2<br />

FD j D i SD i . 71<br />

To simplify this expression, we neglect all cubic and quartic<br />

terms<br />

gD¯ 2 app c i c j gD i gD j . 72<br />

i, j<br />

Optimization of Eq. 72 subject to the constraint Eq. 38<br />

gives the DIIS expression of the expansion coefficients in<br />

Eq. 37.<br />

III. APPLICATIONS<br />

In this section, we examine the convergence characteristics<br />

of the TRSCF algorithm. First, we consider a rhodiumcomplex<br />

optimization as an example of a difficult case; next,<br />

as a simpler case, we consider a calculation on H 2 O with the<br />

OH bond lengths stretched to double length. For comparison,<br />

we also give the convergence characteristics of the DIIS<br />

algorithm 4 and the quadratically convergent restricted step<br />

Hartree–Fock QRHF method. 13,14 All calculations are carried<br />

out using a local version of the DALTON program<br />

package. 17<br />

A. The rhodium complex calculation<br />

In Fig. 5, we have plotted the error in the energy at each<br />

iteration of TRSCF, DIIS, and QRHF optimizations of the<br />

rhodium complex with the geometry specified in Table I using<br />

the AhlrichsVDZ basis 16 combined with STO-3G on Rh.<br />

The starting orbitals have been obtained from diagonalizing<br />

the one-electron Hamiltonian.<br />

Clearly, the QRHF and DIIS methods do not work in this<br />

case. In particular, the DIIS method is unable to handle the<br />

global part of the optimization, where the initially indefinite<br />

Hessian changes its structure and becomes positive definite.<br />

Since the DIIS method relies solely on gradient information,<br />

it does not see the negative eigenvalues and produces steps<br />

that may or may not be in the right direction, leading to<br />

Downloaded 25 Jun 2004 to 130.225.22.254. Redistribution subject to AIP license or copyright, see http://jcp.aip.org/jcp/copyright.jsp


J. Chem. Phys., Vol. 121, No. 1, 1 July 2004 Self-consistent field method<br />

25<br />

TABLE I. Geometry of the rhodium complex.<br />

x y z<br />

Cl 2.783200 0.000000 0.000000<br />

C 0.000000 1.750000 0.000000<br />

C 0.000000 1.750000 0.000000<br />

C 2.510000 1.247077 0.000000<br />

C 2.510000 1.247077 0.000000<br />

C 3.960000 1.247077 0.000000<br />

C 3.960005 1.247074 0.000000<br />

C 4.685005 0.008663 0.000000<br />

C 6.585566 1.381712 0.000000<br />

C 7.224161 0.912908 0.000000<br />

H 1.802335 2.074803 0.000000<br />

H 1.965500 2.190178 0.000000<br />

H 4.323007 2.273792 0.000000<br />

H 4.504500 2.190178 0.000000<br />

H 6.215281 1.889842 0.889165<br />

H 6.215281 1.889842 0.889165<br />

H 7.169607 1.539271 0.889165<br />

H 7.169607 1.539271 0.889165<br />

H 7.674455 1.397244 0.000000<br />

H 8.164527 0.363696 0.000000<br />

N 1.790000 0.000000 0.000000<br />

N 6.124978 0.017359 0.000000<br />

O 0.122018 3.144673 0.000000<br />

O 0.122018 3.144673 0.000000<br />

Rh 0.0000000 0.000000 0.000000<br />

divergence. Moreover, in this DIIS calculation, no level<br />

shifts have been applied in the RH part of the optimization,<br />

again leading to steps in the wrong direction. In short, the<br />

DIIS method cannot be used for optimizations as complex as<br />

the rhodium calculation. However, if the DIIS method is<br />

started after the SCF local region has been reached by the<br />

TRSCF algorithm, then the DIIS algorithm converges nicely<br />

since the Hessian has the correct structure. In Fig. 5, we have<br />

also plotted the errors in a calculation where the DIIS<br />

method is started after nine TRSCF iterations. It then converges<br />

in roughly the same manner as the pure TRSCF<br />

method.<br />

In the QRHF calculation, the total energy reduces slowly<br />

and monotonically during the iteration procedure. However,<br />

the resulting energy lowering is much too slow to be of any<br />

practical value. Thus, after 14 iterations, the energy has decreased<br />

by only 37 E h , which is insignificant compared with<br />

the 237 E h needed for convergence.<br />

To understand the difference between the QRHF and<br />

TRSCF optimizations, let us recall the main features of the<br />

two methods. Since the QRHF method is based on a local<br />

quadratic model of E SCF , the QRHF orbital rotations are<br />

correct to first order. However, no global information about<br />

E SCF is available and only small steps can be trusted in the<br />

optimization. When QRHF steps are taken to the boundary of<br />

the trust region, level-shifted Newton equations are solved<br />

with the Hessian of Eq. 26. By contrast, in the TRSCF<br />

method, the RH optimization is based on the local energy<br />

function E RH , which has the same gradient as E SCF but a<br />

slightly different Hessian—compare Eqs. 25 and 26.<br />

More important, E RH shares some global features with E SCF .<br />

In the RH diagonalization step, a global optimization is carried<br />

out for E RH . When an RH step is taken to the boundary<br />

of the trust region of E RH , a level-shifted Fock eigenvalue<br />

equation is solved where the level-shift parameter effectively<br />

introduces a shift in the Hessian of E RH Eq. 25. The similarity<br />

of the Hessians of E SCF and E RH makes the directions<br />

of the steps taken by the QRHF and RH methods very similar<br />

for sufficiently large level shifts, the essential difference<br />

being the global character of the RH steps and the local<br />

character of the QRHF steps. It is this local character of the<br />

QRHF steps that prevents the QRHF method from being<br />

efficient for systems as difficult as the rhodium complex.<br />

Let us now consider the individual TRSCF iterations as<br />

listed in Table II. The optimization begins with orbitals that<br />

diagonalize the one-electron Hamiltonian, giving a start energy<br />

of 5 466.530 208 964 75 E h . In Table II, the SCF energy<br />

lowering E SCF is divided into two contributions, one<br />

from the RH step and one from the DSM step. Recalling<br />

from Eq. 24 that D() n is the purified D¯n ,<br />

E DSM SCFn1<br />

E SCF D n E SCF D n 73<br />

becomes a realistic measure of the energy change in the<br />

DSM part of the iteration. Similarly,<br />

RH<br />

E SCFn1<br />

E SCF D n1 E SCF D n <br />

74<br />

becomes a realistic measure of the change in the RH part.<br />

Clearly, the sum of Eqs. 73 and 74 is equal to the total<br />

change E SCF . These exact energy changes should be compared<br />

with the energy changes in the local models E RH and<br />

E DSM given in Eqs. 27 and 68, respectively, also listed<br />

in the table. Note that, to obtain E SCF D(), we must carry<br />

out an additional energy calculation, which is here done only<br />

for the purpose of this analysis.<br />

For the DSM method, we have also indicated in Table II<br />

how the trust-region optimization was terminated (exit DSM ):<br />

M indicates that a minimum was determined in the full<br />

space; PM indicates that a minimum was obtained in the<br />

reduced space with the direction corresponding to the negative<br />

Hessian eigenvalue projected out; and L indicates that<br />

the iterations were terminated because the maximum step<br />

length k was reached. For the RH steps, we have also listed<br />

the level-shift parameter opt and the corresponding overlap<br />

a( opt ) of Eq. 22.<br />

The TRSCF iterations converge linearly, with a reduction<br />

in the error of about a factor 2–4 at each iteration.<br />

Moreover, the energy lowerings of the local models E RH<br />

and E DSM are in good agreement with the actual SCF energy<br />

changes, in the local as well as in the global part of the<br />

optimization. Both the predicted and the actual energy<br />

changes are negative in all iterations. In the global region,<br />

E RH<br />

SCF is usually significantly larger than E DSM SCF , whereas,<br />

in the local region, they have similar sizes.<br />

Except for three iterations in the global part of the SCF<br />

optimization, the DSM trust-region method finds a minimum<br />

within the step-length limit k. In the intermediate region, we<br />

encounter components of the step vector that cannot be<br />

trusted and have been projected out as described in Sec.<br />

II B 3. The DSM iterations then reach a minimum in the orthogonal<br />

subspace.<br />

Downloaded 25 Jun 2004 to 130.225.22.254. Redistribution subject to AIP license or copyright, see http://jcp.aip.org/jcp/copyright.jsp


26 J. Chem. Phys., Vol. 121, No. 1, 1 July 2004 Thøgersen et al.<br />

TABLE II. Convergence details for the TRSCF calculation on the rhodium complex using AhlrichsVDZ basis combined with STO-3G on Rh. Energies given<br />

in atomic units.<br />

DSM<br />

It. E SCF E SCF<br />

E DSM RH<br />

E SCF<br />

E RH RH<br />

opt<br />

a( RH opt )<br />

Exit DSM<br />

1 18.94647615033 0.00000000009 0.00000000000 18.94647615024 19.21320649447 17.47 0.99382<br />

2 45.45858825211 8.95768890498 7.10309975657 36.50089934714 38.75977508968 14.44 0.98630 M<br />

3 59.81037380731 12.93651600370 8.85502694483 46.87385780361 51.53623100635 11.68 0.97940 M<br />

4 63.34486220663 24.25263285599 21.63716388564 39.09222935064 48.71127100240 7.28 0.97288 L<br />

5 30.22875461345 12.81783382045 12.23686585427 17.41092079300 21.38161936631 2.63 0.97384 L<br />

6 11.56061105704 5.64904464510 4.74940263974 5.91156641194 7.60366893231 0.90 0.97552 L<br />

7 4.61334906659 1.90220393646 1.51155035145 2.71114513013 3.30373325651 0.24 0.97792 M<br />

8 2.16270415323 0.44637212140 0.44849600108 1.71633203184 1.49977814394 0.07 0.97876 M<br />

9 0.60805181167 0.29078332276 0.21298647367 0.31726848890 0.60770324492 1.30 0.99823 M<br />

10 0.16667264229 0.00294157325 0.00194422453 0.16373106904 0.22325882198 0.70 0.99934 PM<br />

11 0.05893002647 0.00782290321 0.00662821837 0.05110712327 0.03977595787 0.00 0.99955 PM<br />

12 0.01821537974 0.00935849099 0.00823957093 0.00885688875 0.00980424864 0.00 0.99989 PM<br />

13 0.00829012952 0.00417695835 0.00382848541 0.00411317118 0.00413942925 0.00 0.99995 PM<br />

14 0.00336772651 0.00246626574 0.00222734467 0.00090146077 0.00176102559 0.00 0.99998 PM<br />

15 0.00144190516 0.00106346997 0.00091468267 0.00037843519 0.00066804948 0.00 1.00000 PM<br />

16 0.00049317801 0.00040627140 0.00039284830 0.00008690661 0.00013209160 0.00 1.00000 PM<br />

17 0.00005633666 0.00003203569 0.00002863768 0.00002430097 0.00003124073 0.00 1.00000 PM<br />

18 0.00001495119 0.00000990523 0.00000917530 0.00000504595 0.00000926762 0.00 1.00000 PM<br />

19 0.00000549749 0.00000312992 0.00000277915 0.00000236757 0.00000276315 0.00 1.00000 M<br />

20 0.00000196603 0.00000126150 0.00000121565 0.00000070454 0.00000067573 0.00 1.00000 M<br />

21 0.00000038264 0.00000022841 0.00000020736 0.00000015423 0.00000016335 0.00 1.00000 M<br />

22 0.00000008720 0.00000004496 0.00000004404 0.00000004225 0.00000004536 0.00 1.00000 M<br />

23 0.00000002788 0.00000001171 0.00000001049 0.00000001617 0.00000001603 0.00 1.00000 M<br />

24 0.00000001286 0.00000000813 0.00000000800 0.00000000472 0.00000000514 0.00 1.00000 M<br />

25 0.00000000294 0.00000000131 0.00000000127 0.00000000163 0.00000000186 0.00 1.00000 M<br />

26 0.00000000119 0.00000000073 0.00000000072 0.00000000045 0.00000000056 0.00 1.00000 M<br />

27 0.00000000035 0.00000000019 0.00000000019 0.00000000016 0.00000000022 0.00 1.00000 M<br />

In the beginning of the SCF optimization, large level<br />

shifts are applied in the RH diagonalization to ensure a continuous<br />

development of the MOs. Thus, in the first few iterations,<br />

the overlap constant a( opt ) is significantly larger than<br />

the minimum accepted overlap of 0.975. However, the levelshift<br />

parameter decreases during the subsequent SCF iterations<br />

until, in the local region, no level shift is required and<br />

conventional RH iterations are carried out. To summarize,<br />

the TRSCF method gives a monotonic and significant energy<br />

lowering both in the RH and in the DSM part of the optimization.<br />

B. The water calculation<br />

To demonstrate the performance of the TRSCF method<br />

in a simpler case, we consider optimizations of H 2 O with the<br />

OH bonds stretched to twice the equilibrium value 195.10<br />

pm. In Figs. 6a and 6b, we have plotted the errors in the<br />

energy during TRSCF, DIIS, and QRHF optimizations in the<br />

cc-pVDZ basis. 15 In Fig. 6a, the initial guess of the orbitals<br />

are the Hückel orbitals as implemented in the DALTON program.<br />

With these initial orbitals, the TRSCF and DIIS methods<br />

converge in a very similar manner to within a threshold<br />

of 10 10 in ten iterations. In this case, therefore, gradient<br />

information is sufficient for convergence. Although the<br />

QRHF method outperforms the TRSCF and DIIS methods in<br />

terms of iterations, this is of no practical value since, in each<br />

QRHF step, about the same number of new Fock matrices<br />

are needed to solve the Newton equations as is required to<br />

find the optimized Hartree–Fock wave function with the<br />

TRSCF and DIIS methods.<br />

FIG. 6. The convergence of calculations on water with stretched bonds<br />

using the cc-pVDZ basis and a aHückel start guess and b a one-electron<br />

Hamiltonian start guess. The error in the total energy is given for the<br />

TRSCF, the standard DIIS and the QRHF method as a function of the<br />

iteration number.<br />

Downloaded 25 Jun 2004 to 130.225.22.254. Redistribution subject to AIP license or copyright, see http://jcp.aip.org/jcp/copyright.jsp


J. Chem. Phys., Vol. 121, No. 1, 1 July 2004 Self-consistent field method<br />

27<br />

In Fig. 6b, we have plotted the error of the energy in<br />

H 2 O optimizations starting with the orbitals that diagonalize<br />

the one-electron Hamiltonian. In this case, convergence to<br />

10 10 is reached in 13 iterations with the TRSCF method and<br />

in 18 iterations with the DIIS method. The main reason for<br />

the better performance of the TRSCF algorithm is that, in the<br />

global region, it gives a significant energy lowering in each<br />

step, whereas the DIIS algorithm shows a much less systematic<br />

behavior.<br />

IV. CONCLUSION<br />

A conventional SCF optimization consists of a sequence<br />

of iterations, each of which begins with a Roothaan–Hall<br />

RH diagonalization step, where a Fock/KS matrix is diagonalized<br />

to obtain an improved density matrix, followed by an<br />

averaging step, where the optimal density matrix is determined<br />

in the subspace of the density matrices of the previous<br />

RH diagonalization steps. In this paper, we have introduced a<br />

trust-region SCF TRSCF algorithm, where improvements<br />

have been made to both the diagonalization and the averaging<br />

steps. In both steps, local energy model functions are<br />

constructed which have the same gradient as the true energy<br />

function E SCF but approximate Hessians. Recognizing the<br />

locality of these energy functions, trust regions are introduced<br />

as regions where they represent a good approximation<br />

to E SCF and only steps inside these trust regions are allowed.<br />

For the density-subspace minimization step, an energy<br />

function is constructed and minimized with respect to the<br />

coefficients of the linear combination of the previous density<br />

matrices. Its functional form is based on a purified averaged<br />

density matrix that is idempotent to first order. The advantages<br />

of this model compared to EDIIS is the built-in density<br />

purification, which helps to avoid problems arising from<br />

non-idempotency. In addition, information about the Hessian<br />

is extracted and used, leading to a monotonic and stable convergence.<br />

The RH diagonalization step corresponds to a minimization<br />

of an energy function E RH that represents the sum of the<br />

orbital energies of the occupied MOs. Since this very simple<br />

energy function is a local model function for E SCF , large<br />

steps cannot be trusted. To generate steps to the boundary of<br />

the trust region, level-shifted RH equations are solved where<br />

the level shifts are determined in a systematic and general<br />

manner, leading to a decrease in the model energy at each<br />

iteration. If sufficiently small steps are taken, a similar decrease<br />

is obtained in the SCF energy.<br />

In the TRSCF algorithm a few diagonalizations are required<br />

in each SCF iteration to obtain solutions for the levelshifted<br />

RH equations in order to determine the optimal density<br />

matrix. The number of diagonalizations may be reduced<br />

in the local SCF region solving RH equations with zero level<br />

shift with little consequence for the convergence. In the local<br />

SCF region one may also safely use the DIIS algorithm if<br />

desired.<br />

The advantages of the TRSCF algorithm are demonstrated<br />

by calculations on a rhodium complex and on a water<br />

molecule with stretched bonds. In the rhodium-complex optimization,<br />

the TRSCF algorithm converges monotonically<br />

and fast, with a significant decrease in the energy in both the<br />

RH part and DSM part at each iteration. By contrast, convergence<br />

is not obtained with the DIIS method for this complex.<br />

For the simpler water molecule, the TRSCF and DIIS methods<br />

behave in a more similar manner, the TRSCF method<br />

converging slightly faster than the DIIS method when the<br />

initial orbitals are obtained by diagonalizing the one-electron<br />

Hamiltonian. With the Hückel guess, the water convergence<br />

is essentially obtained in the same number of steps for<br />

the TRSCF and DIIS methods. In short, it appears that the<br />

TRSCF algorithm, and its use of local energy model functions<br />

to obtain significant reductions in E SCF in each iteration,<br />

constitutes a significant step towards a black-box optimization<br />

of SCF wave functions.<br />

ACKNOWLEDGMENTS<br />

This work has been supported by the Danish Natural<br />

Research Council Grant No. 21-02-0467 and the Carlsbergfondet.<br />

We also acknowledge support from the Danish Center<br />

for Scientific Computing DCSC. D.Y. acknowledges<br />

support from the Robert A. Welch Foundation, Grant No.<br />

A-770.<br />

1 C. C. J. Roothaan, Rev. Mod. Phys. 23, 691951.<br />

2 G. G. Hall, Proc. R. Soc. London, Ser. A 205, 5411951.<br />

3 T. Helgaker, P. Jørgensen, and J. Olsen, Molecular Electronic Structure<br />

Theory Wiley, Chichester, 2000.<br />

4 P. Pulay, Chem. Phys. Lett. 73, 393 1980; J. Comput. Chem. 3, 556<br />

1982.<br />

5 K. N. Kudin, G. E. Scuseria, and E. Cances, J. Chem. Phys. 116, 8255<br />

2002.<br />

6 R. McWeeny, Rev. Mod. Phys. 32, 335 1960.<br />

7 R. W. Nunes and D. Vanderbilt, Phys. Rev. B 50, 176111994.<br />

8 A. D. Daniels and G. E. Scuseria, Phys. Chem. Chem. Phys. 2, 2173<br />

2000.<br />

9 C. Ochsenfeld and M. Head-Gordon, Chem. Phys. Lett. 270, 3991997.<br />

10 E. A. Hylleraas and B. Undheim, Z. Phys. 65, 759 1930.<br />

11 J. K. L. MacDonald, Phys. Rev. 43, 830 1933.<br />

12 R. Fletcher, Practical Methods of Optimization, 2nd ed. Wiley, New<br />

York, 1987.<br />

13 G. B. Bacskay, Chem. Phys. 61, 385 1981.<br />

14 H. J. Aa. Jensen and P. Jørgensen, J. Chem. Phys. 80, 1204 1984.<br />

15 T. H. Dunning, J. Chem. Phys. 90, 1007 1989.<br />

16 A. Schafer, H. Horn, and R. Ahlrichs, J. Chem. Phys. 97, 25711992.<br />

17 DALTON, a molecular electronic structure program, Release 1.2 2001,<br />

written by T. Helgaker, H. J. Aa. Jensen, P. Jørgensen et al. http://<br />

www.kjemi.uio.no/software/dalton.<br />

Downloaded 25 Jun 2004 to 130.225.22.254. Redistribution subject to AIP license or copyright, see http://jcp.aip.org/jcp/copyright.jsp


Part 1<br />

The Trust-region Self-consistent Field Method in Kohn-Sham Density-functional Theory,<br />

L. Thøgersen, J. Olsen, A. Köhn, P. Jørgensen, P. Sałek, and T. Helgaker,<br />

J. Chem. Phys. 123, 074103 (2005)


THE JOURNAL OF CHEMICAL PHYSICS 123, 074103 2005<br />

The trust-region self-consistent field method<br />

in Kohn–Sham density-functional theory<br />

Lea Thøgersen, a Jeppe Olsen, Andreas Köhn, and Poul Jørgensen<br />

Department of Chemistry, University of Århus, DK-8000 Århus C, Denmark<br />

Paweł Sałek<br />

Laboratory of Theoretical Chemistry, The Royal Institute of Technology, Roslagstullbacken 15,<br />

Stockholm, S-10691 Sweden<br />

Trygve Helgaker<br />

Department of Chemistry, University of Oslo, P.O. Box 1033 Blindern, N-0315 Norway<br />

Received 20 May 2005; accepted 7 June 2005; published online 22 August 2005<br />

The trust-region self-consistent field TRSCF method is extended to the optimization of the Kohn–<br />

Sham energy. In the TRSCF method, both the Roothaan–Hall step and the density-subspace<br />

minimization step are replaced by trust-region optimizations of local approximations to the Kohn–<br />

Sham energy, leading to a controlled, monotonic convergence towards the optimized energy.<br />

Previously the TRSCF method has been developed for optimization of the Hartree–Fock energy,<br />

which is a simple quadratic function in the density matrix. However, since the Kohn–Sham energy<br />

is a nonquadratic function of the density matrix, the local energy functions must be generalized for<br />

use with the Kohn–Sham model. Such a generalization, which contains the Hartree–Fock model as<br />

a special case, is presented here. For comparison, a rederivation of the popular direct inversion in<br />

the iterative subspace DIIS algorithm is performed, demonstrating that the DIIS method may be<br />

viewed as a quasi-Newton method, explaining its fast local convergence. In the global region the<br />

convergence behavior of DIIS is less predictable. The related energy DIIS technique is also<br />

discussed and shown to be inappropriate for the optimization of the Kohn–Sham energy. © 2005<br />

American Institute of Physics. DOI: 10.1063/1.1989311<br />

I. INTRODUCTION<br />

Computational methods rigorously based on the laws of<br />

quantum mechanics are becoming an evermore important<br />

component of scientific and technological progress in many<br />

branches of natural science, including biochemistry and materials<br />

science. Quantum-chemical codes, in particular, are<br />

today routinely used to perform calculations on molecules<br />

containing hundreds of atoms. Furthermore, with the advent<br />

of density-functional theory DFT methods, molecules with<br />

more complex electronic structure and larger parts of potential<br />

surfaces may be calculated than with the Hartree–Fock<br />

method. Most of these calculations are performed by nonspecialists,<br />

not trained in quantum chemistry or in numerical<br />

simulations. An important challenge is thus to develop<br />

quantum-chemical techniques that allow the user to focus on<br />

the physical and chemical interpretations of the results of the<br />

calculations by eliminating or at least minimizing the need to<br />

understand the details of the numerical algorithms.<br />

A central numerical task of the Hartree–Fock wavefunction<br />

theory and Kohn–Sham DFT is the minimization of<br />

the electronic energy function with respect to the density<br />

matrix of a single-determinant reference wave function. In<br />

its original formulation, the self-consistent field SCF<br />

method for optimizing Hartree–Fock and Kohn–Sham energies<br />

E SCF consists of a sequence of Roothaan–Hall<br />

a Electronic mail: lea@chem.au.dk<br />

iterations. 1,2 At each iteration, the Fock/Kohn–Sham matrix<br />

is first constructed from the current approximate atomicorbital<br />

AO density matrix; next, an improved AO density<br />

matrix is generated from the molecular orbitals MOs obtained<br />

by diagonalization of this Fock/Kohn–Sham matrix.<br />

Unfortunately, this simple SCF scheme converges only in<br />

simple cases. To improve upon its convergence, the optimization<br />

is modified by constructing the Fock/Kohn–Sham matrix<br />

not directly from the AO density matrix of the last diagonalization<br />

but rather from an averaged density matrix,<br />

calculated in the subspace of the density matrices of the current<br />

and previous iterations. In practice, the averaged AO<br />

density matrix is calculated by the direct inversion in iterative<br />

subspace DIIS method of Pulay, 3 nowadays implemented<br />

in most electronic-structure programs. In the DIIS<br />

method, the averaged density matrix is a linear combination<br />

of density matrices, where the expansion coefficients are obtained<br />

by minimizing the norm of the corresponding linear<br />

combination of the gradients.<br />

Over the years, several attempts have been made to improve<br />

upon the DIIS method. In particular, Kudin et al. have<br />

proposed the energy DIIS EDIIS method, 4 where the<br />

gradient-norm minimization is replaced by a minimization of<br />

an approximation to the true energy function E SCF , where the<br />

expansion coefficients of the averaged density matrix are<br />

used as variational parameters. For the special case of two<br />

density matrices such an approach was first developed by<br />

Karlström. 5<br />

0021-9606/2005/1237/074103/17/$22.50<br />

123, 074103-1<br />

© 2005 American Institute of Physics<br />

Downloaded 23 Aug 2005 to 130.225.22.254. Redistribution subject to AIP license or copyright, see http://jcp.aip.org/jcp/copyright.jsp


074103-2 Thøgersen et al. J. Chem. Phys. 123, 074103 2005<br />

Recently, we introduced the trust-region self-consistent<br />

field TRSCF method 6 for SCF density-matrix optimizations.<br />

In the TRSCF method, the diagonalization step trustregion<br />

Roothaan–Hall TRRH and the density-optimization<br />

step trust-region density-subspace minimization TRDSM<br />

are realized as minimizations of local energy model functions<br />

of E SCF . The local energy functions are expanded about<br />

the current AO density matrix and have the same gradients as<br />

the true energy E SCF but approximate Hessians. In the course<br />

of the SCF optimization, each step is restricted to be within<br />

the trust region of the current model, that is, within the region<br />

where the model accurately represents the true energy<br />

function. In TRDSM the steplength is controlled through a<br />

standard trust-region optimization 7 and in TRRH the<br />

steplength is controlled through a level shift. 8 In this manner,<br />

a reliable and systematic energy lowering of E SCF is ensured<br />

at each iteration.<br />

In the first implementation of the TRSCF method, the<br />

focus was on the optimization of the Hartree–Fock energy. In<br />

this paper, the focus is on the optimization of the Kohn–<br />

Sham energy. In the Kohn–Sham theory, the energy difference<br />

between the highest occupied MO and lowest unoccupied<br />

MO the HOMO-LUMO gap is usually much smaller<br />

than that in the Hartree–Fock theory, making the optimization<br />

more difficult. Here, we investigate the consequences of<br />

this smaller HOMO-LUMO gap for the global and local convergence<br />

characteristics for the Roothaan–Hall optimization<br />

step. In the Hartree–Fock theory, the energy function is quadratic<br />

in the density matrix, whereas, in the Kohn–Sham<br />

theory, it becomes a nonquadratic function because of the<br />

exchange-correlation contribution to the energy. In our previous<br />

implementation of the TRSCF method, the model<br />

function used to determine the averaged density matrix was<br />

specially designed for the Hartree–Fock theory, assuming<br />

that the energy depends quadratically on the density matrix.<br />

For the Kohn–Sham theory, the model function must be generalized.<br />

Such a generalization is presented here.<br />

In this paper, the DIIS algorithm is also rederived to<br />

understand better when it can safely be applied. In particular,<br />

we find that the DIIS method may be viewed as a quasi-<br />

Newton method in the local region, explaining its fast local<br />

convergence. The convergence characteristics of the DIIS<br />

method in the global region are less predictable.<br />

Recently, and along the same lines as our TRRH method,<br />

Francisco et al. introduced their globally convergent trustregion<br />

methods for SCF, 9 where the standard fixed-point<br />

Roothaan–Hall step is replaced by a trust-region optimization<br />

of a model energy function. Any acceleration scheme,<br />

such as DIIS, EDIIS, and the TRDSM method, can then be<br />

combined with this method.<br />

After an introduction to the SCF problem in Sec. II, we<br />

examine the Roothaan–Hall scheme in Sec. III. In particular,<br />

we identify the model energy function that is effectively being<br />

optimized in the diagonalization step, demonstrating how<br />

convergence can be improved upon by level shifting. In Sec.<br />

IV, we consider the density-matrix averaging step. We establish<br />

the model energy function of the weights of the density<br />

matrices and perform an order analysis of the resulting<br />

scheme, demonstrating that it represents a balanced approximation;<br />

next, we compare our local energy function with the<br />

EDIIS function, showing that the latter misses a term that is<br />

necessary for calculating the correct gradient. After a brief<br />

discussion of configuration shifts in Sec. V, we present in<br />

Sec. VI a rederivation of the DIIS algorithm, establishing its<br />

equivalence with the quasi-Newton method in the local region.<br />

Section VII contains some convergence examples for<br />

the DFT calculations, using the TRSCF algorithm and some<br />

of its alternatives. Finally, Sec. VIII contains some concluding<br />

remarks.<br />

II. THE KOHN–SHAM ENERGY AND THE<br />

ROOTHAAN–HALL METHOD<br />

For a closed-shell system with N/2 electron pairs, the<br />

Kohn–Sham energy excluding the nuclear-nuclear repulsion<br />

contribution is given by 10<br />

E KS D =2TrhD +TrDGD + E XC D.<br />

Here D is the scaled one-electron density matrix in the AO<br />

basis, D= 1 2 DAO ; h is the one-electron Hamiltonian matrix in<br />

this basis; and the elements of GD are given by<br />

G D =2<br />

<br />

g D − g D ,<br />

<br />

where g are the two-electron AO integrals. The first term<br />

in Eq. 2 represents the Coulomb contribution and the second<br />

term the contribution from exact exchange, with =1 in<br />

the Hartree–Fock theory, =0 in the pure DFT, and 0 in<br />

the hybrid DFT. The exchange-correlation energy E XC D in<br />

Eq. 1 is a functional of the electron density. In the localdensity<br />

approximation LDA, the exchange-correlation energy<br />

is local in the density, whereas, in the generalized gradient<br />

approximation GGA, it is also local in the squared<br />

density gradient, that is, it may be expressed as<br />

E XC D = fx,xdx.<br />

Here the electron density x and its squared gradient norm<br />

x are given by<br />

x = T xDx,<br />

x = x · x,<br />

1<br />

2<br />

3<br />

4a<br />

4b<br />

where x is a column vector containing the AOs. We note<br />

that the exchange-correlation energy density fx,x in<br />

Eq. 3 is a nonlinear and nonquadratic function of x and<br />

x. In the following, we shall therefore rely on an expansion<br />

of E XC D around some reference density matrix D 0 ,<br />

E XC D = E XC D 0 + D − D 0 T 1<br />

E XC<br />

+ 1 2 D − D 0 T E 2 XC D − D 0 + ¯ , 5<br />

where the derivatives E n XC<br />

have been evaluated at D=D 0 and<br />

where, for convenience, we have used a vector-matrix notation<br />

for D, E 1 XC<br />

, and E 2 XC<br />

.<br />

The first derivative of E KS D with respect to the density<br />

matrix D is then given by<br />

Downloaded 23 Aug 2005 to 130.225.22.254. Redistribution subject to AIP license or copyright, see http://jcp.aip.org/jcp/copyright.jsp


074103-3 Trust-region self-consistent-field J. Chem. Phys. 123, 074103 2005<br />

E 1 KS D = E KSD<br />

=2FD, 6<br />

D<br />

where we have introduced the Fock/Kohn–Sham matrix,<br />

FD = h + GD + 1 2 E XC<br />

1 D.<br />

We note, that for the energy in Eq. 1 to be a valid Kohn–<br />

Sham energy, the density matrix D must satisfy the symmetry,<br />

trace, and idempotency conditions,<br />

D T = D,<br />

Tr DS = N 2 ,<br />

DSD = D,<br />

7<br />

8a<br />

8b<br />

8c<br />

where S is the AO overlap matrix. Therefore, we cannot<br />

carry out a free minimization of the total energy in Eq. 1,<br />

but must restrict ourselves to those changes in the density<br />

matrix that comply with these requirements.<br />

The Kohn–Sham energy E KS is traditionally optimized<br />

self-consistently by fixed-point iterations. From the current<br />

approximation D 0 to the density matrix, the Kohn–Sham matrix<br />

FD 0 is calculated from Eq. 7, followed by the solution<br />

of the Roothaan–Hall generalized eigenvalue<br />

equations: 1,2<br />

FD 0 C occ = SC occ ,<br />

where C occ is the set of occupied MOs and is a diagonal<br />

matrix containing the associated eigenvalues orbital energies.<br />

An improved density matrix is next calculated from the<br />

occupied MOs as<br />

D = C occ C T occ , 10<br />

and the Roothaan–Hall fixed-point iteration is established by<br />

constructing the Kohn–Sham matrix FD from this density<br />

matrix, followed by diagonalization according to Eq. 9.<br />

Note that, since<br />

C occ UU T C T<br />

occ = C occ C T occ , 11<br />

where U is unitary, the Kohn–Sham density matrix in Eq.<br />

10 and hence the energy are invariant to unitary transformations<br />

among the occupied MOs.<br />

The naive Roothaan–Hall fixed-point iteration outlined<br />

above converges only in simple cases. To improve upon this<br />

scheme, the new Kohn–Sham matrix is usually not calculated<br />

directly from the density matrix obtained by diagonalization<br />

of the previous Kohn–Sham matrix, but rather from<br />

the density matrix obtained by diagonalizing some linear<br />

combinations of the current and n previous Kohn–Sham matrices,<br />

n<br />

F¯ = F0 + c i FD i .<br />

12<br />

i=0<br />

Typically, the coefficients c i are obtained by the DIIS method<br />

as the weights of an improved density matrix,<br />

9<br />

n<br />

D¯ = D 0 + c i D i .<br />

i=0<br />

13<br />

Upon diagonalization of F¯ according to Eq. 9, the new<br />

density matrix is obtained from Eq. 10, thereby establishing<br />

the iterations. In general, the averaged density matrix in<br />

Eq. 13 is not idempotent and therefore does not represent a<br />

valid density matrix; moreover, since the Kohn–Sham matrix<br />

unlike the Fock matrix is nonlinear in the density matrix,<br />

the averaged Kohn–Sham matrix in Eq. 12 is different from<br />

FD¯ . For these reasons, we cannot associate the averaged<br />

Kohn–Sham matrix in Eq. 12 uniquely with a valid Kohn–<br />

Sham matrix. Usually, this does not matter much since the<br />

subsequent diagonalization of the Kohn–Sham matrix nevertheless<br />

produces a valid density matrix according to Eq. 10.<br />

In the following, we shall disregard the complications arising<br />

from the use of the averaged Kohn–Sham matrix in Eq. 12,<br />

noting that the errors introduced by this approach may easily<br />

be corrected for, if necessary.<br />

In the remainder of this paper, we discuss the TRSCF<br />

method, which differs from the traditional SCF scheme by<br />

the consistent use of trust-region techniques for optimization<br />

control, both in the Roothaan–Hall diagonalization step in<br />

Eq. 9 and in the construction of the averaged density matrix<br />

in Eq. 13. In particular, the traditional Roothaan–Hall eigenvalue<br />

problem is replaced by a level-shifted eigenvalue<br />

problem, where the level shift is determined from trustregion<br />

considerations, resulting in the TRRH step. Similarly,<br />

the averaged density matrix is determined by a TRDSM<br />

technique rather than by the traditional DIIS method. As we<br />

shall see, the combined use of the TRRH and TRDSM<br />

schemes in the TRSCF method leads to a highly efficient and<br />

robust SCF scheme, characterized, in its most robust implementation,<br />

by a monotonic convergence towards the optimized<br />

Kohn–Sham energy.<br />

III. TRUST-REGION ROOTHAAN–HALL OPTIMIZATION<br />

A. The trust-region Roothaan–Hall method<br />

We begin by noting that the solution of the traditional<br />

Roothaan–Hall eigenvalue problem in Eq. 9 may be regarded<br />

as the minimization of the sum of the energies of the<br />

occupied MOs, 11<br />

E RH D =2 i =2TrF 0 D,<br />

14<br />

i<br />

subject to MO orthonormality constraints,<br />

C T occ SC occ = I N/2 , 15<br />

where F 0 is typically obtained as a weighted sum of the<br />

Kohn–Sham matrices such as F¯ in Eq. 12. Since Eq. 14<br />

represents a crude model of the true Kohn–Sham energy<br />

with the same first-order term but different zero- and<br />

second-order terms as discussed in Sec. III B, it has a rather<br />

small trust radius. A global minimization of E RH D, asaccomplished<br />

by the solution of the Roothaan–Hall eigenvalue<br />

problem in Eq. 9, may therefore easily lead to steps that are<br />

longer than the trust radius and hence unreliable. To avoid<br />

Downloaded 23 Aug 2005 to 130.225.22.254. Redistribution subject to AIP license or copyright, see http://jcp.aip.org/jcp/copyright.jsp


074103-4 Thøgersen et al. J. Chem. Phys. 123, 074103 2005<br />

such steps, we shall impose on the optimization of Eq. 14<br />

the constraint that the new density matrix D does not differ<br />

much from the old matrix D 0 , that is, the S norm of the<br />

density difference should be equal to a small number ,<br />

D − D 0 2 S =TrD − D 0 SD − D 0 S =−2TrD 0 SDS + N<br />

= . 16<br />

The optimization of Eq. 14 subject to the constraints in<br />

Eqs. 15 and 16 may be carried out by introducing the<br />

Lagrangian<br />

L =2TrF 0 D −2Tr DSD 0 S − 1 2N − <br />

−2TrC T occ SC occ − I N/2 , 17<br />

where is the undetermined multiplier associated with the<br />

constraint in Eq. 16, whereas the symmetric matrix contains<br />

the multipliers associated with the MO orthonormality<br />

constraints. Differentiating this Lagrangian with respect to<br />

the MO coefficients and setting the result equal to zero, we<br />

arrive at the level-shifted Roothaan–Hall equations,<br />

F 0 − SD 0 SC˜ occ = SC˜ occ.<br />

18<br />

Since the density matrix in Eq. 10 is invariant to unitary<br />

transformations among the occupied MOs in C˜ occ, we<br />

may transform this eigenvalue problem to the canonical basis,<br />

F 0 − SD 0 SC occ = SC occ ,<br />

19<br />

where the diagonal matrix contains the orbital energies.<br />

Note that, since D 0 S projects onto the part of C occ that is<br />

occupied in D 0 see Ref. 11, the level-shift parameter <br />

shifts only the energies of the occupied MOs. Therefore, the<br />

role of is to modify the difference between the energies of<br />

the occupied and virtual MOs, in particular, the HOMO-<br />

LUMO gap.<br />

Clearly, the success of the TRRH method will depend on<br />

our ability to make a judicious choice of the level-shift parameter<br />

in Eq. 19. In our standard TRRH implementation,<br />

we determine by requiring that D does not differ<br />

much from D 0 in the sense of Eq. 16, thereby ensuring a<br />

continous and controlled development of the density matrix<br />

from the initial guess to the converged one. In the following<br />

sections we discuss how is determined in this standard<br />

implementation.<br />

In view of the relative crudeness of the E RH D model, a<br />

more robust approach consists of performing a line search<br />

along the path defined by to obtain the minimum of the<br />

Kohn–Sham energy E KS D. Strictly speaking, this optimization<br />

is not a line search but rather a one-parameter optimization.<br />

One-parameter optimizations have previously<br />

been used by Seeger and Pople 12 to stabilize convergence of<br />

the RH procedure.<br />

For → Eq. 19 becomes equivalent to solving the<br />

eigenvalue equation,<br />

0<br />

SD 0 SC occ = SC 0 occ , 20<br />

where has eigenvalue 1 for the set of orbitals that are<br />

occupied in D 0 and eigenvalue 0 for the set of virtual orbitals.<br />

Equation 20 thus effectively divide the molecular orbitals<br />

into a set that is occupied and a set that is unoccupied,<br />

where the density D 0 is obtained from the occupied set,<br />

D 0 = C 0 occ C 0 occ T . 21<br />

Since F 0 is the gradient of E KS at D 0 , the step from Eq. 19<br />

for large is in the steepest-descent direction and will therefore<br />

give a decrease in the Kohn–Sham energy compared to<br />

the energy at D 0 . However, this TRRH line-search TRRH-<br />

LS algorithm is more expensive than the standard method,<br />

requiring the repeated construction of the Kohn–Sham matrix<br />

at each SCF iteration.<br />

B. Comparison of the Roothaan–Hall and Kohn–Sham<br />

energy functions<br />

To understand better our strategy for determining the<br />

level-shift parameter in the Kohn–Sham energy optimizations,<br />

we here examine the Roothaan–Hall model energy of<br />

Eq. 14 in more detail, comparing it with the true Kohn–<br />

Sham energy of Eq. 1. Expanding the Kohn–Sham and<br />

Roothaan–Hall energies about the reference density matrix<br />

D 0 and neglecting the differences between F 0 and FD 0 <br />

noted in Sec. II, we obtain<br />

E KS D = E KS D 0 +2TrFD 0 D − D 0 <br />

+TrD − D 0 GD − D 0 + E XC D − E XC D 0 <br />

−TrD − D 0 E 1 XC D 0 ,<br />

22<br />

E RH D = E RH D 0 +2TrFD 0 D − D 0 .<br />

23<br />

These expansions have the same first-order term 2 Tr FD 0 <br />

D−D 0 but different zero- and second-order terms. In an<br />

orthonormal MO basis, we may express any valid density<br />

matrix D in terms of the reference density matrix D 0 as<br />

DK = exp− KD 0 expK,<br />

24<br />

where the antisymmetric rotation matrix may be written in<br />

the form<br />

K = 0 − T<br />

. 25<br />

0<br />

The diagonal block matrices representing rotations among<br />

the occupied MOs and among the virtual MOs are zero since<br />

the density matrix in Eq. 10 is invariant to such rotations<br />

see Eq. 11. In terms of K, the first-order Roothaan–Hall<br />

and Kohn–Sham energies may be written as<br />

2TrFD 0 D − D 0 =2TrFD 0 <br />

exp− KD 0 expK − D 0 26<br />

and thus share a series of higher-order terms in K. If these<br />

shared higher-order terms are larger than the higher-order<br />

terms that occur only in the Kohn–Sham energy in Eq. 22,<br />

then the energy changes predicted by the Roothaan–Hall<br />

function in Eq. 23 will be a good approximation to the<br />

changes in the Kohn–Sham energy, even for large<br />

rotations K.<br />

Let us now compare the derivatives of the Roothaan–<br />

Hall and Kohn–Sham energies with respect to the orbital-<br />

Downloaded 23 Aug 2005 to 130.225.22.254. Redistribution subject to AIP license or copyright, see http://jcp.aip.org/jcp/copyright.jsp


074103-5 Trust-region self-consistent-field J. Chem. Phys. 123, 074103 2005<br />

rotation parameters ai in this paper, i, j, k, and l denote the<br />

occupied indices and a, b, c, and d denote the virtual indices.<br />

As already established, the two energy functions have<br />

the same gradients,<br />

E 1 KS ai =<br />

E 1 RH ai =<br />

E KS<br />

=−4F ai ,<br />

ai<br />

=0<br />

ERH<br />

<br />

ai<br />

=−4F ai . =0<br />

27a<br />

27b<br />

The Hessians are most conveniently expressed in a basis<br />

where the occupied-occupied and virtual-virtual blocks of<br />

the Kohn–Sham matrix are diagonal,<br />

F ab = ab a ,<br />

28a<br />

F ij = ij i .<br />

28b<br />

Since, at convergence where F is fully diagonal, the diagonal<br />

elements a and i become the orbital energies, we shall refer<br />

to these as the pseudo-orbital energies or sometimes just the<br />

orbital energies. In this basis, the Hessians of the two energy<br />

functions become<br />

E 2 KS aibj =<br />

2 E KS <br />

=4 ij ab a − i + M aibj ,<br />

ai bj=0<br />

29a<br />

E 2 RH aibj = 2 E RH <br />

=4 ij ab a − i , 29b<br />

ai bj=0<br />

where<br />

M aibj =16g aibj −4g abij + g ajib + E 2 XC D aibj . 30<br />

Clearly, the Roothaan–Hall Hessian in Eq. 29b is positive<br />

definite whenever the energies of the occupied orbitals are<br />

lower than the energies of the virtual orbitals, that is, whenever<br />

the HOMO-LUMO gap is positive. Furthermore, if the<br />

differences a − i in the Hessians are large compared to M aibj<br />

in Eq. 30, then E 2 RH<br />

is a good approximation to E 2 KS<br />

.<br />

C. Quadratically convergent trust-region optimization<br />

To minimize the Roothaan–Hall energy in Eq. 14, consider<br />

the second-order expansion in the orbital-rotation parameters<br />

,<br />

E RH 2 = E RH + T E 1 RH + 1 2 T E 2 RH .<br />

31<br />

The unconstrained Newton step is obtained by setting the<br />

gradient equal to zero,<br />

E 2<br />

RH<br />

<br />

= E RH<br />

1 + E 2 RH =0.<br />

32<br />

Solution of these equations yields the Newton step, with its<br />

fast second-order convergence in the local region. In the global<br />

region, far away from the true minimum, it is not reasonable<br />

to accept large steps since the expansion in Eq. 31 is<br />

only a valid approximation to E RH D for h, where h is<br />

the trust radius. Furthermore, if E 2 RH<br />

is indefinite, the Newton<br />

step in Eq. 32 may not reduce the energy. Therefore, if the<br />

Hessian is not positive definite or if the Newton step is too<br />

large, we solve instead a modified set of equations, where we<br />

minimize Eq. 31 subject to the constraint =h. To accomplish<br />

this, we introduce an undetermined multiplier <br />

and set up the Lagrangian<br />

L, = E RH 2 + 1 2 T − h 2 ,<br />

33<br />

whose stationary points are determined from the equation<br />

L,<br />

= E 1<br />

RH + E 2 RH + =0,<br />

34<br />

leading to the level-shifted Newton step,<br />

=−E 2 RH + I −1 E 1 RH .<br />

35<br />

The multiplier is chosen such that =h and such that the<br />

energy change predicted by E RH 2 is negative. Consider the<br />

first- and second-order changes of the Roothaan–Hall energy,<br />

E RH 1 − E RH = T E 1 RH =− T E 2 RH + I, 36a<br />

E RH 2 − E RH = T E 1 RH + 1 2 T E 2 RH <br />

=− 1 2 T E 2 RH + I − 1 2 T . 36b<br />

2<br />

If E RH<br />

is positive definite, both corrections are negative for<br />

2<br />

0; if E RH<br />

is indefinite, they are negative for − 1 ,<br />

where 1 is the lowest negative eigenvalue i.e., the HOMO-<br />

LUMO gap. In general, therefore, we choose such that<br />

max0,− 1 . As discussed in Ref. 6, it is always possible<br />

to find a level-shift parameter that satisfies this requirement.<br />

D. The quadratically convergent SCF method<br />

It is possible to optimize the Hartree–Fock and Kohn–<br />

Sham energies in Eq. 1 directly, without invoking the<br />

Roothaan–Hall energy function in Eq. 14. In the secondorder<br />

trust-region Newton method, the optimization then<br />

consists of a sequence of level-shifted Newton iterations. At<br />

each iteration, the linear equation in Eq. 35 is solved, replacing<br />

E RH<br />

1 2 1<br />

and E RH<br />

by E KS<br />

and E 2 KS<br />

, respectively. The<br />

resulting optimization scheme is known as the quadratically<br />

convergent SCF QC-SCF method. 13,14 The method is quadratically<br />

convergent in the local region and has a dynamic<br />

update of the trust region as discussed by Fletcher. 7<br />

E. The level-shift parameter in the TRRH method<br />

1. The global region<br />

A TRRH diagonalization step determined with =0 in<br />

Eq. 19 corresponds to the global minimum of E RH D.<br />

Therefore, when we impose the constraint in Eq. 16 on the<br />

difference between the old and new density matrices, then<br />

the step-size control is applied to a global optimization of<br />

E RH D. By contrast, in the quadratically convergent trustregion<br />

optimization of E RH in Eq. 35, step-size control<br />

is applied to a local model of E RH , that is, to the optimization<br />

of the second-order Taylor expansion of the energy<br />

E RH 2 in Eq. 31 inside the trust region.<br />

In the quadratically convergent trust-region method, we<br />

direct the step towards the minimum by choosing the level-<br />

Downloaded 23 Aug 2005 to 130.225.22.254. Redistribution subject to AIP license or copyright, see http://jcp.aip.org/jcp/copyright.jsp


074103-6 Thøgersen et al. J. Chem. Phys. 123, 074103 2005<br />

shift parameter in Eq. 35 such that the lowest diagonal<br />

element of the Hessian LUMO a − HOMO i + becomes positive.<br />

Alternatively, in the Kohn–Sham diagonalization step in Eq.<br />

19, we may ensure positive definiteness by monitoring the<br />

dependence of the pseudo-orbital energies on the levelshift<br />

parameter in Eq. 19, adjusting it such that the<br />

HOMO-LUMO gap,<br />

ai = LUMO a − HOMO i ,<br />

37<br />

becomes positive. The configuration that defines the HOMO-<br />

LUMO gap is identified from the eigenvalues of Eq. 20 that<br />

are equal to one. Insisting on a smooth development of the<br />

MOs from those that are occupied in D 0 to those that are<br />

obtained by diagonalizing Eq. 19, we restrict to the interval<br />

min , where min is the smallest value for<br />

which the HOMO-LUMO gap is positive. In addition, the<br />

step must be constrained such that Eq. 16 is fulfilled. In<br />

passing, we note that the reference density matrix D 0 may<br />

not always be idempotent, for example, it may be D¯ of Eq.<br />

13, in which case its eigenvalues are not exactly 1. In such<br />

cases, the matrix<br />

D¯ 0 idem = C 0 occ C 0 occ T 38<br />

constructed from the eigenvectors of Eq. 20 with D 0 replaced<br />

by D¯ represents a purification of D¯ .<br />

The constraint on the change in the AO density in Eq.<br />

16 refers to a change which may arise not only from small<br />

changes in many MOs but also from large changes in a few<br />

MOs or even in a single MO. In the TRRH algorithm, we<br />

shall require that the changes in the individual MOs are all<br />

small. Expanding the MO new i , obtained by diagonalization<br />

of Eq. 19, in the old MOs, we obtain<br />

new i = j<br />

old j new i old j + old a new i old a ,<br />

a<br />

39<br />

where the first summation is over the occupied MOs and the<br />

second over the virtual MOs. The squared norm of the projection<br />

of new i onto the MO space associated with D 0 is<br />

therefore<br />

a orb i = old j new i 2 .<br />

40<br />

j<br />

To ensure small individual MO changes at each iteration to<br />

within a unitary transformation of the occupied MOs, we<br />

shall therefore require<br />

orb = min<br />

a min<br />

i<br />

a orb i A orb min ,<br />

41<br />

where A orb min is close to 1. This constraint also ensures that the<br />

HOMO-LUMO gap in Eq. 37 stays positive.<br />

The Hessians of E RH and E KS in Eq. 29 both contain<br />

the orbital-energy difference term, while the Hessian of E KS<br />

also contains the terms M aibj of Eq. 30. When is large<br />

compared to the M aibj terms, the step generated by the levelshifted<br />

diagonalization in Eq. 19 is then of the same quality<br />

as that generated by a quadratically convergent trust-region<br />

optimization of E KS . However, since the step-size control in<br />

Eq. 22 is imposed on the global optimization, the quality of<br />

the step may be further improved relative to that obtained in<br />

a QC-SCF optimization of the Kohn–Sham energy. When the<br />

level shift is determined in the global region such that<br />

a orb min A orb min we see often not just this one orbital but many for<br />

which a orb i A orb min . In this way a large number of orbitals<br />

change significantly.<br />

2. The local region<br />

To investigate the local convergence of the TRRH algorithm<br />

in Eq. 19, we first note that, in the local region near<br />

convergence, the gradient in Eq. 6 and thus the blocks F ov<br />

and F vo between the occupied and virtual orbitals in the<br />

Kohn–Sham matrix in the representation of Eq. 28,<br />

F = o<br />

F ov<br />

F vo v<br />

, 42<br />

are small, see Eq. 27. Writing the unitary transformation of<br />

F generated by K in Eq. 25 as<br />

expKF exp− K = o<br />

F ov<br />

F vo v<br />

+ − T F vo − T v<br />

o F ov<br />

<br />

+ T − F ov o T<br />

+ O 2 , 43<br />

− v F vo <br />

we find that, to first order, the block diagonalization of the<br />

Kohn–Sham matrix may be accomplished by solving the following<br />

set of linear equations:<br />

F vo + o − v = 0.<br />

44<br />

Since these equations are identical to the Newton equation in<br />

Eq. 32, we conclude that, in the local region where the<br />

higher-order terms in may be neglected, the block diagonalization<br />

of the Kohn–Sham matrix is equivalent to the solution<br />

of the equation<br />

=−E 2 RH −1 E 1 RH .<br />

45<br />

Let these equations determine the step of iteration n and<br />

expand the Kohn–Sham gradient at iteration n+1 about iteration<br />

point n,<br />

1<br />

E KSn+1<br />

1<br />

= E KSn<br />

1<br />

= E KSn<br />

2<br />

+ E KSn n + O 2 <br />

2<br />

− E KSn<br />

2<br />

E RHn<br />

Using Eqs. 27 and 29, we then obtain<br />

1<br />

E KSn+1<br />

1<br />

= E KSn<br />

2<br />

− E RHn<br />

1<br />

−1 E RHn + O 2 . 46<br />

2<br />

+ M n E RHn −1 1<br />

E KSn<br />

2<br />

=−M n E RHn −1 1<br />

E KSn , 47<br />

having neglected terms proportional to O 2 . Therefore, if<br />

2<br />

M n E RHn<br />

−1 has eigenvalues larger than 1, a simple TRRH<br />

sequence will diverge. This is particularly a problem in the<br />

Kohn–Sham theory, where the HOMO-LUMO gap the lowest<br />

eigenvalue of E 2 RH<br />

often is small compared to the contribution<br />

from M. To improve upon the local convergence,<br />

we may increase the HOMO-LUMO gap by level shifting,<br />

thereby reducing the magnitude of the eigenvalues of M n<br />

2<br />

E RHn<br />

−1 . We note that, when the simple TRRH sequence<br />

diverges, the TRSCF algorithm may still converge as TRRH<br />

mainly serves to provide a new density and TRDSM then<br />

Downloaded 23 Aug 2005 to 130.225.22.254. Redistribution subject to AIP license or copyright, see http://jcp.aip.org/jcp/copyright.jsp


074103-7 Trust-region self-consistent-field J. Chem. Phys. 123, 074103 2005<br />

FIG. 1. a The HOMO-LUMO gap ai , b the minimum overlap a orb min of<br />

the new occupied orbitals with the previous set of occupied orbitals, and c<br />

the changes in the model energy E RH —- and the Kohn–Sham energy<br />

E RH KS ---. All as a function of the level-shift parameter in the TRRH step<br />

of the seventh iteration of the zinc complex calculation seen in Fig. 5.<br />

optimizes the combination of the various densities.<br />

F. Examples of the trust-region<br />

Roothaan–Hall algorithm<br />

To illustrate how the TRRH algorithm is employed in the<br />

different parts of a Kohn–Sham energy optimization, we here<br />

consider how the level-shift parameter is determined in two<br />

iterations of the zinc complex calculation depicted in Sec.<br />

VII, Fig. 5. We first consider iteration 7, which is in the<br />

global region of the optimization, and then proceed to iteration<br />

22, as an example of a step in the local region.<br />

In Figs. 1a and 1b, we have plotted the HOMOorb<br />

LUMO gap ai of Eq. 37 and the overlap parameter a min<br />

of Eq. 41, respectively, as functions of the level-shift parameter<br />

. The corresponding changes in the Kohn–Sham<br />

energy E RH KS dash line and in the Roothaan–Hall model<br />

energy E RH full line of Eqs. 22 and 23 are plotted<br />

in Fig. 1c. We note that the change in the Kohn–Sham<br />

energy has been calculated as<br />

E RH KS = E KS D − E KS D¯ 0 idem ,<br />

48<br />

where D and D¯ 0 idem are the density matrices calculated<br />

from the solutions to the eigenvalue problems in Eqs. 19<br />

and 20, respectively.<br />

FIG. 2. a The HOMO-LUMO gap ai , b the minimum overlap a orb min of<br />

the new occupied orbitals with the previous set of occupied orbitals, and c<br />

the changes in the model energy E RH —- and the Kohn–Sham energy<br />

E RH KS ---. All as a function of the level-shift parameter in the TRRH step<br />

of the 22nd iteration of the zinc complex calculation seen in Fig. 5.<br />

In Fig. 1a, we see that, in iteration 7, ai is linear<br />

for 2.2, as the density matrix changes smoothly with<br />

decreasing from that of Eq. 20 to that obtained by applying<br />

the Aufbau principle to the solution of Eq. 19. For <br />

2.2, the occupied and virtual orbitals defined by the previous<br />

density interchange. The value of =5.078 used in this<br />

iteration was chosen from the requirement a orb min =A orb min =0.98<br />

in Eq. 41, restricting the new orbital component to 0.02.<br />

Figure 1c shows that an even lower energy would have<br />

been obtained by reducing the level shift to about 2.4, but it<br />

would be very difficult to identify this optimal value of <br />

without constructing additional Kohn–Sham matrices, since<br />

the Roothaan–Hall model energy is not accurate for small .<br />

In short, the identification of from the overlap requirement<br />

a orb min =A orb min appears to be a good and secure way to control the<br />

step sizes in the optimization.<br />

Figures 2a–2c are equivalent to Figs. 1a–1c, but<br />

for iteration 22 in the local part of the optimizaton. Notably,<br />

the linear regime of ai in Fig. 2a now extends to<br />

include =0, which corresponds to an unconstrained<br />

Roothaan–Hall step. Also, since a orb min =1.0000 for =0, we<br />

can no longer determine the level shift from the overlap criterion<br />

a orb min =A orb min . From Fig. 2c, we see that E RH KS dash<br />

Downloaded 23 Aug 2005 to 130.225.22.254. Redistribution subject to AIP license or copyright, see http://jcp.aip.org/jcp/copyright.jsp


074103-8 Thøgersen et al. J. Chem. Phys. 123, 074103 2005<br />

TABLE I. Convergence details for the TRRH steps in the TRSCF calculation<br />

on the zinc complex in Fig. 5. Energies given in a.u.<br />

Iteration RH a orb min RH <br />

RH<br />

E KS<br />

line takes on its minimum value at =1.3; for smaller ,<br />

the energy increases, giving a total increase of 6.0·10 −5 E h<br />

for =0.<br />

The TRRH energy increase in the local part of the SCF<br />

optimization is particularly prominent for the DFT calculations.<br />

In the Hartree–Fock calculations, the TRRH model<br />

energy describes the SCF energy equally well in the local<br />

and global regions of the optimization. To avoid the increase<br />

in energy, we could add a constant minimum level shift, but<br />

this may in some cases slow down the convergence. Typically,<br />

the increase in the Kohn–Sham energy in the TRRH<br />

steps in the local region of the optimization is compensated<br />

by a larger energy decrease in the TRDSM step, ensuring<br />

an overall decrease in the Kohn–Sham energy in the iteration.<br />

In Table I, we have listed the values of several parameters<br />

characterizing the TRRH steps in the TRSCF iterations<br />

of the zinc complex calculation. In the first 17 iterations, the<br />

constraint a orb min =A orb min is active and determines the level-shift<br />

parameter. Note that, in the global region, E RH is a reasonable<br />

good approximation to E RH KS . After iteration 17, the<br />

local region of the Kohn–Sham energy optimization is approached<br />

and E RH is no longer a good approximation to<br />

E RH KS . In this region, the Kohn–Sham energy increases and it<br />

is the TRDSM algorithm that ensures the calculations convergence<br />

see Sec. VII, Table IV.<br />

IV. TRUST-REGION DENSITY-SUBSPACE<br />

MINIMIZATION IN DFT<br />

E RH<br />

2 22.57 0.994 −8.366 865 −8.411 913<br />

3 26.71 0.980 −20.122 850 −20.895 267<br />

4 30.54 0.980 −31.041 569 −35.286 269<br />

5 19.21 0.980 −27.278 985 −31.363 274<br />

6 10.31 0.980 −15.101 958 −18.277 717<br />

7 5.07 0.980 −10.675 155 −13.082 691<br />

8 2.96 0.980 −6.749 189 −7.197 438<br />

9 2.18 0.981 −3.181 254 −4.589 630<br />

10 4.68 0.980 0.394 694 −3.712 621<br />

11 1.40 0.980 −1.676 644 −2.885 580<br />

12 1.40 0.980 −1.743 634 −1.775 556<br />

13 0.93 0.980 −0.402 427 −0.843 260<br />

14 0.78 0.980 −0.376 675 −0.622 386<br />

15 0.54 0.981 −0.211 002 −0.227 722<br />

16 0.15 0.982 0.029 066 −0.199 268<br />

17 0.07 0.980 0.010 452 −0.068 243<br />

18 0.00 0.991 0.043 376 −0.037 071<br />

19 0.00 0.997 0.012 644 −0.009 493<br />

20 0.00 0.999 0.001 104 −0.000 931<br />

21 0.00 0.999 0.000 352 −0.000 249<br />

22 0.00 0.999 0.000 059 −0.000 049<br />

23 0.00 0.999 0.000 010 −0.000 006<br />

24 0.00 1.000 0.000 000 −0.000 000<br />

After a sequence of the Roothaan–Hall iterations, we<br />

have determined a set of the density matrices D i and a corresponding<br />

set of the Kohn–Sham matrices F i =FD i . The<br />

question then arises as to how to make the best use of the<br />

information contained in these collected density and Kohn–<br />

Sham matrices.<br />

A. Parametrization of the DSM density matrix<br />

Taking D 0 as the reference density matrix, we write the<br />

improved density matrix as a linear combination of the current<br />

and previous density matrices,<br />

n<br />

D¯ = D 0 + c i D i ,<br />

49<br />

i=0<br />

which, ideally, should satisfy the symmetry, trace, and idempotency<br />

conditions in Eq. 8 of a valid Kohn–Sham density<br />

matrix. Whereas the symmetry condition in Eq. 8a is trivially<br />

satisfied for any such linear combination, the trace condition<br />

in Eq. 8b holds only for combinations that satisfy the<br />

restriction<br />

n<br />

c i =0,<br />

50<br />

i=0<br />

leading to a set of n+1 constrained parameters c i with 0<br />

in. Alternatively, an unconstrained set of n parameters c i<br />

with 1in can be used, with c 0 defined so that the trace<br />

condition is fulfilled,<br />

n<br />

c 0 =− c i .<br />

51<br />

i=1<br />

In terms of these independent parameters, the density matrix<br />

D¯ becomes<br />

D¯ = D 0 + D + ,<br />

where we have introduced the notations<br />

n<br />

D + = c i D i0 ,<br />

i=1<br />

D i0 = D i − D 0 .<br />

52<br />

53a<br />

53b<br />

Unlike the symmetry and trace conditions in Eqs. 8a<br />

and 8b, the idempotency condition in Eq. 8c is in general<br />

not fulfilled for linear combinations of D i . Still, for any averaged<br />

density matrix D¯ in Eq. 52 that does not fulfill the<br />

idempotency condition, we may generate a purified density<br />

matrix with a smaller idempotency error by the<br />

transformation, 15<br />

D˜ =3D¯ SD¯ −2D¯ SD¯ SD¯ .<br />

54<br />

The purification of the density matrix has previously been<br />

used in connection with minimization of energy<br />

functions. 16–19<br />

Introducing the idempotency correction,<br />

D = D˜ − D¯ ,<br />

55<br />

we may then write the purified averaged density matrix in<br />

the form<br />

Downloaded 23 Aug 2005 to 130.225.22.254. Redistribution subject to AIP license or copyright, see http://jcp.aip.org/jcp/copyright.jsp


074103-9 Trust-region self-consistent-field J. Chem. Phys. 123, 074103 2005<br />

D˜ = D 0 + D + + D .<br />

56<br />

In the following, we shall analyze the relative magnitudes of<br />

the terms D + and D entering Eq. 56.<br />

B. Order analysis of the purified averaged<br />

density matrix<br />

For simplicity, we shall work in the orthonormal MO<br />

basis that diagonalizes the reference density matrix,<br />

D 0 = I 0<br />

57<br />

0 0,<br />

and consider the case with only one additional density matrix<br />

D 1 . According to Eq. 24, an antisymmetric matrix K of the<br />

form in Eq. 25 exists such that<br />

D 1 = exp− KD 0 expK<br />

= D 0 + − T − <br />

T<br />

− T + O 3 , 58<br />

giving rise to the following averaged density matrix:<br />

D¯ = D 0 + cD 10 = D 0 + − cT − c<br />

T<br />

− c c T + Oc 3 .<br />

The idempotency error of D¯ is given by<br />

59<br />

D¯ D¯ − D¯ = c 2 − c T 0<br />

0 T + Oc 4 , 60<br />

showing that D¯ is idempotent only to first order in . To<br />

reduce the idempotency error, we subject D¯ to the purification<br />

in Eq. 54, obtaining<br />

D˜ =3D¯ 2 −2D¯ 3 = D 0 + T − c2 T − c T<br />

− c c 2 + Oc 3 .<br />

<br />

Finally, comparing Eqs. 59 and 61, we obtain<br />

D˜ = D¯ + Oc 2 ,<br />

61<br />

62<br />

demonstrating that the impure and purified average density<br />

matrices differ by terms proportional to c 2 . Since the<br />

McWeeny purification in Eq. 54 converges quadratically,<br />

we conclude that the idempotency error of Eq. 62 is proportional<br />

to c 2 4 .<br />

In a more general analysis, we would not assume an<br />

orthonormal basis and we would also include several density<br />

matrices D i =exp−K i D 0 expK i . The essential result is<br />

then that we may write Eq. 56 as<br />

n<br />

D˜ = D 0 + <br />

i=1<br />

n<br />

c i D i0 + O c i D i0 2,<br />

i=1<br />

63<br />

where we have used the fact that D i0 is proportional to i .<br />

We conclude that while D + is linear in c i and D i0 , the idempotency<br />

correction D to D¯ is linear in c i but quadratic in D i0 .<br />

The conclusions to be derived from this analysis are summarized<br />

in Table II.<br />

TABLE II. Comparison of the properties of the unpurified density D¯ and the<br />

purified density D˜ .<br />

C. Construction of the DSM energy function<br />

Having established a useful parametrization of the averaged<br />

density matrix in Eq. 52 and having considered its<br />

purification in Eq. 54, let us now consider how to determine<br />

the best set of coefficients c i . Expanding the energy for<br />

the purified averaged density matrix in Eq. 56 around the<br />

reference density matrix D 0 , we obtain to second order<br />

ED˜ = ED 0 + D + + D T E 0<br />

1<br />

+ 1 2 D + + D T E 0 2 D + + D . 64<br />

To evaluate the terms containing E 0 1 and E 0 2 , we make the<br />

identifications,<br />

E 0 1 =2F 0 ,<br />

E 0 2 D + =2F + + OD + 2 ,<br />

65<br />

66<br />

which follow from Eq. 6 and from the second-order Taylor<br />

expansion of E 1 0<br />

about D 0 , and where we have generalized<br />

the notation in Eq. 53a to the Kohn–Sham matrix F +<br />

= n<br />

i=1<br />

c i F i0 . Ignoring the terms quadratic in D in Eq. 64<br />

and quadratic in D + in Eq. 66, we then obtain for the DSM<br />

energy,<br />

E DSM c = ED 0 +2TrD + F 0 +TrD + F +<br />

+2TrD F 0 +2TrD F + .<br />

67<br />

Finally, for a more compact notation, we introduce the<br />

weighted Kohn–Sham matrix,<br />

n<br />

F¯ = F0 + F + = F 0 + c i F i0 ,<br />

68<br />

i=1<br />

and find that the DSM energy may be written in the form<br />

E DSM c = ED¯ +2TrD F¯ ,<br />

69<br />

where the first term is quadratic in the expansion coefficients<br />

c i ,<br />

ED¯ = ED 0 +2TrD + F 0 +TrD + F + ,<br />

70<br />

and the second, idempotency-correction term is quartic in<br />

these coefficients:<br />

2TrD F¯ =Tr6D¯ SD¯ −4D¯ SD¯ SD¯ −2D¯ F¯ .<br />

D¯<br />

Differences D¯ −D 0 =Oc D˜ −D¯ =Oc 2 <br />

Idempotency error D¯ SD¯ −D¯ =Oc 2 D˜ SD˜ −D˜ =Oc 2 4 <br />

Trace error Tr D¯ S− N 2=0 TrD˜ S− N 2=Oc 2 4 <br />

71<br />

The derivatives of E DSM (c) are straightforwardly obtained<br />

by inserting the expansions of F¯ and D¯ , using the independent<br />

parameter representation.<br />

D˜<br />

Downloaded 23 Aug 2005 to 130.225.22.254. Redistribution subject to AIP license or copyright, see http://jcp.aip.org/jcp/copyright.jsp


074103-10 Thøgersen et al. J. Chem. Phys. 123, 074103 2005<br />

TABLE III. Convergence details for the TRDSM steps in the TRSCF calculation<br />

on the zinc complex in Fig. 5. Energies given in a.u.<br />

Iteration D S<br />

2<br />

D + S<br />

2<br />

DSM<br />

E KS<br />

E DSM<br />

3 1.612 753 6.129 310 −48.255 717 −49.742 656<br />

4 1.488 082 12.140 844 −105.996 850 −111.554 301<br />

5 0.206 716 1.594 214 −43.136 482 −41.110 879<br />

6 1.504 099 3.162 679 −26.390 457 −26.511 025<br />

7 0.096 714 1.468 925 −14.755 377 −14.499 582<br />

8 0.110 282 1.525 848 −7.711 220 −7.278 600<br />

9 0.086 759 1.569 113 −5.289 340 −5.165 696<br />

10 0.423 825 1.614 867 −2.684 359 −3.500 173<br />

11 0.196 628 1.002 744 −1.053 899 −1.126 867<br />

12 0.111 409 0.867 238 −1.054 903 −0.936 180<br />

13 0.093 520 0.729 574 −0.658 907 −0.621 180<br />

14 0.054 596 0.324 338 −0.293 889 −0.238 992<br />

15 0.045 721 0.201 434 −0.213 251 −0.170 060<br />

16 0.026 474 0.242 928 −0.104 012 −0.096 482<br />

17 0.011 746 0.071 203 −0.100 694 −0.093 602<br />

18 0.001 512 0.022 758 −0.043 180 −0.042 748<br />

19 0.000 687 0.040 675 −0.057 441 −0.056 819<br />

20 0.000 122 0.011 897 −0.016 501 −0.016 416<br />

21 0.000 025 0.001 164 −0.001 471 −0.001 453<br />

22 0.000 001 0.000 308 −0.000 428 −0.000 427<br />

23 0.000 000 0.000 050 −0.000 076 −0.000 076<br />

24 0.000 000 0.000 009 −0.000 012 −0.000 012<br />

25 0.000 000 0.000 000 −0.000 000 −0.000 000<br />

D. Optimization of the DSM energy<br />

The energy function E DSM c in Eq. 69 provides an<br />

excellent approximation to the exact Kohn–Sham energy<br />

E KS c about D 0 , with an error cubic in D + . It can be optimized<br />

by the trust-region method, as described in Ref. 6,<br />

yielding an improved density matrix D˜ , from which the<br />

Kohn–Sham matrix of the next TRRH iteration is constructed.<br />

However, to avoid the expensive calculation of the<br />

Kohn–Sham matrix from D˜ , we use instead in our TRDSM<br />

implementation the averaged Kohn–Sham matrix in Eq. 68.<br />

As in the TRRH step in Sec. III A, the averaged density<br />

matrix D¯ may also be determined by a line search. Here, the<br />

line search is made in the direction defined by the first step<br />

of the TRDSM algorithm, that is, the step at the expansion<br />

point D 0 . As in the TRRH step, such a line search is guaranteed<br />

to reduce the Kohn–Sham energy. We denote this line<br />

search algorithm TRDSM-LS.<br />

In the DSM scheme, we assume that the idempotency<br />

correction D =D˜ −D¯ is small relative to D + =D¯ −D 0 , both<br />

when discarding the terms quadratic in D in Eq. 64 and<br />

when constructing the Kohn–Sham matrix from D¯ rather<br />

than from D˜ in the subsequent Roothaan–Hall iteration. As is<br />

seen from Eq. 63, this assumption holds if the old density<br />

matrices D i are similar to D 0 . Formally, therefore, we should<br />

include in the TRDSM only density matrices that are similar<br />

to D 0 . In particular, if the orbital occupations change in the<br />

course of the Roothaan–Hall iterations, we should discard all<br />

density matrices that represent the old occupations.<br />

To demonstrate the validity of the assumption, that D is<br />

small compared to D + , we have in Table III listed D S<br />

2<br />

FIG. 3. The ratio between the norms of the idempotency correction to the<br />

density D S 2 =D˜ −D¯ S 2 and the density change D + S 2 =D¯ −D 0 S 2 in the<br />

TRDSM steps of the zinc complex calculation seen in Fig. 5.<br />

=TrD SD S and D + 2 S =TrD + SD + S at each iteration of the<br />

zinc complex calculation of Sec. VII. From Fig. 3, where the<br />

ratio D 2 S /D + 2 S is plotted, we see that, apart from iteration<br />

6, this ratio is always smaller than 0.3 and that it rapidly<br />

converges to zero in the local region. The neglect of the<br />

terms that are quadratic in D in the TRDSM method is thus<br />

well justified. In Table III, we have also listed the model<br />

energy change E DSM and the actual energy change E DSM KS ,<br />

obtained as the difference between the Kohn–Sham energies<br />

calculated from the idempotent D¯ obtained as in Eq. 38<br />

and from D 0 : E DSM KS =E KS D¯ 0 idem −E KS D 0 . Clearly,<br />

E DSM c is an extremely good representation of E KS c for<br />

the step sizes taken by the TRDSM algorithm, as expected<br />

since E DSM c and E KS c differ in terms that are cubic in D + .<br />

E. Comparison of the DSM and EDIIS energies<br />

Neglecting the idempotency correction in the DSM energy<br />

in Eq. 69, we are left with ED¯ . In the Hartree–Fock<br />

theory, this remaining term may be expressed in several<br />

equivalent ways. First, it may be written as the energy of the<br />

weighted density matrix,<br />

E HF D¯ =2TrhD¯ +TrD¯ GD¯ ,<br />

72<br />

where the weighted density matrix is defined as note the<br />

difference from Eq. 49<br />

n<br />

D¯ = d i D i ,<br />

i=0<br />

n<br />

d i =1.<br />

i=0<br />

73<br />

In their development of the EDIIS method, Kudin et al. 4<br />

suggested the alternative form<br />

n<br />

E EDIIS D¯ = d i E SCF D i − 1 n<br />

i=0<br />

2 Tr d i d j F ij D ij , 74<br />

i,j=0<br />

where E SCF D may be the Hartree–Fock energy or the<br />

Kohn–Sham energy. In the Hartree–Fock theory, Eqs. 70,<br />

72, and 74 are equivalent since the Fock matrix is linear<br />

in the density matrix. By contrast, in the DFT, where the<br />

Kohn–Sham matrix contains terms that are nonlinear in the<br />

density matrix, these expressions are not equivalent. Below,<br />

we discuss some of the consequences of their nonequivalence<br />

in the DFT.<br />

Downloaded 23 Aug 2005 to 130.225.22.254. Redistribution subject to AIP license or copyright, see http://jcp.aip.org/jcp/copyright.jsp


074103-11 Trust-region self-consistent-field J. Chem. Phys. 123, 074103 2005<br />

Eliminating d 0 =1− n<br />

i=1 d i from Eq. 74, we may express<br />

the EDIIS energy in the independent representation of<br />

Eqs. 52 and 53,<br />

n<br />

E EDIIS D¯ = E SCF D 0 + d i E SCF D i − E SCF D 0 <br />

i=1<br />

n<br />

− <br />

i=1<br />

n<br />

d i Tr F i0 D i0 + d i d j Tr F j0 D j0<br />

n<br />

i,j=1<br />

− 1 d i d j Tr F ij D ij .<br />

2<br />

i,j=1<br />

75<br />

Comparing this expression with ED¯ of Eq. 70, wefind<br />

that they have the same values at the expansion point D 0 but<br />

that their first derivatives differ since<br />

ED¯ <br />

c k<br />

=2TrF 0 D k0 , 76a<br />

E EDIIS D¯ <br />

= E SCF D k − E SCF D 0 −TrF k0 D k0 . 76b<br />

d k<br />

In the Hartree–Fock theory, it is easy to see that Eqs. 76a<br />

and 76b are identical.<br />

The DSM gradient is<br />

E DSM c ED¯ <br />

= +2 Tr D F¯<br />

. 77<br />

c k c k c k<br />

Since E DSM is equal to E KS to first order, we have that<br />

E DSM c<br />

= E KS<br />

. 78<br />

c k c k<br />

The EDIIS gradient at the expansion point is thus not equal<br />

to the KS gradient as the last nonzero term in Eq. 77 the<br />

term resulting from the idempotency correction is missing.<br />

Further the correct gradient in the DSM can only be obtained<br />

in the DFT if Eq. 76a and not Eq. 76b is used. It is thus<br />

incorrect to use Eq. 76a in the DFT even though Eqs. 76a<br />

and 76b are equivalent in Hartree–Fock.<br />

V. CONFIGURATION SHIFT<br />

IN THE TRSCF ALGORITHM<br />

Since the TRSCF method has been designed for a<br />

smooth and controlled convergence of the density matrix, it<br />

does not allow for the abrupt changes in the orbitals associated<br />

with configuration shifts. Nevertheless, it may sometimes<br />

be advantageous to allow such shifts, as illustrated in<br />

Fig. 4, where we compare two cadmium complex calculations<br />

see Sec. VII for details. The “no-shift” optimization<br />

proceeds carefully, allowing only small changes in the density<br />

matrix at each iteration, whereas the “do-shift” optimization<br />

is more daring, accepting abrupt configuration shifts<br />

that reduce the total energy.<br />

In Fig. 4a, we have plotted the error in the energy at<br />

each iteration of the two optimizations. The first 13 iterations<br />

are identical; the optimizations are in the global region and<br />

orb<br />

the level shift is determined from the requirement a min<br />

FIG. 4. The TRSCF cadmium complex calculation described in Sec. VII. a<br />

The convergence without abrupt configuration shift and with abrupt<br />

configuration shift . b and c contain details of the TRRH step in<br />

iteration 14; b the minimum overlap a orb<br />

min for the new occupied orbitals<br />

with the previous set of occupied orbitals and c the changes in the model<br />

energy E RH — and the actual energy E RH KS ---. All as a function of the<br />

level-shift parameter .<br />

=A orb min =0.98. In iteration 14, the two optimizations differ. To<br />

understand the reasons for these differences, we have in Fig.<br />

4b plotted a orb min and in Fig. 4c E RH full line and<br />

E RH KS dash line as functions of . For =0.25, there is an<br />

abrupt shift in a orb min from 0.99 to 0.00, representing a configuration<br />

shift where the LUMO for 0.25 becomes the<br />

HOMO for 0.25. From Fig. 4c, we see that this shift<br />

lowers the Kohn–Sham total energy. Because of the abrupt<br />

change in a orb min at =0.25, we are unable to identify<br />

a orb min =0.98. In the no-shift calculation, is chosen larger<br />

than 0.25, whereas, in the do-shift calculation, the undamped<br />

Roothaan–Hall step is taken with =0.<br />

As the DSM energy model assumes small changes in the<br />

density matrix, the density matrices of all previous iterations<br />

are discarded in iteration 14 of the do-shift calculation, and a<br />

rapid convergence to the optimized state is seen from that<br />

point. In the no-shift calculation, an a orb min profile similar to<br />

that of iteration 14 is obtained in the next few iterations. In<br />

these iterations, the lowest Hessian eigenvalue is −0.95 a.u.<br />

and the optimization proceeds towards a stationary point.<br />

Finally, in iteration 22, the TRSCF algorithm identifies this<br />

stationary point as a saddle point, moves out of this region,<br />

and converges rapidly to the same minimum as the do-shift<br />

optimization.<br />

As this example illustrates, it is important to recognize<br />

Downloaded 23 Aug 2005 to 130.225.22.254. Redistribution subject to AIP license or copyright, see http://jcp.aip.org/jcp/copyright.jsp


074103-12 Thøgersen et al. J. Chem. Phys. 123, 074103 2005<br />

and accept a favorable configuration shift. A configuration<br />

shift may be recognized when an a orb min profile has an<br />

abrupt change where on the right-hand side a orb min is close to 1<br />

and on the left-hand side a orb min is close to 0. To maintain the<br />

high degree of control characteristic of the TRSCF method,<br />

the energy of the new configuration is checked before the<br />

shift is accepted, at the cost of an additional Kohn–Sham<br />

matrix build. As seen from Fig. 4a, this check is well worth<br />

the effort, saving more than ten iterations, and thus it is made<br />

an integrated part of our TRSCF implementation.<br />

VI. THE DIIS METHOD VIEWED<br />

AS A QUASI-NEWTON METHOD<br />

Since its introduction by Pulay in 1980, the DIIS method<br />

has been extensively and successfully used to accelerate the<br />

convergence of SCF optimizations. We here present a rederivation<br />

of the DIIS method to demonstrate that, in the iterative<br />

subspace of density matrices, it is equivalent to a quasi-<br />

Newton method. From this observation, we conclude that, in<br />

the local region of the SCF optimization, the DIIS steps can<br />

be used safely and will lead to fast convergence. The convergence<br />

of the DIIS algorithm in the global region is also<br />

discussed and is much more unpredictable.<br />

We assume that, in the course of the SCF optimization,<br />

we have determined a set of n+1 AO density matrices<br />

D 0 ,D 1 ,D 2 ,...,D n and the associated Kohn–Sham or Fock<br />

matrices FD 0 ,FD 1 ,FD 2 ,...,FD n . Since the electronic<br />

gradient gD is given by 11<br />

gD =4SDFD − FDDS,<br />

79<br />

we also have available the corresponding gradients<br />

gD 0 ,gD 1 ,gD 2 ,...,gD n . We now wish to determine a<br />

corrected density matrix,<br />

n<br />

D¯ = D 0 + c i D i0 , D i0 = D i − D 0 , 80<br />

i=1<br />

that minimizes the norm of the gradient gD¯ . For this purpose,<br />

we parameterize the density matrix in terms of an antisymmetric<br />

matrix X=−X T and the current density matrix<br />

D 0 as 11<br />

DX = exp− XSD 0 expSX.<br />

81<br />

With each old density matrix D i , we now associate an antisymmetric<br />

matrix X i such that<br />

D i = exp− X i SD 0 expSX i = D 0 + D 0 ,X i S + OX 2 i .<br />

82<br />

Introducing the averaged antisymmetric matrix,<br />

n<br />

X¯ = c i X i ,<br />

i=1<br />

we obtain<br />

83<br />

n<br />

DX¯ = D 0 + c i D 0 ,X i S + OX¯ 2 ,<br />

i=1<br />

84<br />

where we have used the S-commutator expansion of DX¯ <br />

analogeous to Eq. 82. Our task is hence to determine X¯ in<br />

Eq. 83 such that DX¯ minimizes the gradient norm<br />

gDX¯ . In passing, we note that, whereas D¯ is not in<br />

general idempotent and therefore not a valid density matrix,<br />

DX¯ is a valid, idempotent density matrix for all choices of<br />

c i .<br />

Expanding the gradient in Eq. 79 about the currentdensity<br />

matrix D 0 , we obtain<br />

gDX¯ = gD 0 + HD 0 X¯ + OX¯ 2 ,<br />

85<br />

where HD is the Jacobian matrix. Neglecting the higherorder<br />

terms, our task is therefore to minimize the norm of the<br />

gradient,<br />

n<br />

gc = gD 0 + c i HD 0 X i ,<br />

86<br />

i=1<br />

with respect to the elements of c. For an estimate of<br />

HD 0 X i , we truncate the expansion,<br />

gD i = gD 0 + HD 0 X i + OX i 2 ,<br />

and obtain the quasi-Newton condition,<br />

gD i − gD 0 = HD 0 X i .<br />

Inserting this condition into Eq. 86, we obtain<br />

n<br />

gc = gD 0 + <br />

i=1<br />

n<br />

c i gD i − gD 0 = c i gD i ,<br />

i=0<br />

87<br />

88<br />

89<br />

where we have introduced the parameter c 0 =1− n<br />

i=1 c i . The<br />

minimization of gc=gc may therefore be carried out as<br />

a least-squares minimization of gc in Eq. 89 subject to the<br />

constraint<br />

n<br />

c i =1.<br />

90<br />

i=0<br />

If we consider gD i as an error vector for the density matrix<br />

D i , this procedure becomes identical to the DIIS method.<br />

From Eq. 86 we also see that DIIS may be viewed as a<br />

minimization of the residual for the Newton equation in the<br />

subspace of the density matrix differences D i −D 0 , i=1, n,<br />

where the quasi-Newton condition is used to set up the subspace<br />

equations. Since the quasi-Newton steps are reliable<br />

only in the local region of the optimization, we conclude that<br />

the DIIS method can be used safely only in this region, when<br />

the electronic Hessian is positive definite.<br />

The optimal combination of the density matrices is obtained<br />

in the DIIS method, by carrying out a least-squares<br />

minimization of the gradient norm subject to the constraint in<br />

Eq. 90. However, since a small gradient norm in the global<br />

region does not necessarily imply a low Kohn–Sham energy,<br />

the DIIS convergence may be unpredictable. Furthermore,<br />

we may encounter regions where the gradient norms are<br />

Downloaded 23 Aug 2005 to 130.225.22.254. Redistribution subject to AIP license or copyright, see http://jcp.aip.org/jcp/copyright.jsp


074103-13 Trust-region self-consistent-field J. Chem. Phys. 123, 074103 2005<br />

ethylenediamine tetra-acetic acid EDTA. Next, in Sec.<br />

VII B, we consider the calculations on five different systems.<br />

All calculations have been carried out with a local version of<br />

the DALTON program package. 20 Unless otherwise indicated,<br />

the starting orbitals have been obtained by diagonalization of<br />

the one-electron Hamiltonian.<br />

FIG. 5. The convergence of different algorithms in a LDA/6-31G computation<br />

with core Hamiltonian start guess for the zinc complex depicted in the<br />

lower left corner. The algorithms being QC-SCF , DIIS , TRSCF<br />

, and TRSCF-LS .<br />

similar but the energies different. The DIIS method may then<br />

diverge, not being able to identify the density matrix of lowest<br />

energy, as illustrated in Sec. VII.<br />

VII. APPLICATIONS<br />

to<br />

In this section, we give numerical examples to illustrate<br />

the convergence characteristics of the Kohn–Sham TRSCF<br />

calculations, comparing with the DIIS and QC-SCF calculations.<br />

Comparisons are also made with the TRSCF-LS technique,<br />

where the TRRH-LS and TRDSM-LS line-search<br />

methods of Secs. III A and IV D are combined to set up an<br />

expensive but highly robust method, in which the lowest<br />

Kohn–Sham energy is identified by a line search at each step.<br />

In Sec. VII A, we discuss the calculations on the zinc complex<br />

in Fig. 5, where Zn 2+ is complexated with<br />

ethylenediamine-N, N -disuccinic acid EDDS, an isomer<br />

<br />

A. Calculations on the zinc complex<br />

In Fig. 5, we have plotted the error in the Kohn–Sham<br />

energy at each iteration of LDA/6-31G calculations on the<br />

zinc complex. The standard TRSCF method performs<br />

almost as well as the very smooth but much more expensive<br />

TRSCF-LS method , giving a somewhat higher energy<br />

between iterations 13 and 22. By contrast, the DIIS method<br />

shows no sign of converging; after 100 iterations, the<br />

Kohn–Sham gradient norm is still about 20. Whereas the<br />

smooth TRSCF convergence arises because Hessian information<br />

is used to ensure downhill TRRH and TRDSM steps<br />

at each iteration, no such information is employed in the<br />

DIIS method. Finally, the QC-SCF method converges<br />

but exceedingly slow—even after 90 iterations it has not<br />

reached the quadratically convergent local region! The difficulties<br />

experienced with the QC-SCF method illustrate<br />

clearly that the use of Hessian information by itself is no<br />

guarantee of fast convergence.<br />

More details about the TRSCF zinc complex calculation<br />

are given in Tables I–V and in Figs. 1–3 and 6, partly discussed<br />

in Secs. III F and IV D. In Table IV, we have listed<br />

the changes in the Kohn–Sham energy generated separately<br />

in the TRRH E RH KS and TRDSM E DSM KS steps at each<br />

SCF iteration, and likewise the norms of the changes in the<br />

TABLE IV. Convergence details for the TRSCF calculation on the zinc complex in Fig. 5. Energies given in a.u.<br />

DSM<br />

Iteration E KS E KS<br />

RH<br />

E KS<br />

2<br />

D¯ n−D n S DSM<br />

D n+1 −D¯ n S<br />

2<br />

2 −8.366 865 0.000 000 −8.366 865 0.000 000 0.197 607<br />

3 −68.378 567 −48.255 717 −20.122 850 6.129 310 1.141 536<br />

4 −137.038 420 −105.996 850 −31.041 569 12.140 844 1.265 250<br />

5 −70.415 468 −43.136 482 −27.278 985 1.594 214 1.031 844<br />

6 −41.492 416 −26.390 457 −15.101 958 3.162 679 1.467 802<br />

7 −25.430 533 −14.755 377 −10.675 155 1.468 925 1.364 944<br />

8 −14.460 409 −7.711 220 −6.749 189 1.525 848 1.249 827<br />

9 −8.470 594 −5.289 340 −3.181 254 1.569 113 1.040 337<br />

10 −2.289 664 −2.684 359 0.394 694 1.614 867 0.817 844<br />

11 −2.730 543 −1.053 899 −1.676 644 1.002 744 1.060 298<br />

12 −2.798 537 −1.054 903 −1.743 634 0.867 238 0.632 009<br />

13 −1.061 335 −0.658 907 −0.402 427 0.729 574 0.410 434<br />

14 −0.670 565 −0.293 889 −0.376 675 0.324 338 0.351 715<br />

15 −0.424 253 −0.213 251 −0.211 002 0.201 434 0.203 170<br />

16 −0.074 945 −0.104 012 0.029 066 0.242 928 0.302 723<br />

17 −0.090 241 −0.100 694 0.010 452 0.071 203 0.175 917<br />

18 0.000 195 −0.043 180 0.043 376 0.022 758 0.126 709<br />

19 −0.044 797 −0.057 441 0.012 644 0.047 885 0.032 787<br />

20 −0.015 396 −0.016 501 0.001 104 0.011 897 0.002 976<br />

21 −0.001 118 −0.001 471 0.000 352 0.001 164 0.000 668<br />

22 −0.000 368 −0.000 428 0.000 059 0.000 308 0.000 111<br />

23 −0.000 066 −0.000 076 0.000 010 0.000 050 0.000 019<br />

24 −0.000 011 −0.000 012 0.000 000 0.000 009 0.000 001<br />

25 −0.000 000 −0.000 000 0.000 000 0.000 000 0.000 000<br />

Downloaded 23 Aug 2005 to 130.225.22.254. Redistribution subject to AIP license or copyright, see http://jcp.aip.org/jcp/copyright.jsp


074103-14 Thøgersen et al. J. Chem. Phys. 123, 074103 2005<br />

TABLE V. The density of each iteration compared to the optimized one.<br />

Iteration D conv −D n S<br />

2<br />

a orb min conv,n<br />

2 66.952 673 0.0965<br />

3 65.174 713 0.0955<br />

4 56.502 973 0.0927<br />

5 51.210 143 0.1017<br />

6 48.482 773 0.1411<br />

7 42.682 641 0.1394<br />

8 35.617 332 0.1992<br />

9 26.551 913 0.3183<br />

10 18.298 431 0.4094<br />

11 14.152 342 0.4983<br />

12 9.767 169 0.6927<br />

13 6.184 621 0.6859<br />

14 3.844 299 0.9187<br />

15 2.240 436 0.9194<br />

16 1.018 810 0.9771<br />

17 0.200 374 0.9952<br />

18 0.064 181 0.9984<br />

19 0.043 906 0.9967<br />

20 0.011 531 0.9996<br />

21 0.001 092 0.9999<br />

22 0.000 309 0.9999<br />

23 0.000 053 0.9999<br />

24 0.000 009 0.9999<br />

25 0.000 000 0.9999<br />

2<br />

density matrix in the TRRH D n+1 −D¯ n S RH<br />

and TRDSM<br />

2<br />

D¯ n−D n S DSM<br />

steps. Remarkably, the TRDSM step consistently<br />

reduces the energy more than the TRRH step. Indeed,<br />

after iteration 15, each TRRH step increases rather<br />

than decreases the energy. Apparently, in the local region, the<br />

role of the TRRH step is reduced to that of improving that<br />

variational space of the subsequent TRDSM step. From the<br />

table, we also see that the largest changes in the density<br />

matrix are generated by the TRDSM step rather than by the<br />

TRRH step.<br />

For the TRRH and TRDSM steps, we have at each iteration<br />

determined the overlap a orb i in Eq. 40 of each generated<br />

occupied orbital new i with the previous orbitals old j . In Fig.<br />

6, the number of orbitals at each iteration with a orb i 0.98<br />

i.e., with large rotations is illustrated in a bar chart. As we<br />

require a orb i 0.98 in the Roothaan–Hall steps, the TRRH<br />

FIG. 6. The number of occupied orbitals in the TRRH and TRDSM steps<br />

with an overlap less than 0.98 to the previous set of occupied orbitals for<br />

each step in the SCF iteration.<br />

orb<br />

bars simply represent the number of orbitals with a i<br />

0.98. In the TRDSM step, however, no such restrictions<br />

are imposed and a large number of orbitals with a orb i 0.98<br />

are observed. Indeed, in the first few DSM steps, overlaps as<br />

small as 0.76 occur, leading to far larger changes than those<br />

accepted in the Roothaan–Hall step, emphasizing the important<br />

role played by the TRDSM step in achieving orbital<br />

reorganizations in a controlled manner.<br />

In Table V, we have listed the norm of the difference<br />

between the current-density matrix D n at each iteration and<br />

the final converged density matrix D conv ; also, we have listed<br />

a orb min conv,n, which is the smallest overlap in the sense of<br />

Eq. 41 of the current occupied orbitals, with the converged<br />

ones. Clearly, very large changes occur in the density matrix<br />

and the orbitals in the course of the optimization, in particular,<br />

during the first 17 iterations; in the remaining iterations,<br />

only small adjustments are made. In spite of the large overall<br />

changes made to the orbitals, they have been accomplished<br />

in a controlled and reliable manner.<br />

In Fig. 7, we have plotted the errors for the same LDA/<br />

6-31G optimization as in Fig. 5, but with the starting orbitals<br />

obtained from a Hückel calculation rather than from the diagonalization<br />

of the one-electron Hamiltonian. Convergence<br />

is now faster, with the TRSCF-LS and TRSCF methods<br />

behaving in the same smooth manner as before. More<br />

importantly, with this improved starting guess, the DIIS<br />

method converges in almost the same number of iterations<br />

as the TRSCF method, although less smoothly.<br />

Finally, in Fig. 8, we have the same plot as in Fig. 7, but<br />

in the STO-3G rather than 6-31G basis still with a Hückel<br />

guess. Somewhat surprisingly, convergence is more difficult<br />

in this smaller basis. Indeed, after 100 iterations, the DIIS<br />

method has not yet converged, with a Kohn–Sham gradient<br />

norm as large as 10. The standard TRSCF method <br />

still converges, but now in a less smooth manner than the<br />

TRSCF-LS method. As mentioned in Sec. III E 2, when<br />

the HOMO-LUMO gap is particularly small, it may sometimes<br />

be necessary to enforce a minimum TRRH level shift<br />

to achieve convergence. Indeed, in the TRSCF optimization<br />

in Fig. 8, we require 0.1 throughout the calculation.<br />

B. Calculations on a variety of molecules<br />

In Fig. 9, we have plotted the errors in the energy at each<br />

SCF iteration, for a variety of molecules at the LDA level of<br />

theory: the zinc complex from Fig. 5 in the 6-31G basis<br />

set; the rhodium complex from Ref. 6 in the Ahlrichs-<br />

VDZ basis 21 with STO-3G on the rhodium atom; a cadmium<br />

complexed with an imidazole ring in the STO-3G basis;<br />

the CH 3 CHO molecule in the cc-pVTZ basis, 22 and the<br />

H 2 O molecule in the cc-pVTZ basis.<br />

For the TRSCF-LS method, convergence is smooth for<br />

all systems, as expected. Likewise, in the TRSCF calculations<br />

with no restrictions enforced on the TRRH level-shift<br />

parameter, convergence is still good although not as smooth<br />

as in the TRSCF-LS calculations. The behavior of the DIIS<br />

method is somewhat more erratic, in particular, in the global<br />

region; in the local region, it converges as well as the TRSCF<br />

Downloaded 23 Aug 2005 to 130.225.22.254. Redistribution subject to AIP license or copyright, see http://jcp.aip.org/jcp/copyright.jsp


074103-15 Trust-region self-consistent-field J. Chem. Phys. 123, 074103 2005<br />

FIG. 7. The convergence of different algorithms in a LDA/6-31G computation<br />

with Hückel start guess for the zinc complex in Fig. 5. The algorithms<br />

being DIIS , TRSCF , and TRSCF-LS .<br />

method. These observations are in agreement with our discussion<br />

in Sec. VI. The DIIS zinc complex calculation does<br />

not converge as discussed above.<br />

In Fig. 9, we have also included the results from the<br />

DIIS-TRRH optimizations. These calculations differ from<br />

the DIIS calculations in that we have used a level-shift parameter<br />

in the Roothaan–Hall diagonalization step; alternatively,<br />

DIIS-TRRH may be viewed as different from TRSCF<br />

in that we have replaced the TRDSM steps by DIIS steps.<br />

Somewhat surprisingly, only the water calculation converges<br />

with the DIIS-TRRH method. To understand this behavior,<br />

we note that, in the global region, the TRRH method typically<br />

produces gradients that do not change much, even<br />

though large changes may occur in the energy. In such cases,<br />

the DIIS method may stall, not being able to identify a good<br />

combination of density matrices.<br />

This behavior is illustrated in Table VI, where we have<br />

listed the gradient norm and Kohn–Sham energy of the first<br />

six iterations of the cadmium complex calculation in Fig. 9.<br />

The TRSCF and DIIS-TRRH gradients stay almost the same<br />

during these iterations, stalling the DIIS-TRRH optimization<br />

but not the TRSCF optimization, whose energy decreases in<br />

each iteration. In the pure DIIS optimization, by contrast, the<br />

gradient changes significantly from iteration to iteration; at<br />

the same time, the energy decreases at each iteration except<br />

the fifth, where also the gradient norm increases. Eventually,<br />

DIIS enters the local region with its rapid rate of convergence<br />

although we note, in the DIIS panel in Fig. 9, a sudden,<br />

large increase in the energy for the cadmium complex<br />

FIG. 9. The convergence in LDA calculations for a variety of molecules<br />

using the TRSCF-LS, TRSCF, DIIS, and DIIS-TRRH approaches, respectively.<br />

The molecules being a zinc complex , rhodium complex ,<br />

cadmium complex , CH 3 CHO , and H 2 O .<br />

calculation in iterations 10 and 11. However, these<br />

changes are accompanied with large increases in the gradient<br />

norm, allowing DIIS to recover safely.<br />

VIII. CONCLUSIONS<br />

FIG. 8. The convergence of different algorithms in a LDA/STO-3G computation<br />

with Hückel start guess for the zinc complex in Fig. 5. The algorithms<br />

being DIIS , TRSCF , and TRSCF-LS .<br />

In this paper, the trust-region SCF TRSCF algorithm<br />

introduced in Ref. 6 has been further developed to make it<br />

applicable to the optimization of the Kohn–Sham energy. In<br />

the TRSCF method, both the Roothaan–Hall step and the<br />

density-subspace minimization DSM step are replaced by<br />

optimizations of local energy models of the Hartree–Fock/<br />

Kohn–Sham energy E SCF . These local models have the same<br />

gradient as the true energy E SCF but an approximate Hessian.<br />

Restricting the steps of the TRSCF algorithm to the trust<br />

region of these local models, that is, to the region where the<br />

local models approximate E SCF well, smooth and fast convergence<br />

may be obtained to the optimized energy.<br />

Downloaded 23 Aug 2005 to 130.225.22.254. Redistribution subject to AIP license or copyright, see http://jcp.aip.org/jcp/copyright.jsp


074103-16 Thøgersen et al. J. Chem. Phys. 123, 074103 2005<br />

TABLE VI. The gradient norm g=4SDF−FDS in the first six iterations of the cadmium complex calculations<br />

seen in Fig. 9.<br />

DIIS DIIS-TRRH TRSCF<br />

Iteration<br />

E KS g E KS g E KS g<br />

1 −5597.0 7.8 −5597.0 7.8 −5597.0 7.8<br />

2 −5502.3 14.9 −5598.4 7.2 −5598.3 7.1<br />

3 −5602.1 9.7 −5600.3 8.5 −5603.7 9.3<br />

4 −5628.5 2.1 −5599.9 7.7 −5611.1 9.1<br />

5 −5627.4 3.5 −5599.9 7.8 −5616.8 7.7<br />

6 −5628.8 0.8 −5600.2 8.1 −5622.7 7.5<br />

conv no conv conv<br />

In the previous implementation of the TRSCF algorithm,<br />

the focus was on the optimization of the Hartree–Fock energy.<br />

As the Kohn–Sham energy is nonquadratic in the density<br />

matrix, the local DSM energy model has been generalized<br />

and is now expanded about the current-density matrix<br />

D 0 in the subspace of the density matrices D i of the previous<br />

iterations. To satisfy the idempotency condition, the energy<br />

model function is parametrized in terms of a purified averaged<br />

density matrix. The local energy function is correct to<br />

second order in D i −D 0 and can be set up solely in terms of<br />

the density matrices and Kohn–Sham matrices of the previous<br />

iterations. In the Hartree–Fock theory, the new local energy<br />

model is identical to the one previously used in TRSCF<br />

optimizations.<br />

The EDIIS function is discussed in the context of the<br />

proposed model. In the Hartree–Fock theory, the EDIIS function<br />

is obtained from our proposed energy function by neglecting<br />

terms that result from the purification of the density<br />

matrix; the EDIIS function therefore does not reproduce the<br />

Hartree–Fock gradient at the expansion point. In the DFT,<br />

the EDIIS function is inappropriate for other reasons as well.<br />

A rederivation of the original DIIS algorithm is also performed<br />

to understand when it can safely be applied. In particular,<br />

it is shown that the DIIS method may be viewed as a<br />

quasi-Newton method, thus explaining its fast local convergence.<br />

In the global region, its behavior is less predictable,<br />

although we note that its gradient-norm minimization mechanism<br />

usually allows it to recover safely from sudden, large<br />

increases in the total energy brought on by the Roothaan–<br />

Hall iterations.<br />

The TRSCF scheme is tested both in a computationally<br />

demanding, robust line-search implementation TRSCF-LS,<br />

and in our standard implementation, where only the Fock/<br />

Kohn–Sham matrices of previous iterations are used. Our<br />

test calculations indicate not only that the TRSCF-LS<br />

method is a highly stable and robust method, but also that the<br />

standard TRSCF implementation converges rapidly in most<br />

cases, with little degradation relative to the TRSCF-LS<br />

scheme.<br />

Relative to these schemes, the DIIS method is somewhat<br />

more erratic since it makes no use of Hessian information<br />

and therefore cannot predict reliably what directions will reduce<br />

the total energy. For example, in situations where the<br />

energy changes in the course of the iterations but the gradient<br />

does not, the DIIS algorithm is unable to identify the density<br />

matrix with the lowest energy and may diverge. Nevertheless,<br />

the DIIS method handles most optimizations amazingly<br />

well, which is particularly impressive in view of its very<br />

simplicity; never has so few lines of code done so much<br />

good for so many calculations. In general, however, it is<br />

outperformed by the TRSCF method, which introduces Hessian<br />

information at little extra cost, and is well founded in<br />

the global as well as local regions of the optimization.<br />

The current formulation of TRSCF requires a few diagonalizations<br />

in each TRRH step, and to obtain linear scaling<br />

these diagonalizations should be avoided. An even more efficient<br />

algorithm may be obtained if the Roothaan–Hall and<br />

DSM steps are integrated in such a manner that the information<br />

from the previous density matrices are directly used in<br />

the Roothaan–Hall optimization step. Work along these lines<br />

is in progress.<br />

ACKNOWLEDGMENTS<br />

We thank Peter Taylor, Ditte Jørgensen, and Stephan<br />

Sauer for providing some of the test examples. This work has<br />

been supported by the Danish Natural Research Council. We<br />

also acknowledge support from the Danish Center for Scientific<br />

Computing DCSC.<br />

1 C. C. J. Roothaan, Rev. Mod. Phys. 23, 691951.<br />

2 G. G. Hall, Proc. R. Soc. London A205, 541 1951.<br />

3 P. Pulay, Chem. Phys. Lett. 73, 393 1980; J. Comput. Chem. 3, 556<br />

1982.<br />

4 K. N. Kudin, G. E. Scuseria, and E. Cancès, J. Chem. Phys. 116, 8255<br />

2002.<br />

5 G. Karlström, Chem. Phys. Lett. 67, 348 1979.<br />

6 L. Thøgersen, J. Olsen, D. Yeager, P. Jørgensen, P. Sałek, and T. Helgaker,<br />

J. Chem. Phys. 121, 162004.<br />

7 R. Fletcher, Practical Methods of Optimization, 2nd ed. Wiley, New<br />

York, 1987.<br />

8 V. R. Saunders and I. H. Hillier, Int. J. Quantum Chem. 7, 6991973.<br />

9 J. B. Francisco, J. M. Martínez, and L. Martínez, J. Chem. Phys. 121, 22<br />

2004.<br />

10 W. Koch and M. C. Holthausen, A Chemist’s Guide to Density Functional<br />

Theory Wiley-VCH, Weinheim, 2000.<br />

11 T. Helgaker, P. Jørgensen, and J. Olsen, Molecular Electronic Structure<br />

Theory Wiley & Son, ltd., Chichester, 2000.<br />

Downloaded 23 Aug 2005 to 130.225.22.254. Redistribution subject to AIP license or copyright, see http://jcp.aip.org/jcp/copyright.jsp


074103-17 Trust-region self-consistent-field J. Chem. Phys. 123, 074103 2005<br />

12 R. Seeger and J. A. Pople, J. Chem. Phys. 65, 265 1976.<br />

13 G. B. Bacskay, Chem. Phys. 61, 385 1981; J. Phys. France 35, 639<br />

1982.<br />

14 P. Jørgensen, P. Swanstrøm, and D. Yeager, J. Chem. Phys. 78, 347<br />

1983.<br />

15 R. McWeeny, Rev. Mod. Phys. 32, 335 1960.<br />

16 X. P. Li, W. Nunes, and D. Vanderbilt, Phys. Rev. B 47, 10891 1993.<br />

17 J. M. Millam and G. E. Scuseria, J. Chem. Phys. 106, 5569 1997.<br />

18 C. Ochsenfeld and M. Head-Gordon, Chem. Phys. Lett. 270, 3391997.<br />

19 X. Li, J. M. Millam, G. E. Scuseria, M. J. Frisch, and H. B. Schlegel, J.<br />

Chem. Phys. 119, 7651 2003.<br />

20 T. Helgaker, H. J. Jensen, P. Jørgensen et al., DALTON, a molecular electronic<br />

structure program, Release 2.0, 2004; http://www.kjemi.uio.no/<br />

software/dalton<br />

21 A. Schäfer, H. Horn, and R. Ahlrichs, J. Chem. Phys. 97, 2571 1992.<br />

22 T. H. Dunning, J. Chem. Phys. 90, 10071989.<br />

Downloaded 23 Aug 2005 to 130.225.22.254. Redistribution subject to AIP license or copyright, see http://jcp.aip.org/jcp/copyright.jsp


Part 3<br />

A Coupled Cluster and Full Configuration Interaction Study of CN and CN - ,<br />

L. Thøgersen and J. Olsen,<br />

Chem. Phys. Lett. 393, 36 (2004)


Chemical Physics Letters 393 (2004) 36–43<br />

www.elsevier.com/locate/cplett<br />

A coupled cluster and full configuration interaction<br />

study of CN and CN<br />

Lea Thøgersen, Jeppe Olsen *<br />

Department of Chemistry, Theoretical Chemistry, University of Aarhus, DK-8000 Aarhus, Denmark<br />

Received 30 April 2004; in final form 27 May 2004<br />

Abstract<br />

Full configuration interaction (FCI) and coupled cluster (CC) calculations are carried out for the CN radical and CN using the<br />

cc-pVDZ and an augmented cc-pVDZ basis set. In addition, CC calculations including up to quadruple excitations are carried out<br />

using the cc-pVTZ basis. At the FCI level, the equilibrium distance is 1.1969 A, the harmonic frequency is 2020.1 cm 1 , the<br />

electronic contribution to the atomization energy is 667 kJ/mol and the vertical electron affinity is 0.12962 E h . The contributions<br />

from quadruple and quintuple excitations to the harmonic frequency are found to be 20 and 5 cm 1 , respectively. The quadruple<br />

excitations give a contribution of 4 kJ/mol to the atomization energy and 0.00013 E h to the vertical electron affinity. None of the<br />

calculations indicate that the convergence of the CC hierarchy is slower for open-shell than for closed-shell systems.<br />

Ó 2004 Elsevier B.V. All rights reserved.<br />

1. Introduction<br />

* Corresponding author. Fax: +45-861-961-99.<br />

E-mail address: jeppe@chem.au.dk (J. Olsen).<br />

The last decade has witnessed significant improvements<br />

in the reliability of ab initio quantum chemical<br />

predictions of spectroscopical and thermochemical data.<br />

For closed shell molecules, equilibrium geometries [1],<br />

harmonic frequencies [2] and reaction enthalpies [3,4]<br />

may often be calculated with an accuracy that is equal to<br />

or better than the experimental accuracy. Of central<br />

importance for this development has been the developments<br />

of hierarchies of basis sets [5], and CC methods<br />

[6–8]. The coupled cluster (CC) method mostly used for<br />

accurate calculations is the CCSD(T) method [9] which<br />

augments the CC method including single and double<br />

excitations (CCSD) [10] with a perturbative estimate of<br />

triples contributions. For closed shell molecules, the<br />

CCSD(T) method often exaggerates the contributions<br />

from triple excitations [11]. As the signs of the triple and<br />

quadruple corrections usually are identical, CCSD(T)<br />

often gives results that are better than the CC method<br />

including all single, double, and triple excitations<br />

(CCSDT). The CCSD(T) method therefore often provides<br />

results in surprisingly good agreement with the<br />

much more expensive CC method including up to quadruple<br />

excitations (CCSDTQ) [12]. Using triple-f basis<br />

sets, the CCSD(T) method is especially accurate for<br />

properties like internuclear distances and frequencies, as<br />

the remaining basis-set errors and correlation errors<br />

here usually are of opposite signs [1].<br />

For open-shell molecules, CC methods with and<br />

without spin-adaptation have been developed [7,13], and<br />

the accuracy of CC calculations often matches the accuracy<br />

obtained for closed shell molecules. In a study of<br />

the atomization energies of 11 small molecules [2], Feller<br />

and Sordo did not observe any systematic difference<br />

between the accuracies obtained for closed- and openshell<br />

molecules when the CCSDT method is used. The<br />

performance of methods including perturbative estimates<br />

of triple excitations as the CCSD(T) method is<br />

less convincing for open-shell molecules. In a systematic<br />

study of the performance of the CCSD(T) method for<br />

the calculation of spectroscopical constants for 33 small<br />

radicals [14], it was observed that the CCSD(T) method<br />

did not provide constants that were significant more<br />

accurate than those obtained with the CCSD method.<br />

Several workers have suggested other methods combining<br />

CCSD with the perturbative treatment of triple<br />

0009-2614/$ - see front matter Ó 2004 Elsevier B.V. All rights reserved.<br />

doi:10.1016/j.cplett.2004.06.001


L. Thøgersen, J. Olsen / Chemical Physics Letters 393 (2004) 36–43 37<br />

excitations, but these alternative corrections do not<br />

systematically perform better than the CCSD(T) method<br />

[15].<br />

The Schr€odinger equation within the Born–Oppenheimer<br />

approximation may be solved in a given oneelectron<br />

basis set using full configuration interaction<br />

(FCI) calculations. In an FCI calculation, the wave<br />

function includes all Slater determinants with correct<br />

spin, symmetry and number of electrons. For a given<br />

basis-set, FCI calculations eliminate the error due to<br />

truncation of the many-electron basis, and provide<br />

therefore important benchmarks for approximate orbital-based<br />

methods. As the number of determinants in<br />

the FCI expansion increase exponentially with the<br />

number of basis functions and electrons, FCI calculations<br />

may only be carried out for small molecules using<br />

basis sets of double- or triple-f quality. For small closed<br />

shell molecules, a number of FCI calculations have been<br />

published [16,17], and these have given additional insight<br />

into the accuracy of standard correlation methods.<br />

For open-shell molecules, the number of FCI calculations<br />

is more limited. Except for a recent FCI investigation<br />

of the geometry of the CCH radical [18], no FCI<br />

calculations have been published for open-shell molecules<br />

with eight or more valence electrons using a correlation-consistent<br />

basis-set [5]. The present study fills<br />

this gab by providing an FCI benchmark for the openshell<br />

molecule CN using the cc-pVDZ basis [5]. This<br />

molecule is sufficiently small to allow FCI calculations<br />

at numerous geometries, allowing the determination of<br />

the FCI results for the equilibrium bond length, harmonic<br />

frequency, and dissociation energy, as well as the<br />

complete potential curve. We will furthermore study the<br />

convergence of the CC energy as a function of the excitation-level<br />

to see if an open-shell molecule exhibits the<br />

same convergence pattern as previously determined for<br />

closed-shell molecules [19–23]. The vertical electron affinity<br />

will also be examined using CC and FCI calculations.<br />

As the cc-pVDZ basis does not provide accurate<br />

geometries or energetics [8], we will obtain the equilibrium<br />

geometry, harmonic frequency and dissociation<br />

energy using the cc-pVTZ basis set [5] and CC calculations<br />

including up to quadruple excitations. We hope<br />

that the data obtained here will assist in the analysis of<br />

the accuracy of various open-shell perturbation and CC<br />

methods, and especially the methods supplementing<br />

CCSD with perturbative estimates of triple excitations.<br />

2. Computational methods<br />

The FCI and CC calculations were carried out using<br />

the LUCIA<br />

program [24]. The algorithms for performing<br />

configuration interaction calculations are based on extensive<br />

modifications of the algorithms originally published<br />

in [25]. The CC code allows arbitrary excitation<br />

levels out from a single closed shell or high-spin open<br />

shell determinant. In contrast to the initial general CC<br />

codes [19], the present codes [26] exhibit the same scaling<br />

as the standard spin–orbital codes using explicitly coded<br />

contractions. Another set of general CC codes with the<br />

right scaling has been developed by Kallay and coworkers<br />

[20,21], and a less efficient general CC code has<br />

been developed by Hirata and Bartlett [22].<br />

All calculations kept the lowest two sigma-orbitals,<br />

corresponding to 1s(C) and 1s(N), doubly occupied. The<br />

open-shell configuration interaction and CC calculations<br />

used orbitals from restricted Hartree–Fock calculations.<br />

No spin-adaptation was done in the open-shell<br />

CC calculations. The integrals and HF-orbitals were<br />

obtained using the DALTON<br />

program [27].<br />

In the following, the different spaces of determinants<br />

or excitations are denoted SD, SDT, SDTQ, SDTQ5,<br />

SDTQ56, SDTQ567 for the spaces including up to<br />

2,3,4,5,6,7 excitations from the occupied spin–orbitals.<br />

For open-shell molecules, an alternative way of classifying<br />

excitations is to consider changes in orbital-occupations<br />

instead of spin–orbital occupations [28]. All CI<br />

calculations in the following are based on changes of<br />

orbital-occupations, whereas we will discuss CC calculations<br />

based on both divisions of excitations. Excitation<br />

spaces based on changes of spin–orbital occupations will<br />

be denoted (spin–orb), whereas the spaces based on<br />

changes of orbital occupations will be denoted (orb).<br />

Thus, the CCSD(spin–orb) excitation space contains all<br />

single and double spin–orbital excitations.<br />

Using the cc-pVDZ basis FCI, CI and CC calculations<br />

were carried out. To examine the contributions<br />

from quadruple excitations in a larger basis, CCSD,<br />

CCSDT, and CCSDTQ calculations were performed<br />

with the cc-pVTZ basis. For calculations of the electron<br />

affinity, the aug-cc-pVDZ [29] basis set without diffuse<br />

d-functions was used for CN and CN . The latter basis<br />

is in the following called the aug 0 -cc-pVDZ basis.<br />

3. Results<br />

3.1. Convergence of CC and CI at the experimental<br />

equilibrium geometry<br />

At the experimental equilibrium distance (1.1718 A)<br />

[30], the FCI wave function and energy was obtained<br />

with an energy convergence threshold of 10 9 E h . The<br />

FCI energy was obtained as )92.493262415 E h . At the<br />

same internuclear distance, single reference CI and CC<br />

energies were obtained with excitation levels from 2 to 7.<br />

In Table 1, we give the deviations of the CI, CC(orb)<br />

and CC(spin–orb) energies from the FCI energy. Fig. 1<br />

is a single-logarithmic plot of these deviations.<br />

The coupled-cluster energies using orbital-occupations<br />

to define the excitation level are slightly below the


38 L. Thøgersen, J. Olsen / Chemical Physics Letters 393 (2004) 36–43<br />

Table 1<br />

Deviations of single reference CI- and CC-energies (E h ) from the FCI energy for CN<br />

Largest exc. level E CI E FCI E CC ðorbÞ E FCI E CC ðspin–orbÞ E FCI<br />

2 0.038240 0.015534 0.016517<br />

3 0.022604 0.001563 0.001637<br />

4 0.002391 0.000207 0.000230<br />

5 0.000583 0.000019 0.000021<br />

6 0.000031 0.000001 0.000002<br />

7 0.000002 – –<br />

0.1<br />

0.01<br />

Coupled Cluster(spin-orb)<br />

Coupled cluster(orb)<br />

Configuration Interaction<br />

Deviation from FCI energy<br />

0.001<br />

0.0001<br />

1e-05<br />

1e-06<br />

2 3 4 5 6<br />

Excitation level<br />

Fig. 1. The deviations (E h ) of CI and CC energies from the FCI energy as a function of excitation level for CN using the cc-pVDZ basis set.<br />

energies using the smaller spaces based on spin–orbital<br />

occupations. However, the differences between the two<br />

choices are not significant compared to the deviations<br />

from the FCI energy. Up to CCSDTQ5, the differences<br />

between the two forms constitute at most 10% of the<br />

deviation from the FCI energy. For the CCSDTQ56<br />

expansions, the large difference between the two deviations<br />

in Table 1 is caused by roundoff errors. Including<br />

an additional digit, the CCSDTQ56 deviations are<br />

0.0000015 and 0.0000013 E h for the spin–orbital and<br />

orbital based divisions, respectively.<br />

The CI-curves exhibit the behavior predicted by<br />

perturbation theory [31]: the even-order excitations give<br />

significantly larger reductions in the deviations than the<br />

odd-order excitations. For CC expansions, perturbation<br />

theory also predicts that adding even order excitations<br />

give larger reductions in the deviations than adding odd<br />

order excitations [8,31]. This is not observed in Fig. 1, as<br />

the deviations of the CC energies nearly form straight<br />

lines. Comparing the convergence of the CI and CC<br />

hierarchies, it is observed that the CCSDT deviation is<br />

slightly smaller than the CISDTQ error, and that the CC<br />

energy obtained using up to n-fold excitations is as accurate<br />

as the CI energy using up to n þ 1 fold excitations,<br />

but less accurate than the CI energy using up to<br />

n þ 2 fold excitations. To obtain an accuracy of 1 mE h<br />

or less, one must include up to quadruple excitations for<br />

the CC expansion, and up to quintuple excitations for<br />

the CI expansion.<br />

The convergence patterns for CI and CC discussed<br />

above are very similar to the convergence patterns previously<br />

reported for N 2 [23]. The similarity between the<br />

convergences of CN and N 2 is more than qualitative. If<br />

one combines the deviation curve for N 2 [23] with the<br />

present deviation curve for CN in a single figure, the two<br />

deviation curves are virtually identical. The deviations<br />

of the CCSDT energies are thus 0.00156 E h and 0.00163<br />

E h for CN and N 2 , respectively, and for a given excitation<br />

level the deviations for CN and N 2 differ by at<br />

most 10%.


L. Thøgersen, J. Olsen / Chemical Physics Letters 393 (2004) 36–43 39<br />

From the above comparisons, it may be concluded,<br />

that the open-shell nature of CN does not lead to slower<br />

convergence of the CC hierarchy than previously observed<br />

for N 2 . However, it should be noted, that the<br />

convergence of the CC hierarchy for N 2 is rather slow<br />

compared to the convergence observed for e.g., H 2 O [19]<br />

and F 2 .<br />

3.2. The potential curve for CN<br />

FCI calculations were carried out at a number of<br />

internuclear distances. To obtain accurate spectroscopic<br />

constants, the CC energies were converged to 10 9 E h<br />

Table 2<br />

FCI energies (E h ) as a function of internuclear distance ( A) for CN<br />

using the cc-pVDZ basis<br />

R E R E<br />

0.9 )92.169732 1.2118 )92.494065103<br />

1.0 )92.384032 1.2169 )92.493765608<br />

1.0918 )92.469313943 1.2369 )92.491837833<br />

1.1318 )92.485677652 1.2518 )92.489691918<br />

1.1518 )92.490432414 1.30 )92.479361<br />

1.1569 )92.491327096 1.40 )92.447147<br />

1.1718 )92.493262415 1.50 )92.408657<br />

1.1769 )92.493704946 1.60 )92.370048<br />

1.1869 )92.494267979 1.7577 )92.316388<br />

1.1918 )92.494402963 2.05065 )92.255688<br />

1.1919 )92.494404785 2.3436 )92.241450<br />

1.1969 )92.494449358 2.9295 )92.240346<br />

1.2019 )92.494404774 3.5154 )92.239697<br />

1.2069 )92.494274026<br />

for internuclear distances close to the experimental value.<br />

For the remaining geometries the energy was converged<br />

to 10 6 E h . The obtained FCI energies are listed<br />

in Table 2. The graph of the FCI potential curve is given<br />

in Fig. 2.<br />

To associate the various internuclear distances with a<br />

degree of bond-breaking it is useful to examine the coefficient<br />

of the Hartree–Fock determinant in the FCI<br />

wave-function. Around the equilibrium geometry, the<br />

weight of the HF-determinant is about 0.92. Increasing<br />

the internuclear distance leads to a steady lowering of<br />

this weight and at 1.3 and 1.8 A, the weights are 0.79<br />

and 0.57, respectively. From 1.8 to 2.0 A the weight<br />

drops sharply so the weight at 2.0 A is 0.25 and at 2.5 A<br />

less than 0.04. We may therefore say that the bond is<br />

half broken at 1.8 A and broken at 2.5 A.<br />

In addition to FCI calculations, CCSD(orb),<br />

CCSDT(orb) and CCSDTQ(orb) calculations were<br />

performed at the various internuclear distances up to 1.8<br />

A. Although it is possible to converge the CC equations<br />

for larger distances, we find this of less interest, due to<br />

the breakdown of the single-reference approximation. In<br />

Fig. 3, we plot the deviations of the CCSDT and<br />

CCSDTQ energies from the FCI energy, and in Table 3,<br />

we list the non-parallelity error (NPE), i.e., the difference<br />

between the largest and smallest deviation from the<br />

FCI energy.<br />

At the equilibrium distance, both deviation curves in<br />

Fig. 3 have a positive curvature. For internuclear distances<br />

larger than the equilibrium distance, both the<br />

CCSDT and CCSDTQ deviation curves are nearly<br />

-92.15<br />

-92.20<br />

-92.25<br />

FCI energy<br />

-92.30<br />

-92.35<br />

-92.40<br />

-92.45<br />

-92.50<br />

0.5 1 1.5 2 2.5 3 3.5 4<br />

Internuclear distance<br />

Fig. 2. The FCI potential curve for CN using the cc-pVDZ basis. The energies are in Hartrees and the inter-nuclear distances are in A.


40 L. Thøgersen, J. Olsen / Chemical Physics Letters 393 (2004) 36–43<br />

0.007<br />

0.006<br />

CCSDT<br />

CCSDTQ<br />

Deviation from FCI energy<br />

0.005<br />

0.004<br />

0.003<br />

0.002<br />

0.001<br />

0<br />

0.9 1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8<br />

Internuclear distance<br />

Fig. 3. The difference between the CCSDT and CCSDTQ energies and the FCI energy for CN using the cc-pVDZ basis. The energies are in Hartrees<br />

and the inter-nuclear distances are in A.<br />

Table 3<br />

Non-parallelity error (NPE) (E h ) for CCSD, CCSDT, and CCSDTQ<br />

Method<br />

NPE<br />

CCSD 0.042326<br />

CCSDT 0.006355<br />

CCSDTQ 0.001742<br />

linear functions of the internuclear distance. Actually,<br />

the slope of the CCSDT deviation is smaller for larger<br />

internuclear distances than for the equilibrium distance.<br />

The analogous CCSDT- and CCSDTQ-curves for the<br />

nitrogen molecule exhibit maxima for an internuclear<br />

distance around 1.5 A, (3 au) [23].<br />

3.3. Spectroscopical constants for CN<br />

Equilibrium geometries and harmonic frequencies<br />

were obtained for the CCSD, CCSDT, CCSDTQ and<br />

FCI methods using quartic interpolation of the energies.<br />

The harmonic frequency for a given method was evaluated<br />

at the equilibrium geometry of this method. In<br />

Table 4 we list the obtained equilibrium distances and<br />

frequencies. In addition, the table contains the CCSD,<br />

CCSDT and CCSDTQ results for the cc-pVTZ basis.<br />

We will first discuss the results obtained using the ccpVDZ<br />

basis. The CC calculations using orbital-based<br />

excitation spaces are slightly more accurate than those<br />

using spin–orbital-based excitation spaces, but the differences<br />

are small compared to the size of the deviations.<br />

We will therefore, discuss only the spin–orbital based<br />

Table 4<br />

Equilibrium distance ( A) and harmonic frequency (cm 1 ) for CN<br />

CCSD(orb) cc-pVDZ 1.1860 2111<br />

CCSDT(orb) cc-pVDZ 1.1946 2043<br />

CCSDTQ(orb) cc-pVDZ 1.1964 2025<br />

CCSD(spin–orb) cc-pVDZ 1.1855 2114<br />

CCSDT(spin–orb) cc-pVDZ 1.1944 2046<br />

CCSDTQ(spin–orb) cc-pVDZ 1.1964 2026<br />

FCI cc-pVDZ 1.1969 2020.1<br />

CCSD(spin–orb) cc-pVTZ 1.1688 2136<br />

CCSDT(spin–orb) cc-pVTZ 1.1783 2067<br />

CCSDTQ(spin–orb) cc-pVTZ 1.1804 2045<br />

Expt. 1.1718 2069<br />

excitation spaces. Since the deviation curves for the CC<br />

energies are increasing functions, the CC equilibrium<br />

distances are necessarily shorter than the FCI equilibrium<br />

distance. The causes of the errors of the harmonic<br />

frequencies will be discussed in detail below. At the<br />

CCSD level, the distance is 0.01 A shorter than the FCI<br />

value and the harmonic frequency is about 90 cm 1<br />

larger than the FCI value, stressing the inaccuracy of<br />

this method for predicting equilibrium properties. The<br />

errors are significantly reduced by the CCSDT method<br />

with errors of 0.0025 A and 26 cm 1 for the equilibrium<br />

distance and frequency, respectively. The errors are<br />

further reduced by about a factor of five by using the<br />

CCSDTQ instead of the CCSDT method. At the<br />

CCSDTQ level, the equilibrium geometry is only 0.0005<br />

R eq<br />

x e


L. Thøgersen, J. Olsen / Chemical Physics Letters 393 (2004) 36–43 41<br />

A smaller than the FCI value, but the frequency is 6<br />

cm 1 too large. The deviations of the various CC<br />

methods obtained here for CN are very similar to the<br />

previously obtained deviations for N 2 . Thus, it has<br />

previously been reported that the contribution from<br />

connected quadruple excitations to the harmonic frequency<br />

for this molecule [23,32] is 20 cm 1 .<br />

It is currently not feasible to obtain FCI energies for<br />

CN in the cc-pVTZ basis with an accuracy that is<br />

sufficient to obtain the frequency with an accuracy of<br />

1cm 1 or less. One can instead estimate the convergence<br />

by examining the changes in the constants through the<br />

CC hierarchy. It is seen from Table 4 that the changes<br />

between the CCSDT and CCSDTQ results are very<br />

similar in the cc-pVDZ and cc-pVTZ basis sets. In both<br />

basis sets, the quadruple excitations increase the distance<br />

by 0.0020 A and reduce the harmonic frequency<br />

by about 20 cm 1 . This suggests that it may be feasible<br />

to obtain the quadruple corrections to these constants in<br />

rather small basis sets. It should be noted that although<br />

the quadruple corrections to the properties are rather<br />

constant, the quadruples corrections to the raw energies<br />

are very different in the two basis sets.<br />

The errors of the harmonic frequencies arise from<br />

two sources. First of all, the positive curvatures of the<br />

CC deviation curves around the equilibrium geometries<br />

lead to CC frequencies that are larger than the FCI<br />

frequency. Furthermore, as the third derivative of the<br />

energies with respect to the distance in general is large<br />

and negative, the somewhat shorter internuclear distances<br />

obtained with the CC methods than with FCI<br />

lead also to frequencies that are too large. These two<br />

sources of errors may be analyzed in the cc-pVDZ basis<br />

by evaluating the CC frequencies at the FCI equilibrium<br />

geometry. For the orbital based methods one then obtains<br />

the frequencies 2035, 2027 and 2022 cm 1 for the<br />

CCSD, CCSDT and CCSDTQ methods. Whereas, the<br />

CCSDT frequency evaluated at the optimized CCSDT<br />

distance deviates from the FCI frequency by 23 cm 1 ,<br />

the CCSDT frequency evaluated at the FCI geometry<br />

thus deviates by only 7 cm 1 . Although the errors connected<br />

with the positive curvatures of the deviation<br />

curves are not vanishing, the major errors of the frequencies<br />

seem to arise from the errors of the equilibrium<br />

distances.<br />

The experimental values for the equilibrium distance<br />

and the harmonic frequency are 1.1718 A and 2069<br />

cm 1 , respectively, [30]. Comparing the results obtained<br />

using the cc-pVTZ basis to the experimental values, it is<br />

observed that the CCSDT results are in better agreement<br />

with experiment than the CCSDTQ results. A<br />

better estimate of the importance of the quadruples<br />

corrections may be obtained using CCSDT results for<br />

large basis sets. Feller and Sordo [2] have calculated the<br />

CCSDT spectroscopic constants for CN using the augcc-pVQZ<br />

basis and obtained the equilibrium distance<br />

1.1739 A and the harmonic frequency of 2082 cm 1 .<br />

Adding our quadruples correction to these CCSDT results<br />

gives an equilibrium geometry of 1.1759 A and a<br />

harmonic frequency of 2060 cm 1 . To obtain spectroscopic<br />

constants that are significantly more accurate<br />

than the CCSDT results, other corrections, most important<br />

core-correlation contributions, must be included<br />

together with the quadruple excitations.<br />

3.4. Atomization energy<br />

It has previously been reported that quadruple and<br />

even quintuple excitations may be important to obtain<br />

atomization energies with high accuracy [3,4,12] In<br />

Table 5, we list the atomization energies using the<br />

CCSD, CCSDT, CCSDTQ, and FCI approaches with<br />

the cc-pVDZ basis and the CCSD, CCSDT, and<br />

CCSDTQ approaches with the cc-pVTZ basis set. All<br />

molecular calculations were carried out at the experimental<br />

equilibrium distance.<br />

It is again noticed that there are no significant difference<br />

between the results obtained using the CC(orb)<br />

and CC(spin–orb) approaches. The two approaches<br />

differ by only 0.1 kJ/mol at the CCSDT and CCSDTQ<br />

levels.<br />

The quadruple excitations change the atomization<br />

energy by 4 kJ/mol with both the cc-pVDZ and the ccpVTZ<br />

basis sets. These results are in agreement with<br />

previous calculations of the contributions from connected<br />

quadruple excitations [4]. From the difference<br />

between CCSDTQ and the FCI atomization energy, it is<br />

seen that the quintuple excitations contribute 0.5 kJ/mol<br />

to the atomization energy. The above contribution from<br />

quadruple and quintuple excitations are very similar to<br />

the results previously reported for N 2 [3]. The contribution<br />

from higher excitations to the atomization energy<br />

of CN has previously been studied by Feller and Sordo<br />

[2]. They obtained a significantly smaller contribution<br />

from quadruple excitations, 0.3 kcal/mol or 1.2 kJ/mol.<br />

There are several experimental measurements of the<br />

atomization energies, and Feller and Sordo [2] quotes<br />

Table 5<br />

The electronic contribution to the dissociation energy (kJ/mol) for CN<br />

CCSD(orb) cc-pVDZ 631.6<br />

CCSDT(orb) cc-pVDZ 663.0<br />

CCSDTQ(orb) cc-pVDZ 666.5<br />

CCSD(spin–orb) cc-pVDZ 629.2<br />

CCSDT(spin–orb) cc-pVDZ 662.9<br />

CCSDTQ(spin–orb) cc-pVDZ 666.4<br />

FCI cc-pVDZ 667.0<br />

CCSD(spin–orb) cc-pVTZ 674.2<br />

CCSDT(spin–orb) cc-pVTZ 714.4<br />

CCSDTQ(spin–orb) cc-pVTZ 718.5<br />

D e


42 L. Thøgersen, J. Olsen / Chemical Physics Letters 393 (2004) 36–43<br />

values in the range 745–762 kJ/mol for the experimental<br />

electronic contribution. Adding our estimate of quadruples<br />

correction to the estimated CCSDT limit of 748<br />

kJ/mol result of Feller and Sordo gives a value of 752 kJ/<br />

mol for the electronic atomization energy for CN.<br />

3.5. The vertical electron affinity<br />

An FCI calculation for the CN anion using the aug 0 -<br />

cc-pVDZ basis was carried out at the experimental<br />

equilibrium geometry of the radical. The FCI calculation<br />

contains about 20 billion Slater determinants and<br />

sparsity of the CI-vectors was only used to reduce discstorage,<br />

not computation time. This FCI calculation<br />

represents one of the largest FCI calculations we hitherto<br />

have carried out. The FCI energy for the anion was<br />

obtained as )92.627391(2) E h . Combining this energy<br />

with the FCI energy of )92.497766 E h for the radical in<br />

the same basis set leads to an FCI value of 0.12962 E h<br />

for the vertical electron affinity. CC expansions using<br />

spin–orbital occupations for restrictions of excitations<br />

were also carried out for the radical and the anion in the<br />

aug 0 -cc-pVDZ basis and the resulting electron affinities<br />

are given in Table 6.<br />

As the differences between the CC calculations using<br />

orbital and spin–orbital restrictions already have been<br />

shown to be small, no orbital-restricted calculations<br />

were carried out. Already at the CCSD level, the calculated<br />

electron affinity differs from the FCI affinity by<br />

less than 1 mE h , and at the CCSDT level the calculated<br />

electron affinity differs from the FCI result by less than<br />

0.1 mE h . The deviations of the CC energies from the<br />

FCI energies for the radical and the anion are also listed<br />

in Table 6. It is seen that the high accuracy of the CC<br />

affinities is caused by cancellation of the errors of the<br />

radical and anion – the deviation of the affinity is<br />

roughly an order of magnitude smaller than the deviation<br />

of the individual energies. It is also interesting to see<br />

that the electron affinity converges from above – the CC<br />

affinities are larger than the FCI affinity. As seen from<br />

the other columns of Table 6, the CC expansion converges<br />

slightly faster for the anion than for the radical.<br />

The faster convergence of the anion may seem surprising<br />

as the anion contains one more electron than the radical<br />

but is probably caused by CN being slightly more<br />

multiconfigurational than the anion. The electron affinity<br />

of CN calculated using CC calculations in large<br />

basis sets has been the subject of several recent studies<br />

[33,34]. These studies also found small contributions to<br />

the electron affinity from triple excitations.<br />

4. Conclusion<br />

Full configuration interaction calculations using the<br />

cc-pVDZ basis and CC calculations using the cc-pVDZ<br />

and cc-pVTZ basis sets have been carried out for the CN<br />

radical at various geometries. Single reference configuration<br />

interaction calculations were also carried out<br />

using the cc-pVDZ basis at the experimental internuclear<br />

distance. At the CCSDT level, the energies differ<br />

from the FCI energy by 1.5 mE h , and at the CCSDTQ<br />

level, the energies are 0.2 mE h from the FCI energy. The<br />

CC energies converge toward the FCI energy in an approximately<br />

linear fashion with a decrease in the deviation<br />

by about a factor of 10 for each added excitation<br />

level. This is in contrast to an analysis based on perturbation<br />

theory, predicting that adding even orders<br />

give larger decreases in the deviations than adding odd<br />

orders. The observed convergence for CN in the ccpVDZ<br />

basis is very similar to the convergence previously<br />

reported for N 2 , indicating that the open-shell nature of<br />

CN does not affect the convergence. A comparison of<br />

the FCI and CC energies at various internuclear distances,<br />

reveals that the deviations of the CC approaches<br />

do not occur suddenly for large internuclear distances.<br />

The deviations are instead nearly linear functions of the<br />

internuclear distance.<br />

At the FCI level, the equilibrium geometry and harmonic<br />

frequency are obtained as 1.1969 A and 2020.1<br />

cm 1 , respectively. The CCSDT and CCSDTQ frequencies<br />

are 25 and 5 cm 1 above the FCI value, respectively.<br />

The quadruple corrections to both the<br />

equilibrium distance and the harmonic frequency were<br />

found to be nearly identical in the cc-pVDZ and ccpVTZ<br />

basis sets. The major errors of the CC frequencies<br />

come from the errors of the distances where these are<br />

evaluated.<br />

For the electronic contribution to the atomization<br />

energy, a value of 667.0 kJ/mol is obtained at the FCI<br />

level using the cc-pVDZ basis set. The CCSDT and<br />

CCSDTQ atomization energies are 4 and 0.5 kJ/mol<br />

below the FCI atomization energy, respectively. The<br />

quadruple contributions in the cc-pVDZ and cc-pVTZ<br />

Table 6<br />

The vertical electron affinity (E h ) of CN calculated in the aug 0 -cc-pVDZ basis<br />

EA EA EA FCI E CN EFCI CN<br />

E CN –EFCI<br />

CN<br />

CCSD(spin–orb) 0.13025 0.00063 0.01529 0.01466<br />

CCSDT(spin–orb) 0.12977 0.00014 0.00154 0.00140<br />

CCSDTQ(spin–orb) 0.12966 0.00003 0.00020 0.00016<br />

FCI 0.12962


L. Thøgersen, J. Olsen / Chemical Physics Letters 393 (2004) 36–43 43<br />

basis are determined as 3.5 and 4.1 kJ/mol, respectively,<br />

indicating that a reliable estimate of quadruple contributions<br />

may be obtained using rather small basis sets.<br />

The FCI vertical electron affinity is obtained in the<br />

aug 0 -cc-pVDZ basis as 0.12962 E h . Due to extensive<br />

cancellations of errors, the FCI affinity is accurately<br />

calculated at the CCSD and CCSDT levels with a contribution<br />

from quadruple and higher excitations of<br />

0.00014 E h . The CC hierarchy approaches the FCI affinity<br />

from above, as the deviations for the anion are<br />

slightly smaller than for the radical.<br />

Acknowledgements<br />

The work has been supported by the Danish Research<br />

Council (Grant No. 9901973). The calculations<br />

were carried out at the centre for supercomputing at<br />

University of Aarhus (CSCAA). The support from the<br />

Danish Centre for Supercomputing (DCSC) is gratefully<br />

acknowledged.<br />

References<br />

[1] F. Pawlowski, P. Jørgensen, J. Olsen, F. Hegelund, T. Helgaker,<br />

J. Gauss, K.L. Bak, J.F. Stanton, J. Chem. Phys. 116 (2002) 6482.<br />

[2] D. Feller, J.A. Sordo, J. Chem. Phys. 113 (2000) 485.<br />

[3] T. Helgaker, W. Klopper, A. Halkier, K.L. Bak, P. Jørgensen,<br />

J. Olsen, in: J. Cioslowski, (Ed.), Understanding Chemical<br />

Reactivity, vol. 22, Kluwer, Dordrecht, p. 1, 2001.<br />

[4] A.D. Boese, M. Oren, O. Atasoylu, J.M.L. Martin, M. Kallay,<br />

J. Gauss, J. Chem. Phys. 120 (2004) 4129.<br />

[5] T.H. Dunning Jr., J. Chem. Phys. 90 (1989) 1007.<br />

[6] R.J. Bartlett, in: D.R. Yarkony (Ed.), Modern Electronic Structure<br />

Theory, Part I, 1047, World Scientific, Singapore, 1995.<br />

[7] J. Paldus, X. Li, Adv. Chem. Phys. 110 (1999) 1.<br />

[8] T. Helgaker, P. Jørgensen, J. Olsen, Molecular Electronic-Structure<br />

Theory, Wiley, 2000.<br />

[9] K. Raghavachari, G.W. Trucks, J.A. Pople, M. Head-Gordon,<br />

Chem. Phys. Lett. 157 (1989) 479.<br />

[10] G.D. Purvis, R.J. Bartlett, J. Chem. Phys. 76 (1982) 1910.<br />

[11] K.L. Bak, P. Jorgensen, J. Olsen, T. Helgaker, W. Klopper,<br />

J. Chem. Phys. 112 (2000) 9229.<br />

[12] T.A. Ruden, T.U. Helgaker, P. Jørgensen, J. Olsen, Chem. Phys.<br />

Lett. 371 (2003) 62.<br />

[13] P.G. Szalay, J. Gauss, J. Chem. Phys. 107 (1997) 9028.<br />

[14] E.F.C. Byrd, D. Sherrill, M. Head-Gordon, J. Phys. Chem. A. 105<br />

(2001) 9736.<br />

[15] S.R. Gwaltney, M. Head-Gordon, J. Chem. Phys. 115 (2001)<br />

2014.<br />

[16] J. Olsen, P. Jørgensen, H. Koch, A. Balkova, R.J. Bartlett,<br />

J. Chem. Phys. 104 (1996) 8007.<br />

[17] H. Larsen, J. Olsen, P. Jørgensen, O. Christiansen, J. Chem. Phys.<br />

113 (2000) 6677.<br />

[18] P.G. Szalay, L.S. Thøgersen, J. Olsen, M. Kallay, J. Gauss,<br />

J. Phys. Chem. A. 105 (2004) 3030.<br />

[19] J. Olsen, J. Chem. Phys. 113 (2000) 7140.<br />

[20] M. Kallay, P.R. Surjan, J. Chem. Phys. 113 (2000) 1359.<br />

[21] M. Kallay, P.R. Surjan, J. Chem. Phys. 115 (2001) 2945.<br />

[22] S. Hirata, R.J. Bartlett, Chem. Phys. Lett. 321 (2000) 216.<br />

[23] J.W. Krogh, J. Olsen, Chem. Phys. Lett. 344 (2001) 578.<br />

[24] LUCIA, a general CI and CC code written by J. Olsen, University<br />

of Aarhus with contributions from H. Larsen, M. F€ulscher.<br />

[25] J. Olsen, B.O. Roos, P. Jørgensen, H.J.Aa. Jensen, J. Chem. Phys.<br />

89 (1988) 2185.<br />

[26] J. Olsen, unpublished.<br />

[27] T. Helgaker et al DALTON, an ab initio electronic structure<br />

program, Release 1.2. see http://www.kjemi.uio.no/software/dalton/dalton.html,<br />

2001.<br />

[28] X. Li, J. Paldus, J. Chem. Phys. 101 (1994) 8812.<br />

[29] R.A. Kendall, T.H. Dunning, R.J. Harrison, J. Chem. Phys. 96<br />

(1992) 6796.<br />

[30] K.P. Huber, G. Herzberg, Molecular Spectra and Molecular<br />

Structure V. Constants of Diatomic Molecules, Van Nostrand<br />

Reinhold, New York, 1979.<br />

[31] W. Kutzelnigg, Theoret. Chim. Acta. 80 (1991) 349.<br />

[32] S.A. Kucharski, J.D. Watts, R.J. Bartlett, Chem. Phys. Lett. 302<br />

(1999) 295.<br />

[33] P. Neogrady, M. Medved, I. Cernusak, M. Urban, Mol. Phys. 100<br />

(2002) 541.<br />

[34] J.A. Sordo, J. Chem. Phys. 114 (2001) 1974.


Part 3<br />

Equilibrium Geometry of the Ethynyl (CCH) Radical,<br />

P. G. Szalay, L. Thøgersen, J. Olsen, M. Kállay and J. Gauss,<br />

J. Phys. Chem. A 108, 3030 (2004).


3030 J. Phys. Chem. A 2004, 108, 3030-3034<br />

Equilibrium Geometry of the Ethynyl (CCH) Radical †<br />

Péter G. Szalay, ‡ Lea S. Thøgersen, § Jeppe Olsen, § Mihály Kállay, | and Ju1rgen Gauss* ,|<br />

Department of Theoretical Chemistry, EötVös Loránd UniVersity, H-1518 Budapest, P.O. Box 32, Hungary,<br />

Department of Chemistry, Aarhus UniVersity, DK-8000 Aarhus C, Denmark, and Institut für Physikalische<br />

Chemie, UniVersität Mainz, D-55099 Mainz, Germany<br />

ReceiVed: September 27, 2003; In Final Form: January 15, 2004<br />

The equilibrium geometry of the ethynyl (CCH) radical has been obtained using the results of high-level<br />

quantum chemical calculations and the available experimental data. In a purely quantum chemical approach,<br />

the best theoretical estimates (1.208 Å for r CC and 1.061-1.063 Å for r CH ) have been obtained from CCSD-<br />

(T), CCSDT, MR-AQCC, and full CI calculations with basis sets up to core-polarized pentuple-zeta quality.<br />

In a mixed theoretical-experimental approach, empirical equilibrium geometrical parameters (1.207 Å for<br />

r CC and 1.069 Å for r CH ) have been obtained from a least-squares fit to the experimental rotational constants<br />

of four isotopomers of CCH which have been corrected for vibrational effects using computed vibrationinteraction<br />

constants. These geometrical parameters lead to a consistent picture with remaining discrepancies<br />

between theory and experiment of 0.001 Å for the CC and 0.006-0.008 Å for the CH distances, respectively.<br />

The corresponding r s and r 0 geometries are shown not to be representative for the true equilibrium structure<br />

of CCH.<br />

I. Introduction<br />

Considerable effort has been devoted to the determination<br />

of the structure of the ethynyl (CCH) radical in its 2 Σ + electronic<br />

ground state from the experimental 1 and the theoretical side. 2-7<br />

Presently, experimental values for ground-state rotational constants<br />

(B 0 ) for four isotopomers of CCH have been determined.<br />

For CCH, a value of 43 674.528 94(115) MHz has been reported<br />

by Müller et al. 8 in agreement with earlier measured values. 9-11<br />

For 13 CCH and C 13 CH, values of 42 077.462(1) and 42 631.382-<br />

(1) MHz have been obtained by McCarthy et al. 12 in excellent<br />

agreement with a previous report of Bogey et al. 1,13 Finally,<br />

for the deuterated form CCD, a value of 36 068.0310(96) MHz<br />

has been reported by Bogey et al. 14<br />

On the basis of the available experimental rotational constants,<br />

Bogey et al. 1 determined a so-called substitution (r s ) structure.<br />

However, the obtained bond distances are not in satisfactory<br />

agreement with corresponding calculated equilibrium values; 2-7<br />

in particular, the CH distance was unusually short (1.046 Å vs<br />

calculated values of 1.062-1.070 Å). As has been already<br />

pointed out by Bogey et al., 14 the observed discrepancy is<br />

probably due to the large amplitude bending motion in CCH<br />

which is not adequately accounted for in the substitution<br />

approach 15 that provides the r s structure. Thus, determination<br />

of the true equilibrium geometry is necessary to get a reliable<br />

picture of the structure of the ethynyl radical.<br />

Although the available rotational constants form a solid basis<br />

for the experimental determination of the r 0 and r s geometry,<br />

respectively, there is not enough experimental information<br />

available to determine the equilibrium geometry. In particular,<br />

the vibrational contributions to the rotational constants, which<br />

in principle can be determined via the complete set of vibrationrotation<br />

interaction constants, 16 cannot be obtained from the<br />

available experimental data.<br />

† Part of the special issue “Fritz Schaefer Festschrift”.<br />

‡ Eötvös Loránd University.<br />

§ Aarhus University.<br />

| Universität Mainz.<br />

As has been suggested long ago by Pulay et al. 17 and more<br />

recently by others, 18,19 quantum chemical calculations can be<br />

used to provide the lacking information. With computed<br />

vibration-rotation interaction constants (R r ), it is possible to<br />

correct experimental rotation constants for vibrational effects<br />

and to obtain the corresponding equilibrium values<br />

B e ) B 0 + 1 ∑ R r (1)<br />

2 r<br />

with the sum running over all vibrational degrees of freedom.<br />

The accuracy of such a mixed experimental-theoretical (or<br />

empirical) procedure for the determination of equilibrium<br />

geometries has recently been investigated by Pawlowski et al. 20<br />

for a set of 18 closed-shell molecules. It was concluded in this<br />

study that errors in the determined empirical bond lengths are<br />

below 0.001 Å, if the vibrational corrections to the rotational<br />

constants are calculated at a sufficiently high level such as the<br />

coupled-cluster singles and doubles (CCSD) level 21 augmented<br />

by a perturbative treatment of triple excitations (CCSD(T)) 22<br />

together with the cc-pVQZ set from Dunning’s correlationconsistent<br />

basis-set hierarchy. 23 Although it is not clear whether<br />

the same accuracy can be achieved for open-shell systems, this<br />

combined experimental-theoretical procedure opens an interesting<br />

possibility for the determination of a reliable equilibrium<br />

geometry for CCH.<br />

Alternatively, accurate equilibrium geometries can be obtained<br />

via a purely theoretical approach. Such an approach can and<br />

should take advantage of existing hierarchies of methods for<br />

the treatment of electron correlation and establish basis-set<br />

convergence by using basis-set sequences such as, for example,<br />

the correlation-consistent sets developed by Dunning and coworkers.<br />

23,24 As has been shown by Helgaker et al. 25 and more<br />

recently also by Bak et al. 26 such a procedure can lead to an<br />

accuracy of 0.002-0.003 Å in bond distances if CCSD(T)<br />

calculations together with sufficiently large basis sets are carried<br />

out. Again, this conclusion is mainly valid for closed-shell<br />

10.1021/jp036885t CCC: $27.50 © 2004 American Chemical Society<br />

Published on Web 02/17/2004


Equilibrium Geometry of Ethynyl Radical J. Phys. Chem. A, Vol. 108, No. 15, 2004 3031<br />

molecules and needs to be checked for open-shell systems, for<br />

which some further complications are expected. 27,28 Concerning<br />

the use of multireference methods, a recent study on more than<br />

60 electronic (closed- and open-shell) states of various diatomic<br />

molecules found that approaches such as, for example, the<br />

multireference-averaged quadratic coupled-cluster (MR-AQCC)<br />

method, 29,30 provide bond distances with an accuracy close to<br />

0.001 Å. As multireference methods together with a careful<br />

selection of the reference space offer a well-balanced treatment<br />

for both open- and closed-shell molecules, such calculations<br />

should be considered useful complements to single-referencebased<br />

CC calculations.<br />

The aim of the present paper is to provide an accurate<br />

equilibrium geometry for the electronic ground state of the<br />

ethynyl radical by using both procedures outlined above. The<br />

accuracy and reliability of the theoretically determined values<br />

will be carefully investigated via benchmark calculations up to<br />

the full configuration interaction (FCI) level. Calculated vibrational<br />

corrections to the rotational constants are used to derive<br />

equilibrium geometrical parameters from the available experimental<br />

rotational constants. The accuracy achieved is judged<br />

by a comparison of the results obtained with the two procedures.<br />

II. Computational Methods<br />

Theoretical determinations of the equilibrium geometry of<br />

CCH have been carried out using various coupled-cluster (CC)<br />

approaches and, to investigate possible multireference effects,<br />

the multireference configuration interaction (MR-CI) and multireference-averaged<br />

quadratic coupled-cluster (MR-AQCC)<br />

methods.<br />

Using the CC ansatz, calculations have been performed at<br />

two levels beyond the coupled-cluster singles and doubles<br />

(CCSD) 21 approximation, namely, at the CCSD(T) level which<br />

includes connected triple excitations perturbatively on top of a<br />

CCSD calculation 22,31 and at the CCSDT level 32-34 which<br />

includes a full treatment of triple excitations. Both unrestricted<br />

Hartree-Fock (UHF) and restricted open-shell Hartree-Fock<br />

(ROHF) reference functions have been used in the CC calculations.<br />

The MR-AQCC method can be considered an approximately<br />

“extensive” version of the MR-CISD (multireference configuration<br />

interaction with single and double excitations) method.<br />

MR-AQCC and MR-CISD calculations have been carried out<br />

with different reference (active) spaces. The n e factor in the<br />

MR-AQCC calculations was chosen to be 9, that is, the core<br />

electrons are not considered in the size-extensivity correction<br />

(for details, see ref 30).<br />

The hierarchy of correlation-consistent basis sets cc-pVXZ 23<br />

and cc-pCVXZ 24 has been used with X ) D,T,Q, and 5.<br />

Since the size of CCH renders FCI calculations with small<br />

basis sets possible, FCI calculations (with a restricted openshell<br />

HF reference) have been carried out for the geometry of<br />

CCH employing the cc-pVDZ basis sets. These benchmark<br />

results are used to calibrate the corresponding CC and MR-<br />

AQCC results.<br />

Geometry optimizations have been carried out with analytically<br />

evaluated gradients in the case of the CCSD(T) 31,35-37 and<br />

MR-AQCC calculations, 38,39 while in all other cases the<br />

equilibrium geometry has been determined using purely numerical<br />

methods.<br />

The vibration-rotation interaction constants which are needed<br />

to subtract the vibrational contribution from the experimental<br />

rotational constants have been obtained at the UHF-CCSD(T)<br />

and ROHF-CCSD(T) levels using cc-pVTZ, cc-pCVTZ, ccpVQZ,<br />

and cc-pCVQZ basis sets 23,24 at the geometry optimized<br />

at the same level. The required quantities (for the relevant<br />

computational expressions, see, for example, ref 16) have been<br />

determined using analytic derivative techniques, that is, the<br />

harmonic force field was determined using either analytic<br />

gradients (ROHF-CCSD(T)) 31 or analytic second derivatives<br />

(UHF-CCSD(T)), 40,41 and the cubic force field has been<br />

subsequently determined via numerical differentiation as described<br />

in refs 19 and 42. In addition, to check the reliability<br />

of the obtained force fields, UHF-CCSDT calculations of the<br />

vibration-rotation interaction constants (within the frozen-core<br />

approximation) have been carried out employing our recently<br />

implemented general CC analytic second derivatives. 43<br />

CC calculations have been performed with the Austin-Mainz<br />

version of the ACES II program system. 44 The COLUMBUS<br />

suite of programs 39,45 was used for the MR-AQCC and the<br />

LUCIA code 46 for the FCI calculations. The CCSDT force field<br />

calculations have been carried using the generalized CI/CC code<br />

developed by one of us 47-49 which has been interfaced to the<br />

ACES II program.<br />

III. Results and Discussions<br />

III.A. Choice of Reference Space in the Multireference<br />

Treatments. The 2 Σ + ground state of CCH has a dominant<br />

configuration of (1σ) 2 (2σ) 2 (3σ) 2 (4σ) 2 (1π) 4 5σ. An appropriate<br />

reference space for the description of this electronic state within<br />

a MR-AQCC treatment has to be selected in a careful manner.<br />

In the present work, four different reference spaces have been<br />

tested with respect to their performance for the equilibrium<br />

geometry of CCH. In particular, the convergence of the<br />

calculated geometrical parameters with increase of the reference<br />

space is investigated.<br />

The smallest reference space is of complete active space<br />

(CAS) type and denoted by “5 × 5”, indicating that five<br />

electrons are distributed within five orbitals, namely the openshell<br />

5σ, the pairs of the π and π* orbitals (1π and 2π). The<br />

next CAS reference space, denoted by “5 × 6”, considers in<br />

addition the virtual 6σ orbital, while the largest CAS space (“5<br />

× 8”) includes three virtual orbitals (6σ, 7σ, and 8σ). Finally,<br />

to investigate the effect of including further “active” electrons,<br />

the “5 × 6” space has been augmented by single and double<br />

excitations involving the 3σ and/or 4σ orbital (in the following<br />

denoted by “5 × 6 + 2d”). Note that in all considered cases,<br />

the orbitals have been taken from MCSCF calculations using<br />

the same space. All single and double excitations out of the<br />

reference configurations have been included in the correlation<br />

treatment within the MR-CISD and MR-AQCC calculations.<br />

As the focus of these initial calculations is just the convergence<br />

of the results with respect to the chosen reference space, the<br />

calculations have been performed at the cc-pVDZ and cc-pVTZ<br />

basis-set levels, respectively.<br />

TABLE 1: Comparison of Geometrical Parameters (in Å)<br />

for the 2 Σ + State of CCH with Respect to the Chosen<br />

Reference Space in the MR-CISD and MR-AQCC<br />

Treatments a 5 × 5 5 × 6 5 × 8 5 × 6 + 2d<br />

r CC<br />

MR-AQCC/cc-pVDZ (fc) 1.2369 1.2376 1.2379 1.2371<br />

MR-CISD/cc-pVTZ (ae) 1.2093 1.2102 1.2102 1.2123<br />

MR-AQCC/cc-pVTZ (ae) 1.2121 1.2129 1.2131 1.2126<br />

r CH<br />

MR-AQCC/cc-pVDZ (fc) 1.0794 1.0797 1.0807 1.0799<br />

MR-CISD/cc-pVTZ (ae) 1.0546 1.0548 1.0552 1.0558<br />

MR-AQCC/cc-pVTZ (ae) 1.0573 1.0575 1.0580 1.0580<br />

a<br />

fc ) frozen-core calculations, ae ) all-electron calculations.


3032 J. Phys. Chem. A, Vol. 108, No. 15, 2004 Szalay et al.<br />

TABLE 2: Comparison of Geometrical Parameters (in Å) for the 2 Σ + State of CCH as Obtained at the CCSD(T), CCSDT, and<br />

MR-AQCC Levels Using Different Basis Sets a<br />

UHF-<br />

CCSD(T)<br />

ROHF-<br />

CCSD(T)<br />

r CC<br />

UHF-<br />

CCSDT<br />

ROHF-<br />

CCSDT<br />

MR-<br />

AQCC<br />

UHF-<br />

CCSD(T)<br />

ROHF-<br />

CCSD(T)<br />

r CH<br />

UHF-<br />

CCSDT<br />

ROHF-<br />

CCSDT<br />

MR-<br />

AQCC<br />

cc-pVDZ (fc) 1.2318 1.2353 1.2352 1.2354 1.2376 1.0797 1.0801 1.0801 1.0802 1.0797<br />

cc-pVTZ (fc) 1.2120 1.2153 1.2150 1.2151 1.2173 1.0643 1.0646 1.0645 1.0645 1.0638<br />

cc-pVQZ (fc) 1.2081 1.2113 1.2110 1.2110 1.2133 1.0642 1.0645 1.0644 1.0644 1.0635<br />

cc-pV5Z (fc) 1.2072 1.2104 1.2098 1.2123 1.0639 1.0642 1.0642 1.0632<br />

cc-pCVTZ (ae) 1.2087 1.2119 1.2132 1.0642 1.0645 1.0627<br />

cc-pCVQZ (ae) 1.2052 1.2083 1.2096 1.0630 1.0632 1.0613<br />

cc-pCV5Z (ae) 1.2043 1.2074 1.0626 1.0629<br />

a<br />

fc ) frozen-core calculations, ae ) all-electron calculations. b 5 × 6 reference space.<br />

The corresponding results are compiled in Table 1. The most<br />

significant observation is that there is a faster convergence of<br />

the bond distance with increase of the reference space in the<br />

MR-AQCC than in the MR-CISD calculations, as the MR-<br />

AQCC results seem to be much less sensitive to the choice of<br />

reference space. While the optimized bond distances obtained<br />

with the two methods are very close when the largest reference<br />

space (5 × 6 + 2d) is used, there are noticeable differences for<br />

the smaller reference spaces. For these, the MR-AQCC results<br />

are much closer to the “5 × 6 + 2d” values than the<br />

corresponding MR-CISD results. In particular, the inclusion of<br />

additional electrons in the reference space seems to be less<br />

important when using the MR-AQCC ansatz. The results in<br />

Table 1 thus indicate that the use of a “5 × 6” active space<br />

seems to be a safe and economical choice for large-scale MR-<br />

AQCC calculations on the 2 Σ + state of CCH. The remaining<br />

error due to higher excitations is estimated to be about 0.001-<br />

0.002 Å.<br />

III.B. Comparison of MR-AQCC and CC Results. In Table<br />

2 the CC and CH bond lengths obtained at CCSD(T), CCSDT,<br />

and MR-AQCC levels using different basis sets are compared.<br />

Focusing first on the coupled-cluster results, it is observed<br />

that, independent of the chosen basis set, the CC distances<br />

obtained at the UHF-CCSD(T) level are about 0.003 Å shorter<br />

than the corresponding CCSDT values, while the corresponding<br />

ROHF-CCSD(T) bond lengths are essentially identical to both<br />

the UHF- and ROHF-CCSDT values. This unexpected difference<br />

between the UHF and ROHF results is investigated in a<br />

forthcoming article 28 where the failure of UHF-CCSD(T) is<br />

traced back to a rapid change of the underlying UHF wave<br />

function at certain bond distances. It will be shown in ref 28<br />

that this breakdown of the UHF-CCSD(T) approach occurs for<br />

the ethynyl radical at distances close to the equilibrium<br />

geometry, and thus, the UHF-CCSD(T) results must be considered<br />

unreliable. Interestingly, the full CCSDT approach seems<br />

to be able to recover from these deficiencies of the underlying<br />

UHF reference functions and provides results which are essentially<br />

independent of the chosen reference functions.<br />

For the CC distances the differences between ROHF-CCSD-<br />

(T) and CCSDT are essentially negligible. When considering<br />

in addition the MR-AQCC calculations (obtained with the “5<br />

× 6” reference), we note that the MR-AQCC value for the CC<br />

distance is even longer than the corresponding CCSDT value<br />

(by about 0.002 Å). It is essentially impossible at this point to<br />

decide whether the CCSDT or the MR-AQCC results should<br />

be considered more accurate. 50 Good agreement of the ROHF-<br />

CCSD(T) and CCSDT also suggests that ROHF-CCSD(T) can<br />

be safely used with the larger basis sets where CCSDT is not<br />

practical.<br />

For the CH distance, all considered approaches yield essentially<br />

the same result.<br />

TABLE 3: Comparison of Geometrical Parameters (in Å)<br />

for the 2 Σ + State of CCH at the CCSD(T), CCSDT, and<br />

MR-AQCC Levels with Corresponding FCI Calculations a<br />

III.C. Comparison with Full Configuration Interaction<br />

Results. To judge the accuracy of MR-AQCC and CCSDT,<br />

benchmark calculations at the FCI level using the cc-pVDZ basis<br />

have been performed. The corresponding results are summarized<br />

in Table 3. As these results show, the CH bond distances<br />

obtained by any approach are in excellent agreement (differences<br />

are less than 0.0005 Å), while for the CC bond distance the<br />

FCI result falls between the corresponding CCSDT and MR-<br />

AQCC values. This means that in comparison with FCI the<br />

CCSDT value is about 0.001 Å too short, while MR-AQCC is<br />

about 0.001 Å too long. Both methods thus exhibit errors which<br />

are acceptable for our purpose.<br />

III.D. Basis-Set Convergence. After discussing the issue of<br />

electron correlation, we will now turn our interest to the basisset<br />

effects. Results obtained with both the cc-pVXZ and ccpCVXZ<br />

sequence of basis sets have been given in Table 2. In<br />

the cc-pVXZ calculations, when employing the frozen-core<br />

approximation, smooth convergence of the geometrical parameters<br />

is observed. When going from cc-pVDZ to cc-pV5Z, both<br />

bond distances are reduced, the CC distance by about 0.025 Å<br />

and the CH distance by about 0.016 Å. The differences between<br />

the cc-pVQZ and cc-pV5Z results are with 0.001 and 0.0003<br />

Å already rather small so that the cc-pV5Z results can be<br />

considered as nearly converged. However, the cc-pVXZ calculations<br />

do not incorporate core-correlation effects. To consider<br />

these properly, all-electron calculations using the core-valence<br />

correlating cc-pCVXZ sets have been carried out. As for the<br />

cc-pVXZ sequence, monotonic convergence is observed for the<br />

geometrical parameters within this basis-set sequence and the<br />

differences between quadruple- and pentuple-zeta results are<br />

again small. From the results, it is further seen that core<br />

correlation together with the additional consideration of core<br />

polarization functions reduces the CC bond distance by about<br />

0.003-0.004 Å, while the CH distance, as one might expect, is<br />

less affected and shortened by only 0.001-0.002 Å.<br />

Unfortunately, because of program limitations, it was not<br />

possible to perform MR-AQCC calculations using the largest<br />

cc-pCV5Z basis. However, the rather systematic difference<br />

between the CCSD(T) and MR-AQCC results enables a<br />

r CC<br />

r CH<br />

ROHF-CCSD(T) 1.2353 1.0801<br />

UHF-CCSD(T) 1.2318 1.0797<br />

UHF-CCSDT 1.2352 1.0801<br />

ROHF-CCSDT 1.2354 1.0802<br />

MR-AQCC b 1.2376 1.0797<br />

FCI 1.2367 1.0802<br />

a<br />

All calculations with cc-pVDZ and core orbitals frozen in the<br />

electron-correlation treatment. b 5 × 6 reference space.


Equilibrium Geometry of Ethynyl Radical J. Phys. Chem. A, Vol. 108, No. 15, 2004 3033<br />

TABLE 4: Calculated Vibrational Corrections ∆B ) B e -<br />

B 0 (in MHz) to the Rotational Constants of Different<br />

Isotopomers of CCH from UHF- and ROHF-based CC<br />

Calculations<br />

CCSD(T)<br />

cc-pVTZ<br />

CCSD(T)<br />

cc-pVQZ<br />

CCSD(T)<br />

cc-pCVTZ<br />

CCSD(T)<br />

cc-pCVQZ<br />

CCSDT(fc)<br />

cc-pVTZ a<br />

UHF Reference Function<br />

CCH 368.27 334.70 379.67 355.74 583.64<br />

13<br />

CCH 355.08 322.65 366.09 342.76 564.26<br />

C 13 CH 366.25 333.16 377.31 353.24 580.54<br />

CCD 168.07 151.12 175.33 167.52 258.47<br />

ROHF Reference Function<br />

CCH 531.16 479.58 568.24 495.37<br />

13<br />

CCH 513.21 463.24 549.12 478.15<br />

C 13 CH 528.13 476.98 564.57 491.72<br />

CCD 237.85 214.59 257.20 230.11<br />

a<br />

fc ) frozen-core calculation.<br />

prediction of the corresponding value based on MR-AQCC/ccpCVQZ<br />

and ROHF-CCSD(T)/cc-pCV5Z calculations. As the<br />

use of the pentuple- instead of the quadruple-ζ set decreases<br />

CC and CH bond distances by about 0.0009 and 0.0004 Å,<br />

respectively, the estimated MR-AQCC/cc-pCV5Z values are<br />

about 1.2087 and 1.0609 Å.<br />

The influence of diffuse functions has been investigated at<br />

the UHF-CCSD(T) level. It was found that the changes amounts<br />

to less than 0.0003 Å when going from cc-pCVQZ to aug-ccpCVQZ.<br />

III.E. Best Theoretical Estimates. On the basis of the<br />

previous sections, we are now able to give a best theoretical<br />

estimate for the equilibrium geometry of CCH. There are two<br />

(almost) independent procedures: one uses the MR-AQCC data<br />

while the other uses the CC data, respectively. At the MR-<br />

AQCC level, the best directly calculated geometry has been<br />

obtained with cc-pCVQZ basis set (r e (CC) ) 1.2096 Å and r e -<br />

(CH) ) 1.0613 Å). This geometry should be “improved” by<br />

the FCI correction obtained at the cc-pVDZ level, that is, by<br />

-0.0009 and 0.0005 Å as well as corrected for the remaining<br />

basis-set effect, that is, by -0.0009 Å and -0.0004 Å, for CC<br />

and CH, respectively (see above). Assuming additivity of these<br />

corrections, this leads to final values of 1.2078 and 1.0614 Å<br />

for the CC and CH bond distance, respectively. A similar<br />

extrapolation procedure starting from the ROHF-CCSD(T)/ccpCV5Z<br />

results (1.2074 and 1.0628 Å) and employing corrections<br />

due to full CCSDT (-0.0003 Å and -0.0001 Å) and FCI<br />

(0.0013 and 0.0000 Å) leads to a final estimate of 1.2084 and<br />

1.0627 Å for the two distances. The discrepancy of 0.001 to<br />

0.002 Å between the values obtained with these two extrapolation<br />

schemes is an indication for the accuracy of our theoretical<br />

results.<br />

It is noteworthy to mention that our best theoretical estimates<br />

are in excellent agreement with recent recommendations for the<br />

equilibrium geometry of CCH by Peterson and Dunning 7 based<br />

on CCSD(T) calculations. The corresponding values are 1.2076<br />

and 1.0619 Å.<br />

III.F. Analysis of Experimental Rotational Constants.<br />

After establishing a theoretical estimate for the equilibrium<br />

geometry of CCH, we now focus on the analysis of the<br />

experimental rotation constants using computed vibrational<br />

corrections. These corrections to B, that is, ∆B ) B e - B 0 , have<br />

been obtained at the UHF- and ROHF-CCSD(T) level using<br />

the cc-pVXZ and cc-pCVXZ sets with X ) T and Q. The<br />

calculated ∆B values are compiled in Table 4 and amount to<br />

about 150-590 MHz, that is, about 0.5 to 1.5% of the values<br />

of the corresponding rotational constants for the considered<br />

isotopomer and thus are non-negligible. However, large discrepancies<br />

are seen between the vibrational corrections computed<br />

with UHF and ROHF reference functions. We thus<br />

decided to check the reliability of the CCSD(T) force fields<br />

via corresponding CCSDT calculations using the cc-pVTZ basis<br />

set. As is seen from Table 4, the CCSDT calculations suggest<br />

that the UHF-CCSD(T) force fields (as the corresponding<br />

geometries) should be considered unreliable and that only the<br />

ROHF-CCSD(T) approach yields vibrational corrections in good<br />

agreement with the CCSDT approach. On the basis of these<br />

calculations, we refrain from discussing the UHF-CCSD(T)<br />

results any further and solely discuss the corresponding ROHF-<br />

CCSD(T) results in the following.<br />

For the least-squares fit of the geometrical parameters to the<br />

rotational constants, the most recent B 0 values from refs 8, 12,<br />

and 14, as given in the Introduction, have been used together<br />

with the vibrational corrections compiled in Table 4. The<br />

resulting empirical equilibrium geometries are summarized in<br />

Table 5. According to the values reported there, an “empirical”<br />

equilibrium geometry of r CC ) 1.207 Å and r CH ) 1.069 Å can<br />

be given with 0.002 Å as a conservative error estimate 51 based<br />

on the convergence of the results.<br />

A comparison of the empirical equilibrium geometry with<br />

our best theoretical estimates shows that the remaining discrepancies<br />

are in the range of 0.001 to 0.002 Å for the CC and<br />

0.006 to 0.008 Å for the CH distances. It appears that the<br />

empirical value for the CC distance is slightly shorter and the<br />

CH distance is longer than the corresponding theoretical values.<br />

While these discrepancies can possibly be traced back to<br />

remaining deficiencies in the theoretical treatment, another, and<br />

maybe more likely, possibility is that these differences point to<br />

so far unexplored limitations in the perturbational treatment of<br />

the vibrational corrections (note that there is a low-lying Π state<br />

which interacts with the electronic ground state through the<br />

bending motion).<br />

Nevertheless, the current study leads to a satisfactory agreement<br />

between theory and experiment and thus provides a<br />

consistent picture with respect to the equilibrium geometry.<br />

Concerning previous efforts to determine the geometry of<br />

CCH, we note that the r s (as well as the r 0 ) structures are rather<br />

TABLE 5: Comparison of Geometrical Parameters (in Å) for the 2 Σ + State of CCH Obtained from Theory and Experiment<br />

structure r CC r CH method ref<br />

r e 1.2064 1.0678 from exptl B 0 with ∆B(ROHF-CCSD(T)/cc-pVTZ) this work<br />

r e 1.2076 1.0657 from exptl B 0 with ∆B(ROHF-CCSD(T)/cc-pVQZ) this work<br />

r e 1.2056 1.0689 from exptl B 0 with ∆B(ROHF-CCSD(T)/cc-pCVTZ) this work<br />

r e 1.2075 1.0651 from exptl B 0 with ∆B(ROHF-CCSD(T)/cc-pCVQZ) this work<br />

r e 1.2050 1.0703 from exptl B 0 with ∆B(UHF-CCSDT(fc)/cc-pVTZ) this work<br />

r e 1.2078 1.0614 est from MR-AQCC this work<br />

r e 1.2084 1.0627 est from CCSDT this work<br />

r 0 1.2193 1.0457 from exptl B 0 this work<br />

r s 1.21652 1.04653 from exptl B 0 1<br />

r e 1.2076 1.0619 est from CCSD(T) 7


3034 J. Phys. Chem. A, Vol. 108, No. 15, 2004 Szalay et al.<br />

different (compare Table 5). Both of them deviate by about<br />

0.005 Å in the CC and by about 0.015 Å in the CH distance<br />

from the equilibrium geometries obtained in this work. Apparently,<br />

unlike often claimed, the substitution approach leading<br />

to the r s structure is not able to eliminate vibrational effects in<br />

the case of CCH, and thus, the r s and r 0 structure turn out to be<br />

very similar. Our observation supports the speculation in ref 1<br />

that the significantly too short CH distance is due to insufficient<br />

account of vibrational effects, and in particular of the lowfrequency<br />

bending motion, a well-known artifact of the substitution<br />

approach to molecular structures.<br />

IV. Conclusions<br />

Equilibrium geometrical parameters for the 2 Σ + state of the<br />

ethynyl radical have been obtained using two approaches. The<br />

first purely theoretical procedure based on extensive CC, MR-<br />

AQCC, and FCI calculations yields values of 1.208 Å for the<br />

CC distance and 1.061-1.063 Å for the CH distance, while<br />

the second approach based on the analysis of experimental<br />

rotational constants using computed vibrational corrections<br />

provides values of 1.207 and 1.069 Å. The observed differences<br />

between the two approaches of 0.001-0.002 Å for CC and<br />

0.006-0.008 Å for CH are somewhat larger than expected.<br />

Among possible causes for this discrepancy, we consider<br />

limitations in the perturbational treatment of the vibrational<br />

corrections to the rotational constants. The r s and r 0 geometries<br />

for CCH are, because of a missing or insufficient treatment of<br />

these corrections, far away from the true equilibrium geometry.<br />

Acknowledgment. The authors acknowledge fruitful discussions<br />

with Professor J. F. Stanton (University of Texas,<br />

Austin). This work has been supported by the Hungarian<br />

Scientific Research Foundation (OTKA, Grants T032980 and<br />

M042110), the Deutsche Forschungsgemeinschaft, the Fonds<br />

der Chemischen Industrie, and the Danish Centre for Supercomputing<br />

(DCSC). This research is part of an effort by a task<br />

group of the International Union of Pure and Applied Chemistry<br />

to determine structures, vibrational frequencies, and thermodynamic<br />

functions of free radicals of importance in atmospheric<br />

chemistry.<br />

References and Notes<br />

(1) Bogey, M.; Demuynck, C.; Destombes, J. L. Mol. Phys. 1989, 66,<br />

955.<br />

(2) Hillier, I. H.; Kendrick, J.; Guest, M. F. Mol. Phys. 1975, 30, 1133.<br />

(3) Shih, S.; Peyerimhoff, S. D.; Buenker, R. J. J. Mol. Spectrosc. 1977,<br />

64, 167.<br />

(4) Shih, S.; Peyerimhoff, S. D.; Buenker, R. J. J. Mol. Spectrosc. 1979,<br />

74, 124.<br />

(5) Fogarasi, G.; Boggs, J. E.; Pulay, P. Mol. Phys. 1983, 50, 139.<br />

(6) Kraemer, W. P.; Roos, B. O.; Bunker, P. R.; Jensen, P. J. Mol.<br />

Spectrosc. 1986, 120, 236.<br />

(7) Peterson, K. A.; Dunning, T. H. J. Chem. Phys. 1997, 106, 4119.<br />

(8) Müller, H.; Klaus, T.; Winnewisser, G. Astron. Astrophys. 2000,<br />

357, L65.<br />

(9) Sastry, K. V. L. N.; Helminger, P.; Charo, A.; Herbst, E.; Delucia,<br />

F. C. Astrophys. J. 1981, 251, L119.<br />

(10) Gottlieb, C. A.; Gottlieb, E. W.; Thaddeus, P. Astrophys. J. 1983,<br />

264, 740.<br />

(11) Saykally, R. J,; Veseth, L.; Evenson, K. M. J. Chem. Phys. 1984,<br />

80, 2247.<br />

(12) McCarthy, M. C.; Gottlieb, C. A.; Thaddeus, P. J. Mol. Spectrosc.<br />

1995, 173, 303.<br />

(13) Note that there is a trivial misprint in ref 1 for the former value.<br />

Bogey, M. Université Lille, France. Private communication, 1999.<br />

(14) Bogey, M.; Demuynck, C.; Destombes, J. L. Astron. Astrophys.<br />

1985, 144, L15.<br />

(15) Costain, C. C. J. Chem. Phys. 1958, 82, 5053.<br />

(16) See, for example: Mills, I. M. In Molecular Spectroscopy: Modern<br />

Research; Rao, K. N., Matthews, C. W., Eds.; Academic: New York, 1972;<br />

p. 115<br />

(17) Pulay, P.; Meyer, W.; Boggs, J. E. J. Chem. Phys. 1978, 68, 5077.<br />

(18) McCarthy, M. C.; Gottlieb, C. A.; Thaddeus, P.; Horn, M.;<br />

Botschwina, P. J. Chem. Phys. 1995, 103, 7820.<br />

(19) Stanton, J. F.; Lopreore, C. L.; Gauss, J. J. Chem. Phys. 1998, 108,<br />

7190.<br />

(20) Pawlowski, F.; Jørgensen, P.; Olsen, J.; Hegelund, F.; Helgaker,<br />

T.; Gauss, J.; Bak, K. L.; Stanton, J. F. J. Chem. Phys. 2002, 116, 6482.<br />

(21) Purvis, G. D.; Bartlett, R. J. J. Chem. Phys. 1982, 76, 1910.<br />

(22) Raghavachari, K.; Trucks, G. W.; Head-Gordon, M.; Pople, J. A.<br />

Chem. Phys. Lett. 1989, 157, 479.<br />

(23) Dunning, T. H. J. Chem. Phys. 1989, 90, 1007.<br />

(24) Woon, D. E.; Dunning, T. H. J. Chem. Phys. 1993, 99, 1914.<br />

(25) Helgaker, T.; Gauss, J.; Jørgensen, P.; Olsen, J. J. Chem. Phys.<br />

1997, 106, 6430.<br />

(26) Bak, K. L.; Gauss, J.; Jørgensen, P.; Olsen, J.; Helgaker, T.; Stanton,<br />

J. F. J. Chem. Phys. 2001, 114, 6548.<br />

(27) See, for example: Byrd, E. F. C.; Sherrill, C. D.; Head-Gordon,<br />

M. J. Phys. Chem. A 2001, 105, 9736.<br />

(28) Szalay, P. G.; Vazquez, J.; Stanton, J. F. Material to be submitted<br />

for publication.<br />

(29) Szalay, P. G.; Bartlett, R. J. Chem. Phys. Lett. 1993, 214, 481.<br />

(30) Szalay, P. G.; Bartlett, R. J. J. Chem. Phys. 1995, 103, 3600.<br />

(31) Watts, J. D.; Gauss, J.; Bartlett, R. J. J. Chem. Phys. 1993, 98,<br />

8718.<br />

(32) Noga, J.; Bartlett, R. J. J. Chem. Phys. 1987, 86, 7041.<br />

(33) Scuseria, G. E.; Schaefer, H. F. Chem. Phys. Lett. 1988, 152, 382.<br />

(34) Watts, J. D.; Bartlett, R. J. J. Chem. Phys 1990, 93, 6104.<br />

(35) Gauss, J.; Stanton, J. F.; Bartlett, R. J. J. Chem. Phys. 1991, 95,<br />

2623.<br />

(36) Watts, J. D.; Gauss, J.; Bartlett, R. J. Chem. Phys. Lett. 1992, 200,<br />

1.<br />

(37) Gauss, J.; Lauderdale, W. J.; Stanton, J. F.; Watts, J. D.; Bartlett,<br />

R. J. Chem. Phys. Lett. 1991, 182, 207.<br />

(38) Shepard, R.; Lischka, H.; Szalay, P. G.; Kovar, T.; Ernzerhof, M.<br />

J. Chem. Phys. 1992, 96, 2085.<br />

(39) Lischka, H.; Shepard, R.; Pitzer, R. M.; Shavitt, I.; Dallos, M.;<br />

Müller, T.; Szalay, P. G.; Seth, M.; Kedziora, G., Yabushitah, S.; Zhangi,<br />

Z. Phys. Chem. Chem. Phys. 2001, 3, 664.<br />

(40) Gauss, J.; Stanton, J. F. Chem. Phys. Lett. 1997, 276, 70.<br />

(41) Szalay, P. G.; Gauss, J.; Stanton, J. F. Theor. Chem. Acc. 1998,<br />

100, 5.<br />

(42) Stanton, J. F.; Gauss, J. Int. ReV. Phys. Chem. 2000, 19, 61.<br />

(43) Kállay, M.; Gauss, J. J. Chem. Phys., in press.<br />

(44) Stanton, J. F.; Gauss, J.; Watts, J. D.; Lauderdale, W. J.; Bartlett,<br />

R. J. Int. J. Quantum Chem. Symp. 1992, 26, 879.<br />

(45) Lischka, H.; Shepard, R.; Shavitt, I.; Brown, F. B.; Pitzer, R. M.;<br />

Ahlrichs, R.; Böhm, H.-J.; Chang, A. H. H.; Comeau, D. C.; Gdanitz, R.;<br />

Dachsel, H.; Dallos, M.; Erhard, C.; Ernzerhof, M.; Gawboy, G.; Höchtl,<br />

P.; Irle, S.; Kedziora, G.; Kovar, T.; Müller, T.; Parasuk, V.; Pepper, M.;<br />

Scharf, P.; Schiffer, H.; Schindler, M.; Schüler, M.; Stahlberg, E.; Szalay,<br />

P. G.; Zhao, J.-G. COLUMBUS, An ab Initio Electronic Structure Program,<br />

release 5.8, 2001.<br />

(46) Olsen, J. LUCIA, a Full CI, Restricted ActiVe Space Program;<br />

Aarhus University: Denmark, with contributions from H. Larsen.<br />

(47) Kállay, M.; Surján,P.R.J. Chem. Phys. 2000, 113, 1359.<br />

(48) Kállay, M.; Surján,P.R.J. Chem. Phys. 2001 115, 2945.<br />

(49) Kállay, M.; Gauss, J.; Szalay P. G. J. Chem. Phys. 2003, 119, 2991.<br />

(50) It should be mentioned here that our MR-AQCC results are in<br />

excellent agreement with previous MR-CI calculations by Peterson and<br />

Dunning. 7 Their best values at the MR-CI level (augmented by a Davidson<br />

correction) using a full valence active space using a pV5Z basis for carbon<br />

and a pVQZ basis for hydrogen of 1.2116 and 1.0643 Å are of comparable<br />

quality as our MR-AQCC/pV5Z(fc) values of 1.2123 and 1.0632 Å.<br />

(51) Note that the residuals in the least-squares fit were in all cases<br />

smaller than 1.5 MHz.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!