Get my PhD Thesis

PhD Thesis 

Optimization of Densities in 

Hartree-Fock and Density-functional Theory 

Atomic Orbital Based Response Theory 

and 

Benchmarking for Radicals 

Lea Thøgersen 

Department of Chemistry 

University of Aarhus 

2005

"Experiments are the only means of knowledge at our disposal. 

The rest is poetry, imagination." 

Max Planck

Contents 

Preface .........................................................................................................................v 

List of Publications ....................................................................................................vii 

Part 1 Improving Self-consistent Field Convergence.................................................1 

1.1 Introduction .....................................................................................................................1 

1.2 The Self-consistent Field Method....................................................................................2 

1.3 A Survey of Methods for Improving SCF Convergence .................................................5 

1.3.1 Energy Minimization.............................................................................................6 

1.3.2 Damping and Extrapolation...................................................................................7 

1.3.3 Level Shifting......................................................................................................11 

1.4 Development of SCF Optimization Algorithms ............................................................12 

1.4.1 Dynamically Level Shifted Roothaan-Hall .........................................................13 

1.4.1.1 RH Step with Control of Density Change..............................................13 

1.4.1.2 The Trust Region RH Level Shift ..........................................................15 

1.4.1.3 DIIS and Dynamically Level Shifted RH ..............................................16 

1.4.1.4 Line Search TRRH.................................................................................18 

1.4.1.5 Optimal Level Shift without MO Information.......................................19 

1.4.1.6 The Trace Purification Scheme..............................................................23 

1.4.2 Density Subspace Minimization..........................................................................25 

1.4.2.1 The Trust Region DSM Parameterization..............................................25 

1.4.2.2 The Trust Region DSM Energy Function ..............................................26 

1.4.2.3 The Trust Region DSM Minimization ...................................................27 

1.4.2.4 Line Search TRDSM..............................................................................29 

1.4.2.5 The Missing Term..................................................................................30 

1.4.3 Energy Minimization Exploiting the Density Subspace .....................................32 

1.4.3.1 The Augmented RH Energy model........................................................33 

1.4.3.2 The Augmented RH Optimization .........................................................34 

1.4.3.3 Applications ...........................................................................................36 

1.5 The Quality of the Energy Models for HF and DFT .....................................................37 

1.5.1 The Quality of the TRRH Energy Model............................................................39 

1.5.2 The Quality of the TRDSM Energy Model.........................................................42 

1.6 Convergence for Problems with Several Stationary Points...........................................44 

1.6.1 Walking Away from Unstable Stationary Points ................................................46 

1.6.1.1 Theory....................................................................................................46 

1.6.1.2 Examples................................................................................................47 

i

1.7 Scaling .......................................................................................................................... 48 

1.7.1 Scaling of TRRH ................................................................................................ 49 

1.7.2 Scaling of TRDSM ............................................................................................. 51 

1.8 Applications.................................................................................................................. 51 

1.8.1 Calculations on Small Molecules ....................................................................... 52 

1.8.2 Calculations on Metal Complexes...................................................................... 54 

1.9 Conclusion .................................................................................................................... 56 

Part 2 Atomic Orbital Based Response Theory........................................................ 59 

2.1 Introduction................................................................................................................... 59 

2.2 AO Based Response Equations in Second Quantization .............................................. 60 

2.2.1 The Parameterization.......................................................................................... 60 

2.2.2 The Linear Response Function ........................................................................... 62 

2.2.3 The Time Development of the Reference State.................................................. 63 

2.2.4 The First-order Equation .................................................................................... 64 

2.2.5 Pairing................................................................................................................. 66 

2.3 Solving the Response Equations................................................................................... 68 

2.3.1 Preconditioning................................................................................................... 69 

2.3.2 Projections .......................................................................................................... 70 

2.4 The Excited State Gradient ........................................................................................... 71 

2.4.1 Construction of the Lagrangian .......................................................................... 71 

2.4.2 The Lagrange Multipliers ................................................................................... 72 

2.4.3 The Geometrical Gradient .................................................................................. 73 

2.4.4 The First-order Excited State Properties............................................................. 74 

2.5 Test Calculations........................................................................................................... 75 

2.6 Conclusion .................................................................................................................... 76 

Part 3 Benchmarking for Radicals............................................................................ 77 

3.1 Introduction................................................................................................................... 77 

3.2 Computational Methods................................................................................................ 77 

3.3 Numerical Results......................................................................................................... 79 

3.3.1 Convergence of CC and CI Hierarchies ............................................................. 79 

3.3.2 The Potential Curve for CN................................................................................ 80 

3.3.3 Spectroscopic Constants and Atomization Energy for CN................................. 81 

3.3.4 The Vertical Electron Affinity of CN................................................................. 82 

3.3.5 The Equilibrium Geometry of CCH ................................................................... 83 

3.4 Conclusion .................................................................................................................... 84 

ii

Summary....................................................................................................................87 

Dansk Resumé ...........................................................................................................89 

Appendix A................................................................................................................91 

Appendix B................................................................................................................93 

Acknowledgements....................................................................................................95 

References..................................................................................................................97 

iii

Preface 

The present PhD thesis is the outcome of four years of PhD studies at the Faculty of Science, 

University of Aarhus, Denmark. 

The thesis is divided into three distinct parts which can be read independently. Part 1 deals with the 

optimization of the one-electron density in Hartree Fock and density functional theory, and Part 2 

deals with atomic orbital based response theory for Hartree Fock and density functional theory. Part 

2 thus naturally follows after Part 1. In Part 3 benchmark results from FCI calculations on the 

radicals CN and CCH are given. 

The work presented in Part 1 has resulted in papers I - III as listed in the following List of 

Publications and the work presented in Part 3 has resulted in papers V – VI. The work presented in 

Part 2 was initialized in the fall 2004 and will result in paper IV. The development of improved 

optimization algorithms for self-consistent field calculations is the subject on which I have spent the 

most of my time, and Part 1 therefore makes up the larger part of this thesis. 

The work has been carried out under the supervision of and in collaboration with Dr. Jeppe Olsen 

and Professor Poul Jørgensen at the University of Aarhus. Some work was carried out during visits 

at The Royal Institute of Technology in Stockholm, Sweden, the University of Trieste, Italy and the 

University of Oslo, Norway. The following people have also contributed to the work presented in 

this thesis (see List of Publications): Paweł Sałek (The Royal Institute of Technology in 

Stockholm), Sonia Coriani (University of Trieste), Trygve Helgaker (University of Oslo), Stinne 

Høst (University of Aarhus), Danny Yeager (Texas A&M University), Andreas Köhn (University of 

Aarhus), Jürgen Gauss (University of Mainz), Péter Szalay (Eötvös Loránd University) and Mihály 

Kállay (University of Mainz). 

The outline of the thesis is as follows: Part 1 is based on the published papers I – II and the 

unpublished paper III, but can be read independently of the papers. Certain discussions in the papers 

I - II are left out of the thesis and only referred to, as they might as well be read in the papers. Other 

discussions not published in the papers are presented in this thesis, including the latest 

developments of the algorithms. Part 2 is simply paper IV in preparation. Part 3 is based on the 

published papers V – VI and is basically a short version of paper V combined with selected results 

from paper VI. Also this part can be read independently of the papers. 

v

List of Publications 

This thesis includes the following papers. Number I, II, V and VI have already been published and 

are attached this thesis, whereas III and IV are in preparation. 

Part 1 

I. The Trust-region Self-consistent Field Method: Towards a Black Box optimization in Hartree- 

Fock and Kohn-Sham Theories, 

L. Thøgersen, J. Olsen, D. Yeager, P. Jørgensen, P. Sałek, and T. Helgaker, 

J. Chem. Phys. 121, 16 (2004) 

II. The Trust-region Self-consistent Field Method in Kohn-Sham Density-functional Theory, 

L. Thøgersen, J. Olsen, A. Köhn, P. Jørgensen, P. Sałek, and T. Helgaker, 

J. Chem. Phys. 123, 074103 (2005) 

III. Augmented Roothaan-Hall for converging Densities in Hartree-Fock and Density-functional 

Theory, 

S. Høst, L. Thøgersen, P. Jørgensen and J. Olsen 

Part 2 

IV. Atomic Orbital Based Response Theory, 

L. Thøgersen, P. Jørgensen, J. Olsen and S. Coriani 

Part 3 

V. A Coupled Cluster and Full Configuration Interaction Study of CN and CN - , 

L. Thøgersen and J. Olsen, 

Chem. Phys. Lett. 393, 36 (2004) 

VI. Equilibrium Geometry of the Ethynyl (CCH) Radical, 

P. G. Szalay, L. Thøgersen, J. Olsen, M. Kállay and J. Gauss, 

J. Phys. Chem. A 108, 3030 (2004). 

vii

Part 1 

Improving Self-consistent Field Convergence 

1.1 Introduction 

The Hartree-Fock (HF) self-consistent field (SCF) method has been around in an orbital formulation 

since 1951, where it was introduced by Roothaan 1 and Hall 2 , but today it is as significant as ever. 

Even though numerous higher correlated methods with superior accuracy have been developed 

since then, most of them still use the Hartree-Fock wave function as the reference function, and are 

thus still dependent on a functioning Hartree-Fock optimization. When Kohn and Sham 3 recognized 

in 1965 that the Roothaan-Hall SCF scheme had a lot to offer the density optimization in density 

functional theory (DFT), the DFT methods entered the chemical scene. Now it was in theory also 

possible to obtain results at the exact level from SCF calculations; if only the correct functional 

could be found. The developments in computer hardware and linear scaling SCF algorithms over 

the last decade have made it possible to carry out ab initio quantum chemical calculations on biomolecules 

with hundreds of amino acids and on large molecules relevant for nano-science. 

Quantum chemical calculations are thus evolving to become a widespread tool for use in several 

scientific branches. It is therefore important that the algorithms work as black-boxes, such that the 

user outside quantum chemistry does not have to be concerned with the details of the calculations. 

Since no scientific results neither from the higher correlated calculations nor from the large-scale 

calculations can be achieved if the SCF optimization does not converge, it is necessary to take an 

interest in developing a sound, stable optimization scheme that can handle the complexity in the 

problems of the future. 

This part of my thesis is a contribution to the quest for a black-box SCF optimization algorithm with 

optimal convergence properties. In Section 1.2, the basic Hartree-Fock/Kohn-Sham theory and 

notation of this part of the thesis is stated, and in Section 1.3 the efforts through the years to 

1

Part 1 


improve the Roothaan-Hall SCF scheme are reviewed. Our contributions to the development of 

stable and physical sound SCF optimization schemes are presented in Section 1.4, and in Section 

1.5 we study the quality of the schemes when applied for HF and DFT. Optimization of problems 

with several stationary points is discussed in Section 1.6, in Section 1.7 the scaling of the algorithms 

is accounted for, and Section 1.8 contains some convergence examples for HF and DFT calculations 

using the algorithms presented in Section 1.4. Finally, Section 1.9 contains concluding remarks; 

reviewing the results of this part of the thesis. 

1.2 The Self-consistent Field Method 

In the following we consider a closed-shell system with N/2 electron pairs. The basic theory of the 

Hartree-Fock (HF) and the Kohn-Sham (KS) density optimizations will be described 

simultaneously, and the differences will be noted as they appear. Since we are interested in 

extending the algorithms presented to large scale calculations, a formulation without reference to 

the delocalized molecular orbitals (MOs) is essential, and thus the focus will be on the density in the 

atomic orbital (AO) basis rather than the MOs themselves. All through the thesis, SCF will be used 

as a general term for HF and KS-DFT methods since they have the SCF optimization scheme in 

common. The orbital index convention used in this thesis is i, j, k, l for occupied MOs, a, b, c, d for 

virtual MOs, p, q for MOs in general, and Greek letters µ, ν, ρ, σ for AOs. 

For closed-shell restricted Hartree-Fock or DFT, the electronic energy is given by 

E = 2TrhD + Tr DG( D) + h + E ( D ), (1.1) 

SCF nuc XC 

where h is the one-electron Hamiltonian matrix in the AO basis, h nuc is the nuclear-nuclear repulsion 

contribution, and D is the (scaled) one-electron density matrix in the AO basis, D = ½D AO , which 

satisfies the symmetry, trace, and idempotency conditions, 

D 

T 

Tr DS = 

= D 

N 

2 

DSD = D , 

(1.2) 

of a valid one-electron density matrix. S is the AO overlap matrix. The elements of G(D) are given 

by 

∑ 

∑ 

G ( D ) = 2 g D −γ g D , (1.3) 

µν µνρσ ρσ µσρν ρσ 

ρσ 

ρσ 

where g µνρσ are the two-electron AO integrals. The first term in Eq. (1.3) represents the Coulomb 

contribution, and the second term is the contribution from exact exchange, with γ = 1 in Hartree- 

Fock theory, γ = 0 in pure DFT, and γ ≠ 0 in hybrid DFT. The exchange-correlation energy E XC (D) 

in Eq. (1.1) is a nonlinear and non-quadratic functional of the electronic density. This term is only 

2

The Self-consistent Field Method 

present in the energy expression for the DFT level of theory - the Hartree-Fock energy is expressed 

only by the first three terms of Eq. (1.1). The form of E XC depends on the DFT functional chosen for 

the calculation. 

The first derivative of the electronic energy with respect to the density is found as 

where 

(1) ∂ESCF 

( D) 

ESCF 

( D) = = 2 F( D) 

, (1.4) 

∂D 

1 

2 

(1) 

XC 

FD ( ) = h+ GD ( ) + E ( D ) 

(1.5) 

is the Kohn-Sham matrix in DFT and, if the last term is excluded, the Fock matrix in Hartree-Fock 

(1) 

theory. From now on F(D) is simply referred to as the Fock matrix. E XC ( D ) is the first derivative 

of the term E XC expanded in the density. 

The Fock matrix is by design an effective one-electron Hamiltonian which is itself dependent on the 

eigenfunctions. Optimizing the electronic energy is thus a nonlinear problem and an iterative 

scheme must be applied. In 1951 Roothaan and Hall suggested an iterative procedure 1,2 in which a 

set of molecular orbitals (MOs) are constructed in each step through a diagonalization of the current 

Fock matrix, which in the AO formulation is written as 

FC = SCε , (1.6) 

where S is the AO overlap matrix, ε is a diagonal matrix containing the orbital energies, and the 

eigenvectors C contain the MO coefficients. The MOs, φ p , are linear combinations of a finite set of 

one-electron basis functions, χ µ , with C µp as expansion coefficients 

ϕ 

p 

= ∑ χ C . (1.7) 

µ 

µ µ p 

For the closed shell case the MOs can be divided into an occupied (φ occ ) and a virtual (φ virt ) part, 

where the occupied MOs each contain two electrons and the virtual orbitals are empty. If the aufbau 

ordering rule is applied, the occupied MOs are chosen as those with the lowest eigenvalues. 

A new trial density D can then be constructed from the occupied orbitals as 

occ 

T 

occ 

D = C C . (1.8) 

From this density a new Fock matrix can be evaluated from Eq. (1.5) and diagonalizing it according 

to Eq. (1.6) establishes the iterative procedure. The iterative cycle stops when self-consistency is 

obtained, that is, when the new density, energy or molecular orbitals do not change within some 

convergence threshold compared to the previous ones. 

3

Part 1 


In an iterative scheme it is necessary to have a start guess. For the SCF case it should be a one 

electron density which fulfils Eq. (1.2), created directly or from a start guess of the molecular 

orbitals as in Eq. (1.8). Different approaches are used; a simple and easily applicable possibility is 

to obtain the starting orbitals by diagonalization of the one-electron Hamiltonian (H1-core). This is 

the start guess most widely used in this thesis since it is always available. Another popular 

possibility is to create a semi-empirical start guess where the orbitals resulting from a semiempirical 

calculation (e.g. Hückel) on the molecule are fitted to the current basis. 

n = n+1 

no 

D 0 

F(D n 

) 

F(D n 

) D n+1 

D n+1 

≈ D n 

yes 

The steps of the self-consistent field (SCF) scheme are summarized 

from the density point of view in Fig. 1.1: From a density matrix start 

guess a Fock matrix is constructed. From this Fock matrix a new density 

matrix can be found and so an iteration procedure is established which 

continues until self consistency. The step creating a new density from a 

Fock matrix will be referred to as the Roothaan-Hall (RH) step 

throughout this thesis, regardless if it is a diagonalization of the Fock 

matrix or some alternative scheme. 

The purpose of an SCF optimization is typically to find the global 

D conv 

minimum. Since the HF/KS equations are nonlinear, several stationary 

Fig. 1.1 Flow diagram of 

points might exist, and depending on the start guess and the 

the SCF scheme. 

optimization procedure, the converged result can be representing a local 

minimum as well as a global or even a saddle point. By evaluating the lowest Hessian eigenvalue it 

can be realized whether the stationary point is a minimum or a saddle point, but no simple test can 

reveal whether a minimum is global or not. The use of the term “convergence” in this thesis will 

simply refer to the iterative development from the start guess to a self-consistent density with a 

gradient below the convergence threshold. The issues connected with problems where several 

stationary points can be found are discussed in Section 1.6. 

Since Roothaan and Hall suggested the iterative diagonalization procedure as a means to solve the 

Hartree-Fock equations and Kohn and Sham suggested using the same scheme for optimizing the 

electron density for density functional theory 3 , the SCF methods have been used extensively in 

quantum chemistry. Unfortunately, it turned out that the simple fixed point scheme sketched in Fig. 

1.1 converges only in simple cases. Already around 1960 it was recognized that the method 

sometimes fails to converge and that divergent behavior in some cases is intrinsic 4,5 . 

4

A Survey of Methods for Improving SCF Convergence 

1.3 A Survey of Methods for Improving SCF Convergence 

Numerous suggestions have been made to improve upon the convergence of Roothaan and Hall’s 

original scheme or to replace it with an alternative scheme. The suggestions can be crudely divided 

into three different categories; energy minimization, damping/extrapolation, and level shifting. 

Furthermore the different suggestions in these categories have been combined in various ways. The 

two latter categories are modifications to the Roothaan-Hall scheme, whereas energy minimization 

is a means of avoiding the iterative diagonalization scheme and instead use some optimization 

scheme on an energy function. 

To my knowledge these categories embrace all convergence improvements suggested over the 

years, except for the method of fractionally occupying orbitals around the Fermi level 6 which does 

not fit in any of the categories. As mentioned, the start guess has a great impact on the optimization, 

and a poor start guess with the wrong electron configuration can use many iterations changing to a 

more optimal electron configuration and in some cases the proper electron configuration is never 

found and the calculation diverges. In the methods using fractional occupations, a number of 

orbitals around the Fermi level are allowed to have non-integral occupation. The non-integral 

occupations are determined from the Fermi-Dirac distribution which is a function of the 

temperature. The non-integral occupations are updated in each iteration, and corrected such that the 

total number of electrons is constant. During the optimization either the temperature is decreased to 

T = 0K or the number of orbitals allowed to have non-integral occupation is decreased, to have only 

integer occupations at the end of the optimization. It is thus possible to optimize the electron 

configuration in an effective manner in the beginning of the SCF optimization, and when the proper 

configuration has been found, the rest of the optimization has a better chance of convergence since 

the start guess in a way has been improved. 

In the following, the focus will be on the efforts to improve the convergence behavior of the SCF 

scheme through optimization algorithm development in the three categories listed above. Other 

efforts bear as much significance and should also be acknowledged, in particular should be 

mentioned the generalizations of many well-functioning schemes to the unrestricted level of theory 

which has its own challenges. Also the quest for construction of an improved start guess is 

important. It is obvious that with an improved start guess, less is demanded from the optimization 

method and thus some convergence problems inherent in the methods could be avoided. In the last 

decade the effort in SCF scheme development has for a large part been put in decreasing the scaling 

of the methods to allow calculations on larger molecules. Scaling is a very important subject and it 

should not be ignored. Section 1.7 will therefore discuss the scaling of the algorithms presented in 

5

Part 1 


this thesis. Despite the importance of these three SCF related subjects, the rest of this section will be 

almost solely on efforts to improve convergence through optimization algorithm development. 

1.3.1 Energy Minimization 

One of the problems in the simple Roothaan-Hall procedure is the lack of guarantees for energy 

decrease in the iterative steps. This was pointed out by McWeeny, and he thus introduced a steepest 

descent procedure 7,8 as an energy minimization alternative to Roothaan and Hall’s repeated 

diagonalizations. Steepest descent optimizations have the benefit that a decrease in energy can be 

guaranteed for each step. McWeeny’s scheme suffers, however, from a slow convergence rate 5 as 

often seen for steepest descent methods. Fletcher and Reeves proposed the conjugate gradient 

optimization method 9 instead, which often is more efficient than steepest descent and is guaranteed 

to converge in a number of steps equal to the dimension of the problem. 

A decade later Hilliers and Saunders suggested an improvement to the McWeeny scheme called 

energy-weighted steepest descent 10 , in which the coordinates in the orbital space are energyweighted. 

In 1976 this work was generalized by Seeger and Pople. They realized that another 

problem in the simple Roothaan procedure is the possibility for discontinuous changes in the 

orbitals which do not necessarily lower the energy. To ensure energy descent it is necessary to be 

able to follow such changes continuously, and methods like the steepest descent have the possibility 

to do so. Their procedure proceeds in small steps, where the new occupied trial orbitals are selected 

based on a criterion of overlap with the previous set. This technique ensures stability and avoids 

switching of orbital occupation. The step is found by a univariate search 11 in the energy, on a path 

that passes through the point corresponding to the next iteration step of the classical procedure. 

Their scheme can therefore also be seen as a polynomial interpolation along a path joining 

successive SCF cycles. Half a decade later, Camp and King followed the same strategy of a 

univariant cubic fit technique 12 , but with a different parameterization. Stanton also suggested a 

similar approach 13 , but whereas the Seeger-Pople approach requires the evaluation of the Fock 

matrix at interior points on the interpolative path, Stanton’s scheme uses a cubic interpolation, 

where only the end point properties are needed, making it a less expensive method. 

Another way of improving the convergence properties is to evaluate the gradient and Hessian of the 

electronic energy analytically with respect to some variational parameter, and then optimize the 

energy through Newton-Raphson steps resulting in a quadratically convergent 14 scheme, at least in 

the region close to the optimized state where a second order approximation is reasonable. These 

methods are computationally very expensive since a four index transformation is required to obtain 

the Hessian information. In 1981 Bacskay proposed a quadratically convergent SCF (QC-SCF) 

method 15 which escapes the four index transformation while requiring four or five micro iterations 

6


per step (in non-problematic cases), each of which is about as expensive computationally as 

building a Fock matrix. His method was inspired from single excitation configuration interaction 

(SX-CI) and multi-configurational SCF (MC-SCF). A possible divergence of the scheme can be 

overcome by moderating the orbital update step by the augmented Hessian method 16 or trust radius 

techniques 17 . Even though it is still quite expensive, the method is also used today for cases with 

convergence problems, since a decrease in energy can be ensured step by step and it has quadratic 

convergence properties near the optimized state. 

Around 1995, the interest for linear scaling SCF methods took on, since the development in 

computer hardware had made calculations on large molecules possible. With newly developed 

algorithms the evaluation of the Fock matrix, with the formal scaling of N 4 arising from the fourindex 

integrals, could now routinely be decreased to a near-linear scaling. The diagonalization with 

a N 3 scaling in standard Roothaan-Hall was now the bottle neck. Inspiration was found in tight 

binding theory 18-20 , where a number of linear scaling approaches had been suggested earlier 21 . To 

obtain linear scaling of the RH step it is necessary to avoid the diagonalization and to ensure 

sparsity in the matrices. This is a problem since the convenient canonical MO basis is inherently 

delocalized. Some of the well known schemes were reformulated in localized MOs 22 , while others 

developed strict AO formulations 20,23-25 . Most of the suggested linear scaling methods did not arise 

so much to improve convergence as to improve the scaling, and will therefore not be discussed in 

further detail. 

Very recently Francisco, Martínez and Martínez introduced their globally convergent trust region 

methods for SCF 26 , where the standard fixed-point Roothaan-Hall step is replaced by a trust region 

optimization of a model energy function. This algorithm has very nice features since it can be 

proved to be globally convergent, and the step sizes are controlled dynamically through a trust 

region update scheme. The convergence rate seems rather random though; sometimes perfect and 

sometimes hopeless, but only small test examples have been published, so time will show. 

1.3.2 Damping and Extrapolation 

In his SCF study of atoms, Hartree noted convergence difficulties and suggested a so-called 

damping scheme 27 as a modification to the iterative procedure. Instead of using the newly 

constructed density D n+1 , which corresponds to a full step, a linear combination of the new density 

matrix with the previous one is constructed 

damp 

Dn+ 1 

= Dn + λ( Dn+ 1 − Dn ) = λDn+ 

1 + ( 1 −λ) 

D n , (1.9) 

7

Part 1 


where λ – the damping factor - is a scalar chosen between zero and one. The iterative sequence is 

then continued with D damp as the new density. Hartree found that this scheme could force 

convergence in problematic cases. 

To get an idea of the effect of the damping factor, we consider a block-diagonal Fock matrix in the 

MO basis 

F 

MO 

⎛ εo 

Fov 

⎞ 

= ⎜ ⎟ , (1.10) 

⎝Fvo 

εv 

⎠ 

where ‘o’ denotes occupied, ‘v’ virtual and [ε o ] ij = δ ij ε i and [ε v ] ab = δ ab ε a . The change in electronic 

energy from the first order variation of the occupied orbitals through first-order perturbation theory 

is then given as 

virtual occupied 2 

( 1) 

−Fai 

SCF 

4 

a i 

εa 

− εi 

∆ E = 

∑ ∑ . (1.11) 

( ) 

If this first order term is negative and sufficiently small such that the higher order contributions are 

insignificant, then a decrease in the electronic energy is seen. If the MOs obey the aufbau principle, 

then all ε i < ε a and it is clear that the term is negative as desired. The Hartree damping of Eq. (1.9) 

roughly corresponds to multiplying the numerator of Eq. (1.11) by the factor λ, which is positive 

and less than one 


( 1) 

−λFai 

SCF 

4 

a i 

εa 

− εi 

∆ E = 

∑ ∑ , (1.12) 

( ) 

thus giving the opportunity to obtain a negative first order change of arbitrarily small magnitude, 

making the higher order terms insignificant. Though this would seem promising, the aufbau 

principle is seldom obeyed all through the optimization. 

If λ could be freely chosen, the damping technique would lead to an extrapolation scheme in the 

densities. Since SCF generates an iterative sequence where each step only depends upon the 

preceding, it was natural to apply the mathematical extrapolation methods (e.g. the Aitken 

extrapolation 28 procedures) on SCF to improve in particular the convergence rate close to the 

minimum. When the individual MO expansion coefficients are chosen as the extrapolated 

parameters, as Winter and Dunning Jr. 29 suggested, unphysical result may be obtained, though they 

can be corrected at the end of the calculation. Nielsen used instead the density matrix as the 

extrapolated parameter 30 and an eigenvalue extrapolation instead of the Aitken method. This led to a 

scheme more similar to Hartree damping, but with λ found within the eigenvalue extrapolation 

scheme. 

8


Different approaches have been taken to dynamically find the damping factor λ. Zerner and 

Hehenberger 31 found it based on an extrapolation of the Mulliken gross population. Karlström 32 

expressed the electronic energy in the damped density E(D damp ) and used the first derivative with 

respect to λ, to choose in each iteration the λ that minimized the electronic energy. 

None of these schemes were very successful solving the convergence problems. They all had some 

particular problematic cases they could handle better than the predecessors, but in general they did 

not catch on. Pulay then suggested in the early 1980s to use the norm of a linear combination of 

error vectors e i from the individual iterations, where the vanishing of the error vector is a necessary 

and sufficient condition for SCF convergence. The norm is then optimized with respect to the 

coefficients c i 

n 

e ( c) 

= ∑ ciei 

, (1.13) 

where n is the number of previous iterations, and the coefficients are restricted to add up to 1 

n 

i= 

1 

i= 

1 

∑ ci 

= 1. (1.14) 

The resulting coefficients are used to construct a favorable linear combination of the previous Fock 

matrices 

n 

F = ∑ ciF i , (1.15) 

i= 

1 

which is diagonalized to obtain a new density, and so the iterative procedure is reestablished. This 

was the first density subspace minimization scheme that deliberately exploited the information 

obtained in the previous iterations and he named the approach DIIS 33 for “Direct Inversion in the 

Iterative Subspace”. For the special case of two matrices, the DIIS density corresponds to the 

damped density of Eq. (1.9), but with no restrictions on λ. A decade later the DIIS algorithm was a 

standard option in most ab initio programs and had effectively solved a number of the convergence 

problems. The orbital rotation gradient was typically used as the error vector for wave function 

optimizations, and Sellers pointed out 34 that the DIIS algorithm exploits the second-order 

information contained in a set of gradients to obtain quadratic convergence behavior. Some 

numerical problems were seen though, where numerical instabilities appeared because of linear 

dependencies in the space of error vectors. Sellers introduced the C2-DIIS method 34 , which is 

similar to DIIS except the restriction is on the squares of the coefficients 

n 

2 

∑ ci 

= 1 , (1.16) 

i= 

1 

9

Part 1 


with a renormalization at the end. This gives an eigenvalue problem to be solved instead of the set 

of linear equations in normal DIIS, and thus singularities are more easily handled. However, one of 

the examples (Pd 2 in the Hyla-Kripsin basis set 35 ) given in ref. 34 , where DIIS supposedly diverges, 

converges for our plain DIIS implementation to 10 -7 in the energy in 14 iterations. 

Even though DIIS is successful, examples of divergence with no relation to numerical instabilities 

have been encountered over the years. In the year 2000 Cancès and Le Bris presented a damping 

algorithm named the Optimal damping Algorithm 36 (ODA) that ensures a decrease in energy at each 

iteration and converges toward a solution to the HF equations. In ODA the damping factor λ is 

found based on the minimum of the Hartree-Fock energy for the damped density in Eq. (1.9) 

E 

damp 

( Dn+ 

1 

, λ) = E ( Dn ) + 2λTrF( Dn )( Dn+ 

−Dn 

) 

HF HF 1 

2 

+ λ Tr ( D −D ) G( D − D ) + h , 

n+ 1 n n+ 

1 n nuc 

(1.17) 

much like Karlström did it in 1979. The damping factor is thus optimized in each iteration, hence 

the name of the algorithm. 

Recently Kudin, Scuseria, and Cancès proposed a method in which the gradient-norm minimization 

in DIIS is replace by a minimization of an approximation to the true energy function and they 

named it the energy DIIS (EDIIS) method 37 . Where the ODA used the energy expression of Eq. 

(1.17) to find the optimal λ, EDIIS uses an approximation of the Hartree-Fock energy for the 

averaged density 

n 

EDIIS 1 

n 

D = ∑ ciD i , (1.18) 

i= 

1 

( , ) = ∑ i SCF ( i ) − 

2 ∑ i j Tr( ( i − j ) ⋅( i − j )) 

i= 1 i, j= 

1 

n 

E Dc c E D c c F F D D , (1.19) 

where the sum of the coefficients c i is still restricted to 1. They combine the scheme with DIIS, such 

that the EDIIS optimized coefficients are used to construct the averaged Fock matrix if all 

coefficients fall between 0 and 1. If not, the coefficients from the DIIS scheme are used instead. The 

EDIIS scheme introduces some Hessian information not found in DIIS and thus improves 

convergence in cases where the start guess has a Hessian structure far from the optimized one. For 

non-problematic cases and near the optimized state EDIIS has a slower convergence rate than DIIS, 

but it has been demonstrated that EDIIS can converge cases where DIIS diverges. 

Recently, we suggested another subspace minimization algorithm along the same line as EDIIS, but 

with a smaller idempotency error in the energy model and the same orbital rotation gradient in the 

subspace as the SCF energy (the EDIIS energy model actually has a different gradient). We named 

it TRDSM 38 for trust region density subspace minimization since a trust region optimization is 

10


carried out of the energy model in the subspace of previous densities. In the second paper on 

TRDSM 39 , a comparison with the EDIIS and DIIS models can be found stating explicitly that the 

EDIIS energy model does not have the correct gradient and is wrong for other reasons as well at the 

DFT level of theory. 

Many of the energy minimization techniques can be combined with a damping or extrapolation 

scheme to improve the convergence. Typically, DIIS has been the choice 24,40,41 , but TRDSM could 

be used just as well. 

1.3.3 Level Shifting 

In 1973 Saunders and Hillier introduced the level shift concept 42 . They suggested adding a positive 

scalar µ to the diagonal of the virtual-virtual block of the Fock matrix in the MO basis, Eq. (1.10), 

before diagonalizing 

MO 

MO 

( µ ( ) ) 

F + I− D C = Cε , (1.20) 

where I is the identity matrix and D MO is the scaled one-electron density matrix in the MO basis 

with 1 in the diagonal of the occupied-occupied block and zeros for the rest. 

To compare level shifting with the damping scheme of Hartree 27 , consider the first order variation in 

the energy change as in Eq. (1.11); the level shift µ then corresponds to adding a positive constant to 

the denominator 


( 1) 

−Fai 

SCF 

4 

a i a i 

∆ E = 

∑ ∑ . (1.21) 

( ε − ε + µ ) 

The level shift thus has, as the damping factor, the possibility to decrease the magnitude of the term. 

The problems with respect to the aufbau principle mentioned in connection with the damping can be 

overcome with the level shift. The level shift can separate the occupied orbitals from the virtuals 

and thereby ensure a positive denominator and an overall decrease in energy. As the level shift is 

increased towards infinity, the obtained decrease in energy will correspond to that of the steepest 

descent method as explained in Section 1.4.1.4, and thus the convergence will be slow. This 

connection between a large gap between the occupied and the virtual orbitals (HOMO-LUMO gap) 

and slow convergence was exploited by Bhattacharyya in 1978 to accelerate convergence for cases 

with large HOMO-LUMO gaps. His “reverse level shift” technique 43 uses a negative level shift 

instead of a positive, thus decreasing the gap and accelerating the convergence. 

In 1977, Carbó, Hernández and Sanz claimed unconditional convergence for an SCF process with a 

properly used level shift 44 , and two decades later, Cancès and Le Bris 45 made a formal proof that for 

11

Part 1 


any initial guess D 0 , there exists a level shift µ 0 > 0 such that for level shift parameters µ > µ 0 , the 

energy decreases at each step and converges towards a stationary value. 

The level shift technique is still routinely used for cases where the DIIS scheme has problems. The 

level shifts are typically found on a trial and error basis. Recently, we advocated the use of a level 

shift to control the changes introduced in the Roothaan-Hall step 38 , and we suggested a way of 

optimizing the level shift at each iteration based on physical arguments and without guesswork. The 

algorithm is based on the trust region philosophy in which a model energy function is optimized, 

but restricted with respect to the step length. We thus named the algorithm trust region Roothaan- 

Hall (TRRH), even though it is not a true trust region optimization scheme like e.g. the energy 

minimization of Francisco, Martínez, and Martínez 26 or our TRDSM scheme 38 . 

Level shifting can be combined with a damping or extrapolation scheme. When the TRRH approach 

is combined with the subspace minimization method TRDSM it seems to outperform DIIS in 

stability and to have a better or similar convergence rate, as will be illustrated in the following 

sections. Combining level shifting with DIIS can occasionally be a benefit, but typically DIIS and 

level-shifting does not work well together, and in Section 1.4.1.3 we will try to justify this. 

1.4 Development of SCF Optimization Algorithms 

The SCF scheme as it typically looks today is sketched in Fig. 1.2. Compared to Fig. 1.1, the step 

is inserted, illustrating a density subspace minimization, where 

some function f is minimized with respect to the coefficients c i 

which expand the previous densities D i . The function f could 

be the gradient norm as in DIIS or some energy model 

D 0 

F(D n 

) 

n 

approximating the SCF energy in the subspace of the previous 

D = ∑ciDi,minf 

( c) 

densities as in EDIIS and TRDSM. In the Roothaan-Hall step 

i= 

1 

 

, the averaged Fock matrix F found from the optimization in 

n 

n = n+1 F = 

is then used instead of the most recent Fock matrix F(D n ) to 

∑ciF( Di) 

i= 

1 

find a new trial density D n+1 . In general, the averaged density 

matrix D is not idempotent and therefore does not represent a 

valid density matrix; moreover, since the Kohn-Sham matrix 

F D n+1 

(unlike the Fock matrix) is nonlinear in the density matrix, the 

averaged Kohn-Sham matrix F is different from FD. ( ) For 

these reasons, the averaged Fock matrix F cannot be 

no 

D n+1 

≈ D n 

yes 

D conv 

associated uniquely with a valid Fock matrix. Usually, this 

Fig. 1.2 Flow diagram of the SCF 

does not matter much since the subsequent diagonalization of scheme including the density 

the Fock matrix nevertheless produces a valid density matrix subspace minimization step. 

12

Development of SCF Optimization Algorithms 

according to Eq. (1.8). The complications arising from the use of the averaged Fock matrix is 

disregarded in the following, noting that the errors introduced by this approach may easily be 

corrected for, if necessary. 

The rest of this part of the thesis will focus on the work we have done over the last couple of years 

to improve SCF convergence. We have made developments in all of the three categories of the 

previous section. The density subspace minimization scheme TRDSM and the level shift scheme in 

TRRH, both briefly described in the previous section, make up a total scheme we have named 

TRSCF, where each SCF iteration contains a TRDSM and a TRRH step. The first subsection will 

go into further detail on TRRH and will thus be concerned with our modifications to step in Fig. 

1.2. The second subsection will likewise go into further detail on TRDSM and will describe the 

scheme we apply in step . In the third subsection, a recently developed energy minimization 

procedure will be presented. The procedure merges step and integrating a subspace 

minimization in the optimization of a new trial density. 

This section will primarily take the Hartree-Fock point of view, acknowledging that with small 

adjustments and the word Fock replaced by Kohn-Sham, it would describe the DFT situation as 

well. In Section 1.5 the differences appearing when the algorithms are applied to the HF and DFT 

cases, respectively, will be discussed. 

1.4.1 Dynamically Level Shifted Roothaan-Hall 

The problems inherent to the RH diagonalization method are the discontinuous changes in the 

density and the lack of guarantees for energy decrease. To overcome these problems, we introduced 

in 2004 a means to restrict the RH step to the trust region of the RH energy model, with the purpose 

of both controlling the changes in the density and ensuring an energy decrease. Since then, the same 

ideas have been put forward by Francisco et. al. 26 as well, suggesting a trust region optimization of 

a RH energy model. 

In this section, our trust region Roothaan-Hall scheme and related subjects are discussed. In 

particular, we present two different schemes for dynamic level shifting and an alternative to 

diagonalization. 

1.4.1.1 RH Step with Control of Density Change 

The solution of the traditional Roothaan–Hall eigenvalue problem Eq. (1.6) may be regarded as the 

minimization of the sum of the energies of the occupied MOs 8,46 

RH 

subject to MO orthonormality constraints 

E 

∑ 

( D) = 2 ε = 2TrF D (1.22) 

i 

i 

0 

13

Part 1 


T 

occ occ = N 

C SC I , (1.23) 

where F 0 is typically obtained as a weighted sum of the previous Fock matrices such as F in Eq. 

(1.15). Since Eq. (1.22) represents a crude model of the true Hartree-Fock energy (with the same 

first-order term, but different zero- and second-order terms), it has a rather small trust radius. A 

global minimization of E RH (D), as accomplished by the solution of the Roothaan–Hall eigenvalue 

problem Eq. (1.6), may therefore easily lead to steps that are longer than the trust radius and hence 

unreliable. To avoid such steps, we shall impose on the optimization of Eq. (1.22) the constraint that 

the new density matrix D does not differ much from the old D 0 , that is, the S-norm of the density 

difference should be equal to a small number ∆ 

2 

2 

D− D0 S 

= Tr ( D−D0 ) S( D− D0 ) S = − 2Tr D0SDS + N = ∆, (1.24) 

where N is the number of electrons – see Eq. (1.2) – and the S-norm used throughout this thesis is 

defined as 

2 

S 

A = Tr ASAS (1.25) 

for symmetric A. The optimization of Eq. (1.22) subject to the constraints Eq. (1.23) and Eq. (1.24) 

may be carried out by introducing the Lagrangian 

1 

T 

L = 2TrFD 0 −2µ 

( TrDSDS 0 − ( N −∆) 

) −2Trη( CoccSCocc 

−I N ) , (1.26) 

2 

where µ is the undetermined multiplier associated with the constraint Eq. (1.24), whereas the 

symmetric matrix η contains the multipliers associated with the MO orthonormality constraints. 

Differentiating this Lagrangian with respect to the MO coefficients and setting the result equal to 

zero, we arrive at the level-shifted Roothaan–Hall equations: 

( F − µ SD S) C ( µ ) = SC ( µ ) λ ( µ ). (1.27) 

0 0 occ occ 

Since the density matrix, Eq. (1.8), is invariant to unitary transformations among the occupied MOs 

in C occ ( µ ), we may transform this eigenvalue problem to the canonical basis: 

( F − µ SD S) C ( µ ) = SC ( µ ) ε ( µ ) , (1.28) 

0 0 occ occ 

where the diagonal matrix ε(µ) contains the orbital energies. Note that, since D 0 S projects onto the 

part of C occ that is occupied in D 0 (see ref. 46 ), the level-shift parameter µ shifts only the energies of 

the occupied MOs. Therefore, the role of µ is to modify the difference between the energies of the 

occupied and virtual MOs - in particular, the HOMO–LUMO gap. 

Clearly, the success of the trust region Roothaan–Hall (TRRH) method will depend on our ability to 

make a judicious choice of the level-shift parameter µ in Eq. (1.28). In our standard TRRH 

implementation, we determine µ by requiring that D(µ) does not differ much from D 0 in the sense of 

2 

14


Eq. (1.24), thereby ensuring a continuous and controlled development of the density matrix from the 

initial guess to the converged one. 

1.4.1.2 The Trust Region RH Level Shift 

The constraint on the change in the AO density Eq. (1.24) refers to a change which may arise not 

only from small changes in many MOs but also from large changes in a few MOs or even in a 

single MO. To obtain a high level of control, we shall require that the changes in the individual 

new 

MOs are all small. Expanding the MOs ϕ i , obtained by diagonalization of Eq. (1.28), in the old 

MOs, we obtain 

occ 

virt 

new old new old old new old 

i = j i j + a i a 

j 

a 

∑ ∑ , (1.29) 

ϕ ϕ ϕ ϕ ϕ ϕ ϕ 

where the first summation is over the occupied MOs and the second over the virtual MOs. The 

new 

squared norm of the projection of ϕ i onto the MO space associated with D 0 is therefore 

orb old new 

i j i 

j 

2 

a = ∑ ϕ ϕ . (1.30) 

To ensure small individual MO changes in each iteration (to within a unitary transformation of the 

occupied MOs), we shall therefore require 

orb orb orb 

min 

min i 

i 

min 

a = a ≥ A , (1.31) 

orb 

where Amin 

is close to one (0.98 or 0.975 in practice). This way of controlling the changes in the 

density was also used by Seeger and Pople in their steepest descent method 11 . 

To illustrate how this scheme is used in practice, detailed 

information from the TRRH step in iteration 7 of a HF/6-31G and 

an LDA/6-31G calculation on the zinc complex depicted in Fig. 

1.3 is displayed in Fig. 1.4 and Fig. 1.5, respectively. In the upper 

orb orb 

panels is illustrated how a search for amin 

= Amin 

determines the 

optimal level shift µ for the TRRH step. The TRRH energy model 

is more accurate for HF than for DFT (see Section 1.5.1), and 

consequently larger changes can be handled in the TRRH step for Fig. 1.3 Zn 2+ in complex with 

orb 

ethylenediamine-N,N'-disuccinic 

HF than for DFT. A 

min 

is thus set to 0.975 for HF and 0.98 for 

acid (EDDS). 

DFT. In the lower panels is seen that the chosen level shifts avoid 

an increase in the energy which would have been the case if the Roothaan-Hall step was not level 

shifted (µ = 0). Notice also that an even lower energy would have been obtained by reducing the 

level shift, but then the restrictions on the overlap should be loosened, and this would result in 

15

Part 1 


energy increase in other iterations. In short, the identification of µ from the overlap requirement 

a 

orb 

min 

orb 

min 

= A appears to be a good and secure way to control the step sizes in the optimization. 

orb 

a min 

1.0 

0.8 

orb 

A min = 0.975 

orb 

a min 

1.0 

0.8 

orb 

A min = 0.98 

0.6 

0.6 

0.4 

0.2 

0.0 

A 

0 2 4 6 8 10 

µ 

0.4 

0.2 

0.0 

A 

0 2 4 6 8 10 

µ 

40.0 

20.0 

RH 

∆E HF 

40.0 

20.0 

RH 

∆E LDA 

∆E / a.u. 

0.0 

-20.0 

-40.0 

RH 

∆E 

0 2 4 6 8 10 

µ 

Fig. 1.4 HF/6-31G, iteration 7. (A) The overlap 

orb 

RH 

a 

min 

and (B) the changes in the HF energy ∆ E HF 

RH 

and in the RH energy model ∆ E as a function of 

the level shift µ. 

B 

∆E / a.u. 

0.0 

-20.0 

-40.0 

∆E RH 

0 2 4 6 8 10 

µ 

Fig. 1.5 LDA/6-31G, iteration 7. (A) The overlap 

orb 

a 

min 

and (B) the changes in the LDA energy 

RH 

RH 

∆ E LDA 

and in the RH energy model ∆ E as a 

function of the level shift µ. 

B 

1.4.1.3 DIIS and Dynamically Level Shifted RH 

For accelerating the SCF convergence, DIIS is a simple and in general very successful scheme. We 

would expect to get an even better performance and improve the stability of the scheme if DIIS was 

combined with a dynamically level shifted RH step like TRRH instead of the standard RH with no 

control of the step. To investigate how a combination of DIIS and TRRH performs, we carried out a 

number of DIIS-TRRH optimizations. A typical example is seen in Fig. 1.7 and an extraordinary 

example is seen in Fig. 1.8. 

Fig. 1.6 Cd 2+ complexed with an 

imidazole ring. 

16


Error in energy / E h 

1.E+02 

1.E+00 

1.E-02 

1.E-04 

1.E-06 

1.E-08 

DIIS 

DIIS-TRRH 

TRSCF 

0 5 10 15 20 25 

Iteration 

Fig. 1.7 LDA/STO-3G calculations with a H1-core 

start guess on the cadmium complex in Fig. 1.6. 


1.E+02 

1.E+00 

1.E-02 

1.E-04 

1.E-06 

TRSCF 

DIIS-TRRH 

DIIS 

0 5 10 15 20 25 30 

Iteration 

Fig. 1.8 LDA/STO-3G calculations with a Hückel 

start guess on the zinc complex in Fig. 1.3. 

Somewhat surprisingly the calculations rarely converge with the DIIS-TRRH method. To 

understand this behavior, we note that, in the global region, the TRRH method typically produces 

gradients that do not change much, even though large changes may occur in the energy. In such 

cases, the DIIS method may stall, not being able to identify a good combination of density matrices. 

This behavior is illustrated in Table 1-1, where the gradient norm and Kohn–Sham energy of the 

first six iterations of the cadmium complex calculations in Fig. 1.7 are listed. 

Table 1-1. The Gradient norm ||g||=||4(SDF-FDS)|| in the first six 

iterations of the cadmium complex calculations of Fig. 1.7. 

DIIS DIIS-TRRH TRSCF 

It. E KS ||g|| E KS ||g|| E KS ||g|| 

1 -5597.0 7.8 -5597.0 7.8 -5597.0 7.8 

2 -5502.3 14.9 -5598.4 7.2 -5598.3 7.1 

3 -5602.1 9.7 -5600.3 8.5 -5603.7 9.3 

4 -5628.5 2.1 -5599.9 7.7 -5611.1 9.1 

5 -5627.4 3.5 -5599.9 7.8 -5616.8 7.7 

6 -5628.8 0.8 -5600.2 8.1 -5622.7 7.5 

conv no conv conv 

The TRSCF and DIIS-TRRH gradients stay almost the same during these iterations, stalling the 

DIIS-TRRH optimization but not the TRSCF optimization, whose energy decreases in each 

iteration. In the pure DIIS optimization, by contrast, the gradient changes significantly from 

iteration to iteration; at the same time, the energy decreases at each iteration except the second and 

fifth, where also the gradient norms increase. Eventually, DIIS enters the local region with its rapid 

rate of convergence although we note a sudden, large increase in the energy in iterations 10 and 11. 

However, these changes are accompanied with large increases in the gradient norm, allowing DIIS 

to recover safely. 

17

Part 1 


In the example Fig. 1.8 standard DIIS diverges. TRSCF converges, but a minimum level shift of 0.1 

is used all through the calculation. When DIIS is combined with TRRH in this case, also using a 

minimum level shift of 0.1, it converges as well as TRSCF. Table 1-2 contains the gradient norm 

and Kohn-Sham energy of the first six iterations of the calculations in Fig. 1.8. 

Table 1-2. The gradient norm ||g||=||4(SDF-FDS)|| in the first six 

iterations of the zinc complex calculations of Fig. 1.8. 


It. E KS ||g|| E KS ||g|| E KS ||g|| 

1 -2826.95 11.6 -2826.95 11.6 -2826.95 11.6 

2 -2745.49 24.0 -2830.11 3.3 -2830.06 3.4 

3 -2809.38 13.6 -2831.04 1.6 -2831.11 1.5 

4 -2819.16 9.7 -2831.44 0.8 -2831.42 1.1 

5 -2776.74 15.4 -2831.34 1.5 -2831.40 1.5 

6 -2826.55 7.0 -2831.41 1.5 -2831.47 0.9 

no conv conv conv 

In this case the gradient norms for the TRSCF calculation change significantly and a decrease in 

gradient relates directly to a decrease in the energy, where in the first example there were no direct 

connection between the gradient norm and the energy. The DIIS-TRRH calculation follows the 

same gradient behavior as TRSCF, just as in the first example, and they both converge. The DIIS 

gradient norm changes, but does not decrease as in the first example. There is still the connection 

between small gradients and low energies though, so why DIIS cannot find the proper directions in 

this case is not evident. 

In our experience DIIS should not be used in connection with a dynamic level shift scheme like 

TRRH, since for all but the simplest cases DIIS-TRRH diverged if DIIS converged. We 

encountered, however, the example in Fig. 1.8 where DIIS does not converge and DIIS-TRRH does, 

but it was the exception. 

1.4.1.4 Line Search TRRH 

In view of the relative crudeness of the E RH (D) model, a more robust approach for choosing the 

level shift µ than the one presented in Section 1.4.1.2 consists of performing a line search along the 

RH 

path defined by µ to obtain the minimum of the energy E SCF ( D ( µ )). Strictly speaking, this 

optimization is not a line search but rather a univariate search. A univariate search has previously 

been used by Seeger and Pople 11 to stabilize convergence of the RH procedure. 

For µ → ∞ Eq. (1.28) becomes equivalent to solving the eigenvalue equation 

0 0 

0 occ = occ 

SD SC SC η , (1.32) 

18


where η has eigenvalues 1 for the set of orbitals that are occupied in D 0 and eigenvalues 0 for the 

set of virtual orbitals. Eq. (1.32) thus effectively divides the molecular orbitals into a set that is 

occupied and a set that is unoccupied. If D 0 is idempotent, it can be reconstructed from the occupied 

0 

set of eigenvectors C occ . If D 0 is not idempotent, a purification of D 0 is obtained 

( ) T 

occ 

idem 0 0 

0 

= occ 

D C C . (1.33) 

Since F 0 is the gradient of E(D 0 ), the step from Eq. (1.28) corresponding to a large µ is in the 

steepest descent direction, and will therefore give a decrease in the Hartree-Fock energy compared 

to the energy at D 0 . Thus a µ exists for which the energy decreases and a line search can then find 

the µ leading to the largest decrease in the energy. Using the same example as in Section 1.4.1.2, 

Fig. 1.9 and Fig. 1.10 illustrate how the optimal µ is chosen for the line search TRRH (TRRH-LS) 

algorithm. A simple search in the energy change for the RH step is carried out, where the energy 

change is found as 

( ) SCF ( ) 

RH 

idem 

∆ E ( µ ) = E D( µ ) − E D , (1.34) 

SCF SCF 

0 

and the µ leading to the largest decrease in energy is chosen as marked on the figures. 

40.0 

20.0 

RH 

∆E HF 

40.0 

20.0 

∆E / a.u. 

0.0 

-20.0 

-40.0 

RH 

∆E 

0 2 4 µ 6 8 10 

Fig. 1.9 HF/6-31G, iteration 7. The changes in the 

RH 

HF energy ∆ E HF 

and in the RH energy model 

RH 

∆ E as a function of the level shift µ. 

∆E / a.u. 

0.0 

-20.0 

-40.0 

RH 

∆E LDA 

∆E RH 

0 2 4 µ 6 8 10 

Fig. 1.10 LDA/6-31G, iteration 7. The changes in 

RH 

the LDA energy ∆ E LDA 

and in the RH energy 

RH 

model ∆ E as a function of the level shift µ. 

The TRRH-LS algorithm thus ensures an energy decrease in the RH step, but is of course much 

more expensive than the standard method, requiring the repeated construction of the Fock matrix for 

a single RH step. However, the first derivative dE 

SCF dµ can be evaluated from the Fock matrix, 

RH 

and a cubic spline interpolation can thus be made from only two points on the ∆ E SCF 

curve. 

1.4.1.5 Optimal Level Shift without MO Information 

As seen from Eq. (1.29) the individual MOs are used to find a suitable level shift in the TRRH 

scheme. We are very much aware that this is the most import point to improve on in our scheme. To 

obtain this MO information, the cubically scaling diagonalization of the Fock matrix is necessary, 

19

Part 1 


and furthermore the MO coefficient matrices C are inherently non-sparse. Several linear or nearlinear 

scaling alternatives to diagonalization have been suggested in the literature 18-20 . These 

methods could be reformulated with a dynamical level shift scheme like ours if the scheme could do 

without the MO information, but it is not an easy task to find a good dynamic level shift scheme 

with a high level of control without the knowledge of the developments in the individual MOs. The 

search used to find the level shift in TRRH-LS is directly applicable since it is not dependent on the 

MO information; the problem is only the number of Fock evaluations. The Fock evaluation is still 

expensive even though algorithms which make the evaluation of the Fock matrix cheaper are 

continually developed. 

This section describes a very recently developed approach to find the optimal level shift in the 

TRRH step without the use of individual MOs or knowledge of the HOMO-LUMO gap. So far it 

has proven to be the most successful level shift scheme we have studied. The scheme is build on the 

assumption that the TRRH step is taken in connection with a TRDSM step (or some other density 

subspace minimization method). In this case it can be exploited that TRDSM is a very good energy 

model (see Section 1.4.2.2) and can be trusted with the responsibility to find the best direction as 

long as not too much new information is introduced to the density subspace in each step. 

A new density, found by diagonalization of a level shifted Fock matrix or by some alternative, can 

be split in a part D ⊥ 

that can be described in the previous densities and a part D with new 

information orthogonal to the existing subspace 

D can be expanded in the previous densities as 

⊥ 

D( µ ) = D + D . (1.35) 

n 

 

D = ∑ωiDi 

, (1.36) 

i= 

1 

where n is the number of previously stored densities D i and the expansion coefficients ω i are 

dependent on µ and determined in a least-squares manner 

n 

−1 

ω i ( µ ) = ∑ ⎡⎣M ⎤⎦ 

Tr D jSD( µ ) S, Mij = Tr DiSD jS . (1.37) 

j= 

1 

ij 

⊥ 

It is obvious that when µ → ∞ then D → 0 since the new density then approaches the initial 

density D 0 , see Eq. (1.32) and (1.33), which belongs to the set of previous densities. Thus, there is a 

⊥ 

connection between D and µ which we can exploit. If the ratio d orth ⊥ 2 

of the square norm D 

S 

2 

relative to D 

S 

is small, only small changes to the density subspace are introduced; 

20


d 

orth 

⊥ 2 

S 

2 

S 

D 

⊥ ⊥ 

Tr D SD S 

= = < δ , (1.38) 

D Tr DSDS 

⊥ 

where δ is some small number and D can be found as D ⊥ = D− 

D . To illustrate how this is used 

in a dynamic level shift scheme, the examples from the previous sections are again seen in Fig. 1.11 

and Fig. 1.12. 

In the rest of the thesis the level shift scheme described in Section 1.4.1.2 will be referred to as the 

C-shift scheme since it involves the eigenvectors C from the diagonalization of the Fock matrix, 

and the level shift scheme described in this section will be referred to as the d orth -shift scheme. If 

nothing is mentioned about the level shift scheme, the C-shift is implied. 

1.0 

0.8 

A 

1.0 

0.8 

A 

d orth 

0.6 

0.4 

d orth 

0.6 

0.4 

0.2 

δ = 0.08 

0.2 

δ = 0.03 

0.0 

0 2 4 6 8 10 

µ 

0.0 

0 2 4 6 8 10 

µ 

40.0 

20.0 

RH 

∆E HF 

B 

40.0 

20.0 

RH 

∆E LDA 

B 

∆E / a.u. 

0.0 

-20.0 

-40.0 

RH 

∆E 

0 2 4 µ 6 8 10 

Fig. 1.11 HF/6-31G iteration 7. (A) The ratio d orth 

RH 

and (B) the changes in the HF energy ∆ E HF 

and in 

RH 

the RH energy model ∆ E as a function of the 

level shift µ. 

∆E / a.u. 

0.0 

-20.0 

-40.0 

RH 

∆E 

0 2 4 µ 6 8 10 

Fig. 1.12 LDA/6-31G iteration 7. (A) The ratio d orth 

RH 

and (B) the changes in the LDA energy ∆ E LDA 

and 

RH 

in the RH energy model ∆ E as a function of the 

level shift µ. 

The upper panels now display the search made in d orth , and it is clearly seen that d orth → 0 for µ → ∞ 

as expected, and increases for µ → 0. As for the C-shift scheme we can allow larger changes in the 

HF method than in DFT, and thus δ is set to 0.08 for HF and 0.03 for DFT. In the lower panels are 

seen that this level shift avoids an increase in the energy just as the C-shift scheme, but the level 

shift chosen here is closer to the optimal line search level shift, and thus leads to a larger decrease in 

the energy than was the case for the C-shift scheme. 

21

Part 1 


In the C-shift scheme seen in Eq. (1.31) the changes introduced are controlled compared to the 

previous density, whereas in the d orth -shift scheme the changes are controlled compared to the 

subspace of all the previous densities. This scheme is thus less restrictive than the C-shift scheme, 

but it seems that the C-shift scheme is too restrictive, ignoring the stability gained from the 

subspace information. To compare the overall effect of the two level shift schemes on the SCF 

convergence, calculations are given in Fig. 1.13 and Fig. 1.14, for HF and LDA, respectively. The 

HF calculations are on CrC with bond distance 2.00Å in the STO-3G basis and the LDA 

calculations are on the zinc complex seen in Fig. 1.3 in the 6-31G basis, both cases for which DIIS 

diverges. The starting orbitals have been obtained by diagonalization of the one-electron 

Hamiltonian (H1-core start guess). 


1.E+02 

1.E+00 

1.E-02 

1.E-04 

1.E-06 

TRSCF 

d orth -shift 

DIIS 

TRSCF 

C-shift 


1.E+02 

1.E+00 

1.E-02 

1.E-04 

1.E-06 

TRSCF 

d orth -shift 

DIIS 

TRSCF 

C-shift 

1.E-08 

0 4 8 12 16 

Iteration 

Fig. 1.13 SCF convergence for HF/STO-3G calculations 

on CrC. 

1.E-08 

0 8 16 24 32 

Iteration 

Fig. 1.14 SCF convergence for LDA/6-31G calculations 

on the zinc complex in Fig. 1.3. 

The only difference in the “TRSCF/d orth -shift” and the “TRSCF/C-shift” optimizations is the way 

the level shift is found in the TRRH step. Since DIIS diverges, the examples display the stability of 

the TRSCF algorithm, and the ability of the two level shifting schemes to handle problematic cases. 

In all examples studied so far, both problematic and simple, the d orth -shift has proven as good as or 

better than the C-shift. The cost of the level shift search process is similar in the two schemes; the 

matrix M in Eq. (1.37) is updated in each iteration as a part of TRDSM and is then reused for the 

d orth -shift scheme in TRRH. 

In Table 1-3 The SCF energy change in each iteration is divided in the part of the change obtained 

from the RH and DSM step, respectively, and it is seen how the RH step is now allowed to accept 

larger changes in the density, but still in a controlled manner, thus leading to larger decreases in the 

energy and improved convergence. 

22


Table 1-3. The SCF energy change for each RH and DSM step 

in the TRSCF calculations in Fig. 1.13. 

C-shift 

d orth -shift 

It. 

RH 

DSM 

RH DSM 

∆ E HF ∆ E HF 

∆ E HF ∆ E HF 

2 -1.1768 0.0000 -1.3976 0.0000 

3 -1.8964 -3.8998 -4.1319 -4.5865 

4 -1.6764 -1.9603 -1.8021 -1.0448 

5 -0.3655 -1.7543 -0.2103 -0.1200 

6 -0.1881 -0.1624 -0.0111 -0.0463 

7 -0.0932 -0.1505 -0.0036 -0.0037 

8 0.0065 -0.0212 -0.0001 -0.0008 

9 -0.0039 -0.0154 

10 0.0002 -0.0009 

1.4.1.6 The Trace Purification Scheme 

The dynamic level shift scheme described in the previous section has no reference to the MO basis. 

This opens the possibility to replace the diagonalizations in the TRRH step with some alternative 

scheme without affecting the overall result. 

There have been many suggestions as to how the diagonalization can be replaced by a linear scaling 

algorithm 47 . The trace purification (TP) scheme 19,48 , however, is a simple and useful approach and it 

has thus been implemented in our SCF program in a local version of DALTON 38,49 . The trace 

purification scheme was originally formulated for tight binding theory by Palser and 

Manolopoulos 19 and later improved by Niklasson 48 , and is linear scaling when formulated in an 

orthogonal basis. The scheme uses the trace and idempotency properties of the density to iteratively 

find the new density from a suitable start guess constructed from the Fock matrix. 

Since the SCF optimization is formulated in the non-orthogonal AO basis to avoid the delocalized 

MO basis, it is necessary to transform the matrices to an orthogonal basis. This is done by a 

Cholesky decomposition 50 of the AO overlap matrix S 

T 

S = LL , (1.39) 

where L then is used to transform the Fock matrix to an orthogonal basis 

orth -1 −T 

F = L FL . (1.40) 

The density resulting from the trace purification scheme will also be in the orthogonal basis and 

should be transformed back as 

−T orth -1 

D = L D L . (1.41) 

Since the AO overlap matrix does not change during the optimization, the Cholesky decomposition 

and the inversion of L can be done once and for all in the beginning of the calculation. 

23

Part 1 


F orth 

R 

λ min 

Estimate and 

for F orth 

λ max 

0 

orth 

( λ 

max 

I 

− 

F 

) 

= 

( λ 

− 

λ 

) 

max 

min 

1 

x n +1 = 2x n - x n 

2 

n = n + 1 

Tr Rn > N 

yes 

R 

n+ 1 = 

R 

2 

n 

no 

2 

n+ 1 = 2 n − n 

R R R 

x n +1 

no 

Tr Rn N ε 

+ 1 − < 

yes 

D orth = R n+1 

Fig. 1.15 Flow diagram for the trace purification (TP) 

scheme. N is the number of electrons. 

0 

x n +1 = x n 

2 

0 x n 

1 

Fig. 1.16 The purifying polynomials used in 

the trace purification scheme. The orange line 

is the McWeeny purification polynomial 

x n+1 = 3x n 2 – 2x n 3 . 

The trace purification is carried out by the Niklasson model with second order purification 

polynomials, and is schematized in Fig. 1.15. The initial density guess R 0 is obtained by 

normalizing the Fock matrix such that it only has eigenvalues between 0 and 1. To do this, the 

bounds for the Fock eigenvalues, λ min and λ max , must be found. They can be estimated using 

Gerschgorin’s theorem or the Lanczos algorithm for eigenvalues 51 with only a small extra 

computational cost. R is then iteratively purified, and the purification function applied in each 

iteration is chosen based on the trace of the matrix R, always keeping the direction towards the 

correct trace condition. The purification functions are sketched in Fig. 1.16 including the McWenny 

purification function 8 . One of the functions used in the scheme has a stationary point for x = 1 and 

the other has a stationary point for x = 0; depending of the function chosen we thus go towards a 

larger or smaller trace. When R fulfils the trace and/or idempotency conditions Eq. (1.2) of the one 

electron density within some threshold ε, the new density D orth = R has been found and the density 

to use in the next TRSCF iteration can be evaluated from Eq. (1.41). 

The number of purification iterations required to obtain a new density depends on the threshold ε. 

For the test calculations carried out so far, the threshold has been an error of 10 -7 in the trace, and 

the number of iterations ranges from 30 to 70 for a single RH step, with the typical number being 

closer to 30 than 70. Still, it is less expensive than the diagonalization as soon as more than a couple 

24


of thousand basis functions are needed. The scaling of the TRRH step in general and the trace 

purification scheme in particular is illustrated and discussed in Section 1.7.1. 

1.4.2 Density Subspace Minimization 

The DIIS scheme seems to have been the overall most successful of all the suggestions on how to 

improve SCF convergence described in Section 1.3. DIIS was the first scheme to take advantage of 

the information contained in the densities and Fock matrices of the previous iterations, and this 

made the difference. 

This is also exploited in the EDIIS scheme by Kudin et. al. 37 in which an energy model is optimized 

with respect to the linear combination of previous densities. The density subspace minimization 

presented in this section is an improvement to EDIIS with a smaller idempotency error in the 

density, the correct gradient compared to SCF, and thus better convergence properties in both the 

local and global region of the optimization. 

1.4.2.1 The Trust Region DSM Parameterization 

After a sequence of Roothaan-Hall iterations, we have determined a set of density matrices D i and a 

corresponding set of Fock matrices F i = F(D i ). An improved density D and Fock matrix F should 

now be found as a linear combination of the previous n + 1 stored matrices. Taking D 0 as the 

reference density matrix, the improved density matrix can be written 

n 

= 0 +∑ ci 

i= 

0 

D D D , (1.42) 

which, ideally, should satisfy the symmetry, trace and idempotency conditions Eq. (1.2) of a valid 

one-electron density matrix. Whereas the symmetry condition is trivially satisfied for any such 

linear combination, the trace condition holds only for combinations that satisfy the constraint 

n 

i= 

0 

i 

∑ ci 

= 0 , (1.43) 

leading to a set of n + 1 constrained parameters c i with 0 ≤ i ≤ n. Alternatively, an unconstrained set 

of n parameters c i with 1 ≤ i ≤ n can be used, with c 0 defined so that the trace condition is fulfilled: 

c 

0 

n 

=−∑ c . (1.44) 

i= 

1 

i 

In terms of these independent parameters, the density matrix D becomes 

where we have introduced the notation 

D = D0 + D + , (1.45) 

25

Part 1 


D 

+ 

= 

n 

∑ 

i= 

1 

c D 

i0 

D = D −D 

i0 i 0 . 

i 

(1.46) 

Unlike the symmetry and trace conditions in Eq. (1.2), the idempotency condition is in general not 

fulfilled for linear combinations of D i . Still, for any averaged density matrix D in Eq. (1.45) that 

does not fulfill the idempotency condition, we may generate a purified density matrix with a smaller 

idempotency error by the transformation 8 

D = 3DSD−2DSDSD. (1.47) 

Introducing the idempotency correction 

Dδ = D − D, (1.48) 

we may then write the purified averaged density matrix in the form 

D = D + D + D . (1.49) 

0 + δ 

1.4.2.2 The Trust Region DSM Energy Function 

Having established a useful parameterization of the averaged density matrix Eq. (1.45) and having 

considered its purification Eq. (1.47), let us now consider how to determine the best set of 

coefficients c i . Expanding the energy in the purified averaged density matrix, Eq. (1.49), around the 

reference density matrix D 0 , we obtain to second order 

T 

( ) ( ) ( ) (1) 1 

T 

D = D + D+ + D E + ( D+ + D ) E (2) ( D+ 

+ D ) 

E E δ δ δ . (1.50) 

SCF(2) SCF 0 0 2 

0 

To evaluate the terms containing 

(1) 

E 

0 

and 

(2) 

E 

0 

we make the identifications 

(1) 

0 

= 2 0 

2 2 

0 + = 2 + + + 

E F (1.51) 

( ) 

( ) 

E D F O D , (1.52) 

which follow from Eq. (1.4) and from the second-order Taylor expansion of about D 0 . The 

n 

notation Eq. (1.46) has now been generalized to the Fock matrix F+ = ∑ c 

i= 

1 iF i0 

. Ignoring the 

terms quadratic in D δ in Eq. (1.50) and quadratic in D + in Eq. (1.52), we then obtain the DSM 

energy 

DSM 

E () = ESCF ( 0 ) + 2Tr + 0 + Tr + + + 2Tr δ 0 + 2Tr δ + 

(1) 

E0 

c D DF DF DF DF. (1.53) 

Finally, for a more compact notation, we introduce the weighted Fock matrix 

n 

0 + 0 ci 

i0 

i= 

1 

and find that the DSM energy may be written in the form 

F = F + F = F +∑ F , (1.54) 

26


DSM 

( ) ( ) 

where the first term is quadratic in the expansion coefficients c i 

E c = E D + 2TrDδ 

F, (1.55) 

( ) SCF 0 0 

E D = E ( D) + 2TrDF + + TrDF, + + 

(1.56) 

and the second, idempotency-correction term is quartic in these coefficients: 

( ) 

2TrDδ F = Tr 6DSD −4DSDSD −2D F . (1.57) 

The derivatives of E DSM (c) are straightforwardly obtained by inserting the expansions of F and D , 

using the independent parameter representation. The expressions are given in Error! Reference 

source not found.. 

The energy function E DSM (c) in Eq. (1.55) provides an excellent approximation to the exact SCF 

energy E SCF (c) about D 0 , with an error quadratic in D δ (see Section 1.5.2). The EDIIS energy model 

corresponds to the first term E( D ) in Eq. (1.55) and has thus an error linear in D δ . 

1.4.2.3 The Trust Region DSM Minimization 

The DSM energy, Eq. (1.55), is minimized with respect to the independent parameters c i with 1 ≤ i 

≤ n. The vector containing the parameters is initialized to zero c (0) = 0 such that D = D 0 , where D 0 

is chosen as the density matrix with the lowest energy E SCF (D i ), usually the one from the latest 

TRRH step. The minimization is then carried out by the trust region method 52 , taking a number of 

steps from the initial parameters c (0) to the final optimized parameters c* as illustrated in Fig. 1.17. 

c (0) = 0 c* 

c (1) c (2) c (3) .... 

Fig. 1.17 Steps in the trust region minimization of the DSM energy. 

We thus consider in each step the second-order Taylor expansion of the DSM energy in Eq. (1.55). 

Introducing the step vector 

( i+ 

1) ( i) 

∆c = c −c , (1.58) 

we obtain 

E 

i 

( ) 

DSM ( ) T 1 T 

(2) 

+ = E0 

+ + 

2 

c ∆c ∆c g ∆c H∆c , (1.59) 

where the energy, gradient, and Hessian at the expansion point are given by 

E 

DSM 2 DSM 

DSM ( i) 

∂E ( c) ∂ E ( c) 

= E ( c ), g = , H = 

∂c 

i 

∂c 

0 2 

c= c 

c= 

c 

() () i 

. (1.60) 

27

Part 1 


DSM ( i) 

We then introduce a trust region of radius h for E ( c + ) 

(2) 

∆c and require that steps are always 

taken inside or to the boundary of this region. To determine a step to the boundary, we restrict the 

step to have the length h in the S metric norm M 

n 

2 2 

S 

= ∑ ∆cM i ij∆ cj 

= h 

ij= 

1 

∆c . (1.61) 

In the unconstrained formulation defined by Eq. (1.44), the metric M of Eq. (1.37), is found as 

M = Tr DSDS−Tr DSDS− Tr DSDS+ Tr DSDS, i, j ≠ 0 , (1.62) 

ij i j i 0 0 j 

0 0 

Introducing the undetermined multiplier ν for the step-size constraint, we arrive at the following 

Lagrangian for minimization on the boundary of the trust region: 

L E h . (1.63) 

T T T 2 

( ∆c, 

ν ) = + ∆c g+ 1 ∆c H∆c − 1 ν ( ∆c M∆c − ) 

0 2 2 

Differentiating this Lagrangian and setting the derivatives equal to zero, we obtain the equations 

∂L 

= g+ H∆c− ν M∆c = 0 

∂∆c 

(1.64) 

∂ L 1 T 2 

2 ( ∆c M∆c − h ) 0 . 

∂ν 

(1.65) 

The optimization of the Lagrangian thus corresponds to the solution of the following set of linear 

equations: 

H− M ∆c =−g 

(1.66) 

( ν ) 

where the multiplier ν is iteratively adjusted until the step is to the boundary of the trust region Eq. 

(1.65). The step length restriction may be lifted by setting ν = 0 as needed for steps inside the trust 

region. 

To illustrate how the level shift parameter ν in Eq. (1.66) is determined, we consider in Fig. 1.18 

and Fig. 1.19 the third and fourth DSM step respectively, in iteration five of the HF/STO-3G 

calculation on CrC seen in Fig. 1.13. The step length ||∆c|| S is plotted as a function of ν. The plots 

consist of branches between asymptotes where ν makes the matrix on the left hand side of Eq. 

(1.66) singular. This happens whenever ν equals one of the Hessian eigenvalues. The lowest 

eigenvalue ω 1 of the Hessian H is found, and the level shift parameter is chosen in the interval -∞ < 

ν < min(0,ω 1 ). The proper value is found where the step length function crosses the line 

DSM 

representing the trust radius h, as marked in Fig. 1.18. If the step that minimizes E 

(2) 

is inside the 

trust region, ν = 0 is chosen as is the case in Fig. 1.19. The trust region is updated during the 

iterative procedure and therefore h is different in the two steps. 

28


3 

3 

2 

2 

1 

h = 0.34 

1 

h = 0.44 

0 

-5 -2.5 0 2.5 5 7.5 

ν 

Fig. 1.18 The step length as a function of the 

multiplier ν in the third DSM step. 

0 

-5 -2.5 0 ν 2.5 5 7.5 

Fig. 1.19 The step length as a function of the 

multiplier ν in the fourth DSM step. 

Each of the trust region steps require the construction of the gradient g and the Hessian H in the 

density subspace, and the solution of the level shifted Newton equations Eq. (1.66). Since E DSM is a 

local model of the true energy function E SCF , it resembles E SCF only in a small region about the 

initial point c (0) . The DSM iterations are therefore terminated if the total step length after p iterations 

||c (p) – c (0) || S exceeds some preset value k. If a minimum of E DSM is found inside the trust region ||c (p) 

– c (0) || S < k, then the step ||c* - c (0) || S to the minimum is taken and the iterations are terminated. This 

is the typical situation. 

When the trust region minimization has terminated, an improved density matrix D can be 

constructed. However, to avoid the expensive calculation of the Fock matrix from D we use instead 

the averaged density matrix from eq. (1.45) and exploit that the Fock matrix is linear in the density 

for Hartree-Fock such that F( D ) is simply the averaged Fock matrix of Eq. (1.54). For DFT this is 

an approximation, but typically insignificant improvements are obtained by evaluating the correct 

Kohn-Sham matrix. The improved Fock matrix and density matrix then enters the TRRH step as F 0 

and D 0 , respectively. 

By construction E DSM (c) is lowered at each iteration of the trust region minimization. Since E DSM is 

a local model to the true energy E SCF , the lowering of E DSM will also lead to a lowering of E SCF 

provided the total step is sufficiently short and thus stays in the local region. 

1.4.2.4 Line Search TRDSM 

As in the TRRH step, the averaged density matrix D may also be determined by a line search and 

we denote this line search algorithm TRDSM-LS. Here, the line search is made in the direction 

defined by the first step c (1) of the TRDSM algorithm—that is, the step at the expansion point D 0 . 

As in the TRRH step, such a line search is guaranteed to reduce the energy. The first step is scaled 

by a parameter α, 

29

Part 1 


tot 

(1) 

∆c = α ⋅ c (1.67) 

DSM 

and a search is made in ∆ E SCF 

to find the step ∆c tot that leads to the largest decrease in energy. 

E SCF (α) is found by evaluating the averaged density of Eq. (1.45) for the coefficients (c 0 + ∆c tot ), 

purifying it as in Eq. (1.32)–(1.33) and inserting it in the energy expression of Eq. (1.1). Then 

DSM 

∆ E SCF ( α) 

can be found as DSM 

∆ E ( α ) = E ( α ) − E ( D ). (1.68) 

SCF 

SCF SCF 0 

Fig. 1.20 and Fig. 1.21 illustrate the search in α, again for iteration seven of the HF and LDA 

calculations on the zinc complex in Fig. 1.3. For α = 0, no step is taken and hence no energy 

decrease is seen. For the marked choice of α, the optimal step length is obtained. 

0 

-5 

-10 

-15 

-20 

-25 

-30 

-35 

0 4 8 12 16 20 

α 

Fig. 1.20 Decrease in HF energy as a function of 

the step length α. 

0 

-5 

-10 

-15 

-20 

-25 

0 4 8 12 16 20 

α 

Fig. 1.21 Decrease in LDA energy as a function of 

the step length α. 

1.4.2.5 The Missing Term 

In the construction of the TRDSM energy model Eq. (1.55), the term of second order in the 

idempotency correction D δ was neglected from Eq. (1.50), since this term required a new Fock 

evaluation F(D δ ), which would increase the expenses of the scheme considerably. This section will 

be concerned with this neglected term and how a part of it can be described without the evaluation 

of a new Fock matrix, leading to an improved energy model for TRDSM at no considerable extra 

cost. The actual effect of this improvement to the energy model will then be discussed through a 

case study. This section will only be concerned with Hartree-Fock theory and examples, but it might 

equally well be done for DFT even though the improvement should be less significant since for 

DFT, also terms of order ||D + || 3 are neglected. These are of the same size as the neglected term 

quadratic in D δ . In Section 1.5.2 these errors are discussed. 

Since the only neglect in the DSM energy model Eq. (1.55) for Hartree-Fock is the term quadratic 

in D δ , and since the only term quadratic in the density is TrDG(D), the HF energy for the density D 

can be written as 

30


( D) = ( D) + D F+ 

D G( D ) 

E HF E 2Tr δ Tr δ δ , (1.69) 

where E ( D ) is seen in Eq. (1.56). Even though a new Fock matrix h + G(D δ ) should be evaluated 

to describe the last term exactly, a part of the term can be described in the subspace of the previous 

densities. 

As exploited in the level-shift scheme Section 1.4.1.5, a density or density difference, in this case 

D δ , can be divided in a part that can be described in the subspace of the previous densities D 

δ 

and 

an unknown part orthogonal to the space 

D 

δ 

D 

⊥ 

δ 

δ = 

δ 

+ 

⊥ 

δ 

D D D 

is expanded in the previous densities D i as 

. (1.70) 

D 

 

δ 

n 

= ∑ωiD 

i= 

0 

i 

, (1.71) 

where the expansion coefficients ω i are determined in a least-squares manner 

ω 

n 

i = 

−1 

⎡⎣ 

⎤⎦ 

Tr 

ij 

j= 

0 

j δ , Mij = Tr i j 

∑ M D SD S D SD S . (1.72) 

Inserting Eq. (1.70) for D δ in Eq. (1.69), an improved DSM energy model can be written 

DSM 

( c) = ( D) + D F+ ( D −D ) G( D ) 

Eimp E 2Tr δ Tr 2 δ δ δ 

where only previous density and Fock matrices enter. The relation 

, (1.73) 

Tr AG( B) = Tr BG( A ) 

(1.74) 

⊥ ⊥ 

for symmetric matrices A and B is used and the term ( ) 

Tr Dδ G D 

δ 

is neglected. A second order 

Taylor expansion of the improved DSM energy can then be made as in Eq. (1.59) and a trust region 

minimization carried out. 

To study the improvement to the energy function, two TRSCF calculations are carried out on the 

cadmium complex seen in Fig. 1.6 in the STO-3G basis and with a H1-core start guess. The 

convergence profiles of the calculations are displayed in Fig. 1.22, the one denoted “Improved 

TRDSM” is a TRSCF calculation just as the one denoted “TRSCF” with the only difference that the 

improved energy model in Eq. (1.73) is used for TRDSM instead of the one in Eq. (1.55). To 

illustrate the impact of the improvement in a single TRDSM step, a line search like the one in Fig. 

1.20 is made in iteration 7 of the same TRSCF calculation as in Fig. 1.22. Apart from displaying the 

change in SCF energy as a function of the step length α, also the DSM energy of Eq. (1.55) and the 

improved DSM energy of Eq. (1.73) are evaluated for the different choices of α, and their energy 

changes found as well. 

31

Part 1 



1.E+02 

1.E+00 

1.E-02 

1.E-04 

1.E-06 

1.E-08 

TRSCF 

Improved TRDSM 

0 5 10 15 20 

Iteration 

Fig. 1.22 Convergence for the cadmium complex in 

Fig. 1.6, both for TRSCF with no improvements, 

and for TRSCF where E is used in TRDSM. 

DSM 

imp 

∆E / E h 

1.0 

0.0 

-1.0 

-2.0 

-3.0 

-4.0 

DSM 

∆E 

DSM 

∆E HF 

DSM 

∆E imp 

0 2 4 6 8 10 12 

α 

Fig. 1.23 TRDSM line search for iteration 7 in the 

TRSCF optimization Fig. 1.22. For different α in 

DSM 

Eq. (1.67), the changes in E 

HF 

, E DSM DSM 

and E 

imp 

compared to E HF (D 0 ) are found. 

It is seen in Fig. 1.23 that the improved DSM energy describes the HF energy better than the 

standard DSM energy does, just as expected. As the step moves away from the expansion point, the 

part of the energy which cannot be described in the old densities grows and both the DSM energy 

models become poor. 

The improvements presented in this section add complexity to the TRDSM algorithm, even though 

the computational cost is not significant. As seen in Fig. 1.22 and Fig. 1.23, the improvements to the 

TRSCF calculation are minor. The overall gain does not justify the extra complexity added to the 

TRDSM algorithm. 

1.4.3 Energy Minimization Exploiting the Density Subspace 

Section 1.3.1 describes how different approaches have been taken to avoid the diagonalization in 

the Roothaan-Hall step. Replacing the standard diagonalization of the Fock matrix can be done for 

the purpose of improving either the convergence properties or the scaling of the algorithm or for 

both reasons. With the purpose of improving both, a newly developed scheme is presented in this 

section, in which an energy minimization replaces the standard diagonalization in the SCF 

optimization. 

When the RH energy model is minimized, the density subspace information used with great success 

in TRDSM is ignored. The novel idea is thus to exploit the valuable information saved in the 

density subspace of the previous densities to construct an improved RH energy model and minimize 

this model instead of the RH model. This makes the TRDSM step redundant since a density 

subspace minimization now is included in the RH energy model minimization. 

The Hessian update methods 40,53 , in which an approximate Hessian is updated in each iteration and 

an approximate Newton step is taken, exploit some of the same ideas, but they are all based on 

32


approximate second order energy expansions in the orbital rotation parameters and therefore do not 

include the third and higher order terms included in the RH energy. 

In the following subsections the improved RH energy model and its minimization will be described. 

The SCF convergence of a test case is then displayed, in which the new energy minimization 

approach is compared to standard DIIS and the TRSCF schemes. As the scheme has not yet been 

extended to DFT, this section will only consider HF theory and calculations. 

1.4.3.1 The Augmented RH Energy model 

If the Hartree-Fock energy, Eq. (1.1), is expanded through second order around some reference 

density D 0 

E ( D) = E ( D ) + 2TrF( D )( D− D ) + Tr( D−D ) G( D−D ) , (1.75) 

HF HF 0 0 0 0 0 

the first two terms are recognized as E RH (D) from Eq. (1.22) plus the terms of zeroth order E HF (D 0 ) 

and - E RH (D 0 ) 

( ) ( ) ( ) 

RH 

RH 

E ( D) = E ( D) + E ( D ) − E ( D ) + Tr D−D G D−D . (1.76) 

HF HF 0 0 0 0 

In a standard RH step, the energy function to minimize is the RH energy, neglecting the last term 

which contains the Hessian information, because it is too expensive to evaluate. Since Hessian 

information is very valuable to an optimization, the scheme presented in this section will replace the 

diagonalization in the RH step by an energy minimization of an augmented RH (ARH) energy 

model, where as much Hessian information as possible is included without directly evaluating new 

Fock matrices. This is done by exploiting the information contained in the density and Fock 

matrices of the previous iterations. 

As previously exploited, a density or density difference, in this case ∆ = D – D 0 , can be split in a 

part that can be described in the subspace of the n + 1 previous densities ∆ and an unknown part 

orthogonal to the space 

⊥ 

∆ 

∆ is expanded in the previous densities D i as 

D− D = ∆ = ∆ + ∆ 

0 

n 

i= 

0 

⊥ 

. (1.77) 

 

∆ = ∑ωiDi 

, (1.78) 

where n is the number of previously stored densities and the expansion coefficients ω i are 

determined in a least-squares manner 

ω 

n 

i = 

−1 

⎡⎣ 

⎤⎦ 

Tr 

ij 

j= 

0 

j , Mij = Tr i j 

∑ M D S∆S D SD S . (1.79) 

33

Part 1 


⊥ ⊥ 

Inserting Eq. (1.77) in the last term of Eq. (1.76) and neglecting the term Tr ∆ G ( ∆ ) , the 

augmented Roothaan-Hall energy model can be written as 

( ) ( ) ( ) 

ARH ( ) RH ( ) ( ) RH 

 

E D = E D + EHF D0 − E ( D0 ) + Tr 2∆−∆ G ∆ , (1.80) 

where G ( ∆ ) is evaluated as a linear combination of previous Fock matrices 

n 

 

( ) ∑ωi ( i ) ∑ωi ( i ) 

G ∆ = G D = ( F D − h ). (1.81) 

i= 1 i= 

1 

The energy model E ARH has no intrinsic restrictions with respect to how different the densities 

spanning the subspace are allowed to be, and this is one of the benefits compared to the TRSCF 

scheme. For the TRDSM energy model, the purification implicit in the DSM energy makes no sense 

if the densities are too different, in particular if they have different electron configurations. In ARH, 

configuration shifts can be handled without problems, and whereas old, obsolete densities pollute 

the DSM energy model, they simply disappear from the ARH energy model, since their weights ω i 

diminish. 

We expect a faster convergence rate for ARH compared to TRSCF, mainly because the RH and 

DSM steps are merged to an energy model with correct gradient (not just in the subspace) and an 

approximate Hessian, which is improved in each iteration using the information from the previous 

density and Fock matrices. 

1.4.3.2 The Augmented RH Optimization 

The density for which the ARH energy model should be optimized can be expanded in the antisymmetric 

matrix X 

n 

D ( X () () () () 

) = exp 1 

( − XS ) D i 0 exp ( SX ) = D i ⎡ i 0 

+ 

0 , ⎤ + ⎡⎡ i 

2 0 

, ⎤ , ⎤ 

⎣ D X ⎦ ⎣ ⎦ 

+ 

⎣ 

D X X ⎦ 

, (1.82) 

() i 

S S S 

where D 

0 

is the reference density from which the step X is taken. Optimizing the ARH energy is 

thus a nonlinear problem and an iterative scheme should be applied. 

A Newton-Raphson (NR) optimization of the ARH energy is therefore carried out, and the steps are 

ARH 

found minimizing a second order approximation of the ARH energy E 

(2) 

by the preconditioned 

conjugate gradient (PCG) method. The second order approximation of the ARH energy, where the 

constant terms are excluded, can be written as 

34


E 

where 

() i 

() i 

( X) 

= 2Tr F0 ⎡ 

0 

, ⎤ + Tr ⎡ 

0 

⎡ 

0 

, ⎤ , ⎤ 

⎣ 

D X 

⎦ 

F 

⎣⎣ D X 

⎦ 

X 

⎦ 

ARH 

(2) S S S 

() i 

(1) (2) 

( D0 

D0 

) ∑( ωi 

ωi 

) G( Di 

) 

+ 2Tr − + 

i, j= 

1 

n 

i= 

1 

n 

n 

() i 

(0) (1) () (0) 

0 S ⎣ 0 S ⎦ 

i= 1 

S 

i= 

1 

i 

∑( ωi ωi ) ( i ) ⎡ 

⎤ ∑ωi 

( i ) 

+ 2Tr ⎡ , ⎤ Tr ⎡ , ⎤ 

⎣ 

D X 

⎦ 

+ G D + 

⎣ 

D X 

⎦ 

, X G D 

n 

∑ 

( ) ⎤DG i ( Dj 

) 

(0) (1) (2) (1) (1) 

j i i i j 

− Tr ⎡ 

⎣ 

2 ω ω + ω + ω ω 

⎦ 

, 

(1.83) 

ω 

ω 

ω 

n 

(0) −1 

( ) 

i = ∑ ⎡⎣ ⎤⎦ 

Tr 

ij 

j= 

1 

i 

( j 0 ) 

M D SD S 

i 

( j 

⎡ ⎤ ) 

n 

(1) −1 

( ) 

i = ∑ ⎡⎣ ⎤⎦ Tr 

0 

, 

ij ⎣ ⎦S 

j= 

1 

M D S D X S 

( ⎡ 

i 

j 

⎡ ⎤ ⎤ 

0 

) 

n 

(2) 1 −1 

( ) 

i = 

2 ∑ ⎡⎣ ⎤⎦ij 

⎣⎣ ⎦S ⎦ 

j= 

1 

S 

M Tr D S D , X , X S . 

(1.84) 

If the summations are put in the most favorable way, the number of matrix multiplications is limited 

and independent of subspace size. Only the update of the metric M takes a number of matrix 

multiplications linearly in the subspace size. 

ARH 

∂E (2) 

∂X 

From the derivative , the problem to be solved by PCG is set up for the current reference 

() i 

density D 

0 

where i denotes the Newton-Raphson step number. Through the whole NR 

optimization D 0 and F 0 are the density and Fock matrices from the previous SCF iteration. The NR 

step X found by PCG is used to evaluate a new density from Eq. (1.82) and if the new density is 

similar to the previous one, the Newton-Raphson optimization has converged, if not, the density is 

() i 

used as reference density D in the next step. 

0 

The final density matrix resulting from the NR optimization is then used to evaluate a new Fock 

matrix, and so the SCF iterative procedure is established. The SCF scheme for the described 

algorithm is illustrated in Fig. 1.24. 

35

Part 1 


( 0 ) 

D 0 

( 0 ) 

( ) 

F D n 

ARH 

min E(2) ( X ) ( i 

D ) 

n 

by PCG 

( i 1 

D 

) ( X) 

n + 

i = i + 1 

n = n + 1 

no 

( i+ 

1) ( i) 

no 

n ≈ Dn 

yes 

( 0 ) ( i+ 

1) 

n+ 1 = D n 

D ( 0 ) ( 0 ) 

n+ 1 ≈ D n 

D 

D 

yes 

D conv 

1.4.3.3 Applications 

Fig. 1.24 Flow diagram of the SCF optimization with 

the diagonalization of the Fock matrix replaced by a 

minimization of the ARH energy. The light blue box 

embraces the Newton-Raphson optimization of E ARH . 

SCF calculations have been carried out using the ARH scheme. In Fig. 1.25 the convergence of 

HF/STO-3G calculations on CrC with 2.00Å bond distance are displayed. Results are given for the 

augmented RH scheme, DIIS and TRSCF with the C-shift and d orth -shift schemes, respectively. For 

the first iterations in the ARH optimization a limit is put on the ||X|| S norm to avoid changes in the 

densities which go beyond the region that is well described by the energy model. 

The ARH scheme is clearly superior for this test case, even with the convergence improvements for 

TRSCF obtained with the d orth -shift scheme; ARH is almost an iteration in front of ‘TRSCF/d orth - 

shift’ in the local region. The standard DIIS approach does not converge at all for this case. 

36

The Quality of the Energy Models for HF and DFT 


1.E+02 

1.E+00 

1.E-02 

1.E-04 

1.E-06 

1.E-08 

1.E-10 

DIIS 

TRSCF C-shift std. 

TRSCF dnew 

orth -shift 

ARH 

1 3 5 7 9 

Iteration 

Fig. 1.25 HF/STO-3G calculations on CrC using 

different approaches. 

0.8 

0.6 

0.4 

0.2 

0.0 

1 3 5 7 9 

Iteration 

Fig. 1.26 Details from the ARH optimization in 

Fig. 1.25: The part of the density change which can 

be described in the subspace of the previous 

densities. 

To illustrate how information gradually is obtained from the previous densities in ARH, the part of 

the density change ∆D = D n+1 - D n in each iteration that can be described in the previous densities 

∆D is found as in Eq. (1.78)-(1.79), and the ratio 

 

∆D 

∆D 

is depicted in Fig. 1.26. It is 

seen how the description of ∆D improves during the first five iterations until a significant part of the 

Hessian is described, then a qualified step is taken to another region, and the new density is 

therefore not well described in the previous densities. This step is followed by a significant decrease 

in SCF energy of two orders of magnitude. The same pattern is repeated after two additional 

iterations. 

Even though only preliminary results are given in this section, the ARH energy minimization seems 

promising, taking the best of the RH and DSM energy models, and improving the convergence 

compared to TRSCF, which already saw better or as good convergence rates as DIIS. It could be 

expected that this scheme has the ability to converge in fewest SCF iterations overall. The future 

success of ARH is dependent on the development of effective ways of solving the nonlinear 

equations in X, e.g. by setting up a good preconditioner. 

1.5 The Quality of the Energy Models for HF and DFT 

Having considered the theory behind the TRRH and TRDSM steps in Section 1.4.1 and 1.4.2 

without being concerned with the approximations introduced in the energy functions, this section 

takes a closer look at the errors in the energy models compared to the SCF energy. The SCF 

optimization of Hartree-Fock and Kohn-Sham-DFT energies is similar; the only difference lies in 

the energy expressions to be optimized. The approximations in the energy models will thus also 

differ in HF and DFT, and while Section 1.2 described the HF and DFT theory in a generic manner, 

this section will focus on the differences, ignoring the general elements already stated in Section 

1.2. 

S 

S 

37

Part 1 


To make the differences in the HF and DFT energy expressions clear, we will now study them 

separately: 

E 

= 2TrhD + Tr DG ( D ) + h , (1.85) 

HF HF nuc 

E = 2TrhD + Tr DG ( D) + h + E ( D ), (1.86) 

DFT DFT nuc XC 

where 

[ G HF ( D ) ] = 2 gµνρσ Dρσ − gµσρν Dρσ 

, (1.87) 

µν 

∑ 

ρσ 

∑ 

ρσ 

[ G DFT ( D )] 

= 2 gµνρσ Dρσ −γ gµσρν Dρσ 

. (1.88) 

µν 

∑ 

ρσ 

∑ 

ρσ 

The second term in Eq. (1.87) and Eq. (1.88) is the contribution from exact exchange, with γ = 0 in 

pure DFT (LDA), and γ ≠ 0 in hybrid DFT. The exchange-correlation energy E XC (D) in Eq. (1.86) is 

a functional of the electronic density. In the local-density approximation (LDA), the exchangecorrelation 

energy is local in the density, whereas in the generalized gradient approximation (GGA), 

it is also local in the squared density gradient, and may thus be expressed as 

EXC ( D) = ∫ f ( ρ( x), ζ( x) 

) dx. (1.89) 

Here the electron density ρ(x) and its squared gradient norm ζ(x) are given by 

T 

ρ( x) = χ ( xDχ ) ( x), 

ζ( x) =∇ρ( x) ⋅∇ρ( x), 

(1.90) 

where χ(x) is a column vector containing the AOs. Note that the exchange-correlation energy 

density f(ρ(x), ζ(x)) in Eq. (1.89) is a nonlinear (and non-quadratic) function of ρ(x) and ζ(x). In the 

following is relied on an expansion of E XC (D) around some reference density matrix D 0 

E 

T 

T 

XC XC 0 0 XC 2 0 XC 0 

(1) (2) 

( D) = E ( D ) + 

1 

( D− D ) E + ( D−D ) E ( D− D ) + , (1.91) 

( n) 

where the derivatives E 

XC 

have been evaluated at D = D 0 and where for convenience a vectormatrix 

notation for D, E 

(1) 

XC 

, and E (2) 

XC 

is used. The precise form of E XC depends on the DFT 

functional chosen for the calculation. 

It is often more problematic to obtain convergence for DFT than HF, mainly for two reasons: The 

HOMO-LUMO gap ∆ε ai is smaller for DFT than for HF, and a determinant with a well separated 

occupied and virtual part has better convergence properties than one with a lot of close lying 

states 54,55 . Also, since the exchange-correlation is nonlinear and non-quadratic in the density, the 

higher order terms in the density not present in Hartree-Fock theory introduces some extra 

approximations to the SCF scheme for DFT. In this section these differences and their consequences 

for the convergence properties will be discussed for the TRSCF algorithm. It is here assumed that if 

the energy models employed in TRSCF were of the same quality for HF and DFT, that is, had errors 

38


of the same order compared to the true SCF energy, then the convergence properties would also be 

of the same quality. 

The study is mainly performed in the MO basis with a block diagonal Fock matrix as in Eq. (1.10) 

and the reference density matrix 

MO 

D 

0 

2δ 

ij 

MO ⎛ 0 ⎞ 

D0 

= ⎜ ⎟ 

⎝ 0 0 ⎠ 

. (1.92) 

It is also exploited that any valid density matrix D may be expressed in terms of a valid reference 

density matrix D 0 as 

MO 

MO 

D ( K) 

= exp( −K) D exp( K ) , (1.93) 

and can thus be expanded in orders of K through the BCH-expansion 46 

MO MO MO 1 MO 3 

= 

0 

+ ⎡ 

0 

⎤ + ⎡⎡ 

2 0 

⎤ ⎤ + 

0 

D ( K) D ⎣D , K⎦ ⎣⎣D , K⎦, K⎦ 

O ( K ). (1.94) 

The anti-symmetric rotation matrix may be written in the form 

⎛ 0 −κ 

⎞ 

K = ⎜ ⎟ , (1.95) 

⎝κ 0 ⎠ 

where κ holds the orbital rotation parameters. The diagonal block matrices representing rotations 

among the occupied MOs and among the virtual MOs are zero since the density matrix in Eq. (1.8) 

is invariant to such rotations. 

In the following subsections the RH energy model Eq. (1.22) and the DSM energy model Eq. (1.55) 

are analyzed separately with respect to differences for HF and DFT. 

1.5.1 The Quality of the TRRH Energy Model 

To compare the RH energy model to the SCF energy, both are expanded about a reference density 

matrix D 0 (neglecting the possible difference between F 0 and F(D 0 ) noted in Section 1.4) 

E 

T 

RH RH 

E ( D) = E ( D0) + 2Tr F( D0) 

( D−D 0 ), (1.96) 

( D) = E ( D ) + 2TrF( D )( D− D ) + Tr( D−D ) G( D−D 

) 

(1) 

+ E ( D) − E ( D ) −Tr ( D−D ) E ( D ), 

(1.97) 

SCF SCF 0 0 0 0 0 

XC XC 0 0 XC 0 

where the last three terms of Eq. (1.97) only are present in DFT theory. These expansions have the 

same first-order term 2TrF(D 0 )(D - D 0 ) and thus the same first derivative with respect to the orbital 

rotation parameters κ ai of Eq. (1.95) 

RH 

(1) ∂E 

( κ ) 

⎡ ⎤ 

⎣ 

E 

RH ⎦ 

= = −4F 

ai , (1.98) 

ai ∂κ 

ai 

κ= 

0 

39

Part 1 


(1) ∂ESCF 

( κ ) 

⎡ ⎤ 

⎣ 

E 

SCF ⎦ 

= = −4F 

ai . (1.99) 

ai ∂κ ai κ= 

0 

The expressions are found replacing D in Eqs. (1.96) and (1.97) with D MO in Eq. (1.94) and 

differentiating with respect to κ ai . 

All higher order terms in κ arising from 2TrF(D 0 )(D - D 0 ) are consequently also shared for the SCF 

and RH energies whereas terms of second and higher order arising from the last term(s) in Eq. 1.94 

are neglected in the RH energy model. To study the differences, the second order derivatives in κ 

are found in the same way as the first derivatives 

2 RH 

(2) ∂ E ( κ) 

⎡ ⎤ 

⎣ 

E 

RH ⎦ 

= = 4δ ij δ ab ( ε a −ε 

i ) 

(1.100) 

aibj ∂κ 

∂κ 

2 

ai 

bj 

κ= 

0 

(2) ∂ ESCF 

( κ) 

⎡ ⎤ 

⎣ 

E 

SCF ⎦ 

= = 4δδ ij ab ( ε a − ε i ) + W aibj , (1.101) 

aibj ∂κ 

∂κ 

ai 

bj 

κ= 

0 

where 

HF 

16 4( ) 

W = g − g + g 

(1.102) 

aibj 

aibj abij ajib 

( ) 

DFT 

Waibj 

= 16gaibj − 4 γ gabij + gajib 

+ ⎡ ( ) ⎤ ⎣ 

E κ ⎦ 

. (1.103) 

(2) 

XC 

aibj 

(2) 

E XC ( κ ) is the second derivative of the term E XC expanded in the orbital rotation parameters κ. The 

error in the RH energy model can then be said to depend partly on the size of W and partly on the 

size of the third and higher order contributions from the nonlinear terms in Eq. (1.97) which are not 

included in Eq. (1.96). This general consideration goes for DFT as well as HF, but with different 

impact. As seen in Eq. (1.102) and (1.103), the definition of W differs in the two approaches and 

even differs depending on which DFT functional is chosen. Furthermore, since the size of the 

HOMO-LUMO gap ∆ε ai = ε a - ε i is typically smaller in DFT, the term 4δ ij δ ab (ε a – ε i ) will have 

different weights in Eq. (1.101) depending on the method. Also the size of the third and higher 

order contributions in Eq. (1.97) would be expected to differ for HF and DFT, since for DFT both 

the terms Tr(D - D 0 )G(D - D 0 ) and E XC (D) contribute whereas HF only contains the Tr(D - D 0 )G(D 

- D 0 ) term. In the beginning of the optimization, where large steps are taken, the size of the third 

and higher order contributions is the potential source of error. Near convergence this should be less 

of an issue, and in this region the size of the lowest Hessian eigenvalues should be the decisive error 

source. 

HF and LDA calculations have been carried out and the part of the SCF energy change arising from 

RH 

the RH step ∆ E SCF 

has been found as well as the change in the RH energy model ∆E RH in each 

iteration. 

40


4.0 

2.0 

0.0 

-2.0 

HF 

LDA 

0 5 10 15 20 

Iteration 

Fig. 1.27 Calculations on the cadmium complex in 

Fig. 1.6 in the STO-3G basis set. 

3.0 

2.0 

1.0 

0.0 

-1.0 

-2.0 

HF 

LDA 

0 5 10 15 20 25 

Iteration 

Fig. 1.28 Calculations on the zinc complex in Fig. 

1.3 in the 6-31G basis set. 

The change in the RH energy model is found as 

idem 

( n ) 

RH 

E 2Tr + 1 0 

∆ = F D −D , (1.104) 

idem 

where D 

0 

is the reference density matrix, typically a D from the previous TRDSM step purified 

as in Eqs. (1.32)-(1.33), and D n+1 is the new density found from diagonalization of the Fock matrix. 

In the C-shift scheme the criterion Eq. (1.31) ensures that the occupied and virtual orbitals do not 

mix, and thus the Hessian, Eq. (1.100), is positive and the RH energy decreases. The SCF energy 

change is found as 

RH 

idem 

SCF SCF n+ 

1 SCF 0 

∆ E = E ( D ) − E ( D ). (1.105) 

The ratio between Eq. (1.104) and Eq. (1.105) contains information of the quality of the RH energy 

model. If the errors are negligible, the ratio is close to 1. If the ratio is larger than one, the RH 

energy model exaggerates the energy decrease, and if it is between 0 and 1 it underestimates the 

energy decrease. If it is negative, the SCF energy increases even though the RH energy model 

predicts an energy decrease. 

RH RH 

For two test cases the ∆E ∆ E SCF 

ratio is displayed in Fig. 1.27 and Fig. 1.28, respectively. It is 

clearly seen that generally, the RH energy model is better for HF than for DFT, in particular, 

negative values are seen for the LDA ratios. The errors in the RH energy model for the LDA 

calculations get worse as convergence is approached, so it would be expected that the significant 

source of error is the neglected term W in the Hessian rather than the higher order terms. Since 

locally the lowest Hessian eigenvalue should be the one controlling the optimization, this theory is 

inspected evaluating the lowest Hessian eigenvalue for both the RH energy model and for SCF 

according to Eq. (1.100) and Eq. (1.101), respectively, at convergence of the two test cases. The 

results are compared in Table 1-4. 

41

Part 1 


Table 1-4 The lowest Hessian eigenvalues for the RH energy 

model and SCF energy at convergence of the calculations in Fig. 

1.27 and Fig. 1.28. The deviation is found as 

( ⎡ (2) ⎤ ⎡ (2) ⎤ ) 

(2) 

RH SCF 

100% ⎡ ⎤ 

⎣ 

E 

⎦ 

− 

⎣ 

E 

⎦ 

⋅ 

⎣ 

E 

SCF ⎦ 

. 

(2) 

SCF 

(2) 

RH 

min min min 

cadmium complex zinc complex 

HF LDA HF LDA 

⎡ 

⎣ 

E ⎤ 

⎦ min 

0.557 0.017 1.000 0.290 

⎡ ⎤ 

⎣ 

E 

⎦ min 

1.112 0.014 1.621 0.281 

Deviation 100% -21% 62% -2% 

As expected, the lowest Hessian eigenvalue for the RH energy model, that is the HOMO-LUMO 

gap, is much smaller for LDA than for HF, but surprisingly it is seen that the Hessian prediction in 

the RH energy model for LDA is much better than the one for HF. Of course this is only the lowest 

eigenvalue, and we have not studied the corresponding eigenvector. We know for sure that the size 

of the orbital rotation parameters κ ai decreases during the optimization and should be very small at 

convergence, where only small adjustments to the density are made. It is thus difficult to imagine 

that terms of third and higher order in κ should be the reason for the larger errors in the DSM 

energy model for LDA compared to HF. 

This is a matter we will investigate further in the future since it is not understood at the moment. 

The importance of the higher order terms should be examined directly to understand how they affect 

the errors, and the Hessian should be studied more carefully introducing information about the 

direction of the eigenvalues. However, it can still be concluded from Fig. 1.27 and Fig. 1.28 that the 

RH energy model is poorer for LDA than for HF optimizations. 

1.5.2 The Quality of the TRDSM Energy Model 

The TRDSM energy model of Section 1.4.2.2 is formulated in a general manner and is as applicable 

to DFT theory as to HF theory. Still, the model will be poorer for DFT than for HF because of the 

general exchange-correlation term appearing in the DFT energy. 

For the DSM energy model there are in general four possible sources of errors: 

1. The purified density D still has an idempotency error. 

2. The term 

1 T [2] 

2 δ 0 δ 

D E D in E( D ) , Eq. (1.50), is neglected. 

3. E( D ) , Eq. (1.50), is truncated after second order. 

4. 

( 2 ) 

0 + 

E D in Eq. (1.50) is approximated by 2 F + . 

42


Let us take a closer look at the errors one by one. In ref. 39 a general order analysis of the purified 

density D used in the parameterization of the DSM energy is given, and the results are summarized 

in Table 1-5. 

Table 1-5. Comparison of the properties of the unpurified density D and the purified 

density D . c is the density expansion coefficients and κ is the orbital rotation parameters 

that change D 0 to another density in the subspace D i . 

D 

Differences D+ = D− D0 = ( c κ ) 

O 

2 

Dδ = D 

− D = O ( c κ ) 

Idempotency error 

2 

4 

DSD − D = O ( c κ ) DSD − D 

= O ( c 2 κ ) 

Trace error Tr DS − N / 2 = 0 

2 4 

Tr DS − N / 2 = O ( c κ ) 

In the D column, the order of the idempotency correction D δ and the idempotency error for D are 

found. These are the same for DFT and HF; the idempotency error is of order c 2 ||κ|| 4 , and since D δ 

is of the order c||κ|| 2 , the error connected to the neglect of the term second order in D δ , will be of 

order c 2 ||κ|| 4 as well. 

The third possible source of errors is the truncation of the energy E( D ) after second order in the 

density. Since the Hartree-Fock energy is quadratic in the density, this truncation leads to no errors 

for HF, but for DFT there will be an error of order ||D + || 3 and from the first column in Table 1-5 it is 

seen that it can be written as an error of order c 3 ||κ|| 3 , since D + is of the order c||κ||. Also since the 

(3) 

HF energy is quadratic in the density, no third derivative E 

0 

exists and thus the Taylor expansion 

( 2 ) 

used to find E0 D+ = 2F + is terminated for HF, but for DFT terms of order ||D + || 2 are neglected. 

( 2 ) 

Since E0 D + is multiplied by D + in the energy function Eq. (1.50), this gives an error for DFT of 

the order ||D + || 3 or as before c 3 ||κ|| 3 . The sizes of the introduced errors are summarized in Table 1-6. 

Table 1-6. Comparison of the errors introduced in the DSM energy model for 

HF and DFT respectively. 

D 

1 Idempotency error DSD − D 

2 Neglected term 

3 Truncation of ( ) 

4 Approximation of 

( ) 

error in HF 

error in DFT 

( 2 4 

O c κ ) 

2 4 

O ( c κ ) 

1 T [2] 

D 

2 δ 

E0 

D 

2 4 

2 4 

δ O ( c κ ) O ( c κ ) 

E D 0 3 3 

O ( c κ ) 

2 

E0 D + 

0 3 3 

O ( c κ ) 

Depending on the sizes of c and ||κ|| respectively, the error for DFT will be of same or lower order 

than the one for HF. To inspect whether or not the DSM energy is a poorer model for DFT than for 

HF, a number of calculations have been carried out, and the sizes of ||D δ || and ||D + || for the DSM 

step in each iteration are examined. Since D δ is of the order c||κ|| 2 and D + is of the order c||κ||, the 

43

Part 1 


size of ||D δ || 2 and ||D + || 3 will indicate whether the error in the energy model is controlled by the 

( c 2 4 

3 3 

O κ ) or the ( c κ ) 

O error. The test cases showed similar behavior and results from HF 

and LDA calculations on the cadmium complex in Fig. 1.6 with a STO-3G basis and a H1-core start 

guess are displayed in Fig. 1.29 and Fig. 1.30. 

1.E+00 

1.E-02 

1.E-04 

1.E-06 

1.E-08 

1.E-10 

4 

||D+||^4D + S 

||D+||^3 

3 

D + S 

2||Ddelta||^2 2 

2 Dδ 

S 

dEDSM 

E − E 

HF 

DSM 

2 5 8 11 14 17 20 

Iteration 

1.E+01 

1.E-01 

1.E-03 

1.E-05 

1.E-07 

1.E-09 

1.E-11 

4 

||D+||^4D + S 

3 

||D+||^3D + S 

||Ddelta||^2 2 

D δ S 

dEDSM 

E − E 

LDA 

DSM 

2 5 8 11 14 17 20 23 

Iteration 

Fig. 1.29 HF/STO-3G calculation. The size of 

different density norms compared to the actual 

error in the DSM energy model. 

Fig. 1.30 LDA/STO-3G calculation. The size of 

different density norms compared to the actual 

error in the DSM energy model. 

DSM 

The SCF energy at the end of a DSM step ESCF 

is found by purifying the resulting D by Eq. (1.32) 

–(1.33) and evaluating the SCF energy, Eq. (1.1), for this density. The DSM energy, Eq. (1.55), is 

DSM DSM 

also evaluated and the error of the DSM energy model is then found as the size ESCF 

− E . 

For the HF calculation this error is expected to be of the size ||D δ || 2 , and it is seen in Fig. 1.29 that 

this is actually the case; if ||D δ || 2 is multiplied by 2, there is a remarkable fit. Also it is seen that if 

the error in the DSM energy for HF should be expressed in the density differences D + , it would be 

the density differences to the third rather than the fourth order. For the DFT calculation the 

interesting point was to see whether or not ||D + || 3 is the controlling error. In Fig. 1.30 is seen that 

even though there is not an obvious fit as for HF, ||D δ || 2 seems to be the dominant error here as well. 

Still, if the error should be expressed in the density differences D + , it would be the density 

differences to the third rather than the fourth order as expected for DFT. 

In conclusion it seems that the dominating error in the DSM energy both for HF and DFT is ||D δ || 2 , 

that is, the idempotency correction squared. In comparison it should be mentioned that the EDIIS 

model 37 by Kudin, Scuseria, and Cancès corresponds to E( D ) in Eq. (1.55) and thus has an error of 

the order ||D δ || compared to the SCF energy. 

1.6 Convergence for Problems with Several Stationary Points 

The HF equation is a nonlinear equation and, therefore, it presents in principle several solutions. 

Several minima might exist, and even though it is typically preferred to find the global minimum, 

44

Convergence for Problems with Several Stationary Points 

no optimization method can make that a guarantee. Furthermore, it cannot be tested if the minimum 

found is a local or the global minimum without knowledge of the whole surface. Depending on the 

start guess and the optimization approach, an optimization can converge to different stationary 

points. Further, it is necessary to decide in which subspace of orbital rotations the desired solution 

should be found, since a solution representing a stable stationary point in one subspace is not 

necessarily stable in another. 

Orbital rotations can be divided in real and complex rotations and each of those can be further 

divided in singlet and triplet rotations. Each of those can then again be divided in rotations within 

the different point group symmetries. Generally, we do not consider the complex rotations, and we 

only optimize in the real space. Further, when optimizing a closed shell wave function, only the 

total-symmetric part of the singlet rotations is considered. A stationary point in the subspace of real, 

total-symmetric, singlet rotations can be shown through elementary arguments to be a stationary 

point for all types of rotations. However, a stationary point can both be a maximum, a saddle point 

or a minimum. A way to realize if the stationary point also is a minimum is to evaluate the Hessian 

eigenvalues. This is done within the subspace in which the solution should be stable. If a negative 

Hessian eigenvalue is found in the subspace of singlet rotations, the stationary point is said to have 

a singlet instability and if a negative Hessian eigenvalue is found in the subspace of triplet rotations, 

it is said to have a triplet instability 54,56 . Triplet instabilities are connected to breaking the symmetry 

between α and β orbitals. If a triplet instability is found, a minimum with a lower energy than the 

current stationary point can be found, if the α and β parts are allowed to differ, typically leading to 

2 

a solution which is not an eigenfunction of Ŝ . Hence, the lower minimum could be found by an 

unrestricted HF (UHF) optimization. A singlet instability found in the total-symmetric subspace 

indicates that the current stationary point is a saddle point and a minimum with lower energy exists 

within the subspace. If a singlet instability is found outside the total-symmetric subspace, orbitals of 

different symmetries should be mixed to decrease the energy further, changing the symmetry of the 

resulting wave function. 

The aufbau ordering rule assumes that occupying the orbitals of lowest energy also leads to the 

lowest Hartree-Fock energy. This cannot be proven to always apply for restricted HF as it can for 

UHF 57 . Thus it is a risk when the aufbau ordering is forced upon an optimization, that a lower 

energy with the aufbau ordering broken could exist. However in a study by Dardenne et. al. 58 , in 

which different ordering schemes were tested, they found in all cases that the minimum was an 

aufbau solution. The aufbau ordering was broken only for saddle points. In our schemes we always 

apply the aufbau ordering rule, but if the RH step is level shifted to the end of the optimization, it 

can force the convergence to a non-aufbau solution. 

45

Part 1 


1.6.1 Walking Away from Unstable Stationary Points 

As concluded in the previous section, the Hessian eigenvalues should be tested to make sure the 

optimized state is stable. This is expensive, so it is only done when it is expected that the problem 

has several stationary points. Depending on the desired solution, only the relevant part of the 

Hessian is checked. So far we have only considered singlet instabilities, but currently tests for triplet 

instabilities are implemented as well. 

The check for singlet instabilities is made on the converged wave function, finding the lowest 

Hessian eigenvalue of the Hessian in the real, singlet subspace. If the lowest Hessian eigenvalue 

turns out to be positive, we are sure to have a solution which is stable with respect to singlet 

rotations, but if it is negative we are in a saddle point, and a minimum with a lower energy exists 

within the subspace. We have in our SCF program implemented the possibility to test the singlet 

Hessian and in case of a negative lowest Hessian eigenvalue follow the corresponding direction 

downhill and away from the saddle point. The scheme and some examples of its use will be 

described in the following. 

1.6.1.1 Theory 

When the SCF optimization has converged, the set of optimized orbitals described by their 

expansion coefficients C opt are used to evaluate the lowest Hessian eigenvalues and the 

corresponding eigenvectors by an iterative subspace method. If the lowest Hessian eigenvalue ε min is 

found positive, then it is clear that the optimization has converged to a minimum. If on the other 

hand the eigenvalue is negative, we know for sure that a lower stationary point exists. 

We would then like to take a step downhill in the direction x corresponding to the negative 

eigenvalue ε min 

( 2 ) 

SCF 

E x = εminx. (1.106) 

This can be accomplished making a unitary transformation of the optimized expansion coefficients 

C opt with x as the orbital rotation parameters to define the direction X dir of the step 

X 

dir 

T 

ai 

⎡ 0 −x 

⎤ 

= ⎢ ⎥ . (1.107) 

⎣ xai 

0 ⎦ 

The step length is controlled by a parameter α 

Uα 

= exp ( −α 

X dir ) 

(1.108) 

C′ ( α ) = C U . (1.109) 

opt opt α 

A line search is then carried out for α > 0 to find the lowest SCF energy in the direction X dir . This is 

of course expensive since every point in the line search requires an evaluation of the Fock matrix 

46

Convergence for Problems with Several Stationary Points 

with respect to the new coefficients C opt ′ . When the SCF energy minimum in the direction X dir is 

found, the corresponding coefficients should be the initial orbitals for a new SCF optimization, 

hopefully now optimizing further downhill to a minimum. In problematic cases, e.g. with a very flat 

saddle point close to the minimum, we have found it convenient to continue the optimization with 

the line search scheme TRSCF-LS (the combination of TRRH-LS and TRDSM-LS described in 

Sections 1.4.1.4 and 1.4.2.4) to ensure a continued decrease in the energy. 

1.6.1.2 Examples 

In Fig. 1.31 and Fig. 1.32 two examples of problems with several stationary points are given. 


1.E+02 

1.E+00 

1.E-02 

1.E-04 

1.E-06 

1.E-08 

TRSCF 

d orth -shift 

TRSCF C-shift 

Line search 

0 20 40 60 

Iteration 

Fig. 1.31 HF calculations on the rhodium complex. 


1.E+02 

1.E+00 

1.E-02 

1.E-04 

1.E-06 

1.E-08 

(1) 

Line search 

(2) 

(3) 

TRSCF 

0 40 80 120 

Iteration 

Fig. 1.32 HF/STO-3G calculation on CrC. 

The first example is a HF optimization on the rhodium complex seen in Fig. 1.33 in the 

AhlrichsVDZ basis 59 combined with STO-3G on rhodium. For this example DIIS diverges, but the 

TRSCF scheme with C-shift converges nicely in 38 iterations. However, when the Hessian is 

inspected it is found that the lowest eigenvalue is negative, and a search in α is carried out in the 

direction corresponding to the negative eigenvalue. This is 

illustrated with the orange line in the picture. Since each 

evaluation of a step-length α necessitates an evaluation of the 

Rh Cl 

Fock matrix, it is fair to display each line search step as an 

iteration on the SCF iteration scale. When a minimum is found in 

this direction, the corresponding orbitals are used as a start guess 

for a new TRSCF optimization, and it is seen that it now Fig. 1.33 Rhodium complex. 

converges nicely to a new and lower stationary point which is 

found to be a minimum. When the d orth -shift scheme is applied in the TRRH steps instead of the C- 

shift scheme, it turns out that convergence to the minimum is obtained with no problems, as seen 

from Fig. 1.31, illustrating how the stationary point found from an SCF optimization not only 

depends on the start guess, but also on the optimization procedure. 

47

Part 1 


The second example is a HF/STO-3G optimization of CrC with a bond distance on 2.00Å. The 

example is also used in Fig. 1.13 and Fig. 1.25, but without discussing the stability of the converged 

state. Also in this case DIIS diverges whereas TRSCF converges nicely in 12-13 iterations to a 

stationary point which is found to have singlet instabilities. As for the first example, a line search is 

carried out in the downhill direction and a new TRSCF optimization is started from the resulting 

orbitals. This time the second optimization has more problems than was the case for the rhodium 

example, but finally it converges to a minimum. Whereas in the rhodium case, only one plateau 

corresponding to the saddle point could be seen, in this case three plateaus can be found, marked by 

numbers on the figure. The first is the saddle point that TRSCF converges to, at E SCF = 

− 1068.77014939 and with a lowest Hessian eigenvalue of -0.624. The second and third stationary 

points are recognized as saddle points by TRSCF itself and it manages to move away. If a DIIS 

optimization is carried out with a Hückel start guess, it converges to the second stationary point, 

which has E SCF = -1069.21761813 and a lowest Hessian eigenvalue of -0.038, again demonstrating 

that depending on the optimization procedure and start guess, different stationary points can be 

found. It is thus necessary to check the Hessian of the result to know for sure that a minimum is 

found, and in this case the final minimum has E SCF = -1069.30090709 and a lowest Hessian 

eigenvalue of 0.043. CrC is well known for being a molecule with a complicated electronic energy 

surface and has been the object for several theoretical studies 60 . 

The scheme testing for singlet instabilities and walking away from unstable stationary points could 

be integrated more efficiently in the optimization than is done here. It can be seen from Fig. 1.31 

and Fig. 1.32 that the optimizations are completely converged before the Hessian check is made, 

spending many iterations improving the unwanted result. The check could be made in an earlier 

stage, saving a number of iterations. Also the steps taken in the line search could be optimized such 

that fewer steps were necessary to find the minimum. Anyhow, it is convenient to have the 

possibility to continue an optimization until a minimum is found. 

1.7 Scaling 

As mentioned in the introduction, it is now possible to apply ab-initio quantum chemical methods, 

in particular HF and DFT, to large molecular systems of interest for biology and nano-science. This 

is due to both the developments in integral screening and algorithms for the Fock matrix builder and 

to approaches avoiding diagonalization and exploiting sparsity in the matrices. Since the TRSCF 

scheme has properties which would be of great advantage for SCF calculations on large and 

complex molecules, it is crucial that the scheme can be formulated in a linear or near-linear scaling 

manner. We have not been concerned with the build of the Fock matrix, and any state-of-the-art, 

linear or near-linear scaling approach could be used as the Fock builder for our scheme. The steps to 

48

Scaling 

consider are thus the Roothaan-Hall step TRRH, which evaluates a new density matrix, and the 

density subspace minimization TRDSM, which improves convergence. In the following subsections 

the scaling of these steps will be discussed. 

1.7.1 Scaling of TRRH 

The TRRH scheme with C-shift described in Section 1.4.1.2 requires the diagonalization of a level 

shifted Fock matrix and the knowledge of the occupied molecular orbital coefficients. The 

diagonalization scales as well as a matrix multiplication as N 3 , where N is the dimension of the 

problem, in this case the number of basis functions. However, a diagonalization is ineffective and 

cannot be nearly as well optimized as a matrix multiplication, and thus the scaling factor is much 

larger for the diagonalization than for the matrix multiplication. Also, the matrix multiplication can 

exploit sparsity and obtain a scaling linearly in the number of non-zero elements whereas sparsity is 

not as easily exploited in diagonalizations. Furthermore, the molecular orbitals described by the 

eigenvectors from the diagonalization of the Fock matrix are inherently delocalized and thus there is 

no sparsity to exploit. 

To obtain a linear scaling TRRH step it is thus necessary to avoid completely the diagonalizations 

and any reference to the MO basis. This can be done in our SCF program – a local version of 

DALTON 38,49 - by combining the d orth -shift scheme described in Section 1.4.1.5 with the trace 

purification (TP) described in Section 1.4.1.6. 

The trace purification scheme replaces the diagonalization of the level shifted Fock matrix and 

makes it possible to exploit sparsity in the matrices. A sparse blocked matrix storage scheme has 

been implemented for this purpose. In this scheme the columns and rows in the matrices are 

permuted such that close lying atoms are collected in blocks, making it possible to exploit the 

locality in the basis functions. Based on some drop tolerance for the size of matrix elements, pure 

zero blocks can be found and neglected, both saving storage and computing time. A library has been 

developed for the purpose of handling the matrix operations for this type of matrices and controlling 

the truncation error arising from the neglect of elements 49 . 

Calculations have been carried out on glycine chains of different length in the 4-31G basis set on a 

3.4GHz Xeon/Nocona Machine with EM64T architecture and MKL BLAS+LAPACK library. 

Timings have been made in the third iteration of the SCF optimization, measuring how much time 

(CPU) is spent in the TRRH step in the case of full matrices and diagonalizations of the level 

shifted Fock matrix (Diag./full) and in the case of sparse blocked matrices and the TP scheme 

(TP/sparse). The results are seen in Fig. 1.34. Both in the full and sparse case the d orth -shift scheme 

is applied. 

49

Part 1 


60 

Time / min. 

50 

40 

30 

20 

10 

Diag./full 

TP/sparse 

0 

400 1050 1700 2350 3000 

Number of basis functions 

Fig. 1.34 Timings of a TRRH step in case of 

diagonalizations of full matrices (Diag./full) and in 

case of trace purification of sparse blocked matrices 

(TP/sparse). 

The crossover is already around 1500 basis functions, and it is clear how the diagonalization 

scheme quickly will become too time consuming if the number of basis functions is increased 

further. Of course, this is a linear molecule as seen from Fig. 1.35, and the cross over will be later 

for more three-dimensional molecules. The TP method does not have an exact linear scaling 

because of the transformation to the orthogonal basis which gives rise to a quadratic term, but the 

scaling factor on the quadratic term is very small. It should be noted that the dynamic level shift 

scheme typically takes 5-10 diagonalizations or trace purifications to find the optimal level shift in 

the first couple of iterations, and as the timings are from the third iteration, then not just one, but 

several diagonalizations or purifications are included in the timings in Fig. 1.34. Currently a full 

trace purification optimization (30-70 purification iterations) is carried out for each level shift tested 

to find the optimal level shift. It is straightforward to optimize this process such that the purification 

is not converged as hard for the level shifts tested and rejected, as for the final optimal level shift. 

Fig. 1.35 Glycine chain. 

To conclude, the scaling of the TRRH scheme with C-shift is dominated by the diagonalization, and 

sparsity cannot be exploited. Still with a good Fock builder it can run effectively up to a couple of 

thousand basis functions, but at some point the diagonalizations get too time consuming. For larger 

systems the purification scheme with the d orth -shift scheme can be used with blocked sparse matrices 

resulting in a near-linear scaling. 

50

Applications 

1.7.2 Scaling of TRDSM 

For the density subspace minimization, a set of linear equations, Eq. (1.66), are solved in each DSM 

step, but only in the dimension of the subspace which is much smaller than the number of basis 

functions. It is therefore of no significance compared to the matrix additions and multiplications 

needed to set up the DSM gradient g and Hessian H for the linear equations. For TRDSM it will 

thus only be the number of matrix multiplication that determines the scaling. Nothing has to be 

changed to exploit sparsity in the matrices, and linear scaling is automatically obtained from the 

point where the number of non-zero elements in the matrices is linear scaling. For full matrices the 

scaling is formally N 3 , where N is the number of basis functions, but as mentioned in the previous 

subsection this is not a problem as it is for the diagonalization, since matrix multiplications can be 

carried out with close to peak performance on computers. However, the number of matrix 

multiplications should be kept at a minimum as it affects the scaling factor. 

The number of matrix multiplications is dependent on the dimension of the subspace as the number 

of gradient and Hessian elements grows with the size of the subspace, but even though the Hessian 

is set up explicitly, the number of matrix multiplications only scales linearly with the dimension of 

the subspace. The expressions for the DSM gradient and Hessian are found in 0, and it is seen that if 

only the matrices FD i , SD i , FDiS and DSD i are evaluated, then all the terms for a Hessian 

element can be expressed as the trace of two known matrices or their transpose. As the operation 

TrAB scales quadratically instead of cubically, the overall scaling of TRDSM will be nN 3 for full 

matrices, where n is the dimension of the subspace and N the dimension of the problem. For sparse 

matrices both the matrix multiplications and TrAB scale linearly, but since n 2 TrABs are evaluated, 

the overall scaling is n 2 N. However, the trace operations have a very small prefactor. 

In the TRSCF scheme with C-shift the diagonalizations are thus the dominating operations, but 

since both the TRRH and TRDSM step can be carried out without any reference to the MO basis 

and with matrix multiplications as the most expensive operations, the TRSCF scheme is near-linear 

scaling and has what it takes to be applied to really large molecular systems. It is still a work in 

progress to get all the parts working together, so unfortunately no large scale TRSCF calculations 

will appear in this thesis, and no benchmarks in which sparsity in the matrices is exploited for 

TRDSM can be presented, but the whole framework is in place. 

1.8 Applications 

In this section, numerical examples are given to illustrate the convergence characteristics of the 

TRSCF and ARH calculations. Comparisons are made with DIIS, the TRSCF-LS method, and the 

globally convergent trust-region minimization method (GTR) of Francisco et. al. 26 . 

51

Part 1 


In Section 1.8.1 a set of small molecules used by Francisco et. al. to illustrate the convergence 

characteristics of GTR is considered. Next in Section 1.8.2 the convergence of calculations on three 

metal complexes is discussed for the DIIS, TRSCF and TRSCF-LS methods. 

1.8.1 Calculations on Small Molecules 

As an alternative to the RH diagonalization, Francisco et. al. have developed an energy 

minimization method (GTR), where an energy model is minimized by a trust-region minimization. 

They have proven that it is a globally convergent algorithm, that is, no matter the starting point; the 

iterative steps will converge towards a stationary point. The best results are obtained when they 

combine GTR with DIIS and thereby let DIIS accelerate the convergence. To examine the 

convergence characteristics of TRSCF and ARH compared to GTR, calculations have been carried 

out with the attempt to reproduce the conditions given in the paper by Francisco et. al.. Thus HF 

calculations have been carried out with a maximum number of 10 previous density matrices for the 

density subspace minimizations and convergence is obtained when the difference between two 

consecutive energies is smaller than 10 -9 E h . The results are given in Table 1-7; the numbers found 

with our SCF program are on a white background, whereas results copied from the GTR paper are 

on a grey background. 

Table 1-7 Number of iterations in HF calculations performed by each algorithm in some test problems. The 

geometry of the molecules and the results in grey are taken from the paper by Francisco et. al. 26 , and 

GTR+DIIS is their globally convergent trust-region algorithm with DIIS acceleration. 

Algorithm 

Molecule Basis Start guess DIIS TRSCF 

C-shift 

TRSCF 

d orth -shift 

ARH DIIS GTR 

+DIIS 

H 2 O STO-3G H1-core 7 7 7 6 5 5 

6-31G H1-core 10 9 8 8 8 8 

NH 3 STO-3G H1-core 7 8 7 6 7 7 

6-31G H1-core 9 9 8 8 7 7 

CO STO-3G H1-core 12 9 9 9 11 10 

Hückel 8 8 8 - 7 7 

CO(Dist) * STO-3G H1-core 39(a) 9 8 8 117(b) 10 

Hückel 35 10 8 - 85 15 

6-31G H1-core 24(a) 13 10 9 27(b) 115 

Hückel 21(a) 10 10 - 36(b) 59 

Cr 2 STO-3G H1-core 34(a) 14(a) 10(a) 12(a) 13 38 

CrC STO-3G H1-core 29(a) 13(a) 11(a) 10(a) (X) 29 

* Distorted geometry – double bond length compared to CO 

(a) Negative Hessian eigenvalue. 

(b) Converged to a higher energy than some of the other algorithms 

(X) No convergence in 5001 iterations. 

Let us first consider the results obtained from our SCF program. Comparing the TRSCF results 

(both C-shift and d orth -shift) to the DIIS results, it is clear that the TRSCF method not only is an 

52

Applications 

improvement when DIIS cannot converge, but also for small simple examples, the convergence of 

TRSCF is as good as or better than for DIIS. Also it is observed that in five instances DIIS converge 

to a stationary point which is not a minimum, while that only happens in two instances for TRSCF. 

This suggests that the TRSCF algorithm does not have a high tendency to converge to saddle points 

compared to DIIS. Comparing the results obtained for TRSCF with the C-shift and the d orth -shift 

schemes, only minor differences are seen for these small examples, but in all cases the d orth -shift 

scheme presents a faster or similar convergence rate compared to the C-shift scheme. With the 

ARH method the convergence is further improved compared to the TRSCF/d orth -shift scheme. It is 

only a matter of saving a single iteration in some of the examples, but the tendency is clear. As the 

algorithm is still in the implementation phase, no numbers can currently be obtained with the 

Hückel start guess. 

Comparing now the results from our SCF program with the results from the GTR paper, the obvious 

peculiarity is the discrepancies between the DIIS results obtained by Francisco et. al. and by us. A 

plain DIIS optimization should be completely reproducible, but there is a difference of two out of 

seven iterations. These differences cannot be explained and make it more difficult to compare our 

results with theirs. Furthermore it seems that they have not tested the Hessian eigenvalues at the 

end; only if they for some other start guess or optimization method found a lower energy, it is noted 

in their table, and thus we cannot know for sure if the given number of iterations corresponds to 

convergence to a minimum. For Cr 2 and CrC it is very difficult to find the minimum, and several 

saddle points exist where convergence can be obtained (see Section 1.6). It is thus an open question 

whether the GTR+DIIS calculations for Cr 2 and CrC actually converge to a minimum or to a saddle 

point as for the TRSCF methods. 

In the examples where GTR+DIIS gives an improvement compared to their DIIS results, TRSCF 

and ARH also give significant improvements to our DIIS results. For the distorted CO example, 

TRSCF and ARH show better convergence than GTR+DIIS even if the results could be compared 

directly. For all examples TRSCF and ARH converge in 7-14 iterations, whereas GTR+DIIS use 

between five and 115. However, as discussed in Section 1.4.1.3, DIIS does not perform well when 

the gradient and energy are not correlated as is often the case in the global region when using 

TRRH, and could very well be the case for GTR as well. TRRH should be combined with a density 

subspace minimization method in the energy (e.g. TRDSM), and the same probably applies for 

GTR. We would thus suggest an implementation of TRDSM in connection with GTR. 

In conclusion it has been illustrated that the TRSCF and ARH methods have very nice convergence 

properties with improvements compared to DIIS in general and to GTR+DIIS as well, in case of 

more problematic examples. 

53

Part 1 


1.8.2 Calculations on Metal Complexes 

In reference 39 and throughout this part of the thesis, three molecules including transition metals 

have been used for examples, namely the molecules in Fig. 1.3, Fig. 1.6 and Fig. 1.33. In this 

section HF and LDA calculations on these metal complexes are given both for DIIS, TRSCF and 

TRSCF-LS. For all calculations a H1-core start guess has been employed and a maximum of 10 

matrices are used to define the subspace in the density subspace minimization. This is different 

from the examples given in ref. 39, where the subspace dimension never was larger than eight. 

Furthermore for the TRSCF calculations in ref. 39 the C-shift scheme was applied whereas in the 

calculations reported here, the d orth -scheme has been applied. 

TRSCF-LS is the TRSCF line search method in which the TRRH-LS and TRDSM-LS steps 

described in Sections 1.4.1.4 and 1.4.2.4 are combined to set up an expensive, but highly robust 

method, in which the lowest SCF energy is identified by a line search at each step. The convergence 

results of the optimizations are seen in Fig. 1.36. For the cadmium complex a STO-3G basis set has 

been applied, for the rhodium complex the AhlrichsVDZ basis set 59 has been applied except for the 

rhodium which is described in the STO-3G basis and for the zinc complex the 6-31G basis set has 

been applied. 

The convergence of the TRSCF and TRSCF-LS methods is comparable for all cases in Fig. 1.36, 

and in general the TRSCF calculations converge in fewer iterations than the TRSCF-LS calculations 

do. As mentioned the line search method TRSCF-LS is much more expensive than TRSCF, and the 

only reason for applying it instead of TRSCF is for very difficult examples, where convergence 

cannot be obtained in any other way. 

The convergence behavior of the DIIS method is somewhat more erratic than that of the TRSCF 

methods since it makes no use of Hessian information and therefore cannot predict reliably what 

directions will reduce the total energy. The HF calculation on the rhodium complex and the LDA 

calculation on the zinc complex both diverge for the DIIS method. In general the erratic behavior is 

in particular seen in the global region whereas in the local region, it converges as well as the 

TRSCF method. 

54

Applications 

HF 

LDA 


1.E+02 

1.E+00 

1.E-02 

1.E-04 

1.E-06 

A 


1.E+02 

1.E+00 

1.E-02 

1.E-04 

1.E-06 

A 

1.E-08 

0 5 10 15 20 

Iteration 

1.E-08 

0 5 10 15 20 

Iteration 


1.E+02 

1.E+00 

1.E-02 

1.E-04 

1.E-06 

B 


1.E+02 

1.E+00 

1.E-02 

1.E-04 

1.E-06 

B 

1.E-08 

0 10 20 30 40 

Iteration 

1.E-08 

0 10 20 30 40 

Iteration 


1.E+02 

1.E+00 

1.E-02 

1.E-04 

1.E-06 

C 


1.E+02 

1.E+00 

1.E-02 

1.E-04 

1.E-06 

C 

1.E-08 

0 10 20 30 40 

Iteration 

1.E-08 

0 10 20 30 40 

Iteration 

DIIS TRSCF TRSCF-LS 

Fig. 1.36 Convergence of HF and LDA calculations on (A) the cadmium complex from Fig. 1.6, 

(B) the rhodium complex from Fig. 1.33, and (C) the zinc complex from Fig. 1.3. 

For the examples presented both in this and the previous subsection, the TRSCF convergence is as 

good as or better than DIIS, and for problems where DIIS diverges, convergence is obtained with 

the TRSCF methods. It thus seems that TRSCF has the properties of a good black-box optimization 

algorithm. 

55

Part 1 


1.9 Conclusion 

In this part of the thesis the trust region SCF (TRSCF) algorithm is presented as a means to improve 

SCF convergence compared to methods typically used today e.g. DIIS. In the TRSCF method, both 

the Roothaan-Hall (RH) step and the density-subspace minimization (DSM) steps are replaced by 

optimizations of local energy models of the Hartree-Fock/Kohn-Sham energy E SCF . These local 

models have the same gradient as the energy E SCF , but an approximate Hessian. Restricting the steps 

of the TRSCF algorithm to the trust region of these local models, that is, to the region where the 

local models approximate E SCF well, smooth and fast convergence may be obtained. 

The developments through the years in SCF optimization algorithms are reviewed, and it is found 

that the fundamental schemes used in TRSCF to improve convergence have been around for several 

years; DIIS is actually a subspace minimization in the gradient norm, and level shifts have been 

used to improve or force convergence since 1973. Anyhow, the level shifts have previously been 

found on a trial and error basis as a constant parameter, whereas we advocate a dynamic level shift 

scheme in which the level shift is used to control the density change in the RH step. As such the 

level shift is optimized in each iteration to allow the density to change to the trust radius of the RH 

energy model, hence the name trust region Roothaan-Hall (TRRH) for our RH scheme. Also, the 

density subspace minimization has been improved compared to previous methods. An accurate 

energy model is constructed in the iterative subspace, where only minor approximations are made 

compared to the SCF energy. The trust region minimization of this energy model thus corresponds 

well to a minimization of E SCF in the iterative subspace, thus resulting in an energy decrease in each 

trust region DSM (TRDSM) step. The TRRH and TRDSM steps in combination make up a 

successful scheme with a high convergence rate without compromising the control of the density 

changes in each step. 

Compared to ref. 38 and 39 , an alternative level shift scheme (d orth -shift) for the TRRH step is 

presented which does not control the density change through the overlap of the individual orbitals, 

but instead controls the amount of new information added to the density subspace. Thus the d orth - 

shift scheme does not contain any reference to the MO basis and can be used in connection with 

alternatives to diagonalization. Also, it is found that the d orth -shift scheme leads to a faster 

convergence since the former level shift scheme is too restrictive, ignoring the well known changes 

contained in the density subspace. 

For TRDSM, an improvement of the energy model is developed, in which a part of the term 

neglected in the DSM energy model compared to the SCF energy is recovered. However, the effects 

of the improvement are found rather small compared to the extra complexity added to the algorithm. 

56

Conclusion 

An energy minimization algorithm is presented as well, replacing the standard RH-diagonalization 

in the SCF optimization. The novel idea is to exploit the valuable information saved in the density 

subspace of the previous densities to construct an improved RH energy model (augmented 

Roothaan-Hall - ARH) and minimize this model instead of the RH model. This makes the TRDSM 

step redundant since a density subspace minimization now is included in the minimization of the 

RH energy model. We expect a faster convergence rate for ARH compared to TRSCF, mainly 

because the RH and DSM steps are merged to an energy model with correct gradient (not just in the 

subspace) and an approximate Hessian, which is improved in each iteration using the information 

from the previous density and Fock matrices. The preliminary results from the ARH energy 

minimization seems promising, with convergence improvements compared to TRSCF, which 

already had better or as good convergence rates as DIIS. 

The errors introduced in the TRRH and TRDSM energy models compared to the SCF energy are 

studied. Since the DFT and HF energy expressions differ, the errors in the energy models are 

potentially different for the two methods. It is found that the DSM energy model has the same error 

of the order ||D δ || 2 for both HF and DFT, where D δ is the idempotency correction we impose on the 

averaged density. For the RH energy model it is found by inspecting test cases that the errors are 

larger for LDA than for HF, especially when convergence is approached. The error can be divided 

into two sources, namely the error in the RH Hessian compared to the SCF Hessian, and the size of 

the third and higher order contributions from the nonlinear terms in the SCF energy, which are not 

included in the RH energy model. By further tests it seems that the Hessian is better described in 

LDA than in HF, and since the errors are larger for LDA in particular close to convergence, it seems 

unlikely that the third and higher order terms are causing the difference. The question why larger 

errors are seen for LDA than for HF is thus still unanswered and it will be further investigated. 

The stability of stationary points is discussed and a method to test and walk away from unstable 

stationary points is described, and examples are given, where it has been applied. It is 

acknowledged that such a method is very valuable since otherwise a minimum could not have been 

found for the examples given. 

The scaling of TRSCF is also considered. An alternative to diagonalization has been implemented 

in our SCF program, where instead of diagonalizing the Fock matrix, the trace purification scheme 

by Palser and Manolopoulos 19 and later Niklasson 48 is used. The purification scheme in combination 

with the d orth -shift scheme make the TRRH step near-linearly scaling. The trace purification scheme 

is linear scaling in an orthogonal basis, but since the optimization scheme is formulated in the nonorthogonal 

AO basis, the transformation to an orthogonal basis has an N 2 scaling with a small 

prefactor. Timings for the TRRH step with diagonalizations and with purifications are given, and it 

57

Part 1 


is seen that the trace purification scheme is a major improvement compared to diagonalization when 

more that a couple of thousand basis functions are needed. The TRDSM step is based on matrix 

multiplications and additions, so by construction it will be linearly scaling when sparsity in the 

matrices is exploited. 

As illustrated in the examples throughout this part of the thesis and in the applications section, 

significant improvements to SCF convergence have been obtained. For both the TRSCF and ARH 

examples presented, the convergence is as good as or better than DIIS, and for problems where 

DIIS diverges, convergence is obtained with the TRSCF and ARH methods. The globally 

convergent trust region method by Francisco et. al. 26 is found to be better only for the simplest 

examples whereas for the rest, the TRSCF and ARH methods are found superior. The future success 

of the TRSCF method depends on a well optimized implementation of the diagonalization 

alternative combined with the dynamic level shift scheme, and sparsity being exploited in an 

efficient manner such that it can compete with the linear scaling SCF programs used today. The 

future success of the ARH method depends on finding efficient ways of solving the nonlinear 

equations corresponding to the minimization of the energy model. For this purpose different 

preconditioners will be tested. 

To conclude, there are still some adjustments that should be done to improve the algorithms, but the 

framework is in place. The SCF optimization algorithms presented in this thesis, each make up a 

black-box optimization scheme for HF and DFT as there is one scheme without any user-adjustment 

that lead to fast and stable convergence for both simple and problematic systems studied so far. We 

are thus convinced that TRSCF and ARH are build to handle the optimization problems of the 

future. 

58

Part 2 



The first part of this thesis was concerned with the optimization of the one electron density matrix 

for Hartree-Fock (HF) and density-functional theory (DFT). From such an optimized density, 

information about excited states and how the system reacts to a perturbation (e.g. an external 

electric field) may be obtained using response theory. Response theory and the derivation of 

molecular properties will be the subject of this part of the thesis. 

Response theory provides a rigorous approach for calculating molecular properties. As for the SCF 

optimization algorithms, the theory has usually been formulated in the molecular orbital (MO) basis 

which is inherently delocal, making the implicated matrices non-sparse. A reformulation in the local 

atomic orbital (AO) basis is thus necessary to obtain linear scaling algorithms and permit 

calculations of properties for large systems. Such a reformulation, in which an exponential 

parameterization of the density matrix is employed, is given in a paper by Larsen et al. 61 . 

The AO formulation of the response functions has a number of advantages compared to the MO 

formulation, besides locality. The response equations and molecular property expressions are 

simpler in the AO basis as the involved matrices (e.g. the Fock and property matrices) enter the 

equations in the basis they are evaluated in originally. No transformation between bases is necessary 

in the AO formulation as it is in the MO formulation. The AO formulation is particular convenient 

for perturbation dependent basis sets. In the MO formulation a set of perturbation dependent 

orthonormal molecular orbitals must be introduced. These orbitals have no physical content and 

thus add artificial complexity to the problem. To exemplify the benefits of the AO formulation, the 

expression for the excited state geometrical gradient is derived in Section 2.4. 

59

Part 2 


In the conventional MO formulation, number operators are redundant and can be eliminated. 

However, in the AO basis the number operators are not redundant and must be included. Because of 

this, the proof of pairing in the solutions of the response equations cannot be directly taken from the 

MO basis to the AO basis. It is thus necessary to study the impact of the included number operators 

on the solver for the AO response equations. This has been done in Section 2.2, using the method of 

second quantization to formulate the AO based response equations. Implementation issues 

connected to solving the AO response equations are discussed in Section 2.3. In Section 2.5 a 

couple of simple examples are given, where the AO response solver is used to find ground and 

excited state properties. In Section 2.6 the results of this part of the thesis are summarized. 

2.2 AO Based Response Equations in Second Quantization 

In this section the linear response equations are derived for Hartree-Fock theory, but with minor 

technical changes they apply to DFT as well. The quadratic and higher response equations could 

equally well be derived in this formulation; however, this is not necessary to arrive at the basic 

conclusions. 

2.2.1 The Parameterization 

Consider a set of atomic orbitals (χ µ ) with the real and symmetric metric S. The creation and 

annihilation operators for the atomic orbitals fulfil the anticommutation relation 

† 

µ , ν + νµ 

⎣⎡a a ⎦ ⎤ = S . (2.1) 

We will consider the following exponential operator 

Tˆ 

= exp ( iκˆ 

), (2.2) 

where ˆκ is a Hermitian one-electron operator 

To examine the action of 

ˆ κ = ∑ κ 

(2.3) 

µν 

† 

µν aµ aν 

† 

κ = κ . 

(2.4) 

exp( iκ ˆ) 

, we consider the transformed creation operators 

a = exp( iˆ) a exp( −iˆ 

κ) 

. (2.5) 

† † 

µ κ µ 

It is seen that the transformed operators satisfy the same anticommutation relations as the 

untransformed operators 

⎡⎣a 

a 

⎤⎦ ⎡⎣ iˆ a iˆ iˆ a iˆ 

⎤⎦ 

† † 

µ , ν = exp( κ) µ exp( − κ),exp( κ) ν exp( − κ) 

+ + 

= exp( iˆ 

κ) ⎡⎣a , a exp( − iˆ) = S . 

† 

µ ν ⎤⎦ 

κ 

+ 

νµ 

(2.6) 

60

AO Based Response Equations in Second Quantization 

The exponential operators of Eq. (2.2) are therefore the manifold of operators that conserves the 

general metric S. In the special case where S = 1, the exponential operator reduces to the standard 

exponential operator occurring in the second quantization formalism of the molecular orbital based 

method. 46 

Using the Baker-Champbell-Hausdorff expansion 46 and the anticommutation relation of Eq. (2.1), 

we get 

a 

a i ˆ a ˆ ˆ a 

† † † 1 

† 

µ = µ + ⎡⎣κ, µ ⎤⎦− ⎡ , , 

2 ⎣κ ⎡⎣κ 

µ ⎤⎤ ⎦⎦ + 

2 

µ ∑ νµ ν 2 ∑ νµ ν 

ν 

ν 

† † 1 

† 

= a + i ( κS ) a − ( κS ) a + . (2.7) 

= 

∑ 

ν 

† 

exp ( iκS 

) a . 

νµ 

ν 

To further investigate the properties of the above exponential transformation, we next consider the 

transformation of a single determinant state 0 with exp( iκ ˆ) 

0 = exp( iκˆ 

) 0 . (2.8) 

The properties of 0 may be obtained by comparing the expectation values of transformed 

creation-annihilation operators 

∆ = 0 

a a 0 

= 0 exp( −iˆ κ) a exp( iˆ κ) exp( −iˆ κ) a exp( iˆ 

κ) 0 

(2.9) 

† † 

µν µ ν µ ν 

with the expectation values of the untransformed operators 

† 

µν aµ aν 

∆ = 0 0 . (2.10) 

To rewrite Eq. (2.9) in terms of Eq. (2.10) we use Eq. (2.7) to write the transformed creation- and 

annihilation-operators in terms of the untransformed operators 

∑ 

∑ 

exp( − iˆ 

κ) a exp( iˆ) = exp( −i ) a 

† † 

µ κ 

κS ρµ ρ 

ρ 

exp( − iˆ 

κ) a exp( iˆ 

κ) = exp( iSκ) a . 

ν νρ ρ 

ρ 

T 

T 

( i ) exp ( i ) 

(2.11) 

Substituting these expressions into Eq. (2.9) gives 

∆ = exp - Sκ ∆ κ S . (2.12) 

In Appendix B, it is shown that if 0 is a single determinant wave function, then ∆ fulfils Eqs. 

(B-7), corresponding to the symmetry, trace, and idempotency condition for the one-electron 

density. We will now show that if ∆ fulfils these equations then so does ∆ . The Hermiticity of ∆ 

follows from the Hermiticity of S and κ and will not be shown explicitly here. The trace relation is 

shown as follows 

61

Part 2 


Tr ∆S = Tr ∆exp( iκ SS ) exp( −iSκ ) SS 

−1 T −1 T −1 

−1 

T T −1 

= Tr ∆exp( iκ S) exp( −iκ SS ) 

= Tr ∆S , 

(2.13) 

where we have used the relation 

−1 −1 

B exp( A) B = exp( B AB ) . (2.14) 

The same relation may be used to show the idempotency relation 

− 

( i ) ( i ) ( i ) ( i ) 

T T T −1 T 

( iSκ ) ∆ ( iκ S ) ( iκ S) S ∆ ( iκ S ) 

T −1 T 

( iSκ ) ∆S ∆exp 

( iκ S ) 

T 

T 

( iSκ ) ∆ ( iκ S ) ∆ 

−1 T T 1 T T 

∆S ∆ 

= exp − Sκ ∆exp κ S S exp − Sκ ∆exp 

κ S 

= exp − exp exp − exp 

= exp − 

= exp − exp = . 

(2.15) 

We can therefore conclude that ∆ fulfils Eqs. (B-7) and exp( iκ ˆ) 0 is therefore a legitimate 

normalized single-determinant wave function. It can be shown that all matrices fulfilling Eqs. (B-7) 

can be obtained from an appropriate choice of κ, so the transformation of Eq. (2.8) is a complete 

parameterization. 

2.2.2 The Linear Response Function 

We will now use the parameterization of Eq. (2.8) for an arbitrary single-determinant wave function 

to describe a Hartree-Fock wave function in an external, time-dependent field. The parameters in κ 

will become time-dependent and we will in the following develop equations for obtaining these 

parameters. The time-dependent Hamiltonian can be written as 

H = H0 + Vt 

, (2.16) 

where H 0 is the Hamiltonian for the unperturbed system, and V t is a first-order perturbation. The 

perturbation will be turned on adiabatically, and V t can be expressed as 

∞ 

−∞ 

Vt 

= ∫ dωVω 

exp( ( − iω + ε ) t) 

, (2.17) 

where ε is a positive infinitesimal that ensures V t → 0 as t → -∞. The perturbation is required to be 

Hermitian, so we have the relation 

† 

ω 

V 

= V . (2.18) 

−ω 

To determine the linear response function, we begin by considering the time dependence of the 

expectation value 0 

A 0 of a one-electron operator A. We need only expand the wave function 

0 of Eq. (2.8) to first order in the external perturbation to obtain the linear response: 

(1) (2) 

t t 

ˆ κ = ˆ κ + ˆ κ +. (2.19) 

62


(0) 

ˆt κ 

The zero-order contribution, , vanishes as the unperturbed wave function 0 is assumed to be 

optimized for the zero-order Hamiltonian, so the Brillouin-conditions in the AO basis hold 

∂ 

∂ 

κ µν 

† 

µ ν 

0 H0 0 = i 0 ⎡⎣H0, a a ⎤⎦ 

0 = 0. (2.20) 

Substitution of the expansion of ˆκ into Eq. (2.8) gives to first order: 

(1) 

0 A 0 = 0 A 0 −i 0 ⎡ ˆ κt 

, A⎤ 

⎣ ⎦ 

0 . (2.21) 

Since the response functions are defined in the frequency rather than the time domain, we formulate 

the wave function corrections in the frequency space. By analogy with Eq. (2.17), we write 

∞ 

−∞ 

Inserting Eq. (2.22) into Eq. (2.21) we obtain 

(1) (1) 

κt = ∫ dωκω 

exp( ( − iω + ε ) t) 

. (2.22) 

∞ 

(1) 

0 A 0 = 0 A 0 −i dω 0 ⎡ ˆ κω 

, A⎤ 

⎣ ⎦ 

0 exp (( − iω + ε) 

t) 

. (2.23) 

∫ 

-∞ 

Comparing Eq. (2.23) with the formal expansion of an expectation value in terms of a response 

function 

∞ 

-∞ 

0 A 0 = 0 A 0 + d ω A; V exp (( − iω + ε) 

t) 

, (2.24) 

we may identify the linear response function as 

∫ 

ω 

ω 

(1) 

ω 

AV ; ω 0 ˆ 

ω 

=−i ⎡κ 

, A⎤ 

⎣ ⎦ 

0 

. (2.25) 

2.2.3 The Time Development of the Reference State 

Before the explicit time-dependent equations are set up for determining the time-dependent 

parameters of κ, it is convenient to rewrite ˆκ , Eq. (2.3), as 

† † † 

∑( µν µ ν ∗ 

µν ν µ ) ∑ µµ µ µ , (2.26) 

ˆ κ = κ a a + κ a a + κ a a 

µ > ν µ 

which follows from the Hermiticity of ˆκ . The operators of ˆκ may be collected in a vector (here in 

row form): 

where the three classes of operators are defined as 

† † 

( ) 

Λ = Q D Q , (2.27) 

Q 

D 

Q 

† † 

m aµ aν 

= , µ > ν 

† † 

m = aµ aµ 

m 

† 

ν µ 

= a a , µ > ν . 

(2.28) 

63

Part 2 


The parameters of κ may similarly be arranged in a vector 

such that 

⎛ ⎞ > 

() i 

κ µν µ ν 

⎜ ⎟ 

() i () i 

= ⎜ κµµ 

⎟ 

⎜ () i 

κ µ ν , 

µν 

∗ ⎟ > 

α (2.29) 

⎝ 

⎠ 

ˆ() i () 

κ = ∑ αm 

i Λm 

. (2.30) 

m 

Here the index m on Λ runs over all three classes of operators listed in Eq. (2.28). 

The single excitation operators a † 

µ aν have by Eq. (2.27)-(2.28) been divided into a set of atomic 

orbital excitations, corresponding to µ > ν and a set of atomic orbital deexcitations, corresponding to 

µ < ν. As the atomic orbital excitations and deexcitation have the same formal properties, this 

division does not have any physical content. However, the division will prove important when the 

paired structure of the response equations is investigated in Section 2.2.5. Note that it is not possible 

to exclude the number operators a † 

µ aµ in the atomic orbital representation, whereas they are 

redundant in the standard molecular orbital formulation. 

In the presence of the time-dependent perturbation, we introduce the time transformed operator 

basis 

⎛ Q 

⎞ 

† ⎜ ⎟ 

Λ 

= ⎜ D 

⎟ , (2.31) 

⎜ † ⎟ 

⎝Q 

 

⎠ 

where 

and similarly for 

† 

Q m and D m . 

Q = exp( iˆ 

κ) Q exp( −iˆ 

κ) 

(2.32) 

m 

The time evolution of 0 may now be determined using Ehrenfest’s theorem for the transformed 

† 

operators of Λ in Eq. (2.31): 

d † ∂ 

0 0 0 

† 0 0 

† 

Λ − 

⎛ 

Λ 

⎞ 

= − ⎡ Λ , 0 + ⎤ 0 

dt 

2.2.4 The First-order Equation 

m 

 

⎜ 

i H V 

∂t 

⎟ 

 

⎣ t 

 

⎝ ⎠ 

⎦ . (2.33) 

We now expand Eq. (2.33) in orders of the external perturbation, restricting ourselves to terms that 

are linear in the amplitudes. Inserting Eq. (2.19) into Eq. (2.33) and collecting the terms linear in the 

perturbation, we obtain the first-order time-dependent equation 

64


† (1) † † (1) 

κt 

=− ⎡ t ⎤ + 

0 ˆ κt 

i 0 ⎡ , ⎤ 0 i 0 , V 0 0 ⎡ , ⎡H 

, ⎤⎤ 

⎣ Λ 

⎦ ⎣ Λ ⎦ ⎣ Λ ⎣ ⎦⎦ 

0 . (2.34) 

To solve the time-dependent equation Eq. (2.34), we insert the frequency expansion of the wave 

function correction of Eq. (2.22) and of the external perturbation Eq. (2.17) 

∞ 

−∞ 

∞ 

∫−∞ 

∫ 

(1) (1) 

( − i + t)( ⎡Λ 

† ˆω 

⎤ − ⎡Λ 

† ⎡H0 

ˆω 

⎤⎤ ) 

dωexp ( ω ε) ω 0 

⎣ 

, κ 

⎦ 

0 0 

⎣ 

, 

⎣ 

, κ 

⎦⎦ 

0 

† 

( i t)( i ⎡Λ 

Vω 

⎤ ) 

= dωexp ( − ω + ε) − 0 ⎣ , ⎦ 0 . 

The first-order response equation is then found as 

† (1) † (1) 

ˆ 

† 

ω H0 

ˆω i Vω 

(2.35) 

ω 0 ⎡ , κ ⎤ 0 0 ⎡ , ⎡ , κ ⎤⎤ 

⎣ 

Λ 

⎦ 

− 

⎣ 

Λ 

⎣ ⎦⎦ 

0 = − 0 ⎡⎣ Λ , ⎤⎦ 

0 . (2.36) 

The equation may be written in terms of the matrices 

and the vector 

E 

= 0 ⎡⎣ Λ ,[ H0 

, Λ ] ⎤⎦ 0 , (2.37) 

[2] † 

Smn = 0 ⎡⎣ Λm , Λn 

⎤⎦ 0 , (2.38) 

[2] † 

mn m n 

[1] † 

ω = Λ 

m 

m 

⎡⎣V 

⎤⎦ 0 ⎡⎣ , Vω 

⎤⎦ 0 . (2.39) 

Using Eqs. (2.37)-(2.39) and (2.29)-(2.30), we now write the first-order response equations, Eq. 

(2.36), in the form 

( ω ) 

[2] − [2] (1) = i [1] 

ω 

E S α V , (2.40) 

where E [2] and S [2] may be viewed as generalized electronic Hessian and overlap matrices 61,62 . The 

[2] [2] 

matrix elements E mn and S mn (Eq. (2.37) and (2.38)) can be expressed as matrix multiplications 

and additions of the density, Fock and overlap matrices. 61 

The linear response function is obtained by inserting the first-order correction as obtained in Eq. 

(2.40) in the expression for the linear response function Eq. (2.25). Renaming the perturbation 

operator V ω to B and introducing 

we obtain 

A 

B 

[1] 

m =− ⎡ ⎣ Λm 

[1] † 

m = ⎡Λm 

0 , A⎤ 

⎦ 0 

(2.41) 

0 ⎣ , B⎤ 

⎦ 0 

− 

( ) 1 

[1] [2] [2] [1] 

AB ; ω 

=−A E −ωS B . (2.42) 

The linear response function may thus be calculated by solving one set of linear equations at each 

frequency. To be more explicit, denoting the solution vector to the linear response equation 

B ω 

− 

( ω ) 1 

[2] [2] [1] 

N ( ) = E − S B , (2.43) 

65

Part 2 


the linear response function in Eq. (2.42) can be obtained as 

[1] 

B 

AB ; ω 

=−A N ( ω) 

. (2.44) 

2.2.5 Pairing 

The excitation energies are identified as the poles of the linear response function of Eq. (2.42) and 

are therefore solutions to the generalized eigenvalue problem 

[2] [2] 

E X = ωS X. (2.45) 

In the MO formulation of response theory, it has been shown that the excitation energies are 

paired 63 , so that if ω i is an eigenvalue for Eq. (2.45) then so is -ω i . It is important to understand how 

pairing appears in the AO basis, in particular since this structural feature is exploited when the 

equations are solved iteratively as is necessary for large problems. This is further discussed in 

Section 2.3. Since the proof of the pairing given in the MO formulation cannot be directly 

transferred to the AO formulation due to the presence of the diagonal operators D m , this section 

gives the proof in the AO formulation. 

The structure of E [2] and S [2] in the AO formulation is analyzed for the purpose of examining the 

pairing structure. Dividing Λ into the tree classes of Eq. (2.28), the matrix E [2] may be written as 

† 

⎛ 0 ⎡⎣Q, ⎡⎣H0, Q ⎤⎤ ⎦⎦ 0 0 [ Q, [ H0, D] 

] 0 0 [ Q, [ H0, Q] 

] 0 ⎞ 

[2] 

⎜ 

⎟ 

† 

E = ⎜ 0 ⎣⎡D, ⎣⎡H0, Q ⎦⎦ ⎤⎤ 0 0 [ D, [ H0, D] 

] 0 0 [ D, [ H0, Q] 

] 0 ⎟. (2.46) 

⎜ † † † † 

⎟ 

⎝ 0 ⎣⎡Q , ⎣⎡H0, Q ⎦⎦ ⎤⎤ 0 0 ⎣⎡Q ,[ H0, D] ⎦⎤ 0 0 ⎣⎡Q ,[ H0, Q] 

⎦⎤ 

0 ⎠ 

If we assume for simplicity that all orbitals and integrals for the unperturbed system are real, the 

† 

elements of for example the block 0 ⎡⎣Q ,[ H0 

, Q ] ⎤⎦ 

0 are trivially rewritten as 

† † 

∗ 

0 ⎡⎣Qm, [ H0, Qn ] ⎤⎦ 0 = 0 ⎡⎣Qm, [ H0, Qn 

] ⎤⎦ 

0 

(2.47) 

† 

= 0 ⎡⎣Qm, ⎡⎣H0 

, Qn 

⎤⎤ ⎦⎦ 0 . 

The nine blocks in Eq. (2.46) can then all be written in terms of the following four matrices 

and we obtain 

† 

mn m 0 n 

A = 0 ⎡⎣Q , ⎡⎣H , Q ⎤⎤ ⎦⎦ 0 , 

Bmn = 0 ⎡⎣Qm , ⎡⎣H0 

, Qn 

⎤⎤ ⎦⎦ 0 , 

(2.48) 

Fmn = 0 ⎡⎣Qm , ⎡⎣H0 

, Dn 

⎤⎤ ⎦⎦ 0 , 

Gmn = 0 ⎡⎣Dm , ⎡⎣H0 

, Dn 

⎤⎤ ⎦⎦ 0 , 

⎛ A F B ⎞ 

[2] ⎜ T T 

E = F G F 

⎟ 

. (2.49) 

⎜ 

⎟ 

⎝ B F A ⎠ 

66


The matrix S [2] may in a similar way be written as 

⎛ Σ Ω ∆ ⎞ 

[2] T T 

S = 

⎜ 

Ω 0 -Ω 

⎟ 

⎜ 

- - - 

⎟ 

⎝ ∆ Ω Σ ⎠ 

, (2.50) 

where 

† 

mn ⎡Qm Qn 

Σ = 0 ⎣ , ⎤⎦ 

0 , 

∆ mn = 0 ⎡⎣Qm , Qn 

⎤⎦ 

0 , 

Ω = 0 [ Q , D ] 0 . 

mn m n 

(2.51) 

Note that the block containing two diagonal operators vanishes as 

† † † † 

[ Dm 

Dn 

] = ⎡⎣aµ aµ aνaν ⎤⎦ = Sµν aµ aν − Sνµ aν aµ 

= . (2.52) 

0 , 0 0 , 0 0 0 0 0 0 

To illustrate how the pairing is obtained in the AO formulation, we assume that the vector 

⎛ Z ⎞ 

X = 

⎜ 

U 

⎟ 

⎜ ⎟ 

⎝Y 

⎠ 

(2.53) 

is an eigenvector for Eq. (2.45) with eigenvalue ω 

⎛ A F B ⎞⎛ Z⎞ ⎛ Σ Ω ∆ ⎞⎛ Z ⎞ 

⎜ T T ⎟⎜ ⎟ T T 

F G F U = ω 

⎜ 

Ω 0 -Ω ⎟⎜ 

U 

⎟ 

. (2.54) 

⎟⎜ ⎟⎜ ⎜ ⎟⎜ ⎟ ⎜ 

- - - 

⎟⎜ ⎟ 

⎝ B F A ⎠⎝Y⎠ ⎝ ∆ Ω Σ ⎠⎝Y 

⎠ 

Multiplying the blocks of Eq. (2.54) gives three sets of equations 

AZ + FU + BY = ω ( ΣZ + ΩU + ∆Y ) 

( ) 

T T T T 

F Z+ GU+ F Y = ω Ω Z −Ω Y 

BZ + FU + AY = ω ( −∆Z −ΩU −ΣY 

). 

(2.55) 

We will now prove that the paired vector 

X 

P 

⎛Y 

⎞ 

= 

⎜ 

U 

⎟ 

⎜ ⎟ 

⎝ Z ⎠ 

(2.56) 

is an eigenvector for Eq. (2.45) with eigenvalue –ω 

⎛ A F B ⎞⎛Y⎞ ⎛ Σ Ω ∆ ⎞⎛Y 

⎞ 

⎜ T T ⎟⎜ ⎟ T T 

F G F U =−ω 

⎜ 

Ω 0 -Ω ⎟⎜ 

U 

⎟ 

. (2.57) 

⎟⎜ ⎟⎜ ⎜ ⎟⎜ ⎟ ⎜ 

- - - 

⎟⎜ ⎟ 

⎝ B F A ⎠⎝ Z⎠ ⎝ ∆ Ω Σ ⎠⎝ Z ⎠ 

Multiplying the blocks of Eq. (2.57) leads to the three sets of equations 

67

Part 2 


AY + FU + BZ = − ω ( ΣY + ΩU + ∆Z ) 

( ) 

T T T T 

F Y+ GU+ F Z = −ω 

Ω Y −Ω Z 

BY + FU + AZ = −ω 

( −∆Y −ΩU −ΣZ 

), 

(2.58) 

which are identical to Eqs. (2.55). It is thus concluded that if X is an eigenvector of Eq. (2.45) with 

eigenvalue ω, then X P is also an eigenvector with eigenvalue –ω. 

2.3 Solving the Response Equations 

For large systems, the response equations 

( ω ) 

[2] [2] [1] 

E − S N B ( ω ) = B (2.59) 

are best solved using iterative algorithms. These algorithms rely on the ability to set up linear 

transformations. Expressions for E [2] b and S [2] b, where b is a trial vector, have previously been 

derived. 61 [2] 

σ = E b (2.60) 

[2] 

ρ = S b. (2.61) 

In each iteration, the response equations are set up and solved in a reduced space. For a reduced 

space consisting of k trial vectors, the equations can be written as 

where the reduced matrices are found as 

( ω ) 

[2] [2] RED [1] 

RED 

− 

RED 

= 

RED 

E S X B , (2.62) 

[2] T [2] T 

RED ⎦ i j i j 

ij 

⎡ 

⎣ 

E ⎤ = b E b = b σ 

[2] T [2] T 

RED ⎦ i j i j 

ij 

⎡ 

⎣ 

S ⎤ = b S b = b ρ 

[1] T [1] 

RED ⎦ 

bi 

B . 

i 

⎡ 

⎣ 

B ⎤ = 

(2.63) 

Normally when this type of iterative procedure is used, the reduced space is extended with one new 

trial vector in each iteration. However, due to the pairing described in the previous section, the 

linear transformations of E [2] and S [2] on a trial vector, here exemplified by E [2] b, 

⎛ A F B ⎞⎛ Z⎞ ⎛ AZ+ FU+ 

BY ⎞ 

[2] ⎜ T T ⎟⎜ ⎟ ⎜ T T 

E b = F G F U = F Z+ GU+ F Y 

⎟ 

= σ , (2.64) 

⎟⎜ ⎜ ⎟⎜ ⎟ ⎜ 

+ + 

⎟ 

⎝ B F A ⎠⎝Y⎠ ⎝ BZ FU AY ⎠ 

may be obtained directly for the paired trial vector as well 

⎛ A F B ⎞⎛Y⎞ ⎛ AY+ FU+ 

BZ ⎞ 

[2] P ⎜ T T ⎟⎜ ⎟ ⎜ T T ⎟ P 

E b = F G F U = F Y+ GU+ F Z = σ . (2.65) 

⎟⎜ ⎜ ⎟⎜ ⎟ ⎜ 

+ + 

⎟ 

⎝ B F A ⎠⎝ Z⎠ ⎝ BY FU AZ ⎠ 

68

Solving the Response Equations 

The reduced space is therefore extended with both vectors without additional cost. Furthermore, 

when a trial vector and its paired counterpart are simultaneously added to the reduced space, the 

paired structure of the response equations is preserved. With this structure preserved, the 

eigenvalues in the reduced space will also be real and paired, and the lowest eigenvalue will 

monotonically decrease towards the converged value as the reduced space is increased. 64 

The solution vector in the reduced space X RED , can be expanded in the basis of trial vectors to 

express the solution vector in the full space 

k 

B 

. (2.66) 

N 

= ∑ 

i= 

1 

RED 

( X i bi 

) 

The residual can then be found as 

k 

( ω ) 

R = E − S N 

−B 

k 

∑ 

[2] [2] B [1] 

= X ( σ −ωρ ) −B 

i= 

1 

RED [1] 

i i i 

. 

(2.67) 

If the norm of the residual is smaller than some specified tolerance, the iterative procedure is ended 

and the converged solution vector has been found 

B 

B 

N ( ω ) = N . (2.68) 

If the residual is too large, a new trial vector may be generated from the residual, preferably with a 

preconditioner A to speed up the convergence 

k+ 1 = 

−1 

b A R . (2.69) 

The reduced space is then extended with b k+1 and bk 

+ 2 = b 

k + 1 

and Eq. (2.62) is set up and solved 

again, establishing the iterative procedure. 

2.3.1 Preconditioning 

As mentioned above, the residual found in each iteration should be preconditioned to obtain an 

effective solver. As a consequence of the strict AO formulation, the electronic Hessian has no 

diagonal dominance as was the case in the MO basis. This makes preconditioning a challenge. So 

far, this problem has not been solved in our SCF response solver. Instead, a transformation is made 

to the MO basis, where the preconditioning is carried out in the usual way using the orbital 

eigenvalue differences, 

k 

P 

MO 

T 

⎣⎡b + 1 ⎦⎤ = ⎣⎡C RkC ⎦⎤ 

( εa −εi 

), (2.70) 

k ai ai 

69

Part 2 


where C is the MO expansion coefficients and ε the orbital energies of the reference state. The 

index a refers to virtual orbitals and i refers to occupied orbitals. The resulting vector is then back 

transformed to the AO basis 

MO 

k + 1 = 

k + 1 

T 

b Cb C . (2.71) 

An AO alternative to this preconditioner should of course be found, since the reference to the MO 

basis in this preconditioner introduces dense matrix intermediates. Moreover, at least one 

diagonalization should be carried out at the end of the optimization of the reference state to obtain 

the information on the MOs. 

2.3.2 Projections 

In the MO basis, the orbital rotations within the occupied and virtual spaces are redundant. The 

response equations in the MO formulation are thus simply set up in the non-redundant occupiedvirtual 

space to avoid linear dependencies. In the AO basis no such separation exists and the 

equations are set up in the full space. To avoid redundancies in the AO formulation, projections 

onto the non-redundant space should be made. In the exponential parameterization of the density 

matrix used in our AO formulation of the response functions, the projector 23 

where 

P = P⊗ Q+ Q⊗P 

T T 

( X) = ∑ µν ρσ X ρσ = ( PXQ + QXP ) 

P P (2.72) 

µν , , 

µν 

ρσ 

P = DS 

Q = 1−DS, 

(2.73) 

projects onto the non-redundant parameter space. It can be shown that all new trial vectors b and 

linear transformations σ and ρ should be projected onto the non-redundant space in the following 

manner 

b 

σ 

ρ 

= P b 

k+ 1 k+ 

1 

T 

k+ 1= 

P σk+ 

1 

T 

k+ 1= 

P ρk+ 

1 

, 

, 

. 

(2.74) 

When solving the response equations as described in the beginning of this section, the vectors 

projected as in Eq. (2.74) are used. 

70

The Excited State Gradient 

2.4 The Excited State Gradient 

In this section the expression for the geometrical gradient of the singlet excited state is derived, to 

illustrate how expressions for properties can straightforwardly be derived in the AO response 

framework. 

As for the derivations in Section 2.2 we assume that the wave function of the ground state is 

optimized at the point of the potential surface, x 0 , where the excited state gradient is evaluated. The 

variational condition is thus fulfilled at that point 

FDS − SDF = 0, (2.75) 

and the ground-state energy at x 0 is further obtained as 

E 

0 

= 2TrhD + TrDG ( D ) + h , (2.76) 

nuc 

where h is the one-electron Hamiltonian matrix in the AO basis, h nuc is the nuclear-nuclear 

repulsion, G holds the two-electron AO integrals and the Fock matrix F is given by h + G(D). 

As mentioned previously, the excitation energy corresponding to the excitation from the ground 

state 0 to the excited state f can be found from the poles of the linear response function for the 

optimized ground state, 62 i.e. as the eigenvalue of the linear response generalized eigenvalue 

equation as Eq. (2.45) 

where ω f is the electronic excitation energy 

and b f is the normalized eigenvector. 61,62 

( ω f ) 

[2] [2] f 

0 

The excitation energy can then be obtained from Eq. (2.77) as 

E − S b = , (2.77) 

f 

0 

ω f = E − E 

(2.78) 

f 

f † [2] 

assuming that the eigenvectors b f satisfy the normalization condition 

f 

ω = b E b , (2.79) 

f † [2] f 

b S b = 1. (2.80) 

Since we are interested in the molecular gradient for the excited state, f , the energy of the excited 

state should be defined at arbitrary points on the potential surface. 

2.4.1 Construction of the Lagrangian 

The analytic expression for the excited state gradient is found using the Lagrangian technique 65 . We 

construct the Lagrangian for the excited state energy E f = E 0 + ω f , using a matrix-vector notation, 

( 1) ( ) 

f 0 f † [2] f f † [2] f 

† 

L = E + b E b −ω 

b S b − −X FDS−SDF . (2.81) 

71

Part 2 


The variational condition on the ground state, Eq. (2.75), and the orthonormality constraint 

condition on the eigenvectors, Eq. (2.80), are included, and they are multiplied by the Lagrange 

multipliers ω and X , respectively. 

We then require the Lagrangian to be variational in all parameters 

∂L f 

= SDF − FDS = 0 

(2.82) 

∂X 

f 

∂L 

f † [2] f 

= b S b − 1= 

0 

(2.83) 

∂ω 

f 

∂L 

[2] f [2] f 

= E b − ωS b = 0 

(2.84) 

f † 

∂b 

f 

∂L 

f † [2] f † [2] 

= b E − ωb S = 0 

(2.85) 

f 

∂b 

f 0 f † [2] f f † [2] f 

∂L 

∂E 

∂b E b ∂b S b ∂( FDS −SDF 

) 

n 

= + −ω 

− X n 

= 0 

∂X ∂X ∂X ∂X ∑ 

, (2.86) 

∂X 

m m m m n 

m 

where X m are the orbital rotation parameters. Due to the 2n + 1 rule, and since the gradient is a firstorder 

property, we only need to solve the above equations through zero order. Eqs. (2.82)-(2.85) are 

thus already taken care of, and it is seen that the multiplier ω is determined as the eigenvalue of the 

linear response equations, i.e. it corresponds to the excitation energy. It is then only necessary to 

determine the Lagrange multipliers X such that Eq. (2.86) is also fulfilled. 

2.4.2 The Lagrange Multipliers 

To evaluate the terms in Eq. (2.86), the asymmetric Baker-Campbell-Hausdorff (BCH) expansion 46 

of the exponentially parameterized density is applied 

DX ( ) = exp( − XSD ) exp( SX) = D+ [ DX , ] S 

+ , (2.87) 

where 

[ AB , ] S 

= ASB−BSA. (2.88) 

Since the derivatives are evaluated at the expansion point, only terms of first order in X are nonzero. 

The last term in Eq. (2.86) is found to be equal to 61 

[2] 

[ , ] [ , ] ([ , ] ) ([ , ] ) 

E X = F X D S− S X D F+ G X D DS−SDG X D . (2.89) 

S S S S 

We can thus find X by solving the set of linear equations 

E 

[2] 

0 f † [2] f f † [2] f 

∂E 

∂b E b ∂b S b 

X = + −ω 

∂X ∂X ∂X 

From the matrix expressions for b f† E [2] b f and b f† S [2] b f 61 

. (2.90) 

72

The Excited State Gradient 

( ) 

b E b F ⎡ b D b ⎤ G b D D b (2.91) 

f † [2] f f f † f f † 

=−Tr ⎣ 

⎡⎣ , ⎤⎦ , −Tr ⎡ , ⎤ ⎡ , ⎤ 

S ⎦S 

⎣ ⎦S ⎣ ⎦S 

f † [2] f f † f 

b S b = Tr b S⎡⎣D, 

b ⎤⎦ 

S (2.92) 

and the relations for the two-electron integrals 

T 

S 

T 

( ) = ( ) 

G A G A (2.93) 

Tr AG ( B ) = Tr BG ( A ) , (2.94) 

the terms on the right hand side of Eq. (2.90) are found as 

where 

0 

∂E 

∂X 

= 0 , (2.95) 

f † [2] f 

f f † 

A 

2 ω⎡ 

, ⎤ 

⎣ 

SDS ⎡b b ⎤ S 

∂X S ⎦ 

, (2.96) 

∂b S b 

− ω 

= − ⎣ ⎦ 

f † [2] 

∂b E b 

∂X 

f 

= ADS −SDA 

, (2.97) 

( ) ( ) ( ⎡ , ⎡ , ⎤ ⎤ ) 

f † f f f f f † f f † 

A = Sb Fb S−Sb F − Fb S− Sb F b S+ G 

⎣ 

b ⎣D b ⎦ 

( ⎡ ⎤ ) ( ⎡ ⎤ ) 

+ 2 ⎡ 

, − , ⎤ 

⎣ 

Sb G b D G b D b S 

⎦ 

f † f f f † 

⎣ ⎦S 

⎣ ⎦S 

S 

S 

⎦ 

S 

(2.98) 

and 

[ ] A 1 1 † 

M = M− 

M (2.99) 

2 2 

[ ] S 1 1 † 

M = M + M . (2.100) 

2 2 

Eq. (2.95) is straight forward since the variational condition Eq. (2.75) is fulfilled at the expansion 

point. 

2.4.3 The Geometrical Gradient 

The excited state geometrical gradient should be expressed in terms of the first derivatives of the 

one and two electron integral matrices h x , G x , S x and the density, Fock and overlap matrices at the 

expansion point x 0 . The notation A x denotes the geometrical first derivative of A. In ref. 66 it was 

found that the first derivative of the density D x (X) is given by the first derivative of the reference 

density matrix D x which, from the idempotency condition for D, is found to be 

x 

x 

D =−DS D. (2.101) 

The first-order geometrical derivative is given by 

f f 0 f † [2] f f † [2] f 

dE dL dE 

∂b E b ∂b S b ∂( FDS −SDF 

) 

= = + −ω 

−X . (2.102) 

dx dx dx ∂x ∂x ∂x 

73

Part 2 


The first term is simply the geometrical gradient of the ground state. In ref. 66 this was shown to be 

E 

0 x = 2Tr x + Tr x ( ) + Tr 

x + hnuc 

x 

Dh DG D D F . (2.103) 

The other terms are found as the derivative of the matrix expressions in Eq. (2.91) and (2.92) 

f † [2] 

∂b E b 

∂x 

f † [2] 

f 

f 

( ( )) 

=− Tr F + G D ⎡ , , ⎤ −Tr ⎡ , , ⎤ 

⎣ 

b D b 

⎦ 

F 

⎣ 

b D b 

⎦ 

x x f f † f x f † 

⎡⎣ ⎤⎦ ⎡ ⎤ 

S S ⎣ ⎦S 

−Tr F⎡⎡⎣ , ⎤⎦ x 

, ⎤ Tr ⎡⎡ ⎣ , ⎤⎦ 

, ⎤ 

⎣ 

b D b 

⎦ 

F 

⎣ 

b D b 

⎦ 

f f † f f † 

− 

S S 

S 

† 

( ⎡ ⎤ ) ⎡ ⎤ 

S 

S 

f x f † f † 

( ⎡ ⎤ )( ⎡ ⎤ ⎡ ⎤ 

x ) 

x f f 

− Tr G ⎣b , D⎦ ⎣D, 

b ⎦ 

− 2Tr G ⎣b , D⎦ ⎣D , b ⎦ + ⎣D, 

b ⎦ 

∂b S b 

f † x f 

− ω 

= −ωTr b S ⎡ , ⎤ 

∂x 

⎣D b ⎦ S 

S 

S S S 

( ⎡ ⎤ ⎡ ⎤ 

x 

⎡ ⎤ ) 

f † x f f f x 

⎣ ⎦S ⎣ ⎦S ⎣ ⎦S 

− ω Tr b S D , b S+ D, b S+ 

D, 

b S 

S 

x 

S 

(2.104) 

(2.105) 

∂( FDS −SDF 

) 

x x x x A 

− X = − 2X⎡ 

+ ( ) + + ⎤ 

∂x 

⎣F DS G D DS FD S FDS ⎦ , (2.106) 

where F x = h x + G x (D). Collecting the various terms we obtain 

f 

∂E 

∂x 

f f † x f † x f 

( D ⎡ 

⎤ 

⎣ 

⎡⎣b D⎤⎦ b [ ] 

S ⎦ 

D X 

S ) h ⎡ ⎤ 

S 

( ⎡ ⎤ 

S 

) 

S 

⎣D b ⎦ G ⎣b D⎦ 

f f † 

x x 

( D ⎡⎡ 

⎣b D⎤⎦ 

b ⎤ [ D X] 

) G D hnuc 

= Tr 2 − , , − , −Tr , , 

+ Tr − 

⎣ 

, , 

⎦ 

− , ( ) + 

S 

S 

S 

x f f † x f † f † 

f 

DG( ⎡ 

⎤ 

⎣ 

⎡⎣b D⎤⎦ b ) ( x 

S ⎦ 

⎡ 

S S 

) ( 

S 

) 

S ⎣D b ⎤⎦ ⎡⎣Db ⎤⎦ G ⎡⎣b D⎤⎦ 

x 

x 

DG( [ DX] 

) ( ⎡ ⎤ [ ] x 

S ⎣D X⎦ 

DX 

S 

S 

) F 

f x 

( ⎡ 

f † † † 

⎡⎣ 

b D ⎤ , ⎤ ⎡ 

f f 

, 

x 

, ⎤ ⎡ 

f f 

, , ⎤ 

⎣ ⎦ b 

S ⎦ 

+ 

S S 

x ) 

S ⎣⎣ ⎡b D⎦⎤ b 

⎦ 

+ 

S ⎣⎣ ⎡b D⎦⎤ 

b 

⎦ 

F 

S 

f † f x f f x 

Tr b S( ⎡b , D ⎤ S ⎡b , D⎤ x 

S ⎡b , D⎤ 

S ) 

−Tr , , − 2Tr , + , , 

−Tr , − Tr , + , 

− Tr , 

+ ω f ⎣ ⎦ + ⎣ ⎦ + ⎣ ⎦ 

f † x f 

+ ω f Tr b S ⎡⎣b , D⎤⎦ 

S, 

G b D b , ( [ , ] ) 

where ( ⎡ 

f 

f † 

, , ⎤ 

⎣ 

⎡⎣ 

⎤⎦S 

⎦ ) 

S 

G x x f 

(D), ( ⎡ , ⎤ ) 

S 

S S S 

f 

G D X , ( ⎡ , ⎤ ) 

(2.107) 

G S ⎣ b D ⎦ and F can be evaluated, whereas 

S 

G ⎣ b D ⎦ , h x and nuc 

x 

h have to be evaluated for each geometrical perturbation. 

S 

Note that no two-electron integrals are represented explicitly, in order to obtain the best 

performance – e.g. for linear scaling codes - no reference should be made to four-index integrals. 

2.4.4 The First-order Excited State Properties 

The expression for the first-order one-electron excited state properties for perturbation independent 

basis sets is obtained from the expression for the excited state gradient by omitting all two-electron 

derivative terms, as well as all terms involving the derivative of the overlap matrix 

74

Test Calculations 

( ⎡ 

† 

⎡⎣ 

⎤⎦ 

⎤ [ ] ) 

x 2Tr x Tr f , , f , x x 

= − 

⎣ 

S ⎦ 

− + 

S 

S nuc 

f h f Dh b D b D X h h . (2.108) 

The first and last terms in Eq. (2.108) correspond to the ground state first order property as seen 

from Eq. (2.103). 

2.5 Test Calculations 

To illustrate the possibilities of an AO response solver in connection with our SCF optimization 

program, test calculations have been carried out on problematic cases from the first part of the 

thesis. The lowest excitation energy and the average polarizability, both static and in a field with ω 

= 0.03a.u., have been found for the zinc complex in Fig. 1.3 and the rhodium complex in Fig. 1.33. 

The levels of theory chosen are those where DIIS could not optimize the reference state, namely 

LDA/6-31G for the zinc complex and HF/AhlrichsVDZ with STO-3G on the rhodium for the 

rhodium complex. 

Table 2-1 Ground state properties obtained with our AO response solver. All numbers are in a.u. 

The average polarizability Excitation 

static ω = 0.03 energy 

Rhodium complex HF/AhrichsVDZ 170.598 173.349 0.0938 

Zinc complex LDA/6-31G 161.406 162.517 0.0713 

The basis sets applied in the test calculations are not satisfactory for serious polarizability 

calculations, and the numbers only demonstrate the perspectives of the AO response solver in 

combination with the SCF optimization algorithms described in Part 1. When the solver is fully 

implemented in the AO basis, we will be able to obtain molecular properties for large complex 

molecules in a routine manner. 

The implementation of the excited state gradient is a work in progress. So far we have implemented 

calculation of first-order one-electron properties of the excited state for perturbation independent 

basis sets as described in Section 2.4.4. The excited state dipole moment of the Rhodium complex 

from above has been found as 

Rh 

Cl 

µ = 5.960a.u. 

Again it should be noted that the basis set is insufficient for this type of calculation. This is only to 

demonstrate that it can be done. 

75

Part 2 



The atomic orbital (AO) based response equations have been derived using the second quantization 

framework. In particular, the proof of pairing is considered. Since the diagonal elements in κ are not 

redundant in the AO basis, the proof given in the MO basis cannot be directly applied. However, it 

is shown that there is also pairing in the AO basis. 

An AO response solver has been implemented similar to the solver in the MO basis with a few 

exceptions. The lack of diagonal dominance in the electronic Hessian in the AO basis makes 

preconditioning a difficult task. Optimally, the AO solver should be implemented in a linear scaling 

manner with only matrix multiplications and additions, and without reference to the MO basis. 

However, currently a transformation is made to the MO basis where the preconditioning is carried 

out followed by a transformation back to the AO basis. The redundant orbital rotations, which are 

simply left out of the MO equations, are removed in the AO formulation using projection operators. 

The response equations and molecular property expressions are simpler in the AO formulation than 

in the MO formulation. To demonstrate how expressions for properties can easily be derived in the 

AO response framework, the expression for the geometrical gradient of the singlet excited state has 

been derived. 

To illustrate the possibilities of the AO optimization methods presented in Part 1, joined with the 

AO response solver presented in this part of the thesis, test calculations are given for cases where 

DIIS diverged when optimizing the reference state. The averaged polarizability and the lowest 

excitation energy are given as well as the excited state dipole for one of the examples. 

The derivation and implementation of the various molecular properties is straightforward in the AO 

formulation compared to the MO formulation as exemplified by the excited state geometrical 

gradient. Especially the derivation of higher derivatives of molecular properties is simplified, and it 

will thus be natural to expand our response program in this direction. However, before calculations 

of molecular properties of large and complex molecules can be carried out in a truly linear scaling 

framework, the problems related to preconditioning of the AO solver must be solved. 

76

Part 3 



To corroborate the reliability of ab initio quantum chemical predictions of molecular properties, it is 

important to investigate and describe strengths and weaknesses of the many-electron models 

through systematic benchmark studies on different kinds of molecules. 

Regarding open-shell molecules, benchmarks have been reported comparing open- and closed-shell 

molecules examining the accuracy of molecular properties computed by various many-electron 

models. In a study of the atomization energies of 11 small molecules 67 no significant difference in 

the performance for closed- and open-shell molecules was found for the CCSDT model. However, 

in another study 68 it was found that even though the CCSD(T) model performs convincingly for 

closed-shell molecules, the performance for open-shell molecules is less impressive. 

In this part of the thesis full configuration interaction (FCI) benchmarks of molecular properties for 

the small open-shell molecules CN and CCH are presented. In the FCI model, all Slater 

determinants arising from distributing the electrons in the given one-electron basis with correct 

symmetry and spin-projection are included. Errors due to truncation of the many-electron basis are 

thus eliminated in an FCI calculation and it provides important benchmarks for other many-electron 

models. For open-shell molecules, the number of FCI benchmarks is limited and the work presented 

in this part of the thesis is an attempt to improve on this situation. We thus hope our results will 

serve as valuable benchmarks for further analysis of open-shell methods. 

3.2 Computational Methods 

All calculations have been carried out with the quantum chemical program package LUCIA 69 , using 

integrals and Hartree-Fock (HF) orbitals obtained from the DALTON 70 program. The calculations 

77

Part 3 


are based on a ROHF reference wave function, but no spin-adaption is imposed in the CI and CC 

calculations. 

All FCI calculations have been carried out in the Dunnings cc-pVDZ 71 basis set. Since the number 

of determinants in the FCI model increases exponentially with the number of basis functions and 

electrons, it is currently not feasible to do the FCI calculations on CN and CCH in the cc-pVTZ 

basis. As the cc-pVDZ basis does not provide accurate geometries and energetics, 46 we will also 

obtain the equilibrium geometry, harmonic frequency, and dissociation energy for CN using the ccpVTZ 

71 basis set in coupled cluster calculations, including up to quadruple excitations. In addition, 

FCI and CC calculations up to quadruples level have been carried out on CN and CN - in the basis 

set aug-cc-pVDZ without the diffuse d-functions (aug´-cc-pVDZ) to obtain the vertical electron 

affinity of CN. 

We investigate two ways of defining the excitation-level in CC. The typical approach is to let the 

excitation level identify the allowed number of orbital excitations, denoted CC(orb). If instead the 

excitation level is taken to identify the spin-orbital excitation level, selected excitations, which 

involve spin-flipping and other internal excitations, are excluded from the calculation for open-shell 

molecules. This scheme will be referred to as CC(spin-orb). The difference between the two 

definitions of the excitation level is illustrated in Fig. 3.1. The CI calculations will all be carried out 

with orbital excitations. 

Double 

orbital 

excitation 

Triple 

Spin-orbital 

excitation 

Fig. 3.1 An excitation which would be 

included in a CCSD(orb) calculation, but 

not in a CCSD(spin-orb) calculation. 

In the following SD, SDT, SDTQ, SDTQ5, SDTQ56 and SDTQ567 denote excitation-spaces which 

include up to 2, 3, 4, 5, 6 and 7 excitations from the occupied spin-orbitals respectively. 

78

Numerical Results 

3.3 Numerical Results 

First, the convergence of the CC and CI hierarchies for the open shell molecule CN is studied. Next, 

the potential curve for CN is obtained from CCSD, CCSDT, CCSDTQ, and FCI calculations at 

various inter-nuclear distances. In Section 3.3.3, the equilibrium geometries, harmonic frequencies, 

and dissociation energies obtained for CN are presented and in Section 3.3.4 the vertical electron 

affinity for CN is found. Finally, in Section 3.3.5 a minor benchmark study is presented where the 

equilibrium geometry of the intergalactic radical CCH is determined at the FCI level. 

3.3.1 Convergence of CC and CI Hierarchies 

The convergence of the CC and CI hierarchies are studied. For CN calculations have been carried 

out at the experimental equilibrium distance 72 r exp = 1.1718Å at the levels CCSD through 

CCSDTQ56. Both the orbital excitation and spin-orbital excitation approaches are considered. In 

addition, calculations have been carried out at the levels CISD through CISDTQ567 and in FCI. In 

all calculations the cc-pVDZ basis-set is used. The results are seen in Fig. 3.2. 

1.E-01 

1.E-02 

CI 

E dev / E h 

1.E-03 

1.E-04 

1.E-05 

CC(spinorb) 

CC(orb) 

1.E-06 

SD 

SDT 

SDTQ 

SDTQ5 

SDTQ56 

SDTQ567 

Fig. 3.2 E dev for CC with spin-orbital and orbital 

excitation levels and for CI with orbital excitation 

levels. E dev = E – E FCI . 

The first thing to note is the similarity of the two CC curves. Clearly the spin-orbital excitation 

restriction does not affect the accuracy in a significant way, the deviation energies are in all cases 

smaller for CC(orb), but the difference is negligible. 

Comparing the CI curve with the CC curves, two trends are obvious; the smooth convergence of the 

CC hierarchy compared to the CI hierarchy and the faster convergence of the CC hierarchy. The CC 

energy obtained using up to n-fold excitations is roughly as accurate as the CI energy using up to 

n+1-fold excitations. Both phenomena are explained by the inclusion of disconnected clusters in the 

CC wave function. At a given level of CC theory, the CC wave function includes all the CI 

configurations at the same level of CI theory plus some higher excitations arising from disconnected 

clusters. Consequently, it covers the dynamical correlation better than CI and is thus at the given 

79

Part 3 


level closer to the FCI solution. Describing the convergence pattern of the CI and CC hierarchies 

through orders of Møller-Plesset perturbation theory (MPPT), 73 the form of the curves can be 

predicted. Because also disconnected products of excitations are included in the ansatz of CC, the 

order of its error grows continually in the order of MPPT. Going from uneven to even excitation 

levels, both methods have an increase in the order of error in energy of two orders of MPPT, thus, 

the graphs are parallel. Going from even to uneven excitation levels, the CC error increases one 

order, whereas the CI error remains unchanged, giving a greater slope for the CC curve. This 

explains the parallel behavior going from uneven to even excitation levels and the smoother 

convergence of the CC hierarchy compared to the CI hierarchy. The stepwise convergence 

predicted by MPPT, which should be significant for CI and noticeably for CC, is not apparent 

though. The reason could be that CN is not strictly mono-configurational. 

The convergence patterns for CI and CC are very similar to the convergence patterns previously 

reported for N 2 . 74 Therefore, it does not seem that the open-shell nature of CN leads to slow 

convergence of the CI and CC hierarchies compared to closed shell cases. 

3.3.2 The Potential Curve for CN 

The potential curve for CN was determined from single-point calculations at the FCI level with 

basis set cc-pVDZ. Close to equilibrium the energies were converged to 10 -9 E h making the 

determination of accurate spectroscopic constants possible. The result is displayed in Fig. 3.3. 

E FCI / E h 

-92.15 

-92.20 

-92.25 

-92.30 

-92.35 

-92.40 

-92.45 

-92.50 

0.5 1.5 R / Å 2.5 3.5 

Fig. 3.3 The potential curve for CN found from FCI 

cc-pVDZ calculations. 

E dev / E h 

0.03 

0.02 

0.01 

0.00 

CCSD 

CCSDT 

CCSDTQ 

0.9 1.2 R / Å 1.5 1.8 

Fig. 3.4 E dev for the CC potential curves. E dev (R) = 

E(R) – E FCI (R). 

The potential curve was also created with the methods CCSD(orb), CCSDT(orb) and CCSDTQ(orb) 

in the basis set cc-pVDZ. Since the weight of the reference HF- determinant decreases as the internuclear 

distance increases, we examine the HF-coefficients from the FCI calculations and discover 

that it is irrelevant to make single-reference CC calculations beyond R = 1.8Å, since the weight of 

the reference has already dropped to 0.57 at that point. Fig. 3.4 displays the differences of the CC 

80


potential curves compared to the FCI curve. At a given inter-nuclear distance, the FCI energy has 

been subtracted from the CC energy. 

The decreasing weight of the reference ground state with increasing atomic distance is reflected in 

the quality of the CC wave functions. The correlation in the wave function compensates partially for 

the lack of a single dominant configuration; the higher the correlation level, the better the 

compensation. This is illustrated by the slopes of the curves in Fig. 3.4. Furthermore, it should be 

noticed how the deviation energy is nearly linear in R, with a slightly positive curvature around the 

equilibrium geometry. 

3.3.3 Spectroscopic Constants and Atomization Energy for CN 

The equilibrium geometry and harmonic frequency for CN were found from single-point 

calculations using quartic interpolation. The atomization energy was found at the experimental 

equilibrium distance. The results are displayed in Table 3-1. 

Table 3-1 Equilibrium geometry, harmonic frequency, and atomization energy for CN. 

R eq / Å ω e / cm -1 D e / kJ/mol 

CCSD(spin-orb) cc-pVDZ 1.1855 2114 629.2 

CCSD(orb) cc-pVDZ 1.1860 2111 631.6 

CCSDT(spin-orb) cc-pVDZ 1.1944 2046 662.9 

CCSDT(orb) cc-pVDZ 1.1946 2043 663.0 

CCSDTQ(spin-orb) cc-pVDZ 1.1964 2026 666.4 

CCSDTQ(orb) cc-pVDZ 1.1964 2025 666.5 

FCI cc-pVDZ 1.1969 2020 667.0 

CCSD(spin-orb) cc-pVTZ 1.1688 2136 674.2 

CCSDT(spin-orb) cc-pVTZ 1.1783 2067 714.4 

CCSDTQ(spin-orb) cc-pVTZ 1.1804 2045 718.5 

Experimental 72 1.1718 2069 --- 

As mentioned in Section 3.2, it is not feasible to carry out FCI calculations at the cc-pVTZ level. 

Still, the convergence of the CC hierarchy can be estimated by examining the changes in the 

constants. Since the difference in accuracy between the models CC(orb) and CC(spin-orb) is 

negligible compared to the deviation from FCI, only the CC(spin-orb) results are discussed from 

now on and only the CC(spin-orb) numbers are found at the cc-pVTZ level. 

The deviation curves for the coupled cluster energies (see Fig. 3.4) are increasing functions, and 

thus the coupled cluster equilibrium bond lengths are shorter than the one found from FCI. 

Furthermore, the positive curvature of the deviation-curves around the equilibrium leads to coupled 

cluster frequencies that are higher than the FCI frequency. 

81

Part 3 


As expected, the cc-pVDZ basis set does not provide accurate geometries and frequencies, and the 

cc-pVTZ numbers are clearly more in the range of the experimental data than the cc-pVDZ 

numbers. 

CCSD displays its insufficiency for prediction of equilibrium properties by differing from the FCI 

values by 0.01Å in the geometry, 90 cm -1 in the frequency, and 35 kJ/mol in the atomization energy. 

The errors in R eq and ω e are reduced by a factor of four going to the CCSDT level and a factor of 

five going from the CCSDT to the CCSDTQ level. The error in the atomization energy is reduced 

by a factor of nine going to the CCSDT level and a factor of eight going from the CCSDT to the 

CCSDTQ level, but while the equilibrium geometry on the CCSDTQ level is only 0.0005Å from 

the FCI value, the harmonic frequency is still about 5 cm -1 too high. 

Both the equilibrium geometry and the harmonic frequency are apparently better approximated by 

the CCSDT method than the CCSDTQ. This is due to a favorable cancellation in errors for CCSDT 

calculations in small basis sets. By extrapolation to the larger aug-cc-pVQZ basis, 67,75 we get an 

equilibrium distance of 1.1759Å and a harmonic frequency of 2060cm -1 at the CCSDTQ level. 

3.3.4 The Vertical Electron Affinity of CN 

Calculations on CN - and CN were carried out in the aug´-cc-pVDZ basis at the experimental 

equilibrium geometry for CN. The FCI calculation on CN - is one of the largest FCI calculations 

carried out so far containing about 20 billion Slater determinants. The vertical electron affinity (EA) 

was found and is displayed in Table 3-2. Again only CC(spin-orb) calculations have been carried 

out because of the rather small difference in performance of CC(spin-orb) and CC(orb). 

Table 3-2 The vertical electron affinity of CN. 

EA / E h EA - EA FCI 

CCSD(spin-orb) aug’-cc-pVDZ 0.13025 0.00063 

CCSDT(spin-orb) aug’-cc-pVDZ 0.12977 0.00014 

CCSDTQ(spin-orb) aug’-cc-pVDZ 0.12966 0.00003 

FCI aug’-cc-pVDZ 0.12962 --- 

The convergence is remarkable; already at the CCSD level we are down to an error of 0.5% of the 

FCI value, on the CCSDT level it is 0.1% and on the CCSDTQ level 0.02%. The reason for the 

excellent convergence is found in a cancellation of errors that influence the result. The deviations of 

the individual energies are always roughly an order of magnitude larger than the deviation of the 

affinity, 75 but the errors cancel when the CN and CN - energies are subtracted. That the convergence 

is from above is also noteworthy. This is because the CC hierarchy converges faster for CN - than for 

82


CN. This seems surprising since CN - contains one more electron than CN, but it could be explained 

by CN - being more one-configurational than CN. 

3.3.5 The Equilibrium Geometry of CCH 

The equilibrium geometry of CCH found from FCI/cc-pVDZ calculations is used in ref. 76 to 

calibrate coupled cluster calculations in larger basis sets. The FCI correction is assumed to be 

independent of basis set. 

To optimize for the two variables R(CC) and R(CH), the CCH radical is assumed linear and the CC 

and CH bonds are then distorted in step-lengths of δ = 0.01Å from an initial geometry making a grid 

of single-point calculations around the equilibrium geometry with R(CC) on the one axis and R(CH) 

on the other. The initial geometry is taken from a CCSDT cc-pVDZ study 76 , the geometry being 

R CCSDT (CC) = 1.23448Å and R CCSDT (CH) = 1.07924Å. The resulting potential energy surface is seen 

in Fig. 3.5. 

-76.4020 

-76.4024 

E FCI / E h 

-76.4028 

-76.4032 

-76.4036 

1.09924 

1.08924 

1.07924 

1.06924 

R (C-H)/Å 

1.21448 

1.22448 

1.23448 

R (C-C)/Å 

1.24448 

1.25448 

1.05924 

Fig. 3.5 The potential energy surface of CCH. 

From finite-difference expressions with the error being of the order δ 4 , the gradient and Hessian are 

found for the initial geometry and a Newton step is taken giving an improved guess for the 

equilibrium geometry. The FCI equilibrium geometry is thus found as 

FCI 

CCSDT −1 

R = R −H G, (3.1) 

where G is the gradient, H the Hessian, and R CCSDT the CCSDT geometry. 

The equilibrium geometry at the FCI level is found to be 

83

Part 3 


R FCI (CC) = 1.2367Å and R FCI (CH) = 1.0802Å. 

The error in the resulting geometry is a sum of the error from the finite difference approximations 

and the error from the Newton step. The gradient and Hessian carry an error of O(δ 4 ) where δ = 

0.01Å, this is an error in the order of 10 -8 Å. The Newton step has an error of O((H -1 G) 2 ), in this 

case H -1 G is of the size 10 -3 Å and so the error is in the order of 10 -6 Å. The error in total is thus in 

the order of 10 -6 Å. 

The gradient for the FCI equilibrium geometry has been found as above, making single-point 

calculations at the FCI geometry and at geometries distorted in steps of 0.01Å from the FCI 

geometry. The same finite-difference expressions as before are used. The gradient is found to be 

⎡ 

FCI 1.8593 10 

E 

⎢ 

;3.0661 10 

⎣ 

Å 

⎤ 

Å⎥⎦ 

G −5 h 

−5 

= − ⋅ ⋅ h , (3.2) 

thus verifying the correctness of the FCI geometry. 

Since the geometry was determined at the CCSDT level to be R CCSDT (CC) = 1.23448Å and 

R CCSDT (CH) = 1.07924Å, the error due to truncation of the many-electron basis in CCSDT is in the 

order of 10 -3 Å. This is similar to the results obtained for CN. This also suggests that the quadruples 

correction to the equilibrium geometry is in the order of 0.001-0.002Å. 


Full configuration interaction (FCI) and coupled cluster (CC) calculations have been carried out on 

CN using the cc-pVDZ and cc-pVTZ basis sets. The equilibrium bond distance, harmonic 

frequency, atomization energy, and vertical electron affinity have been evaluated on the various 

levels of theory. 

As expected, the cc-pVDZ basis set does not provide accurate geometries and frequencies and 

CCSD is insufficient for prediction of equilibrium properties. Apparently, the CCSDT method is a 

better approximation than CCSDTQ for obtaining the equilibrium geometry and the harmonic 

frequency. This is due to a favorable cancellation of errors for CCSDT calculations in small basis 

sets. Also the vertical electron affinities are affected by cancellation of errors, and already at the 

CCSD level, the error is less than 1mE h compared to the FCI value. 

The convergence patterns for the CI and CC hierarchies are studied for CN and it is found similar to 

the convergence patterns previously reported for N 2 . 74 Thus, it does not seem that the open-shell 

nature of CN leads to slow convergence of the CI and CC hierarchies compared to closed shell 

cases. 

E 

84

Conclusion 

For a number of the CC calculations, the excitation levels have been defined by spin-orbital 

excitations instead of orbital excitations. Certain internal excitations are thereby omitted, but it is 

seen that this does not affect the accuracy in any significant way. For a given excitation level, the 

energies obtained in the orbital formalism are in all cases closer to the FCI energy than the ones 

obtained in the spin-orbital formalism. However, the difference is negligible. 

The equilibrium geometry of CCH has been found at the FCI level in the cc-pVDZ basis set to be 

R FCI (CC) = 1.2367Å and R FCI (CH) = 1.0802Å. The correction found to the initial CCSDT geometry 

is in the order of 10 -3 Å. The FCI correction to the CCSDT equilibrium geometry of CN was of the 

same order. 

85

Summary 

The developments in computer hardware and linear scaling algorithms over the last decade have 

made it possible to carry out ab-initio quantum chemical calculations on bio-molecules with 

hundreds of amino acids and on large molecules relevant for nano-science. Quantum chemical 

calculations are thus evolving to become a widespread tool for use in several scientific branches. It 

is therefore important that the algorithms work as black-boxes, such that the user outside quantum 

chemistry does not have to be concerned with the details of the calculations. In particular Hartree 

Fock (HF) and density functional theory (DFT) methods are employed for calculations on large 

systems as they represent good compromises between relatively low computational costs and 

reasonable accuracy of the results. The HF and DFT methods have been a fundamental part of 

quantum chemistry for many years, and calculations on molecules of ever increasing size and 

complexity are made possible due to increasing computer resources. The conventional algorithms 

used for optimization of the one-electron density in HF and DFT are therefore continually tried on 

their stability and general performance and occasionally they break down. In these cases the 

calculation takes more time to complete than acceptable or no result can be obtained at all. 

We have improved on this situation. In the first part of this thesis, algorithms are presented which 

improve the optimization in HF and DFT significantly. The optimization has become more effective 

and where the optimization broke down using conventional algorithms, it now converges without 

problems. Furthermore, the presented algorithms have no problem-specific parameters and can thus 

be used as black-boxes. 

When the one-electron density has been optimized, molecular properties such as polarizabilities and 

excitation energies can be calculated. Response theory is often used for this purpose. In the second 

part of this thesis an atomic orbital (AO) based formulation of response theory is presented which 

allows linear scaling calculations of molecular properties. Furthermore, the derivation of 

expressions for molecular properties is simpler in the AO formulation than in the molecular orbital 

formulation typically used. To illustrate the benefits, the expression for the geometrical derivative 

of the excited state is derived in the AO formulation. 

To confirm the reliability of quantum chemical predictions of molecular properties, it is important 

to investigate and describe strengths and weaknesses of the quantum chemical models employed. 

The full configuration interaction (FCI) model is exact within a certain basis set of atomic orbitals. 

It is thus of great value to be able to compare results from approximate models with FCI results. In 

the third part of this thesis FCI results are presented for two open-shell molecules, namely CN and 

CCH. The FCI results are compared with results from approximate models used today for 

calculations where an accuracy comparable to the experimental is needed. 

87

Dansk Resumé 

Udviklingen i det seneste årti indenfor computerhardware og lineært skalerende algoritmer har gjort 

det muligt at udføre ab-initio kvantekemiske beregninger på bio-molekyler med hundredvis af 

aminosyrer og på store molekyler relevant for nanoteknologi. Kvantekemiske beregninger udvikler 

sig derfor til at være et bredt anvendt værktøj til brug for adskillige naturvidenskabelige grene. Det 

er derfor vigtigt at algoritmerne fungerer som såkaldte black-boxes, således at brugere uden for 

kvantekemi ikke behøver bekymre sig om detaljerne i beregningen. Især Hartree Fock (HF) og 

density functional theory (DFT) metoderne er benyttet til beregninger på store systemer, da de 

repræsenterer et godt kompromis mellem fornuftig nøjagtighed af resultaterne og relativ kort 

beregningstid. HF og DFT er metoder, som har været anvendt i kvantekemien igennem mange år, 

og da stadig større computer ressourcer er til rådighed bliver de brugt til at udføre beregninger på 

stadigt større og mere komplekse molekyler. De algoritmer som benyttes i dag til optimering af den 

en-elektroniske densitet i HF og DFT bliver derfor til stadighed testet på deres stabilitet og 

effektivitet og til tider bryder de sammen. I disse tilfælde tager beregningen enten uacceptabelt lang 

tid eller opgiver at levere et resultat. 

Vi har forbedret denne situation. I den første del af afhandlingen præsenteres algoritmer, som 

signifikant forbedrer optimeringen i HF og DFT. Optimeringen er blevet mere effektiv, og tilfælde 

hvor optimeringen før brød sammen kan nu udføres uproblematisk. De præsenterede algoritmer har 

desuden ingen problem-specifikke parametre og kan derfor betragtes som black-boxes. 

Når den en-elektroniske densitet er optimeret, kan molekylære egenskaber såsom polarisabiliteter 

og eksitationsenergier beregnes. Til det formål benyttes ofte responsteori. I anden del af 

afhandlingen præsenteres en atomorbitalformulering af responsteori, som muliggør en lineær 

skalering af egenskabsberegningerne. Desuden er udviklingen af udtryk for molekylære egenskaber 

blevet simplere i atomorbitalformuleringen sammenlignet med molekylorbitalformuleringen som 

ellers typisk benyttes. For at illustrere fordelene er udtrykket for den eksiterede tilstands 

geometriske gradient udviklet i atomorbitalformuleringen. 

For at bekræfte troværdigheden af kvantekemiske forudsigelser af molekylære egenskaber, er det 

vigtigt at undersøge og beskrive styrker og svagheder ved de kvantekemiske modeller som 

anvendes. Full configuration interaction (FCI) er en eksakt model inden for et bestemt sæt af 

atomorbital basisfunktioner. Det er derfor værdifuldt at kunne sammenligne resultater fra 

approksimative modeller med FCI resultater. I tredje del af afhandlingen er FCI resultater 

præsenteret for to åben-skal molekyler, CN og CCH. Disse resultater er sammenlignet med 

resultater fra approksimative modeller, som i dag bruges til at levere kvantekemiske beregninger 

med en nøjagtighed, som i visse tilfælde overgår den eksperimentelle. 

89

Appendix A 

The Derivatives of the DSM Energy 

The first and second derivatives of the DSM energy model with respect to c is found recalling that 

and 

DSM 

( ) ( ) 

( ) ( ) 2Tr 

E c = E D + 2TrFD δ , (A-1) 

E D = E D0 + DF + 0 + TrDF, + + 

(A-2) 

n 

D = c ( D −D ), (A-3) 

+ 

∑ 

i= 

1 

The two terms in Eq. (A-1) is evaluated one by one: 

and 

∂E 

∂c 

( D ) 

x 

i 

i 

0 

D δ = 3DSD −2DSDSD −D. (A-4) 

= Tr DF − Tr DF + Tr DF + Tr DF−Tr DF −Tr 

DF (A-5) 

x 0 0 x x x 

0 0 

∂ 

∂F 

∂D 

2TrFDδ 

= 2Tr Dδ 

+ 2TrF 

∂c ∂c ∂c 

x x x 

∂Dδ 

= 2TrFD 

x δ + 2Tr F , 

∂c 

x 

δ 

(A-6) 

where 

∂D 

∂ 

δ 

c x 

= 3DSD + 3D SD −2DSDSD −2DSD SD −2D SDSD −D . (A-7) 

The second derivative is found in the same manner 

∂ 

where 

2 

E 

∂c 

x 

( D ) 

∂c 

y 

x x x x x x 

= 2TrDF + TrDF + TrDF −TrDF −TrDF −TrDF −TrDF, (A-8) 

0 0 x y y x 0 x x 0 y 0 0 y 

2 

2 

∂ 

∂ δ ∂ δ ∂ δ 

2Tr δ = 2Tr D x + 2Tr D y + 2Tr 

D 

x y y x x y 

FD F F F , (A-9) 

∂c ∂c ∂c ∂c ∂c ∂c 

2 

∂ D 

∂c 

∂c 

x 

δ 

y 

= 3D SD + 3D SD −2DSD SD −2D SDSD −2DSD SD 

y x x y y x y x x y 

−2DSDSD−2DSDSD −2 DSDSD. 

y x x y x y 

(A-10) 

91

Appendix B 

The Density Matrix in the Atomic Orbital Basis 

In this appendix we will briefly review the density matrix in the atomic orbital basis and derive the 

most important relations. For convenience consider a single-determinant wave function with n 

molecular orbitals occupied. The expectation value of a one-electron operator may then be written 

as a sum over occupied spin-orbitals 

0 hˆ 

0 

n 

= ∑ h . (B-1) 

i= 

1 

ii 

Explicitly introducing the MO-AO transformation matrix C allow us to write the expectation value 

as 

0 hˆ 

0 

= 

n 

i= 

1 

ii 

N n 

⎛ 

∗ 

∑ hµν ∑Cµ iCν 

i 

µν , = 1 i= 

1 

N 

h 

⎞ 

= ⎜ ⎟ 

⎝ ⎠ 

= 

∑ 

∑ 

h 

D 

µν µν 

µν , = 1 

, 

(B-2) 

where N is the number of AO basis functions and we have introduced D as 

D 

n 

µν C ∗ 

µ iCνi 

i= 

1 

= ∑ . (B-3) 

It is of interest to study the relation between D and the expectation values ∆ of Eq. (2.10). To 

accomplish this we consider the second quantization expression for 0 h ˆ 0 in the nonorthogonal 

atomic orbital basis. According to ref. 46 one obtains 

N 

0 hˆ 

0 = 

0 0 

µν , = 1 

N 

µν , = 1 

N 

h 

1 1 † 

aµ a 

µν ν 

= ∆ 

= 

− − 

∑ ( S hS ) 

−1 −1 

∑ ( S hS ) 

∑ 

µν 

−1 −1 

( S ∆S ) 

µν 

µν µν 

µν , = 1 

. 

(B-4) 

By comparing Eqs. (B-4) and (B-2) we have the identification 

−1 −1 

D = S ∆S . (B-5) 

93

Thus, the density element D µν is only identical to the matrix element ∆ µν in an orthonormal basis. 

Although it could be argued that it would be appropriate to call ∆ the one-electron density matrix in 

the AO-basis, we will be consistent with the standard literature and call D the density matrix in the 

AO basis, and ∆ the matrix of expectation values of creation-annihilation operators. From the 

properties of the one-electron density matrix 

D 

† 

= D 

Tr DS = N 

DSD = D , 

elec. 

(B-6) 

one straightforwardly obtains the following relations for ∆ 

∆ 

Tr ∆S 

−1 

† 

−1 

= ∆ 

= N 

∆S ∆ = ∆. 

elec. 

(B-7) 

Although Eqs. (B-6) and Eqs. (B-7) are formally equivalent, the equations for the standard AO 

density matrix D are somewhat simpler to use as they contain the metric S whereas the equations for 

∆ involves the inverted metric S -1 . It should be noted that Eqs. (B-7) are necessary and sufficient 

conditions, so all three equations are fulfilled if and only if 0 is a normalized single-determinant 

wave function. 

94

Acknowledgements 

A number of people have made my four years of PhD study a pleasant and interesting experience, 

and I could not have done it without them. First of all I would like to thank Jeppe Olsen and Poul 

Jørgensen for guidance and support through the years; they are a fantastic team. I am grateful to the 

whole theoretical chemistry group for nice lunch breaks and cake-meetings, and I would like to 

thank in particular Ove Christiansen for his career advices and Andreas Hesselman for sharing some 

of his latest work with me. And Stinne, how I managed to get through the days before Stinne joined 

the group is a mystery. It quickly turned out that we have much the same attitude towards life and 

we have shared many a wholehearted opinion of the life as such and our work situation in 

particular. 

I would like to thank Pawel Salek for being good company during development and debugging of 

Fortran90 code of the finest quality and for being willing to help with any problems that I might 

have. A special thanks goes to Sonia Coriani and her husband Asger Halkier who took very good 

care of me during my visits in Trieste (even though I still havn’t tasted her mum’s lasagna). 

For a number of conferences, winter schools and summer schools a group of mainly Scandinavian 

people made my trips an extra pleasant experience. They were always ready for some boozing and 

all sorts of crazy ideas. In particular should be mentioned Patzke-guy; a gentleman disguised as a 

theoretician, Pekka; the lizard king, Ulf; the sweet Swede, crazy Mikael, Ola, Tommy and all the 

others. It has been some really fine hours spent with you guys, and I hope to see you all again, 

maybe for a salmari or two – no miksi ei. 

I also had the pleasure to spend a summer school with some of the students from the Copenhagen 

group: Marianne, Anders, Jacob and Thorsten. Anders and Jacob got connected to the Aarhus group 

at some point and have always been up for a nice chat and disgusting body noises to cheer up a grey 

day at work. 

I would like to thank Birgit Schiøtt for nice colleagueship in connection with teaching and for 

coffee and talks in her office. I look forward to our collaboration on my next project. 

I am grateful to the girl-gang; Louise, Trine, Cindie, and Rikke for keeping the connection to Århus 

and for gossip, lunch dates and girl nights. 

I would also like to thank my parents for raising me as a good girl who always did her homework, 

otherwise I would never have gotten this far, and last but not least a great thanks goes to Kristoffer 

for putting up with me and being considerate and caring when needed. 

95

References 

1 

2 

3 

4 

5 

6 

7 

8 

9 

C. C. J. Roothaan, Rev. modern Physics 23, 69 (1951). 

G. G. Hall, Proc. R. Soc. London, Ser. A 205, 541 (1951). 

W. Kohn and L. J. Sham, Phys. Rev. 140, A1133 (1965). 

J. Koutecky and V. Bonacic, J. Chem. Phys. 55, 2408 (1971); T. Claxton and W. Smith, Theor. 

Chim. Acta 22, 399 (1971); W. A. Lathan, L. A. Curtiss, W. J. Hehre et al., Progress in 

Physical Organic Chemistry. (Wiley, New York, 1974). 

D. H. Sleeman, Theor. Chim. Acta 11, 135 (1968). 

J. C. Slater, J. B. Mann, T. M. Wilson et al., Phys. Rev. 184, 672 (1969); A. D. Rabuck and 

G. E. Scuseria, J. Chem. Phys. 110, 695 (1999); B. I. Dunlap, Phys. Rev. A 29, 2902 (1984). 

R. McWeeny, Proc. R. Soc. London Ser. A 235, 496 (1956). 

R. McWeeny, Rev. Mod. Phys. 32, 335 (1960). 

R. Fletcher and C. M. Reeves, Comput. J. 7, 149 (1964). 

10 I. H. Hillier and V. R. Saunders, Proc. R. Soc. London Ser. A 320, 161 (1970). 

11 R. Seeger and J. A. Pople, J. Chem. Phys. 65, 265 (1976). 

12 R. N. Camp and H. F. King, J. Chem. Phys. 75, 268 (1981). 

13 R. E. Stanton, J. Chem. Phys. 75, 3426 (1981). 

14 W. R. Wessel, J. Chem. Phys. 47, 3253 (1967); Douady, Ellinger, Subra et al., J. Chem. Phys. 

72, 1452 (1980). 

15 G. B. Bacskay, Chem. Phys. 61, 385 (1981). 

16 R. Shepard, I. Shavitt, and J. Simons, J. Chem. Phys. 76, 543 (1982). 

17 H. J. Aa. Jensen and P. Jørgensen, J. Chem. Phys. 80, 1204 (1984); H. J. Aa. Jensen and H. 

Ågren, Chem. Phys. Lett. 110, 140 (1984). 

18 X. Li, J. M. Millam, G. E. Scuseria et al., J. Chem. Phys. 119, 7651 (2003); E. Hernández, M. 

J. Gillan, and C. M. Goringe, Phys. Rev. B 53, 7147 (1996); J. M. Millam and G. E. Scuseria, J. 

Chem. Phys. 106, 5569 (1997); M. Challacombe, J. Chem. Phys. 110, 2332 (1999). 

19 A. H. R. Palser and D. E. Manolopoulos, Phys. Rev. B 58, 12704 (1998). 

20 C. Ochsenfeld and M. Head-Gordon, Chem. Phys. Lett. 270, 399 (1997). 

21 R. W. Nunes and D. Vanderbilt, Phys. Rev. B 50, 17611 (1994); M. S. Daw, Phys. Rev. B 47, 

10895 (1993); X. P. Li, R. W. Nunes, and D. Vanderbilt, Phys. Rev. B 47, 10891 (1993). 

22 G. Galli and M. Parrinello, Phys. Rev. Lett. 69, 3547 (1992); F. Mauri, G. Galli, and R. Car, 

Phys. Rev. B 47, 9973 (1993); W. Kohn, Chem. Phys. Lett. 208, 167 (1993); P. Ordejon, D. 

Drabold, M. Grunbach et al., Phys. Rev. B 48, 14646 (1993). 

23 T. Helgaker, H. Larsen, J. Olsen et al., Chem. Phys. Lett. 327, 397 (2000). 

24 A. D. Daniels and G. E. Scuseria, Phys. Chem. Chem. Phys. 2, 2173 (2000). 

25 J. VandeVondele and J. Hutter, J. Chem. Phys. 118, 4365 (2003). 

26 J. B. Francisco, J. M. Martínez, and L. Martínez, J. Chem. Phys. 121, 10863 (2004). 

27 D. R. Hartree, The calculation of atomic structures. (John Wiley and Sons, Inc., New York, 

1957). 

28 E. Isaacson and H. B. Keller, Analysis of numerical methods. (Wiley, New York, 1966); C. C. J. 

Roothaan and P. S. Bagus, Methods in Computational Physics. (Academic, New York, 1963). 

29 N. W. Winter and T. H. Dunning Jr., Chem. Phys. Lett. 8, 169 (1971). 

97

30 W. B. Neilsen, Chem. Phys. Lett. 18, 225 (1973). 

31 M. C. Zerner and M. Hehenberger, Chem. Phys. Lett. 62, 550 (1979). 

32 G. Karlström, Chem. Phys. Lett. 67, 348 (1979). 

33 P. Pulay, Chem. Phys. Lett. 73, 393 (1980); P. Pulay, J. Comput. Chem. 3, 556 (1982). 

34 H. Sellers, Int. J. Quant. Chem. 45, 31 (1993). 

35 I. Hyla-Krispin, J. Demuynck, A. Strich et al., J. Chem. Phys. 75, 3954 (1981). 

36 E. Cancès and C. Le Bris, Int. J. Quant. Chem. 79, 82 (2000). 

37 K. N. Kudin, G. E. Scuseria, and E. Cancès, J. Chem. Phys. 116, 8255 (2002). 

38 L. Thøgersen, J. Olsen, D. Yeager et al., J. Chem. Phys. 121, 16 (2004). 

39 L. Thøgersen, J. Olsen, A. Köhn et al., J. Chem. Phys. 123, 074103 (2005). 

40 A. P. Rendell, Chem. Phys. Lett. 229, 204 (1994). 

41 H. Sellers, Chem. Phys. Lett. 180, 461 (1991); C. Kollmar, Int. J. Quant. Chem. 62, 617 (1997). 

42 V. R. Saunders and I. H. Hillier, Int. J. Quant. Chem. 7, 699 (1973). 

43 S. P. Bhattacharyya, Chem. Phys. Lett. 56, 395 (1978). 

44 R. Carbó, J. A. Hernández, and F. Sanz, Chem. Phys. Lett. 47, 581 (1977). 

45 E. Cancès and C. Le Bris, Math. Model. Num. Anal. 34, 749 (2000). 

46 T. Helgaker, P. Jørgensen, and J. Olsen, Molecular Electronic Structure Theory. (Wiley, 

Chichester, 2000). 

47 S. Goedecker, Rev. Mod. Phys. 71, 1085 (1999). 

48 A. M. N. Niklasson, Phys. Rev. B 66, 155115 (2002). 

49 E. Rubensson, Masters Thesis, Royal Institute of Technology (KTH), Stockholm, 2005. 

50 G. W. Stewart, Introduction to Matrix Computations. (Academic Press, inc., New York, 1973). 

51 J. W. Demmel, Applied Numerical Linear Algebra. (SIAM, 1997). 

52 R. Fletcher, Practical Methods of Optimization, 2nd ed. (Wiley, New York, 1987). 

53 G. Chaban, M. W. Schmidt, and M. S. Gordon, Theor. Chem. Acc. 97, 88 (1997); T. H. Fischer 

and J. E. Almlöf, J. Phys. Chem. 96, 9768 (1992). 

54 R. E. Stanton, J. Chem. Phys. 75, 5416 (1981). 

55 M. A. Natiello and G. E. Scuseria, Int. J. Quant. Chem. 26, 1039 (1984). 

56 P. Cizek and J. Paldus, J. Chem. Phys. 47, 3976 (1967); H. Fukutome, Int. J. Quant. Chem. 20, 

955 (1981); P. J. Thouless, Nucl. Phys. 21, 225 (1960). 

57 V. Bach, E. H. Lieb, M. Loss et al., Phys. Rev. Lett. 72, 2981 (1994); P.-L. Lions, Comm. Math. 

PHys. 109, 33 (1987). 

58 L. E. Dardenne, N. Makiuchi, L. A. C. Malbouisson et al., Int. J. Quant. Chem. 76, 600 (2000). 

59 A. Schafer, H. Horn, and R. Ahlrichs, J. Chem. Phys. 97, 2571 (1992). 

60 A. Kalemos, T. H. Dunning Jr., and A. Mavridis, J. Chem. Phys. 123, 014302 (2005); R. G. A. 

R. Maclagan and G. E. Scuseria, J. Chem. Phys. 106, 1491 (1997); I. Shim and K. A. Gingerich, 

Int. J. Quant. Chem. S23, 409 (1989). 

61 H. Larsen, P. Jørgensen, J. Olsen et al., J. Chem. Phys. 113, 8908 (2000). 

62 J. Olsen and P. Jørgensen, in Modern Electronic Structure Theory, Part II, edited by D. R. 

Yarkony (World Scientific, Singapore, 1995). 

63 J. Olsen and P. Jørgensen, J. Chem. Phys. 82, 3235 (1985). 

64 J. Olsen, H. J. Aa. Jensen, and P. Jørgensen, J. Comp. Phys. 74, 265 (1988). 

98

65 T. Helgaker and P. Jørgensen, Theor. Chim. Acta 75, 111 (1989); T. Helgaker and P. Jørgensen, 

in Advances in Quantum Chemistry (Academic Press, 1988), Vol. 19; T. Helgaker and P. 

Jørgensen, in Methods in Computational Molecular Physics, edited by S. Wilson and G. H. F. 

Diercksen (Plenum Press, New York, 1992). 

66 H. Larsen, T. Helgaker, P. Jørgensen et al., J. Chem. Phys. 115, 10344 (2001). 

67 D. Feller and J. A. Sordo, J. Chem. Phys. 113, 485 (2000). 

68 D. Sherrill E. F. C. Byrd, and M. Head-Gordon, J. Phys. Chem. A 105, 9736 (2001). 

69 J. Olsen, LUCIA, a quantum chemical program package. 

70 T. Helgaker, H. J. Aa. Jensen, P. Joergensen et al., DALTON, an electronic structure program 

(1997). 

71 T. H. Dunning Jr., J. Chem. Phys. 90, 1007 (1989). 

72 K. P. Huber and G. Herzberg, Molecular Spectra and Molecular Structure IV. Constants of 

Diatomic Molecules. (Van Nostrand, New York, 1979). 

73 W. Kutzelnigg, Theor. Chim. Acta 80, 349 (1991). 

74 J. W. Krogh and J. Olsen, Chem. Phys. Lett. 344, 578 (2001). 

75 L. Thøgersen and J. Olsen, Chem. Phys. Lett. 393, 36 (2004). 

76 P. G. Szalay, L. Thøgersen, J. Olsen et al., J. Phys. Chem. A 108, 3030 (2004). 

99

Part 1 

The Trust-region Self-consistent Field Method: 

Towards a Black Box optimization in Hartree-Fock and Kohn-Sham Theories, 

L. Thøgersen, J. Olsen, D. Yeager, P. Jørgensen, P. Sałek, and T. Helgaker, 

J. Chem. Phys. 121, 16 (2004)

JOURNAL OF CHEMICAL PHYSICS VOLUME 121, NUMBER 1 1 JULY 2004 

The trust-region self-consistent field method: Towards a black-box 

optimization in Hartree–Fock and Kohn–Sham theories 

Lea Thøgersen, Jeppe Olsen, Danny Yeager, a) and Poul Jørgensen 

Department of Chemistry, University of Århus, DK-8000 Århus C, Denmark 

Paweł Sałek 

Laboratory of Theoretical Chemistry, The Royal Institute of Technology, 

Teknikringen 30, Stockholm SE-10044, Sweden 

Trygve Helgaker 

Department of Chemistry, University of Oslo, P.O. Box 1033 Blindern, N-0315 Norway 

Received 17 February 2004; accepted 5 April 2004 

The trust-region self-consistent field TRSCF method is presented for optimizing the total energy 

E SCF of Hartree–Fock theory and Kohn–Sham density-functional theory. In the TRSCF method, 

both the Fock/Kohn–Sham matrix diagonalization step to obtain a new density matrix and the step 

to determine the optimal density matrix in the subspace of the density matrices of the preceding 

diagonalization steps have been improved. The improvements follow from the recognition that local 

models to E SCF may be introduced by carrying out a Taylor expansion of the energy about the 

current density matrix. At the point of expansion, the local models have the same gradient as E SCF 

but only an approximate Hessian. The local models are therefore valid only in a restricted region— 

the trust region—and steps can only be taken with confidence within this region. By restricting the 

steps of the TRSCF model to be inside the trust region, a monotonic and significant reduction of the 

total energy is ensured in each iteration of the TRSCF method. Examples are given where the 

TRSCF method converges monotonically and smoothly, but where the standard DIIS method 

diverges. © 2004 American Institute of Physics. DOI: 10.1063/1.1755673 

I. INTRODUCTION 

The steady progress in computer technology and 

quantum-chemical methodology has widened the range of 

users of quantum-chemical software packages to include a 

vast number of practicing, experimental chemists. Routinely, 

such users perform Hartree–Fock HF calculations and 

Kohn–Sham KS density-functional theory DFT calculations 

for molecules of a size and complexity that, a decade 

ago, were beyond reach even for the most advanced research 

codes. This development calls for further advances in the 

automatization of the self-consistent field SCF procedure 

used to optimize the HF and DFT energies, so as to ensure 

that convergence may be reached in a routine manner even 

for very complex molecules. 

In the original formulation, the SCF procedure consists 

of a sequence of Roothaan–Hall RH iterations. 1,2 At each 

iteration, a Fock/KS matrix is first constructed from the current 

approximation to the one-electron density matrix and 

then diagonalized to yield an improved set of orbitals and 

orbital energies and thus an improved density matrix. In the 

subsequent iteration, this improved density matrix is then 

used to construct a new Fock/KS matrix, thereby establishing 

the iteration procedure. However, such a sequence of RH 

a On leave. Permanent address: Department of Chemistry, Texas A&M University, 

P.O. Box 30012, College Station, Texas 77842-3012. 

iterations converges only in simple cases. To improve upon 

the convergence, each RH iteration may be extended to include, 

in addition to the diagonalization step, also a step 

where the best density matrix is generated in the subspace of 

the density matrices of the current and preceding RH iterations. 

In the next RH iteration, this averaged density matrix 

rather than the pure density matrix obtained in the last diagonalization 

is used to construct the new Fock/KS matrix. 

In this paper, we make improvements both to the RH 

diagonalization step and to the density-subspace optimization 

step of the SCF scheme. Our approach follows from the 

recognition that, in both steps, we may construct local models 

to the SCF energy function E SCF by a Taylor expansion of 

the energy about the current density matrix. However, since, 

at the point of expansion, these models have an exact gradient 

but only an approximate Hessian, they are valid only in a 

restricted region about the current approximation to the density 

matrix—the trust region. Therefore, when these local 

models are used in the course of the SCF optimization, it is 

essential they are used only to generate steps within their 

trust region. Only in this manner can it be ensured that the 

SCF energy is systematically and sufficiently lowered at each 

iteration. 

In the RH diagonalization part of the SCF optimization, 

the improvements are obtained by introducing an energy 

function E RH that corresponds to the sum of the occupied 

0021-9606/2004/121(1)/16/12/$22.00 16 

© 2004 American Institute of Physics 

Downloaded 25 Jun 2004 to 130.225.22.254. Redistribution subject to AIP license or copyright, see http://jcp.aip.org/jcp/copyright.jsp

J. Chem. Phys., Vol. 121, No. 1, 1 July 2004 Self-consistent field method 

17 

orbital energies. 3 An unconstrained minimization of E RH results 

in the same solution i.e., density matrix as obtained by 

a diagonalization of the Fock/KS matrix. However, since, at 

the point of expansion, the RH energy function E RH has only 

the gradient in common with the true SCF energy E SCF ,a 

global minimization of E RH may lead to steps that are too 

long to be trusted. We therefore introduce a trust region 

where E RH is a good approximation to E SCF . If a global 

minimization of E RH leads to a step outside the trust region, 

then the step to the minimum on the boundary of the trust 

region for E RH is taken instead. This step is found by a 

level-shifting technique, where the occupied molecular orbital 

energies effectively are shifted by some constant to increase 

the gap between the occupied and virtual molecular 

orbitals. Level shifting has previously been used to improve 

the convergence of the simple RH sequence of iterations. An 

essential feature of our implementation is to adjust the level 

shift in such a manner that the step is to the boundary of the 

trust region, recognizing that only in this manner does a lowering 

of E RH result in a lowering of E SCF . For this reason, 

the resulting method is called the trust-region RH TRRH 

method. 

The optimization of the density matrix in the subspace of 

the density matrices of the preceding RH iterations has a 

long history. Early on, it was recognized that a simple averaging 

of the density matrices of the last few RH iterations 

significantly improves the convergence of the RH scheme. 

This simple density-matrix averaging technique was later rationalized 

and systematized in the direct inversion in iterative 

subspace DIIS method of Pulay. 4 In the DIIS method, 

an improved density matrix is obtained as a linear combination 

of the previous density matrices by minimizing the norm 

of the corresponding linear combination of gradients. The 

DIIS method significantly speeds up the local convergence 

and convergence can often be obtained to ground states of 

rather complex molecules with a small gap between energies 

of the highest occupied molecular orbital HOMO and the 

lowest unoccupied molecular orbital LUMO and with a 

large number of close-lying electronic states. 

Several attempts have been made to modify the DIIS 

algorithm so as to improve upon its global convergence behavior. 

Recently, Kudin, Scuseria, and Cances proposed the 

energy DIIS EDIIS method, where the DIIS gradient-norm 

minimization is replaced by a minimization of an approximate 

energy function. 5 In EDIIS, the variational parameters, 

which are the linear expansion coefficients of the density 

matrices from the previous RH iterations, may only take on 

values that give densities in the convex set—that is, densities 

with occupation numbers between 0 and 1. As the EDIIS 

method is based on the minimization of an approximate energy 

function, it may have some advantages in the global 

region. However, it is worrying that a convex solution often 

cannot be obtained and that the observed local convergence 

of the EDIIS method is slower than in the standard DIIS 

method. 

In the DIIS and EDIIS methods, an improved density 

matrix is obtained as a sum of the density matrices from the 

preceding RH diagonalization steps. Consequently, the averaged 

density matrix is not idempotent as required in HF and 

KS theories. The deviation from idempotency may be reduced 

using a purified density matrix as the one suggested by 

McWeeny. 6 This has been done for the SCF energy minimization 

by several workers including Nunes and Vanderbilt 7 

and Daniels and Scuseria 8 and for the calculation of geometrical 

derivatives by Ochsenfeld and co-workers. 9 It may 

also be done for the EDIIS energy function. The energy function 

then has the same gradient as E SCF , but also contains 

terms which cannot be obtained from the densities and 

Fock/KS matrices of the previous RH iterations. Neglecting 

these terms, we arrive at the density-subspace minimization 

DSM algorithm proposed in this paper. At the point of expansion, 

the DSM energy function E DSM thus has the same 

gradient as the true energy function E SCF but only an approximate 

Hessian. Again, a trust region may be introduced 

and only steps within this region are taken, ensuring that any 

lowering of E DSM also corresponds to a lowering of E SCF . 

The resulting method is called the trust-region DSM 

TRDSM method. 

In the next section, we first describe the standard optimization 

of the SCF energy function in a density-matrix formulation. 

The TRRH method is then discussed in Sec. II A 

and the TRDSM method in Sec. II B. In Sec. III, we give 

some numerical examples to demonstrate the performance of 

the resulting trust-region SCF TRSCF method. The last 

section contains some concluding remarks. 

II. THEORY 

For a closed-shell system with N/2 electron pairs, the 

Hartree–Fock HF energy excluding the nuclear–nuclear repulsion 

energy is given by 3 

E SCF D2 TrhDTr DGD, 

1 

where D is the one-electron density matrix in the atomicorbital 

AO basis, h is the one-electron Hamiltonian matrix 

and GD is defined as 

G D 

2g g D , 2 

where g is a two-electron integral in the AO basis. For 

the energy in Eq. 1 to be a valid approximation to the true 

HF energy, the density matrix D must satisfy the symmetry, 

trace, and idempotency conditions: 

D T D, 

3 

Tr DS N 2 , 

DSDD. 

5 

Similar conditions apply in the Kohn–Sham KS theory, but 

the energy function of Eq. 1 must then be modified by 

including the exchange-correlation term and by scaling or 

complete removal of the exchange term from Eq. 2. 

The traditional approach to the optimization of the HF 

energy is an iterative one. From the current approximation to 

the density matrix D n in iteration n, a Fock matrix is built 

FD n hGD n 

6 

4 


18 J. Chem. Phys., Vol. 121, No. 1, 1 July 2004 Thøgersen et al. 

and, following the Roothaan–Hall RH procedure, the Fock 

matrix is diagonalized 

FD n C occ SC occ , 

7 

where S is the overlap matrix in the AO basis, to give a set of 

occupied molecular orbitals MOs, from which a new approximation 

to the density matrix is obtained as 

D n1 C occ C T occ . 8 

The iteration procedure is established using D n1 as the current 

density in Eq. 6. The final solution to the minimization 

problem is obtained when the D n and D n1 are the same. 

This self-consistent field SCF procedure may also be used 

in KS theory, the only difference being the addition of the 

exchange-correlation potential and the scaling of the exchange 

contribution in the Fock matrix to yield the KS matrix. 

The pure RH iterations presented above often do not 

converge. A powerful method for handling this divergence is 

not to construct the Fock matrix from the density matrix D n 

but rather from an average of all previous density matrices: 

n 

D¯n c i D i . 

i1 

The averaged density matrix D¯n is then used in place of the 

pure density matrix D n in Eq. 6 to obtain the Fock matrix 

F(D¯n) as 

n 

FD¯n c i FD i 

10 

i1 

and the iteration procedure is established. In the course of the 

TRSCF iterations, the following matrices are set up in the 

order indicated: D 1 , F(D 1 ), D 2 , F(D 2 ), D¯2 , F(D¯2), D 3 , 

F(D 3 ), D¯3 , F(D¯3),.... Among these, D 1 , F(D 1 ), D 2 , 

F(D 2 ), D 3 , F(D 3 ), . . . are saved during the iteration procedure. 

In the following, we describe improvements to the SCF 

diagonalization and density-subspace optimization steps. In 

Sec. II A, we describe how the trust-region RH TRRH 

method is used to generate new density matrices by a modification 

of the traditional RH method Eqs. 7 and 8. Next, 

in Sec. II B, we introduce the trust-region density-subspace 

minimization TRDSM method for calculating the averaged 

density matrix of Eq. 9. In the following, we use the indices 

i, j,k,l for occupied MOs and the indices a,b,c,d for the 

virtual MOs. 

A. The trust-region Roothaan–Hall method 

As discussed in Ref. 3, the traditional RH method may 

be viewed as a minimization of the sum of the orbital energies 

of the occupied MOs 

9 

E RH 2 

i 

i 2TrFD¯D, 11 

subject to orthonormality constraints on the occupied MOs 

i : 

i j ij . 

12 

Whereas D¯ is the current approximation to the HF/KS density 

matrix, usually obtained as a linear combination of the 

previous densities according to Eq. 9, the density matrix D 

to be optimized in Eq. 11 is related to the occupied MOs 

resulting from the diagonalization of F(D¯) as 

DC occ C T occ . 13 

To see this, consider the constrained minimization of E RH in 

Eq. 11 expressed in terms of the Lagrangian 

L2 TrFD¯D2 T TrCocc SC occ I N/2 , 14 

where the multipliers ij ensure orthonormality among the 

occupied MOs. Minimization of this Lagrangian leads to the 

standard RH equations: 

FD¯Cocc SC occ . 

15 

However, since E RH of Eq. 11 is only a crude model of the 

true energy E SCF the gradient is correct at D¯ assuming D¯ is 

idempotent, a global minimization of E RH according to Eq. 

15 may easily lead to steps that are too long to be trusted as 

they are outside the region where E RH is a good approximation 

to E SCF . Steps outside the trust region may often not 

lead to a reduction of the total energy E SCF . 

1. The level-shifted Roothaan–Hall equations 

To avoid too long steps, an additional constraint is imposed 

on the optimization of Eq. 11, namely, that the new 

density matrix D in Eq. 13 does not differ too much from 

the old matrix D¯. This condition is conveniently expressed in 

terms of the overlap between the density matrices in the S 

metric norm 

DD¯ S Tr DSD¯Sa N 2 Tr D¯SD¯S, 

16 

where Tr D¯SD¯S N/2 since D¯ is not necessarily idempotent. 

Note that, for D equal to an idempotent D¯, a is equal to 

one. For a sufficiently close to one, a step will therefore be 

taken in the local region. In practice, we define sufficiently 

close to one by the parameter a min 0.975. 

Introducing an undetermined multiplier associated 

with this new constraint, we obtain the following Lagrangian: 

L2 TrFD¯D2Tr SD¯SDa N 2 Tr D¯SD¯S 

2 TrC T occ SC occ I N/2 . 17 

Differentiating this Lagrangian with respect to the MO coefficients 

and setting the result equal to zero, we arrive at the 

level-shifted RH equations 

FD¯SD¯SC occ SC occ . 

18 

To interpret the level-shift term, we note that D¯S projects out 

the component of C occ that is occupied in D¯ assuming idempotent 

D¯), see Ref. 3. The level shift therefore works only on 

the occupied part of F(D¯), shifting all the occupied orbital 

energies and increasing the gap between the occupied and 

virtual MOs, in particular the HOMO-LUMO gap. 



19 

where i HOMO () and a LUMO () are the HOMO and LUMO 

orbital energies, respectively; in Fig. 1b, we have plotted 

the overlap between the old and new density matrices as 

given by 

DD¯ S 

a 

, 

DDS D¯D¯ S 

22 

where D()D() S is equal to N/2. For sufficiently large 

, the HOMO-LUMO gap Eq. 21 is linear in . This linearity 

of ai () for large arises from the dependence of 

the orbital energies on in Eq. 19, where is effectively 

subtracted from the occupied orbital energies. The MOs C¯occ 

occupied in D¯ satisfy the generalized eigenvalue equations 

SD¯SC¯occ SC¯occ , 

23 

and become identical to the MOs C occ () obtained from Eq. 

19 when tends to infinity. The corresponding density is 

denoted 

T 

DC¯occ C¯occ 

24 

FIG. 1. For the fourth iteration of the rhodium calculation described in Sec. 

III we have displayed as a function of the level-shift parameter ; a the 

HOMO-LUMO gap ai , where min is the smallest accepted level-shift, 

b the overlap a between the old and new density matrices, where opt is 

the optimal level-shift, and c the change in the model energy E RH and the 

actual energy E RH SCF . 

Since the SCF energy E SCF is invariant with respect to 

an orthogonal transformation between the MOs, Eq. 18 

may be transformed to the canonical basis: 

FD¯SD¯SC occ SC occ , 

where the diagonal matrix contains the orbital energies. 

2. Choice of the RH level-shift parameter 

19 

The density matrix generated from the restricted RH solution 

Eq. 19 depends on the level-shift parameter : 

DC occ C T occ . 20 

To see how is determined, we consider the determination 

of in the fourth iteration of the rhodium-complex calculation 

described in Sec. III. In Fig. 1a, we have plotted the 

HOMO-LUMO gap as a function of , 

ai a LUMO i HOMO , 

21 

and represents a purified D¯. In the linear regime of ai (), 

there is a continuous development of the occupied MOs from 

those occupied in D¯. As decreases and we enter the nonlinear 

regime at min , the MOs in Eq. 20 no longer correspond 

to those in Eq. 23. Comparing plot a and b in Fig. 

1, we note that the region a()a min in Fig. 1b corresponds 

roughly to the region min in Fig. 1a. 

As we insist on a controlled, continuous development of 

the MOs from those occupied in D¯, the level-shift parameter 

should be restricted to the linear regime min . To determine 

the optimal level-shift parameter opt , we therefore 

begin by establishing the onset of linearity min by linear 

extrapolation by means of two Fock/KS matrix diagonalizations, 

giving the two ai values marked by crosses and the 

linearly interpolated min value marked with an arrow. Next, 

since, in the linear interval, a small corresponds to a large 

step, we investigate whether min is acceptable by checking 

if a( min )a min . If this step is too long, we backtrack by 

increasing using inexact line search until an acceptable 

value opt is found such that a( opt )a min , requiring a few 

additional Fock/KS matrix diagonalizations. In Fig. 1b, the 

accepted opt is marked with an arrow. 

For a better understanding of this step, consider the Hessian 

of the E RH energy function: 

A RH ai,bj ij ab a i . 25 

By restricting the level-shift parameter to min where 

LUMO a () HOMO i ()0, we ensure that the effective Hessian 

is positive definite and that the model energy function 

E RH is reduced. We note that the Hessian of the true energy 

function E SCF is given by the more complicated expression 

A SCF ai,bj ij ab a i 4g aibj g abij g ajib . 26 

Often, the orbital energy difference dominates the Hessian. 

In such cases, we expect the above step to reduce the SCF 

energy E SCF as well as the model function E RH . In any case, 

when a sufficiently large level shift is added in Eq. 19, the 



Hessian structure of Eq. 25 becomes similar to that of the 

true energy function E SCF in Eq. 26. The steps generated 

from E RH with such level shifts will therefore have essentially 

the same direction as the ones generated from E SCF . 

By construction, the E RH energy function is lowered 

when is chosen according to the above prescription 

E RH 2TrFD¯DD¯0. 

27 

Since E RH is only a local model of the true energy function 

E SCF , the associated change in the true energy 

E RH SCF E SCF DE SCF D¯ 

28 

may be either negative or positive, depending on how well 

E RH represents E SCF for the chosen step. However, for sufficiently 

small steps, E RH SCF 0, since the model function then 

represents the true energy well. 

Let us consider the relationship between the true lowering 

E RH 

SCF and the lowering predicted by the model function 

E RH . Introducing the presumably small differential density 

matrix 

DD¯ 

29 

and using the identity Tr AG(B)Tr BG(A) valid for symmetric 

matrices A and B, we find that the change in the true 

energy Eq. 28 may be written in the form 

E RH SCF 2TrhDD¯TrD¯ 

GD¯Tr D¯GD¯ 

2 Trh2 TrGD¯Tr G, 

30 

which shows that the changes in the true energy and in the 

model energy are related as 

E RH SCF E RH Tr G. 

31 

If the last term which is second order in is negligible, the 

energy lowering predicted by the local model E RH becomes 

equal to E RH SCF . However, since the correction term is positive 

strictly positive in the absence of exchange, its presence 

in Eq. 31 shows that, for sufficiently large steps, a 

lowering of the model function may not lead to a lowering of 

the total energy. To avoid such steps, it would be useful to 

provide an alternative prediction of E RH 

SCF that is less expensive 

than the calculation of Tr G itself. Section II A 3 is 

concerned with this problem. 

To demonstrate the efficiency of the chosen level shift 

opt in the global region of a SCF optimization, we have for 

the fourth iteration of the rhodium-complex calculation plotted 

in Fig. 1c, E RH 

SCF and E RH as a function of . The 

energy gain E RH SCF is about optimal for the level shift opt . 

Increasing gives a smaller energy gain while decreasing 

gives a slight increase in the energy gain and from 4.5, 

RH is actually positive. Note also that for opt , E RH 

E SCF 

RH 

and E SCF start to differ indicating that the importance of 

Tr G increases. The step representing a RH iteration 

where 0 is far too long to be trusted and results in a 

significant increase of the total energy. 

3. Prediction of the energy close to the minimum 

To develop a better prediction of E RH 

SCF than E RH ,we 

note that the only part, that cannot easily be evaluated from 

known Fock-matrices, is the second-order contribution to Eq. 

31 from that part of that does not belong to the linear 

space spanned by the previous density matrices D i . To see 

this, we decompose the current density matrix D into two 

parts 

DD D , 

32 

where D belongs to the linear space spanned by the previous 

density matrices and D belongs to its orthogonal complement. 

We then expand D in the following manner: 

n 

D 

i1 

c i D i , 

33 

where the expansion coefficients c i () are determined in a 

least-squares manner 

n 

c i M 1 ij Tr D j SDS, M ij Tr D i SD j S. 

j1 

34 

The change in the SCF energy associated with the change of 

density matrix from D¯ to D may be expressed as 

E RH SCF E SCF D E SCF D¯2 TrD FD 

Tr D GD . 

35 

Ignoring the small term quadratic in D , we may now predict 

the change in the SCF energy at little cost from the 

expression 

E P SCF E SCF D E SCF D¯2 TrD FD , 36 

using only the density matrices and Fock/KS matrices of the 

previous iterations. In particular in the later parts of the iteration 

sequence, where the space spanned by the densities 

of the preceding RH iterations is large, an accurate estimate 

of E RH 

SCF may be obtained from this formula. In the following, 

we shall see how we may use this prediction to determine 

the level shift when min 0 and a(0)a min . 

P 

To illustrate how E SCF is used to find the level-shift 

parameter, consider as an example the determination of the 

level-shift parameter in the ninth iteration of the rhodiumcomplex 

calculation of Sec. III. The plot of the HOMO- 

LUMO gap in Fig. 2a shows that the allowed level-shift 

interval is 0. In Fig. 2b, we have plotted the overlap 

a() as a function of . Since a(0)a min , we should, 

according to the discussion in Sec. II A 2, use opt 0 to 

determine the step. In short, considerations based on the 

HOMO-LUMO gap and on the overlap with the averaged 

density matrix indicate that the next density matrix should be 

determined from the standard, unshifted RH equations. 

However, from the nine density matrices of the previous 

P 

RH iterations, we can use E SCF () to predict the change in 

E RH SCF () more accurately than with E RH (). Indeed, from 

P 

Fig. 2c, we see that E SCF () provides a good global representation 

of E RH SCF (), with a minimum close to the minimum 

of E RH SCF (). By contrast, the local model E RH () 



21 

n 

D¯ c i D i . 

i1 

37 

Ideally, this averaged density should also fulfill the conditions 

Eqs. 3–5. The symmetry condition Eq. 3 is trivially 

satisfied since the averaged density Eq. 37 is a linear 

combination of symmetric density matrices. The trace condition 

Eq. 4 is also easily taken care of by imposing the 

restriction 

n 

i1 

c i 1 

38 

on the expansion coefficients 

n 

Tr D¯S c i Tr D i S N 

i1 

2 . 

39 

By contrast, the idempotency condition Eq. 5 cannot be 

imposed on the averaged density matrix. However, the idempotency 

may be significantly improved if, instead of working 

with D¯, we work with the purified density matrix 6 

D˜ 3D¯SD¯2D¯SD¯SD¯, 

40 

as proposed by Nunes and Vanderbilt. 7 The electronic energy 

may be expressed in terms of the purified average density 

matrix as 

ED˜ 2 TrhD˜ Tr D˜ GD˜ . 

41 

FIG. 2. For the ninth iteration of the rhodium calculation described in Sec. 

III we have displayed as a function of the level-shift parameter ; a the 

HOMO-LUMO gap ai , where min 0, b the overlap a between the old 

and new density matrices, where a min is the smallest accepted overlap and 

c the change in the model energy E RH , the actual energy E RH 

SCF and the 

P 

P 

predicted energy E SCF . opt is found at the minimum of E SCF (). 

gives a minimum at 0. Clearly, 0 should be avoided 

in the calculation since it would lead to an increase in the 

SCF energy. Instead, the value of the level-shift parameter 

P 

that corresponds to the minimum of E SCF denoted by opt ) 

is chosen for the calculation of the next density matrix. 

This procedure may be summarized as follows. If min 

0 and a(0)a min , then we calculate the predicted energies 

P 

P 

P 

E SCF (0) and E SCF () with 0. If E SCF (0) 

P 

E SCF (), then we use D0. Otherwise, we estimate the 

P 

minimum opt of E SCF () by an inexact line search and 

use the density matrix D( opt ) at this minimum. 

B. Density-subspace minimization 

1. The DSM energy function 

Let us assume that we have carried out n RH iterations 

and that we have kept all previous density matrices D i and 

the corresponding Fock matrices F i . We would now like to 

construct an optimal density as a linear combination of the 

densities from these iterations according to Eq. 9, 

We note that the purified density is correct to first order in 

the expansion coefficients c i and that E(D˜ ) thus contains 

errors through second order in c i . To determine the best 

average density matrix Eq. 37, we shall minimize Eq. 41 

with respect to the expansion coefficients c i subject to the 

condition Eq. 38. 

One problem we encounter when minimizing Eq. 41 is 

that new Fock matrices F(D˜ ) need to be evaluated. To avoid 

this problem, we shall use an approximate form of Eq. 41. 

Since the purified density matrix D˜ is close to the original 

density matrix D¯, we can write it as 

D˜ D¯, 

42 

where is the correction term. Inserting Eq. 42 into Eq. 

41, we obtain 

E2 TrhD¯Tr D¯GD¯2 Trh 

2 TrGD¯Tr G. 

43 

Since is small, we may ignore the term quadratic in and 

arrive at the density-subspace minimization DSM energy 

function 

E DSM c2 TrhD¯Tr D¯GD¯2 Trh2 TrGD¯ 

ED¯2 TrFD¯D˜ D¯. 

44 

Since is first order in the expansion coefficients c i , the 

DSM energy differs from the true energy to second and 



higher orders in c i . The first contribution to the DSM energy 

function may for example be evaluated using the energy expression 

of the EDIIS algorithm, 5 

Lc,,E 0 c T g 1 2 c T Hcc T 1 

1 2 c T Mch 2 , 

52 

ED¯ 

i 

c i E SCF D i 1 2 

ij 

c i c j TrF i F j D i D j . 

45 

Using Eq. 40, we find that the second contribution may be 

evaluated as 

where 1 is a column vector with elements equal to 1. Differentiating 

this Lagrangian and setting the derivatives equal to 

zero, we obtain the equations 

L 

c gHcMc10, 

53 

2TrFD¯D˜ D¯2 

ij 

c i c j Tr F i D j 

L 

cT 10, 

54 

6 

ijk 

c i c j c k Tr F i D j SD k 

L 

1 2 c T Mch 2 0. 

55 

4 

ijkl 

c i c j c k c l Tr F i D j SD k SD l . 

46 

All contributions to the DSM energy function are therefore 

easily calculated from the previous density and Fock/KS 

matrices. 

2. The trust-region DSM minimization 

We minimize the DSM energy functional by the trustregion 

method. 12 We thus consider the second-order Taylor 

expansion of the DSM energy in Eq. 44 about c 0 . Introducing 

the step vector 

ccc 0 , 

we obtain 

47 

E DSM (2) cE 0 c T g 1 2 c T Hc, 

48 

where the energy, gradient, and Hessian at the expansion 

point are given by 

E 0 Ec 0 , 

g Ec 

c 

cc 0 

, H 2 Ec 

c 2 cc 0 

. 49 

As starting point c 0 , we choose the density matrix with the 

lowest energy E SCF (D i ), usually from the last RH iteration. 

The trace condition Eq. 38 imply 

n 

i1 

c i 0. 

50 

We also introduce a trust region of radius h for E DSM (2) (c) 

and require that steps are always taken inside or to the 

boundary of this region. To determine a step to the boundary, 

we restrict the step to have the length h in the S metric norm 

of Eq. 34, 

c S 2 

ij 

c i M ij c j h 2 . 51 

Introducing the undetermined multipliers and for the 

trace and step-size constraints, we arrive at the following 

Lagrangian for minimization on the boundary of the trust 

region: 

The optimization of the Lagrangian thus corresponds to the 

solution of the following set of linear equations: 

HM 

1 T 

1 

0 

c 

 

g 0 , 

56 

where the multiplier is iteratively adjusted until the step is 

to the boundary of the trust region Eq. 55. The step-length 

restriction may be lifted by setting 0, as needed for steps 

inside the trust region. 

To understand the behavior of the step-length function, 

we consider first the generalized eigenvalue problem 

 

H 1 

v 1 T 0 M 0 

0 T 

v , 57 

where 0 is a column vector with zero elements, is a small 

positive constant, and the eigenvector is normalized such that 

v T v 2 1. 

58 

We first note that, for a finite , v0. Next, carrying out 

block multiplications in Eq. 57, we obtain 

Hv1Mv, 

1 T v, 

59 

60 

which upon elimination of from the first equation yields 

the relation 

Hv1 T v1 2 Mv. 

61 

Since (1 T v)1 is finite, we conclude that, as tends to zero, 

the eigenvalue tends to either plus or minus infinity 

1/2 . Next, substituting these values of into Eq. 60, 

we find that v tends to the zero vector with elements proportional 

to 1/2 and that , because of the normalization Eq. 

58, tends to 1. In short, the eigenvalue problem Eq. 57 

with 0 has two eigenvalues , whose eigenvectors 

have zero elements except for the last element, which is 

equal to 1. Finally, invoking the Hylleraas–Undheim interlace 

theorem, 10,11 we conclude that the remaining n1 finite 

eigenvalues of Eq. 57 bisects the n eigenvalues of the reduced 

eigenvalue problem 

HvMv. 

62 



23 

Let us now consider the step length c() S as a function 

of . In the diagonal representation of the augmented 

matrix in the linear equations Eq. 57, we may write these 

equations in the following uncoupled form: 

h i m i i i , i1,2,3,...,n1. 63 

Here, the h i and m i are the diagonal elements of the Hessian 

and metric matrices, respectively, of the generalized eigenvalue 

problem Eq. 57, whereas the i and i , respectively, 

are the corresponding elements of the solution and gradient 

vectors of Eq. 56. Since the last element of the gradient 

vector in Eq. 56 is zero, the gradient vector has no contributions 

from the eigenvectors with infinite eigenvalues 

1 n1 0, 1 n1 64 

assuming that the eigenvalues are sorted in increasing order 

1 2 ¯ n1 . In the diagonal representation, therefore, 

we may write the step norm in the form 

c S 

i2 

n 

m i i 

2 

h i m i 2 . 

65 

From this expression, we note that the step function consists 

of n branches separated by n1 asymptotes at the finite 

eigenvalues i . Moreover, it increases monotonically from 

zero to infinity as increases from minus infinity and approaches 

the lowest finite eigenvalue 2 . Therefore, there is 

always one and only one 2 that gives rise to a 

step of length h. As shown by Fletcher, 12 this value of 

corresponds to the global minimum on the boundary of the 

trust region. 

In practice, we cannot easily determine the eigenvalues 

i of the augmented eigenvalue problem Eq. 57. Instead, 

we determine the eigenvalues i of the reduced problem Eq. 

62 and restrict our search of to the smaller monotonic 

interval 1 . Since 1 2 , it is possible that no 

solution exists in this reduced interval. Mostly, however, this 

restriction is mild since the two eigenvalues are usually 

close. If no solution is found, we choose instead the slightly 

shorter step obtained with 1 . 

To illustrate how the level-shift parameter in Eq. 56 

is determined, we consider the first Fig. 3a and third Fig. 

3b DSM step in the eighth iteration of the rhodiumcomplex 

calculation in Sec. III. We have plotted the steplength 

function c() S as a function of . The plots consist 

of a series of branches between asymptotes where 

makes the matrix on the left-hand side of Eq. 56 singular. 

The lowest eigenvalue 1 is marked with a vertical dashed 

line in Figs. 3a and 3b. For minimization, the level-shift 

parameter is chosen in the interval min( 1 ,0), 

where 1 is the lowest eigenvalue of Eq. 62. The proper 

value is found where the step-length function crosses the line 

representing the trust radius h, as marked with a cross in Fig. 

3a. If the step that minimizes E DSM (2) is inside the trust region, 

0 is chosen as marked with a cross in Fig. 3b. 

The trust region is updated during the iterative procedure. 

FIG. 3. The step-length function c() S is plotted as a function of for 

the first a and third b DSM step in the eighth iteration of the rhodium 

calculation described in Sec. III. The trust radius h is represented by a 

horizontal line. The proper value is marked with a cross. 

3. Global optimization of the DSM function 

The optimization of the E DSM energy is carried out in the 

usual manner, requiring several trust-region steps, each of 

which involves the construction of the gradient g and the 

Hessian H, and the solution of the modified level-shifted 

Newton equations Eq. 56. After p iterations, the density is 

calculated from the coefficients 

p 

c p c (0) c i . 

66 

i1 

However, since E DSM itself is a rather crude model of the 

true energy function E SCF , it resembles E SCF only in a small 

region about the initial point c (0) . The DSM iterations are 

therefore terminated when the total step length c p c (0) 

exceeds some preset value k. If a minimum of E DSM is found 

inside the trust region c p c (0) k, then the step to the 

minimum is taken and the iterations are terminated. This is 

often the case. 

Occasionally, the iterations start where the lowest eigenvalue 

of the Hessian in Eq. 62 is negative. In the course of 

the iterations, the Hessian can become positive definite and a 

minimum is reached. In a few cases, however, a negative 

Hessian eigenvalue may persist, changing little from iteration 

to iteration. In our experience, a step along the eigenvector 

corresponding to the negative eigenvalue cannot be 

trusted. This direction is therefore projected out from the step 

and the DSM function is minimized in the orthogonal subspace. 

As an illustration, consider the first DSM step of the 

tenth SCF iteration of the rhodium-complex calculation in 

Sec. III. In Fig. 4, we have, for comparison, plotted the steplength 

functions with the negative component kept and projected 

out. The level shifts resulting from the two situations 



FIG. 4. The step-length function c() S is plotted as a function of with 

the direction corresponding to the negative Hessian eigenvalue kept — and 

projected out - --, respectively. The values resulting from the two 

situations are marked with crosses. 

are marked with crosses in Fig. 4. The level shift used in the 

DSM optimization is, in this particular case, 0. 

When the trust-region minimization is terminated, a new 

RH iteration is initiated by constructing a new density and 

associated Fock matrix 

n 

n 

D¯ c i D i , 

i1 

F¯ c i FD i , 

67 

i1 

where we have used the fact that the Fock matrix is linear in 

the density. By construction E DSM (c) is lowered at each iteration 

of the trust-region minimization. The total energy 

lowering at the pth iteration is given by 

E DSM E DSM c p E DSM c (0) . 

68 

Since E DSM is a local model to the true energy E SCF , the 

lowering of E DSM will also lead to a lowering of E SCF provided 

the total step is sufficiently short to be in the local 

region. 

4. Relationship to the DIIS method 

The optimal density has previously been determined using 

the DIIS scheme of Pulay. 4 In the DIIS method, the improved 

density matrix is obtained as a linear combination of 

the previous density matrices where the expansion coefficients 

are determined by minimizing the norm of the error 

vector, using the gradients of the previous iterations as error 

vectors. To highlight the difference between TRDSM and 

DIIS, we give below an alternative derivation of the DIIS 

algorithm. 

In an SCF calculation, the electronic gradient with the 

averaged density matrix D¯ in Eq. 37 may be expressed in 

the form, 3 

gD¯4D¯SFD¯FD¯SD¯. 

69 

To determine the best linear combination of densities D i ,we 

minimize the norm of the squared gradient 

gD¯ 2 16 TrD¯SFD¯FD¯SD¯2 . 

70 

Inserting the expansion Eq. 37, we obtain a quartic polynomial 

in c i , 

FIG. 5. The convergence of calculations on the rhodium complex using 

AhlrichsVDZ basis Ref. 16 combined with STO-3G for Rh. The error in 

the total energy is given for the TRSCF, the standard DIIS, and the QRHF 

method as a function of the iteration number. Furthermore results are given 

where DIIS is applied after nine TRSCF iterations. 

gD¯ 2 16 Tr 

i 

c i gD i 

i, j 

c i c j D i SFD j D i 

2 

FD j D i SD i . 71 

To simplify this expression, we neglect all cubic and quartic 

terms 

gD¯ 2 app c i c j gD i gD j . 72 

i, j 

Optimization of Eq. 72 subject to the constraint Eq. 38 

gives the DIIS expression of the expansion coefficients in 

Eq. 37. 

III. APPLICATIONS 

In this section, we examine the convergence characteristics 

of the TRSCF algorithm. First, we consider a rhodiumcomplex 

optimization as an example of a difficult case; next, 

as a simpler case, we consider a calculation on H 2 O with the 

OH bond lengths stretched to double length. For comparison, 

we also give the convergence characteristics of the DIIS 

algorithm 4 and the quadratically convergent restricted step 

Hartree–Fock QRHF method. 13,14 All calculations are carried 

out using a local version of the DALTON program 

package. 17 

A. The rhodium complex calculation 

In Fig. 5, we have plotted the error in the energy at each 

iteration of TRSCF, DIIS, and QRHF optimizations of the 

rhodium complex with the geometry specified in Table I using 

the AhlrichsVDZ basis 16 combined with STO-3G on Rh. 

The starting orbitals have been obtained from diagonalizing 

the one-electron Hamiltonian. 

Clearly, the QRHF and DIIS methods do not work in this 

case. In particular, the DIIS method is unable to handle the 

global part of the optimization, where the initially indefinite 

Hessian changes its structure and becomes positive definite. 

Since the DIIS method relies solely on gradient information, 

it does not see the negative eigenvalues and produces steps 

that may or may not be in the right direction, leading to 



25 

TABLE I. Geometry of the rhodium complex. 

x y z 

Cl 2.783200 0.000000 0.000000 

C 0.000000 1.750000 0.000000 

C 0.000000 1.750000 0.000000 

C 2.510000 1.247077 0.000000 

C 2.510000 1.247077 0.000000 

C 3.960000 1.247077 0.000000 

C 3.960005 1.247074 0.000000 

C 4.685005 0.008663 0.000000 

C 6.585566 1.381712 0.000000 

C 7.224161 0.912908 0.000000 

H 1.802335 2.074803 0.000000 

H 1.965500 2.190178 0.000000 

H 4.323007 2.273792 0.000000 

H 4.504500 2.190178 0.000000 

H 6.215281 1.889842 0.889165 

H 6.215281 1.889842 0.889165 

H 7.169607 1.539271 0.889165 

H 7.169607 1.539271 0.889165 

H 7.674455 1.397244 0.000000 

H 8.164527 0.363696 0.000000 

N 1.790000 0.000000 0.000000 

N 6.124978 0.017359 0.000000 

O 0.122018 3.144673 0.000000 

O 0.122018 3.144673 0.000000 

Rh 0.0000000 0.000000 0.000000 

divergence. Moreover, in this DIIS calculation, no level 

shifts have been applied in the RH part of the optimization, 

again leading to steps in the wrong direction. In short, the 

DIIS method cannot be used for optimizations as complex as 

the rhodium calculation. However, if the DIIS method is 

started after the SCF local region has been reached by the 

TRSCF algorithm, then the DIIS algorithm converges nicely 

since the Hessian has the correct structure. In Fig. 5, we have 

also plotted the errors in a calculation where the DIIS 

method is started after nine TRSCF iterations. It then converges 

in roughly the same manner as the pure TRSCF 

method. 

In the QRHF calculation, the total energy reduces slowly 

and monotonically during the iteration procedure. However, 

the resulting energy lowering is much too slow to be of any 

practical value. Thus, after 14 iterations, the energy has decreased 

by only 37 E h , which is insignificant compared with 

the 237 E h needed for convergence. 

To understand the difference between the QRHF and 

TRSCF optimizations, let us recall the main features of the 

two methods. Since the QRHF method is based on a local 

quadratic model of E SCF , the QRHF orbital rotations are 

correct to first order. However, no global information about 

E SCF is available and only small steps can be trusted in the 

optimization. When QRHF steps are taken to the boundary of 

the trust region, level-shifted Newton equations are solved 

with the Hessian of Eq. 26. By contrast, in the TRSCF 

method, the RH optimization is based on the local energy 

function E RH , which has the same gradient as E SCF but a 

slightly different Hessian—compare Eqs. 25 and 26. 

More important, E RH shares some global features with E SCF . 

In the RH diagonalization step, a global optimization is carried 

out for E RH . When an RH step is taken to the boundary 

of the trust region of E RH , a level-shifted Fock eigenvalue 

equation is solved where the level-shift parameter effectively 

introduces a shift in the Hessian of E RH Eq. 25. The similarity 

of the Hessians of E SCF and E RH makes the directions 

of the steps taken by the QRHF and RH methods very similar 

for sufficiently large level shifts, the essential difference 

being the global character of the RH steps and the local 

character of the QRHF steps. It is this local character of the 

QRHF steps that prevents the QRHF method from being 

efficient for systems as difficult as the rhodium complex. 

Let us now consider the individual TRSCF iterations as 

listed in Table II. The optimization begins with orbitals that 

diagonalize the one-electron Hamiltonian, giving a start energy 

of 5 466.530 208 964 75 E h . In Table II, the SCF energy 

lowering E SCF is divided into two contributions, one 

from the RH step and one from the DSM step. Recalling 

from Eq. 24 that D() n is the purified D¯n , 

E DSM SCFn1 

E SCF D n E SCF D n 73 

becomes a realistic measure of the energy change in the 

DSM part of the iteration. Similarly, 

RH 

E SCFn1 

E SCF D n1 E SCF D n 

74 

becomes a realistic measure of the change in the RH part. 

Clearly, the sum of Eqs. 73 and 74 is equal to the total 

change E SCF . These exact energy changes should be compared 

with the energy changes in the local models E RH and 

E DSM given in Eqs. 27 and 68, respectively, also listed 

in the table. Note that, to obtain E SCF D(), we must carry 

out an additional energy calculation, which is here done only 

for the purpose of this analysis. 

For the DSM method, we have also indicated in Table II 

how the trust-region optimization was terminated (exit DSM ): 

M indicates that a minimum was determined in the full 

space; PM indicates that a minimum was obtained in the 

reduced space with the direction corresponding to the negative 

Hessian eigenvalue projected out; and L indicates that 

the iterations were terminated because the maximum step 

length k was reached. For the RH steps, we have also listed 

the level-shift parameter opt and the corresponding overlap 

a( opt ) of Eq. 22. 

The TRSCF iterations converge linearly, with a reduction 

in the error of about a factor 2–4 at each iteration. 

Moreover, the energy lowerings of the local models E RH 

and E DSM are in good agreement with the actual SCF energy 

changes, in the local as well as in the global part of the 

optimization. Both the predicted and the actual energy 

changes are negative in all iterations. In the global region, 

E RH 

SCF is usually significantly larger than E DSM SCF , whereas, 

in the local region, they have similar sizes. 

Except for three iterations in the global part of the SCF 

optimization, the DSM trust-region method finds a minimum 

within the step-length limit k. In the intermediate region, we 

encounter components of the step vector that cannot be 

trusted and have been projected out as described in Sec. 

II B 3. The DSM iterations then reach a minimum in the orthogonal 

subspace. 



TABLE II. Convergence details for the TRSCF calculation on the rhodium complex using AhlrichsVDZ basis combined with STO-3G on Rh. Energies given 

in atomic units. 

DSM 

It. E SCF E SCF 

E DSM RH 

E SCF 

E RH RH 

opt 

a( RH opt ) 

Exit DSM 

1 18.94647615033 0.00000000009 0.00000000000 18.94647615024 19.21320649447 17.47 0.99382 

2 45.45858825211 8.95768890498 7.10309975657 36.50089934714 38.75977508968 14.44 0.98630 M 

3 59.81037380731 12.93651600370 8.85502694483 46.87385780361 51.53623100635 11.68 0.97940 M 

4 63.34486220663 24.25263285599 21.63716388564 39.09222935064 48.71127100240 7.28 0.97288 L 

5 30.22875461345 12.81783382045 12.23686585427 17.41092079300 21.38161936631 2.63 0.97384 L 

6 11.56061105704 5.64904464510 4.74940263974 5.91156641194 7.60366893231 0.90 0.97552 L 

7 4.61334906659 1.90220393646 1.51155035145 2.71114513013 3.30373325651 0.24 0.97792 M 

8 2.16270415323 0.44637212140 0.44849600108 1.71633203184 1.49977814394 0.07 0.97876 M 

9 0.60805181167 0.29078332276 0.21298647367 0.31726848890 0.60770324492 1.30 0.99823 M 

10 0.16667264229 0.00294157325 0.00194422453 0.16373106904 0.22325882198 0.70 0.99934 PM 

11 0.05893002647 0.00782290321 0.00662821837 0.05110712327 0.03977595787 0.00 0.99955 PM 

12 0.01821537974 0.00935849099 0.00823957093 0.00885688875 0.00980424864 0.00 0.99989 PM 

13 0.00829012952 0.00417695835 0.00382848541 0.00411317118 0.00413942925 0.00 0.99995 PM 

14 0.00336772651 0.00246626574 0.00222734467 0.00090146077 0.00176102559 0.00 0.99998 PM 

15 0.00144190516 0.00106346997 0.00091468267 0.00037843519 0.00066804948 0.00 1.00000 PM 

16 0.00049317801 0.00040627140 0.00039284830 0.00008690661 0.00013209160 0.00 1.00000 PM 

17 0.00005633666 0.00003203569 0.00002863768 0.00002430097 0.00003124073 0.00 1.00000 PM 

18 0.00001495119 0.00000990523 0.00000917530 0.00000504595 0.00000926762 0.00 1.00000 PM 

19 0.00000549749 0.00000312992 0.00000277915 0.00000236757 0.00000276315 0.00 1.00000 M 

20 0.00000196603 0.00000126150 0.00000121565 0.00000070454 0.00000067573 0.00 1.00000 M 

21 0.00000038264 0.00000022841 0.00000020736 0.00000015423 0.00000016335 0.00 1.00000 M 

22 0.00000008720 0.00000004496 0.00000004404 0.00000004225 0.00000004536 0.00 1.00000 M 

23 0.00000002788 0.00000001171 0.00000001049 0.00000001617 0.00000001603 0.00 1.00000 M 

24 0.00000001286 0.00000000813 0.00000000800 0.00000000472 0.00000000514 0.00 1.00000 M 

25 0.00000000294 0.00000000131 0.00000000127 0.00000000163 0.00000000186 0.00 1.00000 M 

26 0.00000000119 0.00000000073 0.00000000072 0.00000000045 0.00000000056 0.00 1.00000 M 

27 0.00000000035 0.00000000019 0.00000000019 0.00000000016 0.00000000022 0.00 1.00000 M 

In the beginning of the SCF optimization, large level 

shifts are applied in the RH diagonalization to ensure a continuous 

development of the MOs. Thus, in the first few iterations, 

the overlap constant a( opt ) is significantly larger than 

the minimum accepted overlap of 0.975. However, the levelshift 

parameter decreases during the subsequent SCF iterations 

until, in the local region, no level shift is required and 

conventional RH iterations are carried out. To summarize, 

the TRSCF method gives a monotonic and significant energy 

lowering both in the RH and in the DSM part of the optimization. 

B. The water calculation 

To demonstrate the performance of the TRSCF method 

in a simpler case, we consider optimizations of H 2 O with the 

OH bonds stretched to twice the equilibrium value 195.10 

pm. In Figs. 6a and 6b, we have plotted the errors in the 

energy during TRSCF, DIIS, and QRHF optimizations in the 

cc-pVDZ basis. 15 In Fig. 6a, the initial guess of the orbitals 

are the Hückel orbitals as implemented in the DALTON program. 

With these initial orbitals, the TRSCF and DIIS methods 

converge in a very similar manner to within a threshold 

of 10 10 in ten iterations. In this case, therefore, gradient 

information is sufficient for convergence. Although the 

QRHF method outperforms the TRSCF and DIIS methods in 

terms of iterations, this is of no practical value since, in each 

QRHF step, about the same number of new Fock matrices 

are needed to solve the Newton equations as is required to 

find the optimized Hartree–Fock wave function with the 

TRSCF and DIIS methods. 

FIG. 6. The convergence of calculations on water with stretched bonds 

using the cc-pVDZ basis and a aHückel start guess and b a one-electron 

Hamiltonian start guess. The error in the total energy is given for the 

TRSCF, the standard DIIS and the QRHF method as a function of the 

iteration number. 



27 

In Fig. 6b, we have plotted the error of the energy in 

H 2 O optimizations starting with the orbitals that diagonalize 

the one-electron Hamiltonian. In this case, convergence to 

10 10 is reached in 13 iterations with the TRSCF method and 

in 18 iterations with the DIIS method. The main reason for 

the better performance of the TRSCF algorithm is that, in the 

global region, it gives a significant energy lowering in each 

step, whereas the DIIS algorithm shows a much less systematic 

behavior. 

IV. CONCLUSION 

A conventional SCF optimization consists of a sequence 

of iterations, each of which begins with a Roothaan–Hall 

RH diagonalization step, where a Fock/KS matrix is diagonalized 

to obtain an improved density matrix, followed by an 

averaging step, where the optimal density matrix is determined 

in the subspace of the density matrices of the previous 

RH diagonalization steps. In this paper, we have introduced a 

trust-region SCF TRSCF algorithm, where improvements 

have been made to both the diagonalization and the averaging 

steps. In both steps, local energy model functions are 

constructed which have the same gradient as the true energy 

function E SCF but approximate Hessians. Recognizing the 

locality of these energy functions, trust regions are introduced 

as regions where they represent a good approximation 

to E SCF and only steps inside these trust regions are allowed. 

For the density-subspace minimization step, an energy 

function is constructed and minimized with respect to the 

coefficients of the linear combination of the previous density 

matrices. Its functional form is based on a purified averaged 

density matrix that is idempotent to first order. The advantages 

of this model compared to EDIIS is the built-in density 

purification, which helps to avoid problems arising from 

non-idempotency. In addition, information about the Hessian 

is extracted and used, leading to a monotonic and stable convergence. 

The RH diagonalization step corresponds to a minimization 

of an energy function E RH that represents the sum of the 

orbital energies of the occupied MOs. Since this very simple 

energy function is a local model function for E SCF , large 

steps cannot be trusted. To generate steps to the boundary of 

the trust region, level-shifted RH equations are solved where 

the level shifts are determined in a systematic and general 

manner, leading to a decrease in the model energy at each 

iteration. If sufficiently small steps are taken, a similar decrease 

is obtained in the SCF energy. 

In the TRSCF algorithm a few diagonalizations are required 

in each SCF iteration to obtain solutions for the levelshifted 

RH equations in order to determine the optimal density 

matrix. The number of diagonalizations may be reduced 

in the local SCF region solving RH equations with zero level 

shift with little consequence for the convergence. In the local 

SCF region one may also safely use the DIIS algorithm if 

desired. 

The advantages of the TRSCF algorithm are demonstrated 

by calculations on a rhodium complex and on a water 

molecule with stretched bonds. In the rhodium-complex optimization, 

the TRSCF algorithm converges monotonically 

and fast, with a significant decrease in the energy in both the 

RH part and DSM part at each iteration. By contrast, convergence 

is not obtained with the DIIS method for this complex. 

For the simpler water molecule, the TRSCF and DIIS methods 

behave in a more similar manner, the TRSCF method 

converging slightly faster than the DIIS method when the 

initial orbitals are obtained by diagonalizing the one-electron 

Hamiltonian. With the Hückel guess, the water convergence 

is essentially obtained in the same number of steps for 

the TRSCF and DIIS methods. In short, it appears that the 

TRSCF algorithm, and its use of local energy model functions 

to obtain significant reductions in E SCF in each iteration, 

constitutes a significant step towards a black-box optimization 

of SCF wave functions. 

ACKNOWLEDGMENTS 

This work has been supported by the Danish Natural 

Research Council Grant No. 21-02-0467 and the Carlsbergfondet. 

We also acknowledge support from the Danish Center 

for Scientific Computing DCSC. D.Y. acknowledges 

support from the Robert A. Welch Foundation, Grant No. 

A-770. 

1 C. C. J. Roothaan, Rev. Mod. Phys. 23, 691951. 

2 G. G. Hall, Proc. R. Soc. London, Ser. A 205, 5411951. 

3 T. Helgaker, P. Jørgensen, and J. Olsen, Molecular Electronic Structure 

Theory Wiley, Chichester, 2000. 

4 P. Pulay, Chem. Phys. Lett. 73, 393 1980; J. Comput. Chem. 3, 556 

1982. 

5 K. N. Kudin, G. E. Scuseria, and E. Cances, J. Chem. Phys. 116, 8255 

2002. 

6 R. McWeeny, Rev. Mod. Phys. 32, 335 1960. 

7 R. W. Nunes and D. Vanderbilt, Phys. Rev. B 50, 176111994. 

8 A. D. Daniels and G. E. Scuseria, Phys. Chem. Chem. Phys. 2, 2173 

2000. 

9 C. Ochsenfeld and M. Head-Gordon, Chem. Phys. Lett. 270, 3991997. 

10 E. A. Hylleraas and B. Undheim, Z. Phys. 65, 759 1930. 

11 J. K. L. MacDonald, Phys. Rev. 43, 830 1933. 

12 R. Fletcher, Practical Methods of Optimization, 2nd ed. Wiley, New 

York, 1987. 

13 G. B. Bacskay, Chem. Phys. 61, 385 1981. 

14 H. J. Aa. Jensen and P. Jørgensen, J. Chem. Phys. 80, 1204 1984. 

15 T. H. Dunning, J. Chem. Phys. 90, 1007 1989. 

16 A. Schafer, H. Horn, and R. Ahlrichs, J. Chem. Phys. 97, 25711992. 

17 DALTON, a molecular electronic structure program, Release 1.2 2001, 

written by T. Helgaker, H. J. Aa. Jensen, P. Jørgensen et al. http:// 

www.kjemi.uio.no/software/dalton. 


Part 1 

The Trust-region Self-consistent Field Method in Kohn-Sham Density-functional Theory, 

L. Thøgersen, J. Olsen, A. Köhn, P. Jørgensen, P. Sałek, and T. Helgaker, 

J. Chem. Phys. 123, 074103 (2005)

THE JOURNAL OF CHEMICAL PHYSICS 123, 074103 2005 

The trust-region self-consistent field method 

in Kohn–Sham density-functional theory 

Lea Thøgersen, a Jeppe Olsen, Andreas Köhn, and Poul Jørgensen 

Department of Chemistry, University of Århus, DK-8000 Århus C, Denmark 

Paweł Sałek 

Laboratory of Theoretical Chemistry, The Royal Institute of Technology, Roslagstullbacken 15, 

Stockholm, S-10691 Sweden 

Trygve Helgaker 

Department of Chemistry, University of Oslo, P.O. Box 1033 Blindern, N-0315 Norway 

Received 20 May 2005; accepted 7 June 2005; published online 22 August 2005 

The trust-region self-consistent field TRSCF method is extended to the optimization of the Kohn– 

Sham energy. In the TRSCF method, both the Roothaan–Hall step and the density-subspace 

minimization step are replaced by trust-region optimizations of local approximations to the Kohn– 

Sham energy, leading to a controlled, monotonic convergence towards the optimized energy. 

Previously the TRSCF method has been developed for optimization of the Hartree–Fock energy, 

which is a simple quadratic function in the density matrix. However, since the Kohn–Sham energy 

is a nonquadratic function of the density matrix, the local energy functions must be generalized for 

use with the Kohn–Sham model. Such a generalization, which contains the Hartree–Fock model as 

a special case, is presented here. For comparison, a rederivation of the popular direct inversion in 

the iterative subspace DIIS algorithm is performed, demonstrating that the DIIS method may be 

viewed as a quasi-Newton method, explaining its fast local convergence. In the global region the 

convergence behavior of DIIS is less predictable. The related energy DIIS technique is also 

discussed and shown to be inappropriate for the optimization of the Kohn–Sham energy. © 2005 

American Institute of Physics. DOI: 10.1063/1.1989311 

I. INTRODUCTION 

Computational methods rigorously based on the laws of 

quantum mechanics are becoming an evermore important 

component of scientific and technological progress in many 

branches of natural science, including biochemistry and materials 

science. Quantum-chemical codes, in particular, are 

today routinely used to perform calculations on molecules 

containing hundreds of atoms. Furthermore, with the advent 

of density-functional theory DFT methods, molecules with 

more complex electronic structure and larger parts of potential 

surfaces may be calculated than with the Hartree–Fock 

method. Most of these calculations are performed by nonspecialists, 

not trained in quantum chemistry or in numerical 

simulations. An important challenge is thus to develop 

quantum-chemical techniques that allow the user to focus on 

the physical and chemical interpretations of the results of the 

calculations by eliminating or at least minimizing the need to 

understand the details of the numerical algorithms. 

A central numerical task of the Hartree–Fock wavefunction 

theory and Kohn–Sham DFT is the minimization of 

the electronic energy function with respect to the density 

matrix of a single-determinant reference wave function. In 

its original formulation, the self-consistent field SCF 

method for optimizing Hartree–Fock and Kohn–Sham energies 

E SCF consists of a sequence of Roothaan–Hall 

a Electronic mail: lea@chem.au.dk 

iterations. 1,2 At each iteration, the Fock/Kohn–Sham matrix 

is first constructed from the current approximate atomicorbital 

AO density matrix; next, an improved AO density 

matrix is generated from the molecular orbitals MOs obtained 

by diagonalization of this Fock/Kohn–Sham matrix. 

Unfortunately, this simple SCF scheme converges only in 

simple cases. To improve upon its convergence, the optimization 

is modified by constructing the Fock/Kohn–Sham matrix 

not directly from the AO density matrix of the last diagonalization 

but rather from an averaged density matrix, 

calculated in the subspace of the density matrices of the current 

and previous iterations. In practice, the averaged AO 

density matrix is calculated by the direct inversion in iterative 

subspace DIIS method of Pulay, 3 nowadays implemented 

in most electronic-structure programs. In the DIIS 

method, the averaged density matrix is a linear combination 

of density matrices, where the expansion coefficients are obtained 

by minimizing the norm of the corresponding linear 

combination of the gradients. 

Over the years, several attempts have been made to improve 

upon the DIIS method. In particular, Kudin et al. have 

proposed the energy DIIS EDIIS method, 4 where the 

gradient-norm minimization is replaced by a minimization of 

an approximation to the true energy function E SCF , where the 

expansion coefficients of the averaged density matrix are 

used as variational parameters. For the special case of two 

density matrices such an approach was first developed by 

Karlström. 5 

0021-9606/2005/1237/074103/17/$22.50 

123, 074103-1 

© 2005 American Institute of Physics 

Downloaded 23 Aug 2005 to 130.225.22.254. Redistribution subject to AIP license or copyright, see http://jcp.aip.org/jcp/copyright.jsp

074103-2 Thøgersen et al. J. Chem. Phys. 123, 074103 2005 

Recently, we introduced the trust-region self-consistent 

field TRSCF method 6 for SCF density-matrix optimizations. 

In the TRSCF method, the diagonalization step trustregion 

Roothaan–Hall TRRH and the density-optimization 

step trust-region density-subspace minimization TRDSM 

are realized as minimizations of local energy model functions 

of E SCF . The local energy functions are expanded about 

the current AO density matrix and have the same gradients as 

the true energy E SCF but approximate Hessians. In the course 

of the SCF optimization, each step is restricted to be within 

the trust region of the current model, that is, within the region 

where the model accurately represents the true energy 

function. In TRDSM the steplength is controlled through a 

standard trust-region optimization 7 and in TRRH the 

steplength is controlled through a level shift. 8 In this manner, 

a reliable and systematic energy lowering of E SCF is ensured 

at each iteration. 

In the first implementation of the TRSCF method, the 

focus was on the optimization of the Hartree–Fock energy. In 

this paper, the focus is on the optimization of the Kohn– 

Sham energy. In the Kohn–Sham theory, the energy difference 

between the highest occupied MO and lowest unoccupied 

MO the HOMO-LUMO gap is usually much smaller 

than that in the Hartree–Fock theory, making the optimization 

more difficult. Here, we investigate the consequences of 

this smaller HOMO-LUMO gap for the global and local convergence 

characteristics for the Roothaan–Hall optimization 

step. In the Hartree–Fock theory, the energy function is quadratic 

in the density matrix, whereas, in the Kohn–Sham 

theory, it becomes a nonquadratic function because of the 

exchange-correlation contribution to the energy. In our previous 

implementation of the TRSCF method, the model 

function used to determine the averaged density matrix was 

specially designed for the Hartree–Fock theory, assuming 

that the energy depends quadratically on the density matrix. 

For the Kohn–Sham theory, the model function must be generalized. 

Such a generalization is presented here. 

In this paper, the DIIS algorithm is also rederived to 

understand better when it can safely be applied. In particular, 

we find that the DIIS method may be viewed as a quasi- 

Newton method in the local region, explaining its fast local 

convergence. The convergence characteristics of the DIIS 

method in the global region are less predictable. 

Recently, and along the same lines as our TRRH method, 

Francisco et al. introduced their globally convergent trustregion 

methods for SCF, 9 where the standard fixed-point 

Roothaan–Hall step is replaced by a trust-region optimization 

of a model energy function. Any acceleration scheme, 

such as DIIS, EDIIS, and the TRDSM method, can then be 

combined with this method. 

After an introduction to the SCF problem in Sec. II, we 

examine the Roothaan–Hall scheme in Sec. III. In particular, 

we identify the model energy function that is effectively being 

optimized in the diagonalization step, demonstrating how 

convergence can be improved upon by level shifting. In Sec. 

IV, we consider the density-matrix averaging step. We establish 

the model energy function of the weights of the density 

matrices and perform an order analysis of the resulting 

scheme, demonstrating that it represents a balanced approximation; 

next, we compare our local energy function with the 

EDIIS function, showing that the latter misses a term that is 

necessary for calculating the correct gradient. After a brief 

discussion of configuration shifts in Sec. V, we present in 

Sec. VI a rederivation of the DIIS algorithm, establishing its 

equivalence with the quasi-Newton method in the local region. 

Section VII contains some convergence examples for 

the DFT calculations, using the TRSCF algorithm and some 

of its alternatives. Finally, Sec. VIII contains some concluding 

remarks. 

II. THE KOHN–SHAM ENERGY AND THE 

ROOTHAAN–HALL METHOD 

For a closed-shell system with N/2 electron pairs, the 

Kohn–Sham energy excluding the nuclear-nuclear repulsion 

contribution is given by 10 

E KS D =2TrhD +TrDGD + E XC D. 

Here D is the scaled one-electron density matrix in the AO 

basis, D= 1 2 DAO ; h is the one-electron Hamiltonian matrix in 

this basis; and the elements of GD are given by 

G D =2 

 

g D − g D , 

 

where g are the two-electron AO integrals. The first term 

in Eq. 2 represents the Coulomb contribution and the second 

term the contribution from exact exchange, with =1 in 

the Hartree–Fock theory, =0 in the pure DFT, and 0 in 

the hybrid DFT. The exchange-correlation energy E XC D in 

Eq. 1 is a functional of the electron density. In the localdensity 

approximation LDA, the exchange-correlation energy 

is local in the density, whereas, in the generalized gradient 

approximation GGA, it is also local in the squared 

density gradient, that is, it may be expressed as 

E XC D = fx,xdx. 

Here the electron density x and its squared gradient norm 

x are given by 

x = T xDx, 

x = x · x, 

1 

2 

3 

4a 

4b 

where x is a column vector containing the AOs. We note 

that the exchange-correlation energy density fx,x in 

Eq. 3 is a nonlinear and nonquadratic function of x and 

x. In the following, we shall therefore rely on an expansion 

of E XC D around some reference density matrix D 0 , 

E XC D = E XC D 0 + D − D 0 T 1 

E XC 

+ 1 2 D − D 0 T E 2 XC D − D 0 + ¯ , 5 

where the derivatives E n XC 

have been evaluated at D=D 0 and 

where, for convenience, we have used a vector-matrix notation 

for D, E 1 XC 

, and E 2 XC 

. 

The first derivative of E KS D with respect to the density 

matrix D is then given by 


074103-3 Trust-region self-consistent-field J. Chem. Phys. 123, 074103 2005 

E 1 KS D = E KSD 

=2FD, 6 

D 

where we have introduced the Fock/Kohn–Sham matrix, 

FD = h + GD + 1 2 E XC 

1 D. 

We note, that for the energy in Eq. 1 to be a valid Kohn– 

Sham energy, the density matrix D must satisfy the symmetry, 

trace, and idempotency conditions, 

D T = D, 

Tr DS = N 2 , 

DSD = D, 

7 

8a 

8b 

8c 

where S is the AO overlap matrix. Therefore, we cannot 

carry out a free minimization of the total energy in Eq. 1, 

but must restrict ourselves to those changes in the density 

matrix that comply with these requirements. 

The Kohn–Sham energy E KS is traditionally optimized 

self-consistently by fixed-point iterations. From the current 

approximation D 0 to the density matrix, the Kohn–Sham matrix 

FD 0 is calculated from Eq. 7, followed by the solution 

of the Roothaan–Hall generalized eigenvalue 

equations: 1,2 

FD 0 C occ = SC occ , 

where C occ is the set of occupied MOs and is a diagonal 

matrix containing the associated eigenvalues orbital energies. 

An improved density matrix is next calculated from the 

occupied MOs as 

D = C occ C T occ , 10 

and the Roothaan–Hall fixed-point iteration is established by 

constructing the Kohn–Sham matrix FD from this density 

matrix, followed by diagonalization according to Eq. 9. 

Note that, since 

C occ UU T C T 

occ = C occ C T occ , 11 

where U is unitary, the Kohn–Sham density matrix in Eq. 

10 and hence the energy are invariant to unitary transformations 

among the occupied MOs. 

The naive Roothaan–Hall fixed-point iteration outlined 

above converges only in simple cases. To improve upon this 

scheme, the new Kohn–Sham matrix is usually not calculated 

directly from the density matrix obtained by diagonalization 

of the previous Kohn–Sham matrix, but rather from 

the density matrix obtained by diagonalizing some linear 

combinations of the current and n previous Kohn–Sham matrices, 

n 

F¯ = F0 + c i FD i . 

12 

i=0 

Typically, the coefficients c i are obtained by the DIIS method 

as the weights of an improved density matrix, 

9 

n 

D¯ = D 0 + c i D i . 

i=0 

13 

Upon diagonalization of F¯ according to Eq. 9, the new 

density matrix is obtained from Eq. 10, thereby establishing 

the iterations. In general, the averaged density matrix in 

Eq. 13 is not idempotent and therefore does not represent a 

valid density matrix; moreover, since the Kohn–Sham matrix 

unlike the Fock matrix is nonlinear in the density matrix, 

the averaged Kohn–Sham matrix in Eq. 12 is different from 

FD¯ . For these reasons, we cannot associate the averaged 

Kohn–Sham matrix in Eq. 12 uniquely with a valid Kohn– 

Sham matrix. Usually, this does not matter much since the 

subsequent diagonalization of the Kohn–Sham matrix nevertheless 

produces a valid density matrix according to Eq. 10. 

In the following, we shall disregard the complications arising 

from the use of the averaged Kohn–Sham matrix in Eq. 12, 

noting that the errors introduced by this approach may easily 

be corrected for, if necessary. 

In the remainder of this paper, we discuss the TRSCF 

method, which differs from the traditional SCF scheme by 

the consistent use of trust-region techniques for optimization 

control, both in the Roothaan–Hall diagonalization step in 

Eq. 9 and in the construction of the averaged density matrix 

in Eq. 13. In particular, the traditional Roothaan–Hall eigenvalue 

problem is replaced by a level-shifted eigenvalue 

problem, where the level shift is determined from trustregion 

considerations, resulting in the TRRH step. Similarly, 

the averaged density matrix is determined by a TRDSM 

technique rather than by the traditional DIIS method. As we 

shall see, the combined use of the TRRH and TRDSM 

schemes in the TRSCF method leads to a highly efficient and 

robust SCF scheme, characterized, in its most robust implementation, 

by a monotonic convergence towards the optimized 

Kohn–Sham energy. 

III. TRUST-REGION ROOTHAAN–HALL OPTIMIZATION 

A. The trust-region Roothaan–Hall method 

We begin by noting that the solution of the traditional 

Roothaan–Hall eigenvalue problem in Eq. 9 may be regarded 

as the minimization of the sum of the energies of the 

occupied MOs, 11 

E RH D =2 i =2TrF 0 D, 

14 

i 

subject to MO orthonormality constraints, 

C T occ SC occ = I N/2 , 15 

where F 0 is typically obtained as a weighted sum of the 

Kohn–Sham matrices such as F¯ in Eq. 12. Since Eq. 14 

represents a crude model of the true Kohn–Sham energy 

with the same first-order term but different zero- and 

second-order terms as discussed in Sec. III B, it has a rather 

small trust radius. A global minimization of E RH D, asaccomplished 

by the solution of the Roothaan–Hall eigenvalue 

problem in Eq. 9, may therefore easily lead to steps that are 

longer than the trust radius and hence unreliable. To avoid 



such steps, we shall impose on the optimization of Eq. 14 

the constraint that the new density matrix D does not differ 

much from the old matrix D 0 , that is, the S norm of the 

density difference should be equal to a small number , 

D − D 0 2 S =TrD − D 0 SD − D 0 S =−2TrD 0 SDS + N 

= . 16 

The optimization of Eq. 14 subject to the constraints in 

Eqs. 15 and 16 may be carried out by introducing the 

Lagrangian 

L =2TrF 0 D −2Tr DSD 0 S − 1 2N − 

−2TrC T occ SC occ − I N/2 , 17 

where is the undetermined multiplier associated with the 

constraint in Eq. 16, whereas the symmetric matrix contains 

the multipliers associated with the MO orthonormality 

constraints. Differentiating this Lagrangian with respect to 

the MO coefficients and setting the result equal to zero, we 

arrive at the level-shifted Roothaan–Hall equations, 

F 0 − SD 0 SC˜ occ = SC˜ occ. 

18 

Since the density matrix in Eq. 10 is invariant to unitary 

transformations among the occupied MOs in C˜ occ, we 

may transform this eigenvalue problem to the canonical basis, 

F 0 − SD 0 SC occ = SC occ , 

19 

where the diagonal matrix contains the orbital energies. 

Note that, since D 0 S projects onto the part of C occ that is 

occupied in D 0 see Ref. 11, the level-shift parameter 

shifts only the energies of the occupied MOs. Therefore, the 

role of is to modify the difference between the energies of 

the occupied and virtual MOs, in particular, the HOMO- 

LUMO gap. 

Clearly, the success of the TRRH method will depend on 

our ability to make a judicious choice of the level-shift parameter 

in Eq. 19. In our standard TRRH implementation, 

we determine by requiring that D does not differ 

much from D 0 in the sense of Eq. 16, thereby ensuring a 

continous and controlled development of the density matrix 

from the initial guess to the converged one. In the following 

sections we discuss how is determined in this standard 

implementation. 

In view of the relative crudeness of the E RH D model, a 

more robust approach consists of performing a line search 

along the path defined by to obtain the minimum of the 

Kohn–Sham energy E KS D. Strictly speaking, this optimization 

is not a line search but rather a one-parameter optimization. 

One-parameter optimizations have previously 

been used by Seeger and Pople 12 to stabilize convergence of 

the RH procedure. 

For → Eq. 19 becomes equivalent to solving the 

eigenvalue equation, 

0 

SD 0 SC occ = SC 0 occ , 20 

where has eigenvalue 1 for the set of orbitals that are 

occupied in D 0 and eigenvalue 0 for the set of virtual orbitals. 

Equation 20 thus effectively divide the molecular orbitals 

into a set that is occupied and a set that is unoccupied, 

where the density D 0 is obtained from the occupied set, 

D 0 = C 0 occ C 0 occ T . 21 

Since F 0 is the gradient of E KS at D 0 , the step from Eq. 19 

for large is in the steepest-descent direction and will therefore 

give a decrease in the Kohn–Sham energy compared to 

the energy at D 0 . However, this TRRH line-search TRRH- 

LS algorithm is more expensive than the standard method, 

requiring the repeated construction of the Kohn–Sham matrix 

at each SCF iteration. 

B. Comparison of the Roothaan–Hall and Kohn–Sham 

energy functions 

To understand better our strategy for determining the 

level-shift parameter in the Kohn–Sham energy optimizations, 

we here examine the Roothaan–Hall model energy of 

Eq. 14 in more detail, comparing it with the true Kohn– 

Sham energy of Eq. 1. Expanding the Kohn–Sham and 

Roothaan–Hall energies about the reference density matrix 

D 0 and neglecting the differences between F 0 and FD 0 

noted in Sec. II, we obtain 

E KS D = E KS D 0 +2TrFD 0 D − D 0 

+TrD − D 0 GD − D 0 + E XC D − E XC D 0 

−TrD − D 0 E 1 XC D 0 , 

22 

E RH D = E RH D 0 +2TrFD 0 D − D 0 . 

23 

These expansions have the same first-order term 2 Tr FD 0 

D−D 0 but different zero- and second-order terms. In an 

orthonormal MO basis, we may express any valid density 

matrix D in terms of the reference density matrix D 0 as 

DK = exp− KD 0 expK, 

24 

where the antisymmetric rotation matrix may be written in 

the form 

K = 0 − T 

. 25 

0 

The diagonal block matrices representing rotations among 

the occupied MOs and among the virtual MOs are zero since 

the density matrix in Eq. 10 is invariant to such rotations 

see Eq. 11. In terms of K, the first-order Roothaan–Hall 

and Kohn–Sham energies may be written as 

2TrFD 0 D − D 0 =2TrFD 0 

exp− KD 0 expK − D 0 26 

and thus share a series of higher-order terms in K. If these 

shared higher-order terms are larger than the higher-order 

terms that occur only in the Kohn–Sham energy in Eq. 22, 

then the energy changes predicted by the Roothaan–Hall 

function in Eq. 23 will be a good approximation to the 

changes in the Kohn–Sham energy, even for large 

rotations K. 

Let us now compare the derivatives of the Roothaan– 

Hall and Kohn–Sham energies with respect to the orbital- 



rotation parameters ai in this paper, i, j, k, and l denote the 

occupied indices and a, b, c, and d denote the virtual indices. 

As already established, the two energy functions have 

the same gradients, 

E 1 KS ai = 

E 1 RH ai = 

E KS 

=−4F ai , 

ai 

=0 

ERH 

 

ai 

=−4F ai . =0 

27a 

27b 

The Hessians are most conveniently expressed in a basis 

where the occupied-occupied and virtual-virtual blocks of 

the Kohn–Sham matrix are diagonal, 

F ab = ab a , 

28a 

F ij = ij i . 

28b 

Since, at convergence where F is fully diagonal, the diagonal 

elements a and i become the orbital energies, we shall refer 

to these as the pseudo-orbital energies or sometimes just the 

orbital energies. In this basis, the Hessians of the two energy 

functions become 

E 2 KS aibj = 

2 E KS 

=4 ij ab a − i + M aibj , 

ai bj=0 

29a 

E 2 RH aibj = 2 E RH 

=4 ij ab a − i , 29b 

ai bj=0 

where 

M aibj =16g aibj −4g abij + g ajib + E 2 XC D aibj . 30 

Clearly, the Roothaan–Hall Hessian in Eq. 29b is positive 

definite whenever the energies of the occupied orbitals are 

lower than the energies of the virtual orbitals, that is, whenever 

the HOMO-LUMO gap is positive. Furthermore, if the 

differences a − i in the Hessians are large compared to M aibj 

in Eq. 30, then E 2 RH 

is a good approximation to E 2 KS 

. 

C. Quadratically convergent trust-region optimization 

To minimize the Roothaan–Hall energy in Eq. 14, consider 

the second-order expansion in the orbital-rotation parameters 

, 

E RH 2 = E RH + T E 1 RH + 1 2 T E 2 RH . 

31 

The unconstrained Newton step is obtained by setting the 

gradient equal to zero, 

E 2 

RH 

 

= E RH 

1 + E 2 RH =0. 

32 

Solution of these equations yields the Newton step, with its 

fast second-order convergence in the local region. In the global 

region, far away from the true minimum, it is not reasonable 

to accept large steps since the expansion in Eq. 31 is 

only a valid approximation to E RH D for h, where h is 

the trust radius. Furthermore, if E 2 RH 

is indefinite, the Newton 

step in Eq. 32 may not reduce the energy. Therefore, if the 

Hessian is not positive definite or if the Newton step is too 

large, we solve instead a modified set of equations, where we 

minimize Eq. 31 subject to the constraint =h. To accomplish 

this, we introduce an undetermined multiplier 

and set up the Lagrangian 

L, = E RH 2 + 1 2 T − h 2 , 

33 

whose stationary points are determined from the equation 

L, 

= E 1 

RH + E 2 RH + =0, 

34 

leading to the level-shifted Newton step, 

=−E 2 RH + I −1 E 1 RH . 

35 

The multiplier is chosen such that =h and such that the 

energy change predicted by E RH 2 is negative. Consider the 

first- and second-order changes of the Roothaan–Hall energy, 

E RH 1 − E RH = T E 1 RH =− T E 2 RH + I, 36a 

E RH 2 − E RH = T E 1 RH + 1 2 T E 2 RH 

=− 1 2 T E 2 RH + I − 1 2 T . 36b 

2 

If E RH 

is positive definite, both corrections are negative for 

2 

0; if E RH 

is indefinite, they are negative for − 1 , 

where 1 is the lowest negative eigenvalue i.e., the HOMO- 

LUMO gap. In general, therefore, we choose such that 

max0,− 1 . As discussed in Ref. 6, it is always possible 

to find a level-shift parameter that satisfies this requirement. 

D. The quadratically convergent SCF method 

It is possible to optimize the Hartree–Fock and Kohn– 

Sham energies in Eq. 1 directly, without invoking the 

Roothaan–Hall energy function in Eq. 14. In the secondorder 

trust-region Newton method, the optimization then 

consists of a sequence of level-shifted Newton iterations. At 

each iteration, the linear equation in Eq. 35 is solved, replacing 

E RH 

1 2 1 

and E RH 

by E KS 

and E 2 KS 

, respectively. The 

resulting optimization scheme is known as the quadratically 

convergent SCF QC-SCF method. 13,14 The method is quadratically 

convergent in the local region and has a dynamic 

update of the trust region as discussed by Fletcher. 7 

E. The level-shift parameter in the TRRH method 

1. The global region 

A TRRH diagonalization step determined with =0 in 

Eq. 19 corresponds to the global minimum of E RH D. 

Therefore, when we impose the constraint in Eq. 16 on the 

difference between the old and new density matrices, then 

the step-size control is applied to a global optimization of 

E RH D. By contrast, in the quadratically convergent trustregion 

optimization of E RH in Eq. 35, step-size control 

is applied to a local model of E RH , that is, to the optimization 

of the second-order Taylor expansion of the energy 

E RH 2 in Eq. 31 inside the trust region. 

In the quadratically convergent trust-region method, we 

direct the step towards the minimum by choosing the level- 



shift parameter in Eq. 35 such that the lowest diagonal 

element of the Hessian LUMO a − HOMO i + becomes positive. 

Alternatively, in the Kohn–Sham diagonalization step in Eq. 

19, we may ensure positive definiteness by monitoring the 

dependence of the pseudo-orbital energies on the levelshift 

parameter in Eq. 19, adjusting it such that the 

HOMO-LUMO gap, 

ai = LUMO a − HOMO i , 

37 

becomes positive. The configuration that defines the HOMO- 

LUMO gap is identified from the eigenvalues of Eq. 20 that 

are equal to one. Insisting on a smooth development of the 

MOs from those that are occupied in D 0 to those that are 

obtained by diagonalizing Eq. 19, we restrict to the interval 

min , where min is the smallest value for 

which the HOMO-LUMO gap is positive. In addition, the 

step must be constrained such that Eq. 16 is fulfilled. In 

passing, we note that the reference density matrix D 0 may 

not always be idempotent, for example, it may be D¯ of Eq. 

13, in which case its eigenvalues are not exactly 1. In such 

cases, the matrix 

D¯ 0 idem = C 0 occ C 0 occ T 38 

constructed from the eigenvectors of Eq. 20 with D 0 replaced 

by D¯ represents a purification of D¯ . 

The constraint on the change in the AO density in Eq. 

16 refers to a change which may arise not only from small 

changes in many MOs but also from large changes in a few 

MOs or even in a single MO. In the TRRH algorithm, we 

shall require that the changes in the individual MOs are all 

small. Expanding the MO new i , obtained by diagonalization 

of Eq. 19, in the old MOs, we obtain 

new i = j 

old j new i old j + old a new i old a , 

a 

39 

where the first summation is over the occupied MOs and the 

second over the virtual MOs. The squared norm of the projection 

of new i onto the MO space associated with D 0 is 

therefore 

a orb i = old j new i 2 . 

40 

j 

To ensure small individual MO changes at each iteration to 

within a unitary transformation of the occupied MOs, we 

shall therefore require 

orb = min 

a min 

i 

a orb i A orb min , 

41 

where A orb min is close to 1. This constraint also ensures that the 

HOMO-LUMO gap in Eq. 37 stays positive. 

The Hessians of E RH and E KS in Eq. 29 both contain 

the orbital-energy difference term, while the Hessian of E KS 

also contains the terms M aibj of Eq. 30. When is large 

compared to the M aibj terms, the step generated by the levelshifted 

diagonalization in Eq. 19 is then of the same quality 

as that generated by a quadratically convergent trust-region 

optimization of E KS . However, since the step-size control in 

Eq. 22 is imposed on the global optimization, the quality of 

the step may be further improved relative to that obtained in 

a QC-SCF optimization of the Kohn–Sham energy. When the 

level shift is determined in the global region such that 

a orb min A orb min we see often not just this one orbital but many for 

which a orb i A orb min . In this way a large number of orbitals 

change significantly. 

2. The local region 

To investigate the local convergence of the TRRH algorithm 

in Eq. 19, we first note that, in the local region near 

convergence, the gradient in Eq. 6 and thus the blocks F ov 

and F vo between the occupied and virtual orbitals in the 

Kohn–Sham matrix in the representation of Eq. 28, 

F = o 

F ov 

F vo v 

, 42 

are small, see Eq. 27. Writing the unitary transformation of 

F generated by K in Eq. 25 as 

expKF exp− K = o 

F ov 

F vo v 

+ − T F vo − T v 

o F ov 

 

+ T − F ov o T 

+ O 2 , 43 

− v F vo 

we find that, to first order, the block diagonalization of the 

Kohn–Sham matrix may be accomplished by solving the following 

set of linear equations: 

F vo + o − v = 0. 

44 

Since these equations are identical to the Newton equation in 

Eq. 32, we conclude that, in the local region where the 

higher-order terms in may be neglected, the block diagonalization 

of the Kohn–Sham matrix is equivalent to the solution 

of the equation 

=−E 2 RH −1 E 1 RH . 

45 

Let these equations determine the step of iteration n and 

expand the Kohn–Sham gradient at iteration n+1 about iteration 

point n, 

1 

E KSn+1 

1 

= E KSn 

1 

= E KSn 

2 

+ E KSn n + O 2 

2 

− E KSn 

2 

E RHn 

Using Eqs. 27 and 29, we then obtain 

1 

E KSn+1 

1 

= E KSn 

2 

− E RHn 

1 

−1 E RHn + O 2 . 46 

2 

+ M n E RHn −1 1 

E KSn 

2 

=−M n E RHn −1 1 

E KSn , 47 

having neglected terms proportional to O 2 . Therefore, if 

2 

M n E RHn 

−1 has eigenvalues larger than 1, a simple TRRH 

sequence will diverge. This is particularly a problem in the 

Kohn–Sham theory, where the HOMO-LUMO gap the lowest 

eigenvalue of E 2 RH 

often is small compared to the contribution 

from M. To improve upon the local convergence, 

we may increase the HOMO-LUMO gap by level shifting, 

thereby reducing the magnitude of the eigenvalues of M n 

2 

E RHn 

−1 . We note that, when the simple TRRH sequence 

diverges, the TRSCF algorithm may still converge as TRRH 

mainly serves to provide a new density and TRDSM then 



FIG. 1. a The HOMO-LUMO gap ai , b the minimum overlap a orb min of 

the new occupied orbitals with the previous set of occupied orbitals, and c 

the changes in the model energy E RH —- and the Kohn–Sham energy 

E RH KS ---. All as a function of the level-shift parameter in the TRRH step 

of the seventh iteration of the zinc complex calculation seen in Fig. 5. 

optimizes the combination of the various densities. 

F. Examples of the trust-region 

Roothaan–Hall algorithm 

To illustrate how the TRRH algorithm is employed in the 

different parts of a Kohn–Sham energy optimization, we here 

consider how the level-shift parameter is determined in two 

iterations of the zinc complex calculation depicted in Sec. 

VII, Fig. 5. We first consider iteration 7, which is in the 

global region of the optimization, and then proceed to iteration 

22, as an example of a step in the local region. 

In Figs. 1a and 1b, we have plotted the HOMOorb 

LUMO gap ai of Eq. 37 and the overlap parameter a min 

of Eq. 41, respectively, as functions of the level-shift parameter 

. The corresponding changes in the Kohn–Sham 

energy E RH KS dash line and in the Roothaan–Hall model 

energy E RH full line of Eqs. 22 and 23 are plotted 

in Fig. 1c. We note that the change in the Kohn–Sham 

energy has been calculated as 

E RH KS = E KS D − E KS D¯ 0 idem , 

48 

where D and D¯ 0 idem are the density matrices calculated 

from the solutions to the eigenvalue problems in Eqs. 19 

and 20, respectively. 

FIG. 2. a The HOMO-LUMO gap ai , b the minimum overlap a orb min of 

the new occupied orbitals with the previous set of occupied orbitals, and c 

the changes in the model energy E RH —- and the Kohn–Sham energy 

E RH KS ---. All as a function of the level-shift parameter in the TRRH step 

of the 22nd iteration of the zinc complex calculation seen in Fig. 5. 

In Fig. 1a, we see that, in iteration 7, ai is linear 

for 2.2, as the density matrix changes smoothly with 

decreasing from that of Eq. 20 to that obtained by applying 

the Aufbau principle to the solution of Eq. 19. For 

2.2, the occupied and virtual orbitals defined by the previous 

density interchange. The value of =5.078 used in this 

iteration was chosen from the requirement a orb min =A orb min =0.98 

in Eq. 41, restricting the new orbital component to 0.02. 

Figure 1c shows that an even lower energy would have 

been obtained by reducing the level shift to about 2.4, but it 

would be very difficult to identify this optimal value of 

without constructing additional Kohn–Sham matrices, since 

the Roothaan–Hall model energy is not accurate for small . 

In short, the identification of from the overlap requirement 

a orb min =A orb min appears to be a good and secure way to control the 

step sizes in the optimization. 

Figures 2a–2c are equivalent to Figs. 1a–1c, but 

for iteration 22 in the local part of the optimizaton. Notably, 

the linear regime of ai in Fig. 2a now extends to 

include =0, which corresponds to an unconstrained 

Roothaan–Hall step. Also, since a orb min =1.0000 for =0, we 

can no longer determine the level shift from the overlap criterion 

a orb min =A orb min . From Fig. 2c, we see that E RH KS dash 



TABLE I. Convergence details for the TRRH steps in the TRSCF calculation 

on the zinc complex in Fig. 5. Energies given in a.u. 

Iteration RH a orb min RH 

RH 

E KS 

line takes on its minimum value at =1.3; for smaller , 

the energy increases, giving a total increase of 6.0·10 −5 E h 

for =0. 

The TRRH energy increase in the local part of the SCF 

optimization is particularly prominent for the DFT calculations. 

In the Hartree–Fock calculations, the TRRH model 

energy describes the SCF energy equally well in the local 

and global regions of the optimization. To avoid the increase 

in energy, we could add a constant minimum level shift, but 

this may in some cases slow down the convergence. Typically, 

the increase in the Kohn–Sham energy in the TRRH 

steps in the local region of the optimization is compensated 

by a larger energy decrease in the TRDSM step, ensuring 

an overall decrease in the Kohn–Sham energy in the iteration. 

In Table I, we have listed the values of several parameters 

characterizing the TRRH steps in the TRSCF iterations 

of the zinc complex calculation. In the first 17 iterations, the 

constraint a orb min =A orb min is active and determines the level-shift 

parameter. Note that, in the global region, E RH is a reasonable 

good approximation to E RH KS . After iteration 17, the 

local region of the Kohn–Sham energy optimization is approached 

and E RH is no longer a good approximation to 

E RH KS . In this region, the Kohn–Sham energy increases and it 

is the TRDSM algorithm that ensures the calculations convergence 

see Sec. VII, Table IV. 

IV. TRUST-REGION DENSITY-SUBSPACE 

MINIMIZATION IN DFT 

E RH 

2 22.57 0.994 −8.366 865 −8.411 913 

3 26.71 0.980 −20.122 850 −20.895 267 

4 30.54 0.980 −31.041 569 −35.286 269 

5 19.21 0.980 −27.278 985 −31.363 274 

6 10.31 0.980 −15.101 958 −18.277 717 

7 5.07 0.980 −10.675 155 −13.082 691 

8 2.96 0.980 −6.749 189 −7.197 438 

9 2.18 0.981 −3.181 254 −4.589 630 

10 4.68 0.980 0.394 694 −3.712 621 

11 1.40 0.980 −1.676 644 −2.885 580 

12 1.40 0.980 −1.743 634 −1.775 556 

13 0.93 0.980 −0.402 427 −0.843 260 

14 0.78 0.980 −0.376 675 −0.622 386 

15 0.54 0.981 −0.211 002 −0.227 722 

16 0.15 0.982 0.029 066 −0.199 268 

17 0.07 0.980 0.010 452 −0.068 243 

18 0.00 0.991 0.043 376 −0.037 071 

19 0.00 0.997 0.012 644 −0.009 493 

20 0.00 0.999 0.001 104 −0.000 931 

21 0.00 0.999 0.000 352 −0.000 249 

22 0.00 0.999 0.000 059 −0.000 049 

23 0.00 0.999 0.000 010 −0.000 006 

24 0.00 1.000 0.000 000 −0.000 000 

After a sequence of the Roothaan–Hall iterations, we 

have determined a set of the density matrices D i and a corresponding 

set of the Kohn–Sham matrices F i =FD i . The 

question then arises as to how to make the best use of the 

information contained in these collected density and Kohn– 

Sham matrices. 

A. Parametrization of the DSM density matrix 

Taking D 0 as the reference density matrix, we write the 

improved density matrix as a linear combination of the current 

and previous density matrices, 

n 

D¯ = D 0 + c i D i , 

49 

i=0 

which, ideally, should satisfy the symmetry, trace, and idempotency 

conditions in Eq. 8 of a valid Kohn–Sham density 

matrix. Whereas the symmetry condition in Eq. 8a is trivially 

satisfied for any such linear combination, the trace condition 

in Eq. 8b holds only for combinations that satisfy the 

restriction 

n 

c i =0, 

50 

i=0 

leading to a set of n+1 constrained parameters c i with 0 

in. Alternatively, an unconstrained set of n parameters c i 

with 1in can be used, with c 0 defined so that the trace 

condition is fulfilled, 

n 

c 0 =− c i . 

51 

i=1 

In terms of these independent parameters, the density matrix 

D¯ becomes 

D¯ = D 0 + D + , 

where we have introduced the notations 

n 

D + = c i D i0 , 

i=1 

D i0 = D i − D 0 . 

52 

53a 

53b 

Unlike the symmetry and trace conditions in Eqs. 8a 

and 8b, the idempotency condition in Eq. 8c is in general 

not fulfilled for linear combinations of D i . Still, for any averaged 

density matrix D¯ in Eq. 52 that does not fulfill the 

idempotency condition, we may generate a purified density 

matrix with a smaller idempotency error by the 

transformation, 15 

D˜ =3D¯ SD¯ −2D¯ SD¯ SD¯ . 

54 

The purification of the density matrix has previously been 

used in connection with minimization of energy 

functions. 16–19 

Introducing the idempotency correction, 

D = D˜ − D¯ , 

55 

we may then write the purified averaged density matrix in 

the form 



D˜ = D 0 + D + + D . 

56 

In the following, we shall analyze the relative magnitudes of 

the terms D + and D entering Eq. 56. 

B. Order analysis of the purified averaged 

density matrix 

For simplicity, we shall work in the orthonormal MO 

basis that diagonalizes the reference density matrix, 

D 0 = I 0 

57 

0 0, 

and consider the case with only one additional density matrix 

D 1 . According to Eq. 24, an antisymmetric matrix K of the 

form in Eq. 25 exists such that 

D 1 = exp− KD 0 expK 

= D 0 + − T − 

T 

− T + O 3 , 58 

giving rise to the following averaged density matrix: 

D¯ = D 0 + cD 10 = D 0 + − cT − c 

T 

− c c T + Oc 3 . 

The idempotency error of D¯ is given by 

59 

D¯ D¯ − D¯ = c 2 − c T 0 

0 T + Oc 4 , 60 

showing that D¯ is idempotent only to first order in . To 

reduce the idempotency error, we subject D¯ to the purification 

in Eq. 54, obtaining 

D˜ =3D¯ 2 −2D¯ 3 = D 0 + T − c2 T − c T 

− c c 2 + Oc 3 . 

 

Finally, comparing Eqs. 59 and 61, we obtain 

D˜ = D¯ + Oc 2 , 

61 

62 

demonstrating that the impure and purified average density 

matrices differ by terms proportional to c 2 . Since the 

McWeeny purification in Eq. 54 converges quadratically, 

we conclude that the idempotency error of Eq. 62 is proportional 

to c 2 4 . 

In a more general analysis, we would not assume an 

orthonormal basis and we would also include several density 

matrices D i =exp−K i D 0 expK i . The essential result is 

then that we may write Eq. 56 as 

n 

D˜ = D 0 + 

i=1 

n 

c i D i0 + O c i D i0 2, 

i=1 

63 

where we have used the fact that D i0 is proportional to i . 

We conclude that while D + is linear in c i and D i0 , the idempotency 

correction D to D¯ is linear in c i but quadratic in D i0 . 

The conclusions to be derived from this analysis are summarized 

in Table II. 

TABLE II. Comparison of the properties of the unpurified density D¯ and the 

purified density D˜ . 

C. Construction of the DSM energy function 

Having established a useful parametrization of the averaged 

density matrix in Eq. 52 and having considered its 

purification in Eq. 54, let us now consider how to determine 

the best set of coefficients c i . Expanding the energy for 

the purified averaged density matrix in Eq. 56 around the 

reference density matrix D 0 , we obtain to second order 

ED˜ = ED 0 + D + + D T E 0 

1 

+ 1 2 D + + D T E 0 2 D + + D . 64 

To evaluate the terms containing E 0 1 and E 0 2 , we make the 

identifications, 

E 0 1 =2F 0 , 

E 0 2 D + =2F + + OD + 2 , 

65 

66 

which follow from Eq. 6 and from the second-order Taylor 

expansion of E 1 0 

about D 0 , and where we have generalized 

the notation in Eq. 53a to the Kohn–Sham matrix F + 

= n 

i=1 

c i F i0 . Ignoring the terms quadratic in D in Eq. 64 

and quadratic in D + in Eq. 66, we then obtain for the DSM 

energy, 

E DSM c = ED 0 +2TrD + F 0 +TrD + F + 

+2TrD F 0 +2TrD F + . 

67 

Finally, for a more compact notation, we introduce the 

weighted Kohn–Sham matrix, 

n 

F¯ = F0 + F + = F 0 + c i F i0 , 

68 

i=1 

and find that the DSM energy may be written in the form 

E DSM c = ED¯ +2TrD F¯ , 

69 

where the first term is quadratic in the expansion coefficients 

c i , 

ED¯ = ED 0 +2TrD + F 0 +TrD + F + , 

70 

and the second, idempotency-correction term is quartic in 

these coefficients: 

2TrD F¯ =Tr6D¯ SD¯ −4D¯ SD¯ SD¯ −2D¯ F¯ . 

D¯ 

Differences D¯ −D 0 =Oc D˜ −D¯ =Oc 2 

Idempotency error D¯ SD¯ −D¯ =Oc 2 D˜ SD˜ −D˜ =Oc 2 4 

Trace error Tr D¯ S− N 2=0 TrD˜ S− N 2=Oc 2 4 

71 

The derivatives of E DSM (c) are straightforwardly obtained 

by inserting the expansions of F¯ and D¯ , using the independent 

parameter representation. 

D˜ 



TABLE III. Convergence details for the TRDSM steps in the TRSCF calculation 

on the zinc complex in Fig. 5. Energies given in a.u. 

Iteration D S 

2 

D + S 

2 

DSM 

E KS 

E DSM 

3 1.612 753 6.129 310 −48.255 717 −49.742 656 

4 1.488 082 12.140 844 −105.996 850 −111.554 301 

5 0.206 716 1.594 214 −43.136 482 −41.110 879 

6 1.504 099 3.162 679 −26.390 457 −26.511 025 

7 0.096 714 1.468 925 −14.755 377 −14.499 582 

8 0.110 282 1.525 848 −7.711 220 −7.278 600 

9 0.086 759 1.569 113 −5.289 340 −5.165 696 

10 0.423 825 1.614 867 −2.684 359 −3.500 173 

11 0.196 628 1.002 744 −1.053 899 −1.126 867 

12 0.111 409 0.867 238 −1.054 903 −0.936 180 

13 0.093 520 0.729 574 −0.658 907 −0.621 180 

14 0.054 596 0.324 338 −0.293 889 −0.238 992 

15 0.045 721 0.201 434 −0.213 251 −0.170 060 

16 0.026 474 0.242 928 −0.104 012 −0.096 482 

17 0.011 746 0.071 203 −0.100 694 −0.093 602 

18 0.001 512 0.022 758 −0.043 180 −0.042 748 

19 0.000 687 0.040 675 −0.057 441 −0.056 819 

20 0.000 122 0.011 897 −0.016 501 −0.016 416 

21 0.000 025 0.001 164 −0.001 471 −0.001 453 

22 0.000 001 0.000 308 −0.000 428 −0.000 427 

23 0.000 000 0.000 050 −0.000 076 −0.000 076 

24 0.000 000 0.000 009 −0.000 012 −0.000 012 

25 0.000 000 0.000 000 −0.000 000 −0.000 000 

D. Optimization of the DSM energy 

The energy function E DSM c in Eq. 69 provides an 

excellent approximation to the exact Kohn–Sham energy 

E KS c about D 0 , with an error cubic in D + . It can be optimized 

by the trust-region method, as described in Ref. 6, 

yielding an improved density matrix D˜ , from which the 

Kohn–Sham matrix of the next TRRH iteration is constructed. 

However, to avoid the expensive calculation of the 

Kohn–Sham matrix from D˜ , we use instead in our TRDSM 

implementation the averaged Kohn–Sham matrix in Eq. 68. 

As in the TRRH step in Sec. III A, the averaged density 

matrix D¯ may also be determined by a line search. Here, the 

line search is made in the direction defined by the first step 

of the TRDSM algorithm, that is, the step at the expansion 

point D 0 . As in the TRRH step, such a line search is guaranteed 

to reduce the Kohn–Sham energy. We denote this line 

search algorithm TRDSM-LS. 

In the DSM scheme, we assume that the idempotency 

correction D =D˜ −D¯ is small relative to D + =D¯ −D 0 , both 

when discarding the terms quadratic in D in Eq. 64 and 

when constructing the Kohn–Sham matrix from D¯ rather 

than from D˜ in the subsequent Roothaan–Hall iteration. As is 

seen from Eq. 63, this assumption holds if the old density 

matrices D i are similar to D 0 . Formally, therefore, we should 

include in the TRDSM only density matrices that are similar 

to D 0 . In particular, if the orbital occupations change in the 

course of the Roothaan–Hall iterations, we should discard all 

density matrices that represent the old occupations. 

To demonstrate the validity of the assumption, that D is 

small compared to D + , we have in Table III listed D S 

2 

FIG. 3. The ratio between the norms of the idempotency correction to the 

density D S 2 =D˜ −D¯ S 2 and the density change D + S 2 =D¯ −D 0 S 2 in the 

TRDSM steps of the zinc complex calculation seen in Fig. 5. 

=TrD SD S and D + 2 S =TrD + SD + S at each iteration of the 

zinc complex calculation of Sec. VII. From Fig. 3, where the 

ratio D 2 S /D + 2 S is plotted, we see that, apart from iteration 

6, this ratio is always smaller than 0.3 and that it rapidly 

converges to zero in the local region. The neglect of the 

terms that are quadratic in D in the TRDSM method is thus 

well justified. In Table III, we have also listed the model 

energy change E DSM and the actual energy change E DSM KS , 

obtained as the difference between the Kohn–Sham energies 

calculated from the idempotent D¯ obtained as in Eq. 38 

and from D 0 : E DSM KS =E KS D¯ 0 idem −E KS D 0 . Clearly, 

E DSM c is an extremely good representation of E KS c for 

the step sizes taken by the TRDSM algorithm, as expected 

since E DSM c and E KS c differ in terms that are cubic in D + . 

E. Comparison of the DSM and EDIIS energies 

Neglecting the idempotency correction in the DSM energy 

in Eq. 69, we are left with ED¯ . In the Hartree–Fock 

theory, this remaining term may be expressed in several 

equivalent ways. First, it may be written as the energy of the 

weighted density matrix, 

E HF D¯ =2TrhD¯ +TrD¯ GD¯ , 

72 

where the weighted density matrix is defined as note the 

difference from Eq. 49 

n 

D¯ = d i D i , 

i=0 

n 

d i =1. 

i=0 

73 

In their development of the EDIIS method, Kudin et al. 4 

suggested the alternative form 

n 

E EDIIS D¯ = d i E SCF D i − 1 n 

i=0 

2 Tr d i d j F ij D ij , 74 

i,j=0 

where E SCF D may be the Hartree–Fock energy or the 

Kohn–Sham energy. In the Hartree–Fock theory, Eqs. 70, 

72, and 74 are equivalent since the Fock matrix is linear 

in the density matrix. By contrast, in the DFT, where the 

Kohn–Sham matrix contains terms that are nonlinear in the 

density matrix, these expressions are not equivalent. Below, 

we discuss some of the consequences of their nonequivalence 

in the DFT. 



Eliminating d 0 =1− n 

i=1 d i from Eq. 74, we may express 

the EDIIS energy in the independent representation of 

Eqs. 52 and 53, 

n 

E EDIIS D¯ = E SCF D 0 + d i E SCF D i − E SCF D 0 

i=1 

n 

− 

i=1 

n 

d i Tr F i0 D i0 + d i d j Tr F j0 D j0 

n 

i,j=1 

− 1 d i d j Tr F ij D ij . 

2 

i,j=1 

75 

Comparing this expression with ED¯ of Eq. 70, wefind 

that they have the same values at the expansion point D 0 but 

that their first derivatives differ since 

ED¯ 

c k 

=2TrF 0 D k0 , 76a 

E EDIIS D¯ 

= E SCF D k − E SCF D 0 −TrF k0 D k0 . 76b 

d k 

In the Hartree–Fock theory, it is easy to see that Eqs. 76a 

and 76b are identical. 

The DSM gradient is 

E DSM c ED¯ 

= +2 Tr D F¯ 

. 77 

c k c k c k 

Since E DSM is equal to E KS to first order, we have that 

E DSM c 

= E KS 

. 78 

c k c k 

The EDIIS gradient at the expansion point is thus not equal 

to the KS gradient as the last nonzero term in Eq. 77 the 

term resulting from the idempotency correction is missing. 

Further the correct gradient in the DSM can only be obtained 

in the DFT if Eq. 76a and not Eq. 76b is used. It is thus 

incorrect to use Eq. 76a in the DFT even though Eqs. 76a 

and 76b are equivalent in Hartree–Fock. 

V. CONFIGURATION SHIFT 

IN THE TRSCF ALGORITHM 

Since the TRSCF method has been designed for a 

smooth and controlled convergence of the density matrix, it 

does not allow for the abrupt changes in the orbitals associated 

with configuration shifts. Nevertheless, it may sometimes 

be advantageous to allow such shifts, as illustrated in 

Fig. 4, where we compare two cadmium complex calculations 

see Sec. VII for details. The “no-shift” optimization 

proceeds carefully, allowing only small changes in the density 

matrix at each iteration, whereas the “do-shift” optimization 

is more daring, accepting abrupt configuration shifts 

that reduce the total energy. 

In Fig. 4a, we have plotted the error in the energy at 

each iteration of the two optimizations. The first 13 iterations 

are identical; the optimizations are in the global region and 

orb 

the level shift is determined from the requirement a min 

FIG. 4. The TRSCF cadmium complex calculation described in Sec. VII. a 

The convergence without abrupt configuration shift and with abrupt 

configuration shift . b and c contain details of the TRRH step in 

iteration 14; b the minimum overlap a orb 

min for the new occupied orbitals 

with the previous set of occupied orbitals and c the changes in the model 

energy E RH — and the actual energy E RH KS ---. All as a function of the 

level-shift parameter . 

=A orb min =0.98. In iteration 14, the two optimizations differ. To 

understand the reasons for these differences, we have in Fig. 

4b plotted a orb min and in Fig. 4c E RH full line and 

E RH KS dash line as functions of . For =0.25, there is an 

abrupt shift in a orb min from 0.99 to 0.00, representing a configuration 

shift where the LUMO for 0.25 becomes the 

HOMO for 0.25. From Fig. 4c, we see that this shift 

lowers the Kohn–Sham total energy. Because of the abrupt 

change in a orb min at =0.25, we are unable to identify 

a orb min =0.98. In the no-shift calculation, is chosen larger 

than 0.25, whereas, in the do-shift calculation, the undamped 

Roothaan–Hall step is taken with =0. 

As the DSM energy model assumes small changes in the 

density matrix, the density matrices of all previous iterations 

are discarded in iteration 14 of the do-shift calculation, and a 

rapid convergence to the optimized state is seen from that 

point. In the no-shift calculation, an a orb min profile similar to 

that of iteration 14 is obtained in the next few iterations. In 

these iterations, the lowest Hessian eigenvalue is −0.95 a.u. 

and the optimization proceeds towards a stationary point. 

Finally, in iteration 22, the TRSCF algorithm identifies this 

stationary point as a saddle point, moves out of this region, 

and converges rapidly to the same minimum as the do-shift 

optimization. 

As this example illustrates, it is important to recognize 



and accept a favorable configuration shift. A configuration 

shift may be recognized when an a orb min profile has an 

abrupt change where on the right-hand side a orb min is close to 1 

and on the left-hand side a orb min is close to 0. To maintain the 

high degree of control characteristic of the TRSCF method, 

the energy of the new configuration is checked before the 

shift is accepted, at the cost of an additional Kohn–Sham 

matrix build. As seen from Fig. 4a, this check is well worth 

the effort, saving more than ten iterations, and thus it is made 

an integrated part of our TRSCF implementation. 

VI. THE DIIS METHOD VIEWED 

AS A QUASI-NEWTON METHOD 

Since its introduction by Pulay in 1980, the DIIS method 

has been extensively and successfully used to accelerate the 

convergence of SCF optimizations. We here present a rederivation 

of the DIIS method to demonstrate that, in the iterative 

subspace of density matrices, it is equivalent to a quasi- 

Newton method. From this observation, we conclude that, in 

the local region of the SCF optimization, the DIIS steps can 

be used safely and will lead to fast convergence. The convergence 

of the DIIS algorithm in the global region is also 

discussed and is much more unpredictable. 

We assume that, in the course of the SCF optimization, 

we have determined a set of n+1 AO density matrices 

D 0 ,D 1 ,D 2 ,...,D n and the associated Kohn–Sham or Fock 

matrices FD 0 ,FD 1 ,FD 2 ,...,FD n . Since the electronic 

gradient gD is given by 11 

gD =4SDFD − FDDS, 

79 

we also have available the corresponding gradients 

gD 0 ,gD 1 ,gD 2 ,...,gD n . We now wish to determine a 

corrected density matrix, 

n 

D¯ = D 0 + c i D i0 , D i0 = D i − D 0 , 80 

i=1 

that minimizes the norm of the gradient gD¯ . For this purpose, 

we parameterize the density matrix in terms of an antisymmetric 

matrix X=−X T and the current density matrix 

D 0 as 11 

DX = exp− XSD 0 expSX. 

81 

With each old density matrix D i , we now associate an antisymmetric 

matrix X i such that 

D i = exp− X i SD 0 expSX i = D 0 + D 0 ,X i S + OX 2 i . 

82 

Introducing the averaged antisymmetric matrix, 

n 

X¯ = c i X i , 

i=1 

we obtain 

83 

n 

DX¯ = D 0 + c i D 0 ,X i S + OX¯ 2 , 

i=1 

84 

where we have used the S-commutator expansion of DX¯ 

analogeous to Eq. 82. Our task is hence to determine X¯ in 

Eq. 83 such that DX¯ minimizes the gradient norm 

gDX¯ . In passing, we note that, whereas D¯ is not in 

general idempotent and therefore not a valid density matrix, 

DX¯ is a valid, idempotent density matrix for all choices of 

c i . 

Expanding the gradient in Eq. 79 about the currentdensity 

matrix D 0 , we obtain 

gDX¯ = gD 0 + HD 0 X¯ + OX¯ 2 , 

85 

where HD is the Jacobian matrix. Neglecting the higherorder 

terms, our task is therefore to minimize the norm of the 

gradient, 

n 

gc = gD 0 + c i HD 0 X i , 

86 

i=1 

with respect to the elements of c. For an estimate of 

HD 0 X i , we truncate the expansion, 

gD i = gD 0 + HD 0 X i + OX i 2 , 

and obtain the quasi-Newton condition, 

gD i − gD 0 = HD 0 X i . 

Inserting this condition into Eq. 86, we obtain 

n 

gc = gD 0 + 

i=1 

n 

c i gD i − gD 0 = c i gD i , 

i=0 

87 

88 

89 

where we have introduced the parameter c 0 =1− n 

i=1 c i . The 

minimization of gc=gc may therefore be carried out as 

a least-squares minimization of gc in Eq. 89 subject to the 

constraint 

n 

c i =1. 

90 

i=0 

If we consider gD i as an error vector for the density matrix 

D i , this procedure becomes identical to the DIIS method. 

From Eq. 86 we also see that DIIS may be viewed as a 

minimization of the residual for the Newton equation in the 

subspace of the density matrix differences D i −D 0 , i=1, n, 

where the quasi-Newton condition is used to set up the subspace 

equations. Since the quasi-Newton steps are reliable 

only in the local region of the optimization, we conclude that 

the DIIS method can be used safely only in this region, when 

the electronic Hessian is positive definite. 

The optimal combination of the density matrices is obtained 

in the DIIS method, by carrying out a least-squares 

minimization of the gradient norm subject to the constraint in 

Eq. 90. However, since a small gradient norm in the global 

region does not necessarily imply a low Kohn–Sham energy, 

the DIIS convergence may be unpredictable. Furthermore, 

we may encounter regions where the gradient norms are 



ethylenediamine tetra-acetic acid EDTA. Next, in Sec. 

VII B, we consider the calculations on five different systems. 

All calculations have been carried out with a local version of 

the DALTON program package. 20 Unless otherwise indicated, 

the starting orbitals have been obtained by diagonalization of 

the one-electron Hamiltonian. 

FIG. 5. The convergence of different algorithms in a LDA/6-31G computation 

with core Hamiltonian start guess for the zinc complex depicted in the 

lower left corner. The algorithms being QC-SCF , DIIS , TRSCF 

, and TRSCF-LS . 

similar but the energies different. The DIIS method may then 

diverge, not being able to identify the density matrix of lowest 

energy, as illustrated in Sec. VII. 

VII. APPLICATIONS 

to 

In this section, we give numerical examples to illustrate 

the convergence characteristics of the Kohn–Sham TRSCF 

calculations, comparing with the DIIS and QC-SCF calculations. 

Comparisons are also made with the TRSCF-LS technique, 

where the TRRH-LS and TRDSM-LS line-search 

methods of Secs. III A and IV D are combined to set up an 

expensive but highly robust method, in which the lowest 

Kohn–Sham energy is identified by a line search at each step. 

In Sec. VII A, we discuss the calculations on the zinc complex 

in Fig. 5, where Zn 2+ is complexated with 

ethylenediamine-N, N -disuccinic acid EDDS, an isomer 

 

A. Calculations on the zinc complex 

In Fig. 5, we have plotted the error in the Kohn–Sham 

energy at each iteration of LDA/6-31G calculations on the 

zinc complex. The standard TRSCF method performs 

almost as well as the very smooth but much more expensive 

TRSCF-LS method , giving a somewhat higher energy 

between iterations 13 and 22. By contrast, the DIIS method 

shows no sign of converging; after 100 iterations, the 

Kohn–Sham gradient norm is still about 20. Whereas the 

smooth TRSCF convergence arises because Hessian information 

is used to ensure downhill TRRH and TRDSM steps 

at each iteration, no such information is employed in the 

DIIS method. Finally, the QC-SCF method converges 

but exceedingly slow—even after 90 iterations it has not 

reached the quadratically convergent local region! The difficulties 

experienced with the QC-SCF method illustrate 

clearly that the use of Hessian information by itself is no 

guarantee of fast convergence. 

More details about the TRSCF zinc complex calculation 

are given in Tables I–V and in Figs. 1–3 and 6, partly discussed 

in Secs. III F and IV D. In Table IV, we have listed 

the changes in the Kohn–Sham energy generated separately 

in the TRRH E RH KS and TRDSM E DSM KS steps at each 

SCF iteration, and likewise the norms of the changes in the 

TABLE IV. Convergence details for the TRSCF calculation on the zinc complex in Fig. 5. Energies given in a.u. 

DSM 

Iteration E KS E KS 

RH 

E KS 

2 

D¯ n−D n S DSM 

D n+1 −D¯ n S 

2 

2 −8.366 865 0.000 000 −8.366 865 0.000 000 0.197 607 

3 −68.378 567 −48.255 717 −20.122 850 6.129 310 1.141 536 

4 −137.038 420 −105.996 850 −31.041 569 12.140 844 1.265 250 

5 −70.415 468 −43.136 482 −27.278 985 1.594 214 1.031 844 

6 −41.492 416 −26.390 457 −15.101 958 3.162 679 1.467 802 

7 −25.430 533 −14.755 377 −10.675 155 1.468 925 1.364 944 

8 −14.460 409 −7.711 220 −6.749 189 1.525 848 1.249 827 

9 −8.470 594 −5.289 340 −3.181 254 1.569 113 1.040 337 

10 −2.289 664 −2.684 359 0.394 694 1.614 867 0.817 844 

11 −2.730 543 −1.053 899 −1.676 644 1.002 744 1.060 298 

12 −2.798 537 −1.054 903 −1.743 634 0.867 238 0.632 009 

13 −1.061 335 −0.658 907 −0.402 427 0.729 574 0.410 434 

14 −0.670 565 −0.293 889 −0.376 675 0.324 338 0.351 715 

15 −0.424 253 −0.213 251 −0.211 002 0.201 434 0.203 170 

16 −0.074 945 −0.104 012 0.029 066 0.242 928 0.302 723 

17 −0.090 241 −0.100 694 0.010 452 0.071 203 0.175 917 

18 0.000 195 −0.043 180 0.043 376 0.022 758 0.126 709 

19 −0.044 797 −0.057 441 0.012 644 0.047 885 0.032 787 

20 −0.015 396 −0.016 501 0.001 104 0.011 897 0.002 976 

21 −0.001 118 −0.001 471 0.000 352 0.001 164 0.000 668 

22 −0.000 368 −0.000 428 0.000 059 0.000 308 0.000 111 

23 −0.000 066 −0.000 076 0.000 010 0.000 050 0.000 019 

24 −0.000 011 −0.000 012 0.000 000 0.000 009 0.000 001 

25 −0.000 000 −0.000 000 0.000 000 0.000 000 0.000 000 



TABLE V. The density of each iteration compared to the optimized one. 

Iteration D conv −D n S 

2 

a orb min conv,n 

2 66.952 673 0.0965 

3 65.174 713 0.0955 

4 56.502 973 0.0927 

5 51.210 143 0.1017 

6 48.482 773 0.1411 

7 42.682 641 0.1394 

8 35.617 332 0.1992 

9 26.551 913 0.3183 

10 18.298 431 0.4094 

11 14.152 342 0.4983 

12 9.767 169 0.6927 

13 6.184 621 0.6859 

14 3.844 299 0.9187 

15 2.240 436 0.9194 

16 1.018 810 0.9771 

17 0.200 374 0.9952 

18 0.064 181 0.9984 

19 0.043 906 0.9967 

20 0.011 531 0.9996 

21 0.001 092 0.9999 

22 0.000 309 0.9999 

23 0.000 053 0.9999 

24 0.000 009 0.9999 

25 0.000 000 0.9999 

2 

density matrix in the TRRH D n+1 −D¯ n S RH 

and TRDSM 

2 

D¯ n−D n S DSM 

steps. Remarkably, the TRDSM step consistently 

reduces the energy more than the TRRH step. Indeed, 

after iteration 15, each TRRH step increases rather 

than decreases the energy. Apparently, in the local region, the 

role of the TRRH step is reduced to that of improving that 

variational space of the subsequent TRDSM step. From the 

table, we also see that the largest changes in the density 

matrix are generated by the TRDSM step rather than by the 

TRRH step. 

For the TRRH and TRDSM steps, we have at each iteration 

determined the overlap a orb i in Eq. 40 of each generated 

occupied orbital new i with the previous orbitals old j . In Fig. 

6, the number of orbitals at each iteration with a orb i 0.98 

i.e., with large rotations is illustrated in a bar chart. As we 

require a orb i 0.98 in the Roothaan–Hall steps, the TRRH 

FIG. 6. The number of occupied orbitals in the TRRH and TRDSM steps 

with an overlap less than 0.98 to the previous set of occupied orbitals for 

each step in the SCF iteration. 

orb 

bars simply represent the number of orbitals with a i 

0.98. In the TRDSM step, however, no such restrictions 

are imposed and a large number of orbitals with a orb i 0.98 

are observed. Indeed, in the first few DSM steps, overlaps as 

small as 0.76 occur, leading to far larger changes than those 

accepted in the Roothaan–Hall step, emphasizing the important 

role played by the TRDSM step in achieving orbital 

reorganizations in a controlled manner. 

In Table V, we have listed the norm of the difference 

between the current-density matrix D n at each iteration and 

the final converged density matrix D conv ; also, we have listed 

a orb min conv,n, which is the smallest overlap in the sense of 

Eq. 41 of the current occupied orbitals, with the converged 

ones. Clearly, very large changes occur in the density matrix 

and the orbitals in the course of the optimization, in particular, 

during the first 17 iterations; in the remaining iterations, 

only small adjustments are made. In spite of the large overall 

changes made to the orbitals, they have been accomplished 

in a controlled and reliable manner. 

In Fig. 7, we have plotted the errors for the same LDA/ 

6-31G optimization as in Fig. 5, but with the starting orbitals 

obtained from a Hückel calculation rather than from the diagonalization 

of the one-electron Hamiltonian. Convergence 

is now faster, with the TRSCF-LS and TRSCF methods 

behaving in the same smooth manner as before. More 

importantly, with this improved starting guess, the DIIS 

method converges in almost the same number of iterations 

as the TRSCF method, although less smoothly. 

Finally, in Fig. 8, we have the same plot as in Fig. 7, but 

in the STO-3G rather than 6-31G basis still with a Hückel 

guess. Somewhat surprisingly, convergence is more difficult 

in this smaller basis. Indeed, after 100 iterations, the DIIS 

method has not yet converged, with a Kohn–Sham gradient 

norm as large as 10. The standard TRSCF method 

still converges, but now in a less smooth manner than the 

TRSCF-LS method. As mentioned in Sec. III E 2, when 

the HOMO-LUMO gap is particularly small, it may sometimes 

be necessary to enforce a minimum TRRH level shift 

to achieve convergence. Indeed, in the TRSCF optimization 

in Fig. 8, we require 0.1 throughout the calculation. 

B. Calculations on a variety of molecules 

In Fig. 9, we have plotted the errors in the energy at each 

SCF iteration, for a variety of molecules at the LDA level of 

theory: the zinc complex from Fig. 5 in the 6-31G basis 

set; the rhodium complex from Ref. 6 in the Ahlrichs- 

VDZ basis 21 with STO-3G on the rhodium atom; a cadmium 

complexed with an imidazole ring in the STO-3G basis; 

the CH 3 CHO molecule in the cc-pVTZ basis, 22 and the 

H 2 O molecule in the cc-pVTZ basis. 

For the TRSCF-LS method, convergence is smooth for 

all systems, as expected. Likewise, in the TRSCF calculations 

with no restrictions enforced on the TRRH level-shift 

parameter, convergence is still good although not as smooth 

as in the TRSCF-LS calculations. The behavior of the DIIS 

method is somewhat more erratic, in particular, in the global 

region; in the local region, it converges as well as the TRSCF 



FIG. 7. The convergence of different algorithms in a LDA/6-31G computation 

with Hückel start guess for the zinc complex in Fig. 5. The algorithms 

being DIIS , TRSCF , and TRSCF-LS . 

method. These observations are in agreement with our discussion 

in Sec. VI. The DIIS zinc complex calculation does 

not converge as discussed above. 

In Fig. 9, we have also included the results from the 

DIIS-TRRH optimizations. These calculations differ from 

the DIIS calculations in that we have used a level-shift parameter 

in the Roothaan–Hall diagonalization step; alternatively, 

DIIS-TRRH may be viewed as different from TRSCF 

in that we have replaced the TRDSM steps by DIIS steps. 

Somewhat surprisingly, only the water calculation converges 

with the DIIS-TRRH method. To understand this behavior, 

we note that, in the global region, the TRRH method typically 

produces gradients that do not change much, even 

though large changes may occur in the energy. In such cases, 

the DIIS method may stall, not being able to identify a good 

combination of density matrices. 

This behavior is illustrated in Table VI, where we have 

listed the gradient norm and Kohn–Sham energy of the first 

six iterations of the cadmium complex calculation in Fig. 9. 

The TRSCF and DIIS-TRRH gradients stay almost the same 

during these iterations, stalling the DIIS-TRRH optimization 

but not the TRSCF optimization, whose energy decreases in 

each iteration. In the pure DIIS optimization, by contrast, the 

gradient changes significantly from iteration to iteration; at 

the same time, the energy decreases at each iteration except 

the fifth, where also the gradient norm increases. Eventually, 

DIIS enters the local region with its rapid rate of convergence 

although we note, in the DIIS panel in Fig. 9, a sudden, 

large increase in the energy for the cadmium complex 

FIG. 9. The convergence in LDA calculations for a variety of molecules 

using the TRSCF-LS, TRSCF, DIIS, and DIIS-TRRH approaches, respectively. 

The molecules being a zinc complex , rhodium complex , 

cadmium complex , CH 3 CHO , and H 2 O . 

calculation in iterations 10 and 11. However, these 

changes are accompanied with large increases in the gradient 

norm, allowing DIIS to recover safely. 

VIII. CONCLUSIONS 

FIG. 8. The convergence of different algorithms in a LDA/STO-3G computation 

with Hückel start guess for the zinc complex in Fig. 5. The algorithms 

being DIIS , TRSCF , and TRSCF-LS . 

In this paper, the trust-region SCF TRSCF algorithm 

introduced in Ref. 6 has been further developed to make it 

applicable to the optimization of the Kohn–Sham energy. In 

the TRSCF method, both the Roothaan–Hall step and the 

density-subspace minimization DSM step are replaced by 

optimizations of local energy models of the Hartree–Fock/ 

Kohn–Sham energy E SCF . These local models have the same 

gradient as the true energy E SCF but an approximate Hessian. 

Restricting the steps of the TRSCF algorithm to the trust 

region of these local models, that is, to the region where the 

local models approximate E SCF well, smooth and fast convergence 

may be obtained to the optimized energy. 



TABLE VI. The gradient norm g=4SDF−FDS in the first six iterations of the cadmium complex calculations 

seen in Fig. 9. 


Iteration 

E KS g E KS g E KS g 

1 −5597.0 7.8 −5597.0 7.8 −5597.0 7.8 

2 −5502.3 14.9 −5598.4 7.2 −5598.3 7.1 

3 −5602.1 9.7 −5600.3 8.5 −5603.7 9.3 

4 −5628.5 2.1 −5599.9 7.7 −5611.1 9.1 

5 −5627.4 3.5 −5599.9 7.8 −5616.8 7.7 

6 −5628.8 0.8 −5600.2 8.1 −5622.7 7.5 

conv no conv conv 

In the previous implementation of the TRSCF algorithm, 

the focus was on the optimization of the Hartree–Fock energy. 

As the Kohn–Sham energy is nonquadratic in the density 

matrix, the local DSM energy model has been generalized 

and is now expanded about the current-density matrix 

D 0 in the subspace of the density matrices D i of the previous 

iterations. To satisfy the idempotency condition, the energy 

model function is parametrized in terms of a purified averaged 

density matrix. The local energy function is correct to 

second order in D i −D 0 and can be set up solely in terms of 

the density matrices and Kohn–Sham matrices of the previous 

iterations. In the Hartree–Fock theory, the new local energy 

model is identical to the one previously used in TRSCF 

optimizations. 

The EDIIS function is discussed in the context of the 

proposed model. In the Hartree–Fock theory, the EDIIS function 

is obtained from our proposed energy function by neglecting 

terms that result from the purification of the density 

matrix; the EDIIS function therefore does not reproduce the 

Hartree–Fock gradient at the expansion point. In the DFT, 

the EDIIS function is inappropriate for other reasons as well. 

A rederivation of the original DIIS algorithm is also performed 

to understand when it can safely be applied. In particular, 

it is shown that the DIIS method may be viewed as a 

quasi-Newton method, thus explaining its fast local convergence. 

In the global region, its behavior is less predictable, 

although we note that its gradient-norm minimization mechanism 

usually allows it to recover safely from sudden, large 

increases in the total energy brought on by the Roothaan– 

Hall iterations. 

The TRSCF scheme is tested both in a computationally 

demanding, robust line-search implementation TRSCF-LS, 

and in our standard implementation, where only the Fock/ 

Kohn–Sham matrices of previous iterations are used. Our 

test calculations indicate not only that the TRSCF-LS 

method is a highly stable and robust method, but also that the 

standard TRSCF implementation converges rapidly in most 

cases, with little degradation relative to the TRSCF-LS 

scheme. 

Relative to these schemes, the DIIS method is somewhat 

more erratic since it makes no use of Hessian information 

and therefore cannot predict reliably what directions will reduce 

the total energy. For example, in situations where the 

energy changes in the course of the iterations but the gradient 

does not, the DIIS algorithm is unable to identify the density 

matrix with the lowest energy and may diverge. Nevertheless, 

the DIIS method handles most optimizations amazingly 

well, which is particularly impressive in view of its very 

simplicity; never has so few lines of code done so much 

good for so many calculations. In general, however, it is 

outperformed by the TRSCF method, which introduces Hessian 

information at little extra cost, and is well founded in 

the global as well as local regions of the optimization. 

The current formulation of TRSCF requires a few diagonalizations 

in each TRRH step, and to obtain linear scaling 

these diagonalizations should be avoided. An even more efficient 

algorithm may be obtained if the Roothaan–Hall and 

DSM steps are integrated in such a manner that the information 

from the previous density matrices are directly used in 

the Roothaan–Hall optimization step. Work along these lines 

is in progress. 

ACKNOWLEDGMENTS 

We thank Peter Taylor, Ditte Jørgensen, and Stephan 

Sauer for providing some of the test examples. This work has 

been supported by the Danish Natural Research Council. We 

also acknowledge support from the Danish Center for Scientific 

Computing DCSC. 

1 C. C. J. Roothaan, Rev. Mod. Phys. 23, 691951. 

2 G. G. Hall, Proc. R. Soc. London A205, 541 1951. 

3 P. Pulay, Chem. Phys. Lett. 73, 393 1980; J. Comput. Chem. 3, 556 

1982. 

4 K. N. Kudin, G. E. Scuseria, and E. Cancès, J. Chem. Phys. 116, 8255 

2002. 

5 G. Karlström, Chem. Phys. Lett. 67, 348 1979. 

6 L. Thøgersen, J. Olsen, D. Yeager, P. Jørgensen, P. Sałek, and T. Helgaker, 

J. Chem. Phys. 121, 162004. 

7 R. Fletcher, Practical Methods of Optimization, 2nd ed. Wiley, New 

York, 1987. 

8 V. R. Saunders and I. H. Hillier, Int. J. Quantum Chem. 7, 6991973. 

9 J. B. Francisco, J. M. Martínez, and L. Martínez, J. Chem. Phys. 121, 22 

2004. 

10 W. Koch and M. C. Holthausen, A Chemist’s Guide to Density Functional 

Theory Wiley-VCH, Weinheim, 2000. 

11 T. Helgaker, P. Jørgensen, and J. Olsen, Molecular Electronic Structure 

Theory Wiley & Son, ltd., Chichester, 2000. 



12 R. Seeger and J. A. Pople, J. Chem. Phys. 65, 265 1976. 

13 G. B. Bacskay, Chem. Phys. 61, 385 1981; J. Phys. France 35, 639 

1982. 

14 P. Jørgensen, P. Swanstrøm, and D. Yeager, J. Chem. Phys. 78, 347 

1983. 

15 R. McWeeny, Rev. Mod. Phys. 32, 335 1960. 

16 X. P. Li, W. Nunes, and D. Vanderbilt, Phys. Rev. B 47, 10891 1993. 

17 J. M. Millam and G. E. Scuseria, J. Chem. Phys. 106, 5569 1997. 

18 C. Ochsenfeld and M. Head-Gordon, Chem. Phys. Lett. 270, 3391997. 

19 X. Li, J. M. Millam, G. E. Scuseria, M. J. Frisch, and H. B. Schlegel, J. 

Chem. Phys. 119, 7651 2003. 

20 T. Helgaker, H. J. Jensen, P. Jørgensen et al., DALTON, a molecular electronic 

structure program, Release 2.0, 2004; http://www.kjemi.uio.no/ 

software/dalton 

21 A. Schäfer, H. Horn, and R. Ahlrichs, J. Chem. Phys. 97, 2571 1992. 

22 T. H. Dunning, J. Chem. Phys. 90, 10071989. 


Part 3 

A Coupled Cluster and Full Configuration Interaction Study of CN and CN - , 

L. Thøgersen and J. Olsen, 

Chem. Phys. Lett. 393, 36 (2004)

Chemical Physics Letters 393 (2004) 36–43 

www.elsevier.com/locate/cplett 

A coupled cluster and full configuration interaction 

study of CN and CN 

Lea Thøgersen, Jeppe Olsen * 

Department of Chemistry, Theoretical Chemistry, University of Aarhus, DK-8000 Aarhus, Denmark 

Received 30 April 2004; in final form 27 May 2004 

Abstract 

Full configuration interaction (FCI) and coupled cluster (CC) calculations are carried out for the CN radical and CN using the 

cc-pVDZ and an augmented cc-pVDZ basis set. In addition, CC calculations including up to quadruple excitations are carried out 

using the cc-pVTZ basis. At the FCI level, the equilibrium distance is 1.1969 A, the harmonic frequency is 2020.1 cm 1 , the 

electronic contribution to the atomization energy is 667 kJ/mol and the vertical electron affinity is 0.12962 E h . The contributions 

from quadruple and quintuple excitations to the harmonic frequency are found to be 20 and 5 cm 1 , respectively. The quadruple 

excitations give a contribution of 4 kJ/mol to the atomization energy and 0.00013 E h to the vertical electron affinity. None of the 

calculations indicate that the convergence of the CC hierarchy is slower for open-shell than for closed-shell systems. 

Ó 2004 Elsevier B.V. All rights reserved. 

1. Introduction 

* Corresponding author. Fax: +45-861-961-99. 

E-mail address: jeppe@chem.au.dk (J. Olsen). 

The last decade has witnessed significant improvements 

in the reliability of ab initio quantum chemical 

predictions of spectroscopical and thermochemical data. 

For closed shell molecules, equilibrium geometries [1], 

harmonic frequencies [2] and reaction enthalpies [3,4] 

may often be calculated with an accuracy that is equal to 

or better than the experimental accuracy. Of central 

importance for this development has been the developments 

of hierarchies of basis sets [5], and CC methods 

[6–8]. The coupled cluster (CC) method mostly used for 

accurate calculations is the CCSD(T) method [9] which 

augments the CC method including single and double 

excitations (CCSD) [10] with a perturbative estimate of 

triples contributions. For closed shell molecules, the 

CCSD(T) method often exaggerates the contributions 

from triple excitations [11]. As the signs of the triple and 

quadruple corrections usually are identical, CCSD(T) 

often gives results that are better than the CC method 

including all single, double, and triple excitations 

(CCSDT). The CCSD(T) method therefore often provides 

results in surprisingly good agreement with the 

much more expensive CC method including up to quadruple 

excitations (CCSDTQ) [12]. Using triple-f basis 

sets, the CCSD(T) method is especially accurate for 

properties like internuclear distances and frequencies, as 

the remaining basis-set errors and correlation errors 

here usually are of opposite signs [1]. 

For open-shell molecules, CC methods with and 

without spin-adaptation have been developed [7,13], and 

the accuracy of CC calculations often matches the accuracy 

obtained for closed shell molecules. In a study of 

the atomization energies of 11 small molecules [2], Feller 

and Sordo did not observe any systematic difference 

between the accuracies obtained for closed- and openshell 

molecules when the CCSDT method is used. The 

performance of methods including perturbative estimates 

of triple excitations as the CCSD(T) method is 

less convincing for open-shell molecules. In a systematic 

study of the performance of the CCSD(T) method for 

the calculation of spectroscopical constants for 33 small 

radicals [14], it was observed that the CCSD(T) method 

did not provide constants that were significant more 

accurate than those obtained with the CCSD method. 

Several workers have suggested other methods combining 

CCSD with the perturbative treatment of triple 

0009-2614/$ - see front matter Ó 2004 Elsevier B.V. All rights reserved. 

doi:10.1016/j.cplett.2004.06.001

L. Thøgersen, J. Olsen / Chemical Physics Letters 393 (2004) 36–43 37 

excitations, but these alternative corrections do not 

systematically perform better than the CCSD(T) method 

[15]. 

The Schr€odinger equation within the Born–Oppenheimer 

approximation may be solved in a given oneelectron 

basis set using full configuration interaction 

(FCI) calculations. In an FCI calculation, the wave 

function includes all Slater determinants with correct 

spin, symmetry and number of electrons. For a given 

basis-set, FCI calculations eliminate the error due to 

truncation of the many-electron basis, and provide 

therefore important benchmarks for approximate orbital-based 

methods. As the number of determinants in 

the FCI expansion increase exponentially with the 

number of basis functions and electrons, FCI calculations 

may only be carried out for small molecules using 

basis sets of double- or triple-f quality. For small closed 

shell molecules, a number of FCI calculations have been 

published [16,17], and these have given additional insight 

into the accuracy of standard correlation methods. 

For open-shell molecules, the number of FCI calculations 

is more limited. Except for a recent FCI investigation 

of the geometry of the CCH radical [18], no FCI 

calculations have been published for open-shell molecules 

with eight or more valence electrons using a correlation-consistent 

basis-set [5]. The present study fills 

this gab by providing an FCI benchmark for the openshell 

molecule CN using the cc-pVDZ basis [5]. This 

molecule is sufficiently small to allow FCI calculations 

at numerous geometries, allowing the determination of 

the FCI results for the equilibrium bond length, harmonic 

frequency, and dissociation energy, as well as the 

complete potential curve. We will furthermore study the 

convergence of the CC energy as a function of the excitation-level 

to see if an open-shell molecule exhibits the 

same convergence pattern as previously determined for 

closed-shell molecules [19–23]. The vertical electron affinity 

will also be examined using CC and FCI calculations. 

As the cc-pVDZ basis does not provide accurate 

geometries or energetics [8], we will obtain the equilibrium 

geometry, harmonic frequency and dissociation 

energy using the cc-pVTZ basis set [5] and CC calculations 

including up to quadruple excitations. We hope 

that the data obtained here will assist in the analysis of 

the accuracy of various open-shell perturbation and CC 

methods, and especially the methods supplementing 

CCSD with perturbative estimates of triple excitations. 

2. Computational methods 

The FCI and CC calculations were carried out using 

the LUCIA 

program [24]. The algorithms for performing 

configuration interaction calculations are based on extensive 

modifications of the algorithms originally published 

in [25]. The CC code allows arbitrary excitation 

levels out from a single closed shell or high-spin open 

shell determinant. In contrast to the initial general CC 

codes [19], the present codes [26] exhibit the same scaling 

as the standard spin–orbital codes using explicitly coded 

contractions. Another set of general CC codes with the 

right scaling has been developed by Kallay and coworkers 

[20,21], and a less efficient general CC code has 

been developed by Hirata and Bartlett [22]. 

All calculations kept the lowest two sigma-orbitals, 

corresponding to 1s(C) and 1s(N), doubly occupied. The 

open-shell configuration interaction and CC calculations 

used orbitals from restricted Hartree–Fock calculations. 

No spin-adaptation was done in the open-shell 

CC calculations. The integrals and HF-orbitals were 

obtained using the DALTON 

program [27]. 

In the following, the different spaces of determinants 

or excitations are denoted SD, SDT, SDTQ, SDTQ5, 

SDTQ56, SDTQ567 for the spaces including up to 

2,3,4,5,6,7 excitations from the occupied spin–orbitals. 

For open-shell molecules, an alternative way of classifying 

excitations is to consider changes in orbital-occupations 

instead of spin–orbital occupations [28]. All CI 

calculations in the following are based on changes of 

orbital-occupations, whereas we will discuss CC calculations 

based on both divisions of excitations. Excitation 

spaces based on changes of spin–orbital occupations will 

be denoted (spin–orb), whereas the spaces based on 

changes of orbital occupations will be denoted (orb). 

Thus, the CCSD(spin–orb) excitation space contains all 

single and double spin–orbital excitations. 

Using the cc-pVDZ basis FCI, CI and CC calculations 

were carried out. To examine the contributions 

from quadruple excitations in a larger basis, CCSD, 

CCSDT, and CCSDTQ calculations were performed 

with the cc-pVTZ basis. For calculations of the electron 

affinity, the aug-cc-pVDZ [29] basis set without diffuse 

d-functions was used for CN and CN . The latter basis 

is in the following called the aug 0 -cc-pVDZ basis. 

3. Results 

3.1. Convergence of CC and CI at the experimental 

equilibrium geometry 

At the experimental equilibrium distance (1.1718 A) 

[30], the FCI wave function and energy was obtained 

with an energy convergence threshold of 10 9 E h . The 

FCI energy was obtained as )92.493262415 E h . At the 

same internuclear distance, single reference CI and CC 

energies were obtained with excitation levels from 2 to 7. 

In Table 1, we give the deviations of the CI, CC(orb) 

and CC(spin–orb) energies from the FCI energy. Fig. 1 

is a single-logarithmic plot of these deviations. 

The coupled-cluster energies using orbital-occupations 

to define the excitation level are slightly below the

38 L. Thøgersen, J. Olsen / Chemical Physics Letters 393 (2004) 36–43 

Table 1 

Deviations of single reference CI- and CC-energies (E h ) from the FCI energy for CN 

Largest exc. level E CI E FCI E CC ðorbÞ E FCI E CC ðspin–orbÞ E FCI 

2 0.038240 0.015534 0.016517 

3 0.022604 0.001563 0.001637 

4 0.002391 0.000207 0.000230 

5 0.000583 0.000019 0.000021 

6 0.000031 0.000001 0.000002 

7 0.000002 – – 

0.1 

0.01 

Coupled Cluster(spin-orb) 

Coupled cluster(orb) 

Configuration Interaction 

Deviation from FCI energy 

0.001 

0.0001 

1e-05 

1e-06 

2 3 4 5 6 

Excitation level 

Fig. 1. The deviations (E h ) of CI and CC energies from the FCI energy as a function of excitation level for CN using the cc-pVDZ basis set. 

energies using the smaller spaces based on spin–orbital 

occupations. However, the differences between the two 

choices are not significant compared to the deviations 

from the FCI energy. Up to CCSDTQ5, the differences 

between the two forms constitute at most 10% of the 

deviation from the FCI energy. For the CCSDTQ56 

expansions, the large difference between the two deviations 

in Table 1 is caused by roundoff errors. Including 

an additional digit, the CCSDTQ56 deviations are 

0.0000015 and 0.0000013 E h for the spin–orbital and 

orbital based divisions, respectively. 

The CI-curves exhibit the behavior predicted by 

perturbation theory [31]: the even-order excitations give 

significantly larger reductions in the deviations than the 

odd-order excitations. For CC expansions, perturbation 

theory also predicts that adding even order excitations 

give larger reductions in the deviations than adding odd 

order excitations [8,31]. This is not observed in Fig. 1, as 

the deviations of the CC energies nearly form straight 

lines. Comparing the convergence of the CI and CC 

hierarchies, it is observed that the CCSDT deviation is 

slightly smaller than the CISDTQ error, and that the CC 

energy obtained using up to n-fold excitations is as accurate 

as the CI energy using up to n þ 1 fold excitations, 

but less accurate than the CI energy using up to 

n þ 2 fold excitations. To obtain an accuracy of 1 mE h 

or less, one must include up to quadruple excitations for 

the CC expansion, and up to quintuple excitations for 

the CI expansion. 

The convergence patterns for CI and CC discussed 

above are very similar to the convergence patterns previously 

reported for N 2 [23]. The similarity between the 

convergences of CN and N 2 is more than qualitative. If 

one combines the deviation curve for N 2 [23] with the 

present deviation curve for CN in a single figure, the two 

deviation curves are virtually identical. The deviations 

of the CCSDT energies are thus 0.00156 E h and 0.00163 

E h for CN and N 2 , respectively, and for a given excitation 

level the deviations for CN and N 2 differ by at 

most 10%.


From the above comparisons, it may be concluded, 

that the open-shell nature of CN does not lead to slower 

convergence of the CC hierarchy than previously observed 

for N 2 . However, it should be noted, that the 

convergence of the CC hierarchy for N 2 is rather slow 

compared to the convergence observed for e.g., H 2 O [19] 

and F 2 . 

3.2. The potential curve for CN 

FCI calculations were carried out at a number of 

internuclear distances. To obtain accurate spectroscopic 

constants, the CC energies were converged to 10 9 E h 

Table 2 

FCI energies (E h ) as a function of internuclear distance ( A) for CN 

using the cc-pVDZ basis 

R E R E 

0.9 )92.169732 1.2118 )92.494065103 

1.0 )92.384032 1.2169 )92.493765608 

1.0918 )92.469313943 1.2369 )92.491837833 

1.1318 )92.485677652 1.2518 )92.489691918 

1.1518 )92.490432414 1.30 )92.479361 

1.1569 )92.491327096 1.40 )92.447147 

1.1718 )92.493262415 1.50 )92.408657 

1.1769 )92.493704946 1.60 )92.370048 

1.1869 )92.494267979 1.7577 )92.316388 

1.1918 )92.494402963 2.05065 )92.255688 

1.1919 )92.494404785 2.3436 )92.241450 

1.1969 )92.494449358 2.9295 )92.240346 

1.2019 )92.494404774 3.5154 )92.239697 

1.2069 )92.494274026 

for internuclear distances close to the experimental value. 

For the remaining geometries the energy was converged 

to 10 6 E h . The obtained FCI energies are listed 

in Table 2. The graph of the FCI potential curve is given 

in Fig. 2. 

To associate the various internuclear distances with a 

degree of bond-breaking it is useful to examine the coefficient 

of the Hartree–Fock determinant in the FCI 

wave-function. Around the equilibrium geometry, the 

weight of the HF-determinant is about 0.92. Increasing 

the internuclear distance leads to a steady lowering of 

this weight and at 1.3 and 1.8 A, the weights are 0.79 

and 0.57, respectively. From 1.8 to 2.0 A the weight 

drops sharply so the weight at 2.0 A is 0.25 and at 2.5 A 

less than 0.04. We may therefore say that the bond is 

half broken at 1.8 A and broken at 2.5 A. 

In addition to FCI calculations, CCSD(orb), 

CCSDT(orb) and CCSDTQ(orb) calculations were 

performed at the various internuclear distances up to 1.8 

A. Although it is possible to converge the CC equations 

for larger distances, we find this of less interest, due to 

the breakdown of the single-reference approximation. In 

Fig. 3, we plot the deviations of the CCSDT and 

CCSDTQ energies from the FCI energy, and in Table 3, 

we list the non-parallelity error (NPE), i.e., the difference 

between the largest and smallest deviation from the 

FCI energy. 

At the equilibrium distance, both deviation curves in 

Fig. 3 have a positive curvature. For internuclear distances 

larger than the equilibrium distance, both the 

CCSDT and CCSDTQ deviation curves are nearly 

-92.15 

-92.20 

-92.25 

FCI energy 

-92.30 

-92.35 

-92.40 

-92.45 

-92.50 

0.5 1 1.5 2 2.5 3 3.5 4 

Internuclear distance 

Fig. 2. The FCI potential curve for CN using the cc-pVDZ basis. The energies are in Hartrees and the inter-nuclear distances are in A.


0.007 

0.006 

CCSDT 

CCSDTQ 

Deviation from FCI energy 

0.005 

0.004 

0.003 

0.002 

0.001 

0 

0.9 1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 

Internuclear distance 

Fig. 3. The difference between the CCSDT and CCSDTQ energies and the FCI energy for CN using the cc-pVDZ basis. The energies are in Hartrees 

and the inter-nuclear distances are in A. 

Table 3 

Non-parallelity error (NPE) (E h ) for CCSD, CCSDT, and CCSDTQ 

Method 

NPE 

CCSD 0.042326 

CCSDT 0.006355 

CCSDTQ 0.001742 

linear functions of the internuclear distance. Actually, 

the slope of the CCSDT deviation is smaller for larger 

internuclear distances than for the equilibrium distance. 

The analogous CCSDT- and CCSDTQ-curves for the 

nitrogen molecule exhibit maxima for an internuclear 

distance around 1.5 A, (3 au) [23]. 

3.3. Spectroscopical constants for CN 

Equilibrium geometries and harmonic frequencies 

were obtained for the CCSD, CCSDT, CCSDTQ and 

FCI methods using quartic interpolation of the energies. 

The harmonic frequency for a given method was evaluated 

at the equilibrium geometry of this method. In 

Table 4 we list the obtained equilibrium distances and 

frequencies. In addition, the table contains the CCSD, 

CCSDT and CCSDTQ results for the cc-pVTZ basis. 

We will first discuss the results obtained using the ccpVDZ 

basis. The CC calculations using orbital-based 

excitation spaces are slightly more accurate than those 

using spin–orbital-based excitation spaces, but the differences 

are small compared to the size of the deviations. 

We will therefore, discuss only the spin–orbital based 

Table 4 

Equilibrium distance ( A) and harmonic frequency (cm 1 ) for CN 

CCSD(orb) cc-pVDZ 1.1860 2111 

CCSDT(orb) cc-pVDZ 1.1946 2043 

CCSDTQ(orb) cc-pVDZ 1.1964 2025 

CCSD(spin–orb) cc-pVDZ 1.1855 2114 

CCSDT(spin–orb) cc-pVDZ 1.1944 2046 

CCSDTQ(spin–orb) cc-pVDZ 1.1964 2026 

FCI cc-pVDZ 1.1969 2020.1 

CCSD(spin–orb) cc-pVTZ 1.1688 2136 

CCSDT(spin–orb) cc-pVTZ 1.1783 2067 

CCSDTQ(spin–orb) cc-pVTZ 1.1804 2045 

Expt. 1.1718 2069 

excitation spaces. Since the deviation curves for the CC 

energies are increasing functions, the CC equilibrium 

distances are necessarily shorter than the FCI equilibrium 

distance. The causes of the errors of the harmonic 

frequencies will be discussed in detail below. At the 

CCSD level, the distance is 0.01 A shorter than the FCI 

value and the harmonic frequency is about 90 cm 1 

larger than the FCI value, stressing the inaccuracy of 

this method for predicting equilibrium properties. The 

errors are significantly reduced by the CCSDT method 

with errors of 0.0025 A and 26 cm 1 for the equilibrium 

distance and frequency, respectively. The errors are 

further reduced by about a factor of five by using the 

CCSDTQ instead of the CCSDT method. At the 

CCSDTQ level, the equilibrium geometry is only 0.0005 

R eq 

x e


A smaller than the FCI value, but the frequency is 6 

cm 1 too large. The deviations of the various CC 

methods obtained here for CN are very similar to the 

previously obtained deviations for N 2 . Thus, it has 

previously been reported that the contribution from 

connected quadruple excitations to the harmonic frequency 

for this molecule [23,32] is 20 cm 1 . 

It is currently not feasible to obtain FCI energies for 

CN in the cc-pVTZ basis with an accuracy that is 

sufficient to obtain the frequency with an accuracy of 

1cm 1 or less. One can instead estimate the convergence 

by examining the changes in the constants through the 

CC hierarchy. It is seen from Table 4 that the changes 

between the CCSDT and CCSDTQ results are very 

similar in the cc-pVDZ and cc-pVTZ basis sets. In both 

basis sets, the quadruple excitations increase the distance 

by 0.0020 A and reduce the harmonic frequency 

by about 20 cm 1 . This suggests that it may be feasible 

to obtain the quadruple corrections to these constants in 

rather small basis sets. It should be noted that although 

the quadruple corrections to the properties are rather 

constant, the quadruples corrections to the raw energies 

are very different in the two basis sets. 

The errors of the harmonic frequencies arise from 

two sources. First of all, the positive curvatures of the 

CC deviation curves around the equilibrium geometries 

lead to CC frequencies that are larger than the FCI 

frequency. Furthermore, as the third derivative of the 

energies with respect to the distance in general is large 

and negative, the somewhat shorter internuclear distances 

obtained with the CC methods than with FCI 

lead also to frequencies that are too large. These two 

sources of errors may be analyzed in the cc-pVDZ basis 

by evaluating the CC frequencies at the FCI equilibrium 

geometry. For the orbital based methods one then obtains 

the frequencies 2035, 2027 and 2022 cm 1 for the 

CCSD, CCSDT and CCSDTQ methods. Whereas, the 

CCSDT frequency evaluated at the optimized CCSDT 

distance deviates from the FCI frequency by 23 cm 1 , 

the CCSDT frequency evaluated at the FCI geometry 

thus deviates by only 7 cm 1 . Although the errors connected 

with the positive curvatures of the deviation 

curves are not vanishing, the major errors of the frequencies 

seem to arise from the errors of the equilibrium 

distances. 

The experimental values for the equilibrium distance 

and the harmonic frequency are 1.1718 A and 2069 

cm 1 , respectively, [30]. Comparing the results obtained 

using the cc-pVTZ basis to the experimental values, it is 

observed that the CCSDT results are in better agreement 

with experiment than the CCSDTQ results. A 

better estimate of the importance of the quadruples 

corrections may be obtained using CCSDT results for 

large basis sets. Feller and Sordo [2] have calculated the 

CCSDT spectroscopic constants for CN using the augcc-pVQZ 

basis and obtained the equilibrium distance 

1.1739 A and the harmonic frequency of 2082 cm 1 . 

Adding our quadruples correction to these CCSDT results 

gives an equilibrium geometry of 1.1759 A and a 

harmonic frequency of 2060 cm 1 . To obtain spectroscopic 

constants that are significantly more accurate 

than the CCSDT results, other corrections, most important 

core-correlation contributions, must be included 

together with the quadruple excitations. 

3.4. Atomization energy 

It has previously been reported that quadruple and 

even quintuple excitations may be important to obtain 

atomization energies with high accuracy [3,4,12] In 

Table 5, we list the atomization energies using the 

CCSD, CCSDT, CCSDTQ, and FCI approaches with 

the cc-pVDZ basis and the CCSD, CCSDT, and 

CCSDTQ approaches with the cc-pVTZ basis set. All 

molecular calculations were carried out at the experimental 

equilibrium distance. 

It is again noticed that there are no significant difference 

between the results obtained using the CC(orb) 

and CC(spin–orb) approaches. The two approaches 

differ by only 0.1 kJ/mol at the CCSDT and CCSDTQ 

levels. 

The quadruple excitations change the atomization 

energy by 4 kJ/mol with both the cc-pVDZ and the ccpVTZ 

basis sets. These results are in agreement with 

previous calculations of the contributions from connected 

quadruple excitations [4]. From the difference 

between CCSDTQ and the FCI atomization energy, it is 

seen that the quintuple excitations contribute 0.5 kJ/mol 

to the atomization energy. The above contribution from 

quadruple and quintuple excitations are very similar to 

the results previously reported for N 2 [3]. The contribution 

from higher excitations to the atomization energy 

of CN has previously been studied by Feller and Sordo 

[2]. They obtained a significantly smaller contribution 

from quadruple excitations, 0.3 kcal/mol or 1.2 kJ/mol. 

There are several experimental measurements of the 

atomization energies, and Feller and Sordo [2] quotes 

Table 5 

The electronic contribution to the dissociation energy (kJ/mol) for CN 

CCSD(orb) cc-pVDZ 631.6 

CCSDT(orb) cc-pVDZ 663.0 

CCSDTQ(orb) cc-pVDZ 666.5 

CCSD(spin–orb) cc-pVDZ 629.2 

CCSDT(spin–orb) cc-pVDZ 662.9 

CCSDTQ(spin–orb) cc-pVDZ 666.4 

FCI cc-pVDZ 667.0 

CCSD(spin–orb) cc-pVTZ 674.2 

CCSDT(spin–orb) cc-pVTZ 714.4 

CCSDTQ(spin–orb) cc-pVTZ 718.5 

D e


values in the range 745–762 kJ/mol for the experimental 

electronic contribution. Adding our estimate of quadruples 

correction to the estimated CCSDT limit of 748 

kJ/mol result of Feller and Sordo gives a value of 752 kJ/ 

mol for the electronic atomization energy for CN. 

3.5. The vertical electron affinity 

An FCI calculation for the CN anion using the aug 0 - 

cc-pVDZ basis was carried out at the experimental 

equilibrium geometry of the radical. The FCI calculation 

contains about 20 billion Slater determinants and 

sparsity of the CI-vectors was only used to reduce discstorage, 

not computation time. This FCI calculation 

represents one of the largest FCI calculations we hitherto 

have carried out. The FCI energy for the anion was 

obtained as )92.627391(2) E h . Combining this energy 

with the FCI energy of )92.497766 E h for the radical in 

the same basis set leads to an FCI value of 0.12962 E h 

for the vertical electron affinity. CC expansions using 

spin–orbital occupations for restrictions of excitations 

were also carried out for the radical and the anion in the 

aug 0 -cc-pVDZ basis and the resulting electron affinities 

are given in Table 6. 

As the differences between the CC calculations using 

orbital and spin–orbital restrictions already have been 

shown to be small, no orbital-restricted calculations 

were carried out. Already at the CCSD level, the calculated 

electron affinity differs from the FCI affinity by 

less than 1 mE h , and at the CCSDT level the calculated 

electron affinity differs from the FCI result by less than 

0.1 mE h . The deviations of the CC energies from the 

FCI energies for the radical and the anion are also listed 

in Table 6. It is seen that the high accuracy of the CC 

affinities is caused by cancellation of the errors of the 

radical and anion – the deviation of the affinity is 

roughly an order of magnitude smaller than the deviation 

of the individual energies. It is also interesting to see 

that the electron affinity converges from above – the CC 

affinities are larger than the FCI affinity. As seen from 

the other columns of Table 6, the CC expansion converges 

slightly faster for the anion than for the radical. 

The faster convergence of the anion may seem surprising 

as the anion contains one more electron than the radical 

but is probably caused by CN being slightly more 

multiconfigurational than the anion. The electron affinity 

of CN calculated using CC calculations in large 

basis sets has been the subject of several recent studies 

[33,34]. These studies also found small contributions to 

the electron affinity from triple excitations. 

4. Conclusion 

Full configuration interaction calculations using the 

cc-pVDZ basis and CC calculations using the cc-pVDZ 

and cc-pVTZ basis sets have been carried out for the CN 

radical at various geometries. Single reference configuration 

interaction calculations were also carried out 

using the cc-pVDZ basis at the experimental internuclear 

distance. At the CCSDT level, the energies differ 

from the FCI energy by 1.5 mE h , and at the CCSDTQ 

level, the energies are 0.2 mE h from the FCI energy. The 

CC energies converge toward the FCI energy in an approximately 

linear fashion with a decrease in the deviation 

by about a factor of 10 for each added excitation 

level. This is in contrast to an analysis based on perturbation 

theory, predicting that adding even orders 

give larger decreases in the deviations than adding odd 

orders. The observed convergence for CN in the ccpVDZ 

basis is very similar to the convergence previously 

reported for N 2 , indicating that the open-shell nature of 

CN does not affect the convergence. A comparison of 

the FCI and CC energies at various internuclear distances, 

reveals that the deviations of the CC approaches 

do not occur suddenly for large internuclear distances. 

The deviations are instead nearly linear functions of the 

internuclear distance. 

At the FCI level, the equilibrium geometry and harmonic 

frequency are obtained as 1.1969 A and 2020.1 

cm 1 , respectively. The CCSDT and CCSDTQ frequencies 

are 25 and 5 cm 1 above the FCI value, respectively. 

The quadruple corrections to both the 

equilibrium distance and the harmonic frequency were 

found to be nearly identical in the cc-pVDZ and ccpVTZ 

basis sets. The major errors of the CC frequencies 

come from the errors of the distances where these are 

evaluated. 

For the electronic contribution to the atomization 

energy, a value of 667.0 kJ/mol is obtained at the FCI 

level using the cc-pVDZ basis set. The CCSDT and 

CCSDTQ atomization energies are 4 and 0.5 kJ/mol 

below the FCI atomization energy, respectively. The 

quadruple contributions in the cc-pVDZ and cc-pVTZ 

Table 6 

The vertical electron affinity (E h ) of CN calculated in the aug 0 -cc-pVDZ basis 

EA EA EA FCI E CN EFCI CN 

E CN –EFCI 

CN 

CCSD(spin–orb) 0.13025 0.00063 0.01529 0.01466 

CCSDT(spin–orb) 0.12977 0.00014 0.00154 0.00140 

CCSDTQ(spin–orb) 0.12966 0.00003 0.00020 0.00016 

FCI 0.12962


basis are determined as 3.5 and 4.1 kJ/mol, respectively, 

indicating that a reliable estimate of quadruple contributions 

may be obtained using rather small basis sets. 

The FCI vertical electron affinity is obtained in the 

aug 0 -cc-pVDZ basis as 0.12962 E h . Due to extensive 

cancellations of errors, the FCI affinity is accurately 

calculated at the CCSD and CCSDT levels with a contribution 

from quadruple and higher excitations of 

0.00014 E h . The CC hierarchy approaches the FCI affinity 

from above, as the deviations for the anion are 

slightly smaller than for the radical. 

Acknowledgements 

The work has been supported by the Danish Research 

Council (Grant No. 9901973). The calculations 

were carried out at the centre for supercomputing at 

University of Aarhus (CSCAA). The support from the 

Danish Centre for Supercomputing (DCSC) is gratefully 

acknowledged. 

References 

[1] F. Pawlowski, P. Jørgensen, J. Olsen, F. Hegelund, T. Helgaker, 

J. Gauss, K.L. Bak, J.F. Stanton, J. Chem. Phys. 116 (2002) 6482. 

[2] D. Feller, J.A. Sordo, J. Chem. Phys. 113 (2000) 485. 

[3] T. Helgaker, W. Klopper, A. Halkier, K.L. Bak, P. Jørgensen, 

J. Olsen, in: J. Cioslowski, (Ed.), Understanding Chemical 

Reactivity, vol. 22, Kluwer, Dordrecht, p. 1, 2001. 

[4] A.D. Boese, M. Oren, O. Atasoylu, J.M.L. Martin, M. Kallay, 

J. Gauss, J. Chem. Phys. 120 (2004) 4129. 

[5] T.H. Dunning Jr., J. Chem. Phys. 90 (1989) 1007. 

[6] R.J. Bartlett, in: D.R. Yarkony (Ed.), Modern Electronic Structure 

Theory, Part I, 1047, World Scientific, Singapore, 1995. 

[7] J. Paldus, X. Li, Adv. Chem. Phys. 110 (1999) 1. 

[8] T. Helgaker, P. Jørgensen, J. Olsen, Molecular Electronic-Structure 

Theory, Wiley, 2000. 

[9] K. Raghavachari, G.W. Trucks, J.A. Pople, M. Head-Gordon, 

Chem. Phys. Lett. 157 (1989) 479. 

[10] G.D. Purvis, R.J. Bartlett, J. Chem. Phys. 76 (1982) 1910. 

[11] K.L. Bak, P. Jorgensen, J. Olsen, T. Helgaker, W. Klopper, 

J. Chem. Phys. 112 (2000) 9229. 

[12] T.A. Ruden, T.U. Helgaker, P. Jørgensen, J. Olsen, Chem. Phys. 

Lett. 371 (2003) 62. 

[13] P.G. Szalay, J. Gauss, J. Chem. Phys. 107 (1997) 9028. 

[14] E.F.C. Byrd, D. Sherrill, M. Head-Gordon, J. Phys. Chem. A. 105 

(2001) 9736. 

[15] S.R. Gwaltney, M. Head-Gordon, J. Chem. Phys. 115 (2001) 

2014. 

[16] J. Olsen, P. Jørgensen, H. Koch, A. Balkova, R.J. Bartlett, 

J. Chem. Phys. 104 (1996) 8007. 

[17] H. Larsen, J. Olsen, P. Jørgensen, O. Christiansen, J. Chem. Phys. 

113 (2000) 6677. 

[18] P.G. Szalay, L.S. Thøgersen, J. Olsen, M. Kallay, J. Gauss, 

J. Phys. Chem. A. 105 (2004) 3030. 

[19] J. Olsen, J. Chem. Phys. 113 (2000) 7140. 

[20] M. Kallay, P.R. Surjan, J. Chem. Phys. 113 (2000) 1359. 

[21] M. Kallay, P.R. Surjan, J. Chem. Phys. 115 (2001) 2945. 

[22] S. Hirata, R.J. Bartlett, Chem. Phys. Lett. 321 (2000) 216. 

[23] J.W. Krogh, J. Olsen, Chem. Phys. Lett. 344 (2001) 578. 

[24] LUCIA, a general CI and CC code written by J. Olsen, University 

of Aarhus with contributions from H. Larsen, M. F€ulscher. 

[25] J. Olsen, B.O. Roos, P. Jørgensen, H.J.Aa. Jensen, J. Chem. Phys. 

89 (1988) 2185. 

[26] J. Olsen, unpublished. 

[27] T. Helgaker et al DALTON, an ab initio electronic structure 

program, Release 1.2. see http://www.kjemi.uio.no/software/dalton/dalton.html, 

2001. 

[28] X. Li, J. Paldus, J. Chem. Phys. 101 (1994) 8812. 

[29] R.A. Kendall, T.H. Dunning, R.J. Harrison, J. Chem. Phys. 96 

(1992) 6796. 

[30] K.P. Huber, G. Herzberg, Molecular Spectra and Molecular 

Structure V. Constants of Diatomic Molecules, Van Nostrand 

Reinhold, New York, 1979. 

[31] W. Kutzelnigg, Theoret. Chim. Acta. 80 (1991) 349. 

[32] S.A. Kucharski, J.D. Watts, R.J. Bartlett, Chem. Phys. Lett. 302 

(1999) 295. 

[33] P. Neogrady, M. Medved, I. Cernusak, M. Urban, Mol. Phys. 100 

(2002) 541. 

[34] J.A. Sordo, J. Chem. Phys. 114 (2001) 1974.

Part 3 

Equilibrium Geometry of the Ethynyl (CCH) Radical, 

P. G. Szalay, L. Thøgersen, J. Olsen, M. Kállay and J. Gauss, 

J. Phys. Chem. A 108, 3030 (2004).

3030 J. Phys. Chem. A 2004, 108, 3030-3034 

Equilibrium Geometry of the Ethynyl (CCH) Radical † 

Péter G. Szalay, ‡ Lea S. Thøgersen, § Jeppe Olsen, § Mihály Kállay, | and Ju1rgen Gauss* ,| 

Department of Theoretical Chemistry, EötVös Loránd UniVersity, H-1518 Budapest, P.O. Box 32, Hungary, 

Department of Chemistry, Aarhus UniVersity, DK-8000 Aarhus C, Denmark, and Institut für Physikalische 

Chemie, UniVersität Mainz, D-55099 Mainz, Germany 

ReceiVed: September 27, 2003; In Final Form: January 15, 2004 

The equilibrium geometry of the ethynyl (CCH) radical has been obtained using the results of high-level 

quantum chemical calculations and the available experimental data. In a purely quantum chemical approach, 

the best theoretical estimates (1.208 Å for r CC and 1.061-1.063 Å for r CH ) have been obtained from CCSD- 

(T), CCSDT, MR-AQCC, and full CI calculations with basis sets up to core-polarized pentuple-zeta quality. 

In a mixed theoretical-experimental approach, empirical equilibrium geometrical parameters (1.207 Å for 

r CC and 1.069 Å for r CH ) have been obtained from a least-squares fit to the experimental rotational constants 

of four isotopomers of CCH which have been corrected for vibrational effects using computed vibrationinteraction 

constants. These geometrical parameters lead to a consistent picture with remaining discrepancies 

between theory and experiment of 0.001 Å for the CC and 0.006-0.008 Å for the CH distances, respectively. 

The corresponding r s and r 0 geometries are shown not to be representative for the true equilibrium structure 

of CCH. 

I. Introduction 

Considerable effort has been devoted to the determination 

of the structure of the ethynyl (CCH) radical in its 2 Σ + electronic 

ground state from the experimental 1 and the theoretical side. 2-7 

Presently, experimental values for ground-state rotational constants 

(B 0 ) for four isotopomers of CCH have been determined. 

For CCH, a value of 43 674.528 94(115) MHz has been reported 

by Müller et al. 8 in agreement with earlier measured values. 9-11 

For 13 CCH and C 13 CH, values of 42 077.462(1) and 42 631.382- 

(1) MHz have been obtained by McCarthy et al. 12 in excellent 

agreement with a previous report of Bogey et al. 1,13 Finally, 

for the deuterated form CCD, a value of 36 068.0310(96) MHz 

has been reported by Bogey et al. 14 

On the basis of the available experimental rotational constants, 

Bogey et al. 1 determined a so-called substitution (r s ) structure. 

However, the obtained bond distances are not in satisfactory 

agreement with corresponding calculated equilibrium values; 2-7 

in particular, the CH distance was unusually short (1.046 Å vs 

calculated values of 1.062-1.070 Å). As has been already 

pointed out by Bogey et al., 14 the observed discrepancy is 

probably due to the large amplitude bending motion in CCH 

which is not adequately accounted for in the substitution 

approach 15 that provides the r s structure. Thus, determination 

of the true equilibrium geometry is necessary to get a reliable 

picture of the structure of the ethynyl radical. 

Although the available rotational constants form a solid basis 

for the experimental determination of the r 0 and r s geometry, 

respectively, there is not enough experimental information 

available to determine the equilibrium geometry. In particular, 

the vibrational contributions to the rotational constants, which 

in principle can be determined via the complete set of vibrationrotation 

interaction constants, 16 cannot be obtained from the 

available experimental data. 

† Part of the special issue “Fritz Schaefer Festschrift”. 

‡ Eötvös Loránd University. 

§ Aarhus University. 

| Universität Mainz. 

As has been suggested long ago by Pulay et al. 17 and more 

recently by others, 18,19 quantum chemical calculations can be 

used to provide the lacking information. With computed 

vibration-rotation interaction constants (R r ), it is possible to 

correct experimental rotation constants for vibrational effects 

and to obtain the corresponding equilibrium values 

B e ) B 0 + 1 ∑ R r (1) 

2 r 

with the sum running over all vibrational degrees of freedom. 

The accuracy of such a mixed experimental-theoretical (or 

empirical) procedure for the determination of equilibrium 

geometries has recently been investigated by Pawlowski et al. 20 

for a set of 18 closed-shell molecules. It was concluded in this 

study that errors in the determined empirical bond lengths are 

below 0.001 Å, if the vibrational corrections to the rotational 

constants are calculated at a sufficiently high level such as the 

coupled-cluster singles and doubles (CCSD) level 21 augmented 

by a perturbative treatment of triple excitations (CCSD(T)) 22 

together with the cc-pVQZ set from Dunning’s correlationconsistent 

basis-set hierarchy. 23 Although it is not clear whether 

the same accuracy can be achieved for open-shell systems, this 

combined experimental-theoretical procedure opens an interesting 

possibility for the determination of a reliable equilibrium 

geometry for CCH. 

Alternatively, accurate equilibrium geometries can be obtained 

via a purely theoretical approach. Such an approach can and 

should take advantage of existing hierarchies of methods for 

the treatment of electron correlation and establish basis-set 

convergence by using basis-set sequences such as, for example, 

the correlation-consistent sets developed by Dunning and coworkers. 

23,24 As has been shown by Helgaker et al. 25 and more 

recently also by Bak et al. 26 such a procedure can lead to an 

accuracy of 0.002-0.003 Å in bond distances if CCSD(T) 

calculations together with sufficiently large basis sets are carried 

out. Again, this conclusion is mainly valid for closed-shell 

10.1021/jp036885t CCC: $27.50 © 2004 American Chemical Society 

Published on Web 02/17/2004

Equilibrium Geometry of Ethynyl Radical J. Phys. Chem. A, Vol. 108, No. 15, 2004 3031 

molecules and needs to be checked for open-shell systems, for 

which some further complications are expected. 27,28 Concerning 

the use of multireference methods, a recent study on more than 

60 electronic (closed- and open-shell) states of various diatomic 

molecules found that approaches such as, for example, the 

multireference-averaged quadratic coupled-cluster (MR-AQCC) 

method, 29,30 provide bond distances with an accuracy close to 

0.001 Å. As multireference methods together with a careful 

selection of the reference space offer a well-balanced treatment 

for both open- and closed-shell molecules, such calculations 

should be considered useful complements to single-referencebased 

CC calculations. 

The aim of the present paper is to provide an accurate 

equilibrium geometry for the electronic ground state of the 

ethynyl radical by using both procedures outlined above. The 

accuracy and reliability of the theoretically determined values 

will be carefully investigated via benchmark calculations up to 

the full configuration interaction (FCI) level. Calculated vibrational 

corrections to the rotational constants are used to derive 

equilibrium geometrical parameters from the available experimental 

rotational constants. The accuracy achieved is judged 

by a comparison of the results obtained with the two procedures. 

II. Computational Methods 

Theoretical determinations of the equilibrium geometry of 

CCH have been carried out using various coupled-cluster (CC) 

approaches and, to investigate possible multireference effects, 

the multireference configuration interaction (MR-CI) and multireference-averaged 

quadratic coupled-cluster (MR-AQCC) 

methods. 

Using the CC ansatz, calculations have been performed at 

two levels beyond the coupled-cluster singles and doubles 

(CCSD) 21 approximation, namely, at the CCSD(T) level which 

includes connected triple excitations perturbatively on top of a 

CCSD calculation 22,31 and at the CCSDT level 32-34 which 

includes a full treatment of triple excitations. Both unrestricted 

Hartree-Fock (UHF) and restricted open-shell Hartree-Fock 

(ROHF) reference functions have been used in the CC calculations. 

The MR-AQCC method can be considered an approximately 

“extensive” version of the MR-CISD (multireference configuration 

interaction with single and double excitations) method. 

MR-AQCC and MR-CISD calculations have been carried out 

with different reference (active) spaces. The n e factor in the 

MR-AQCC calculations was chosen to be 9, that is, the core 

electrons are not considered in the size-extensivity correction 

(for details, see ref 30). 

The hierarchy of correlation-consistent basis sets cc-pVXZ 23 

and cc-pCVXZ 24 has been used with X ) D,T,Q, and 5. 

Since the size of CCH renders FCI calculations with small 

basis sets possible, FCI calculations (with a restricted openshell 

HF reference) have been carried out for the geometry of 

CCH employing the cc-pVDZ basis sets. These benchmark 

results are used to calibrate the corresponding CC and MR- 

AQCC results. 

Geometry optimizations have been carried out with analytically 

evaluated gradients in the case of the CCSD(T) 31,35-37 and 

MR-AQCC calculations, 38,39 while in all other cases the 

equilibrium geometry has been determined using purely numerical 

methods. 

The vibration-rotation interaction constants which are needed 

to subtract the vibrational contribution from the experimental 

rotational constants have been obtained at the UHF-CCSD(T) 

and ROHF-CCSD(T) levels using cc-pVTZ, cc-pCVTZ, ccpVQZ, 

and cc-pCVQZ basis sets 23,24 at the geometry optimized 

at the same level. The required quantities (for the relevant 

computational expressions, see, for example, ref 16) have been 

determined using analytic derivative techniques, that is, the 

harmonic force field was determined using either analytic 

gradients (ROHF-CCSD(T)) 31 or analytic second derivatives 

(UHF-CCSD(T)), 40,41 and the cubic force field has been 

subsequently determined via numerical differentiation as described 

in refs 19 and 42. In addition, to check the reliability 

of the obtained force fields, UHF-CCSDT calculations of the 

vibration-rotation interaction constants (within the frozen-core 

approximation) have been carried out employing our recently 

implemented general CC analytic second derivatives. 43 

CC calculations have been performed with the Austin-Mainz 

version of the ACES II program system. 44 The COLUMBUS 

suite of programs 39,45 was used for the MR-AQCC and the 

LUCIA code 46 for the FCI calculations. The CCSDT force field 

calculations have been carried using the generalized CI/CC code 

developed by one of us 47-49 which has been interfaced to the 

ACES II program. 

III. Results and Discussions 

III.A. Choice of Reference Space in the Multireference 

Treatments. The 2 Σ + ground state of CCH has a dominant 

configuration of (1σ) 2 (2σ) 2 (3σ) 2 (4σ) 2 (1π) 4 5σ. An appropriate 

reference space for the description of this electronic state within 

a MR-AQCC treatment has to be selected in a careful manner. 

In the present work, four different reference spaces have been 

tested with respect to their performance for the equilibrium 

geometry of CCH. In particular, the convergence of the 

calculated geometrical parameters with increase of the reference 

space is investigated. 

The smallest reference space is of complete active space 

(CAS) type and denoted by “5 × 5”, indicating that five 

electrons are distributed within five orbitals, namely the openshell 

5σ, the pairs of the π and π* orbitals (1π and 2π). The 

next CAS reference space, denoted by “5 × 6”, considers in 

addition the virtual 6σ orbital, while the largest CAS space (“5 

× 8”) includes three virtual orbitals (6σ, 7σ, and 8σ). Finally, 

to investigate the effect of including further “active” electrons, 

the “5 × 6” space has been augmented by single and double 

excitations involving the 3σ and/or 4σ orbital (in the following 

denoted by “5 × 6 + 2d”). Note that in all considered cases, 

the orbitals have been taken from MCSCF calculations using 

the same space. All single and double excitations out of the 

reference configurations have been included in the correlation 

treatment within the MR-CISD and MR-AQCC calculations. 

As the focus of these initial calculations is just the convergence 

of the results with respect to the chosen reference space, the 

calculations have been performed at the cc-pVDZ and cc-pVTZ 

basis-set levels, respectively. 

TABLE 1: Comparison of Geometrical Parameters (in Å) 

for the 2 Σ + State of CCH with Respect to the Chosen 

Reference Space in the MR-CISD and MR-AQCC 

Treatments a 5 × 5 5 × 6 5 × 8 5 × 6 + 2d 

r CC 

MR-AQCC/cc-pVDZ (fc) 1.2369 1.2376 1.2379 1.2371 

MR-CISD/cc-pVTZ (ae) 1.2093 1.2102 1.2102 1.2123 

MR-AQCC/cc-pVTZ (ae) 1.2121 1.2129 1.2131 1.2126 

r CH 

MR-AQCC/cc-pVDZ (fc) 1.0794 1.0797 1.0807 1.0799 

MR-CISD/cc-pVTZ (ae) 1.0546 1.0548 1.0552 1.0558 

MR-AQCC/cc-pVTZ (ae) 1.0573 1.0575 1.0580 1.0580 

a 

fc ) frozen-core calculations, ae ) all-electron calculations.

3032 J. Phys. Chem. A, Vol. 108, No. 15, 2004 Szalay et al. 

TABLE 2: Comparison of Geometrical Parameters (in Å) for the 2 Σ + State of CCH as Obtained at the CCSD(T), CCSDT, and 

MR-AQCC Levels Using Different Basis Sets a 

UHF- 

CCSD(T) 

ROHF- 

CCSD(T) 

r CC 

UHF- 

CCSDT 

ROHF- 

CCSDT 

MR- 

AQCC 

UHF- 

CCSD(T) 

ROHF- 

CCSD(T) 

r CH 

UHF- 

CCSDT 

ROHF- 

CCSDT 

MR- 

AQCC 

cc-pVDZ (fc) 1.2318 1.2353 1.2352 1.2354 1.2376 1.0797 1.0801 1.0801 1.0802 1.0797 

cc-pVTZ (fc) 1.2120 1.2153 1.2150 1.2151 1.2173 1.0643 1.0646 1.0645 1.0645 1.0638 

cc-pVQZ (fc) 1.2081 1.2113 1.2110 1.2110 1.2133 1.0642 1.0645 1.0644 1.0644 1.0635 

cc-pV5Z (fc) 1.2072 1.2104 1.2098 1.2123 1.0639 1.0642 1.0642 1.0632 

cc-pCVTZ (ae) 1.2087 1.2119 1.2132 1.0642 1.0645 1.0627 

cc-pCVQZ (ae) 1.2052 1.2083 1.2096 1.0630 1.0632 1.0613 

cc-pCV5Z (ae) 1.2043 1.2074 1.0626 1.0629 

a 

fc ) frozen-core calculations, ae ) all-electron calculations. b 5 × 6 reference space. 

The corresponding results are compiled in Table 1. The most 

significant observation is that there is a faster convergence of 

the bond distance with increase of the reference space in the 

MR-AQCC than in the MR-CISD calculations, as the MR- 

AQCC results seem to be much less sensitive to the choice of 

reference space. While the optimized bond distances obtained 

with the two methods are very close when the largest reference 

space (5 × 6 + 2d) is used, there are noticeable differences for 

the smaller reference spaces. For these, the MR-AQCC results 

are much closer to the “5 × 6 + 2d” values than the 

corresponding MR-CISD results. In particular, the inclusion of 

additional electrons in the reference space seems to be less 

important when using the MR-AQCC ansatz. The results in 

Table 1 thus indicate that the use of a “5 × 6” active space 

seems to be a safe and economical choice for large-scale MR- 

AQCC calculations on the 2 Σ + state of CCH. The remaining 

error due to higher excitations is estimated to be about 0.001- 

0.002 Å. 

III.B. Comparison of MR-AQCC and CC Results. In Table 

2 the CC and CH bond lengths obtained at CCSD(T), CCSDT, 

and MR-AQCC levels using different basis sets are compared. 

Focusing first on the coupled-cluster results, it is observed 

that, independent of the chosen basis set, the CC distances 

obtained at the UHF-CCSD(T) level are about 0.003 Å shorter 

than the corresponding CCSDT values, while the corresponding 

ROHF-CCSD(T) bond lengths are essentially identical to both 

the UHF- and ROHF-CCSDT values. This unexpected difference 

between the UHF and ROHF results is investigated in a 

forthcoming article 28 where the failure of UHF-CCSD(T) is 

traced back to a rapid change of the underlying UHF wave 

function at certain bond distances. It will be shown in ref 28 

that this breakdown of the UHF-CCSD(T) approach occurs for 

the ethynyl radical at distances close to the equilibrium 

geometry, and thus, the UHF-CCSD(T) results must be considered 

unreliable. Interestingly, the full CCSDT approach seems 

to be able to recover from these deficiencies of the underlying 

UHF reference functions and provides results which are essentially 

independent of the chosen reference functions. 

For the CC distances the differences between ROHF-CCSD- 

(T) and CCSDT are essentially negligible. When considering 

in addition the MR-AQCC calculations (obtained with the “5 

× 6” reference), we note that the MR-AQCC value for the CC 

distance is even longer than the corresponding CCSDT value 

(by about 0.002 Å). It is essentially impossible at this point to 

decide whether the CCSDT or the MR-AQCC results should 

be considered more accurate. 50 Good agreement of the ROHF- 

CCSD(T) and CCSDT also suggests that ROHF-CCSD(T) can 

be safely used with the larger basis sets where CCSDT is not 

practical. 

For the CH distance, all considered approaches yield essentially 

the same result. 

TABLE 3: Comparison of Geometrical Parameters (in Å) 

for the 2 Σ + State of CCH at the CCSD(T), CCSDT, and 

MR-AQCC Levels with Corresponding FCI Calculations a 

III.C. Comparison with Full Configuration Interaction 

Results. To judge the accuracy of MR-AQCC and CCSDT, 

benchmark calculations at the FCI level using the cc-pVDZ basis 

have been performed. The corresponding results are summarized 

in Table 3. As these results show, the CH bond distances 

obtained by any approach are in excellent agreement (differences 

are less than 0.0005 Å), while for the CC bond distance the 

FCI result falls between the corresponding CCSDT and MR- 

AQCC values. This means that in comparison with FCI the 

CCSDT value is about 0.001 Å too short, while MR-AQCC is 

about 0.001 Å too long. Both methods thus exhibit errors which 

are acceptable for our purpose. 

III.D. Basis-Set Convergence. After discussing the issue of 

electron correlation, we will now turn our interest to the basisset 

effects. Results obtained with both the cc-pVXZ and ccpCVXZ 

sequence of basis sets have been given in Table 2. In 

the cc-pVXZ calculations, when employing the frozen-core 

approximation, smooth convergence of the geometrical parameters 

is observed. When going from cc-pVDZ to cc-pV5Z, both 

bond distances are reduced, the CC distance by about 0.025 Å 

and the CH distance by about 0.016 Å. The differences between 

the cc-pVQZ and cc-pV5Z results are with 0.001 and 0.0003 

Å already rather small so that the cc-pV5Z results can be 

considered as nearly converged. However, the cc-pVXZ calculations 

do not incorporate core-correlation effects. To consider 

these properly, all-electron calculations using the core-valence 

correlating cc-pCVXZ sets have been carried out. As for the 

cc-pVXZ sequence, monotonic convergence is observed for the 

geometrical parameters within this basis-set sequence and the 

differences between quadruple- and pentuple-zeta results are 

again small. From the results, it is further seen that core 

correlation together with the additional consideration of core 

polarization functions reduces the CC bond distance by about 

0.003-0.004 Å, while the CH distance, as one might expect, is 

less affected and shortened by only 0.001-0.002 Å. 

Unfortunately, because of program limitations, it was not 

possible to perform MR-AQCC calculations using the largest 

cc-pCV5Z basis. However, the rather systematic difference 

between the CCSD(T) and MR-AQCC results enables a 

r CC 

r CH 

ROHF-CCSD(T) 1.2353 1.0801 

UHF-CCSD(T) 1.2318 1.0797 

UHF-CCSDT 1.2352 1.0801 

ROHF-CCSDT 1.2354 1.0802 

MR-AQCC b 1.2376 1.0797 

FCI 1.2367 1.0802 

a 

All calculations with cc-pVDZ and core orbitals frozen in the 

electron-correlation treatment. b 5 × 6 reference space.

Equilibrium Geometry of Ethynyl Radical J. Phys. Chem. A, Vol. 108, No. 15, 2004 3033 

TABLE 4: Calculated Vibrational Corrections ∆B ) B e - 

B 0 (in MHz) to the Rotational Constants of Different 

Isotopomers of CCH from UHF- and ROHF-based CC 

Calculations 

CCSD(T) 

cc-pVTZ 

CCSD(T) 

cc-pVQZ 

CCSD(T) 

cc-pCVTZ 

CCSD(T) 

cc-pCVQZ 

CCSDT(fc) 

cc-pVTZ a 

UHF Reference Function 

CCH 368.27 334.70 379.67 355.74 583.64 

13 

CCH 355.08 322.65 366.09 342.76 564.26 

C 13 CH 366.25 333.16 377.31 353.24 580.54 

CCD 168.07 151.12 175.33 167.52 258.47 

ROHF Reference Function 

CCH 531.16 479.58 568.24 495.37 

13 

CCH 513.21 463.24 549.12 478.15 

C 13 CH 528.13 476.98 564.57 491.72 

CCD 237.85 214.59 257.20 230.11 

a 

fc ) frozen-core calculation. 

prediction of the corresponding value based on MR-AQCC/ccpCVQZ 

and ROHF-CCSD(T)/cc-pCV5Z calculations. As the 

use of the pentuple- instead of the quadruple-ζ set decreases 

CC and CH bond distances by about 0.0009 and 0.0004 Å, 

respectively, the estimated MR-AQCC/cc-pCV5Z values are 

about 1.2087 and 1.0609 Å. 

The influence of diffuse functions has been investigated at 

the UHF-CCSD(T) level. It was found that the changes amounts 

to less than 0.0003 Å when going from cc-pCVQZ to aug-ccpCVQZ. 

III.E. Best Theoretical Estimates. On the basis of the 

previous sections, we are now able to give a best theoretical 

estimate for the equilibrium geometry of CCH. There are two 

(almost) independent procedures: one uses the MR-AQCC data 

while the other uses the CC data, respectively. At the MR- 

AQCC level, the best directly calculated geometry has been 

obtained with cc-pCVQZ basis set (r e (CC) ) 1.2096 Å and r e - 

(CH) ) 1.0613 Å). This geometry should be “improved” by 

the FCI correction obtained at the cc-pVDZ level, that is, by 

-0.0009 and 0.0005 Å as well as corrected for the remaining 

basis-set effect, that is, by -0.0009 Å and -0.0004 Å, for CC 

and CH, respectively (see above). Assuming additivity of these 

corrections, this leads to final values of 1.2078 and 1.0614 Å 

for the CC and CH bond distance, respectively. A similar 

extrapolation procedure starting from the ROHF-CCSD(T)/ccpCV5Z 

results (1.2074 and 1.0628 Å) and employing corrections 

due to full CCSDT (-0.0003 Å and -0.0001 Å) and FCI 

(0.0013 and 0.0000 Å) leads to a final estimate of 1.2084 and 

1.0627 Å for the two distances. The discrepancy of 0.001 to 

0.002 Å between the values obtained with these two extrapolation 

schemes is an indication for the accuracy of our theoretical 

results. 

It is noteworthy to mention that our best theoretical estimates 

are in excellent agreement with recent recommendations for the 

equilibrium geometry of CCH by Peterson and Dunning 7 based 

on CCSD(T) calculations. The corresponding values are 1.2076 

and 1.0619 Å. 

III.F. Analysis of Experimental Rotational Constants. 

After establishing a theoretical estimate for the equilibrium 

geometry of CCH, we now focus on the analysis of the 

experimental rotation constants using computed vibrational 

corrections. These corrections to B, that is, ∆B ) B e - B 0 , have 

been obtained at the UHF- and ROHF-CCSD(T) level using 

the cc-pVXZ and cc-pCVXZ sets with X ) T and Q. The 

calculated ∆B values are compiled in Table 4 and amount to 

about 150-590 MHz, that is, about 0.5 to 1.5% of the values 

of the corresponding rotational constants for the considered 

isotopomer and thus are non-negligible. However, large discrepancies 

are seen between the vibrational corrections computed 

with UHF and ROHF reference functions. We thus 

decided to check the reliability of the CCSD(T) force fields 

via corresponding CCSDT calculations using the cc-pVTZ basis 

set. As is seen from Table 4, the CCSDT calculations suggest 

that the UHF-CCSD(T) force fields (as the corresponding 

geometries) should be considered unreliable and that only the 

ROHF-CCSD(T) approach yields vibrational corrections in good 

agreement with the CCSDT approach. On the basis of these 

calculations, we refrain from discussing the UHF-CCSD(T) 

results any further and solely discuss the corresponding ROHF- 

CCSD(T) results in the following. 

For the least-squares fit of the geometrical parameters to the 

rotational constants, the most recent B 0 values from refs 8, 12, 

and 14, as given in the Introduction, have been used together 

with the vibrational corrections compiled in Table 4. The 

resulting empirical equilibrium geometries are summarized in 

Table 5. According to the values reported there, an “empirical” 

equilibrium geometry of r CC ) 1.207 Å and r CH ) 1.069 Å can 

be given with 0.002 Å as a conservative error estimate 51 based 

on the convergence of the results. 

A comparison of the empirical equilibrium geometry with 

our best theoretical estimates shows that the remaining discrepancies 

are in the range of 0.001 to 0.002 Å for the CC and 

0.006 to 0.008 Å for the CH distances. It appears that the 

empirical value for the CC distance is slightly shorter and the 

CH distance is longer than the corresponding theoretical values. 

While these discrepancies can possibly be traced back to 

remaining deficiencies in the theoretical treatment, another, and 

maybe more likely, possibility is that these differences point to 

so far unexplored limitations in the perturbational treatment of 

the vibrational corrections (note that there is a low-lying Π state 

which interacts with the electronic ground state through the 

bending motion). 

Nevertheless, the current study leads to a satisfactory agreement 

between theory and experiment and thus provides a 

consistent picture with respect to the equilibrium geometry. 

Concerning previous efforts to determine the geometry of 

CCH, we note that the r s (as well as the r 0 ) structures are rather 

TABLE 5: Comparison of Geometrical Parameters (in Å) for the 2 Σ + State of CCH Obtained from Theory and Experiment 

structure r CC r CH method ref 

r e 1.2064 1.0678 from exptl B 0 with ∆B(ROHF-CCSD(T)/cc-pVTZ) this work 

r e 1.2076 1.0657 from exptl B 0 with ∆B(ROHF-CCSD(T)/cc-pVQZ) this work 

r e 1.2056 1.0689 from exptl B 0 with ∆B(ROHF-CCSD(T)/cc-pCVTZ) this work 

r e 1.2075 1.0651 from exptl B 0 with ∆B(ROHF-CCSD(T)/cc-pCVQZ) this work 

r e 1.2050 1.0703 from exptl B 0 with ∆B(UHF-CCSDT(fc)/cc-pVTZ) this work 

r e 1.2078 1.0614 est from MR-AQCC this work 

r e 1.2084 1.0627 est from CCSDT this work 

r 0 1.2193 1.0457 from exptl B 0 this work 

r s 1.21652 1.04653 from exptl B 0 1 

r e 1.2076 1.0619 est from CCSD(T) 7

3034 J. Phys. Chem. A, Vol. 108, No. 15, 2004 Szalay et al. 

different (compare Table 5). Both of them deviate by about 

0.005 Å in the CC and by about 0.015 Å in the CH distance 

from the equilibrium geometries obtained in this work. Apparently, 

unlike often claimed, the substitution approach leading 

to the r s structure is not able to eliminate vibrational effects in 

the case of CCH, and thus, the r s and r 0 structure turn out to be 

very similar. Our observation supports the speculation in ref 1 

that the significantly too short CH distance is due to insufficient 

account of vibrational effects, and in particular of the lowfrequency 

bending motion, a well-known artifact of the substitution 

approach to molecular structures. 

IV. Conclusions 

Equilibrium geometrical parameters for the 2 Σ + state of the 

ethynyl radical have been obtained using two approaches. The 

first purely theoretical procedure based on extensive CC, MR- 

AQCC, and FCI calculations yields values of 1.208 Å for the 

CC distance and 1.061-1.063 Å for the CH distance, while 

the second approach based on the analysis of experimental 

rotational constants using computed vibrational corrections 

provides values of 1.207 and 1.069 Å. The observed differences 

between the two approaches of 0.001-0.002 Å for CC and 

0.006-0.008 Å for CH are somewhat larger than expected. 

Among possible causes for this discrepancy, we consider 

limitations in the perturbational treatment of the vibrational 

corrections to the rotational constants. The r s and r 0 geometries 

for CCH are, because of a missing or insufficient treatment of 

these corrections, far away from the true equilibrium geometry. 

Acknowledgment. The authors acknowledge fruitful discussions 

with Professor J. F. Stanton (University of Texas, 

Austin). This work has been supported by the Hungarian 

Scientific Research Foundation (OTKA, Grants T032980 and 

M042110), the Deutsche Forschungsgemeinschaft, the Fonds 

der Chemischen Industrie, and the Danish Centre for Supercomputing 

(DCSC). This research is part of an effort by a task 

group of the International Union of Pure and Applied Chemistry 

to determine structures, vibrational frequencies, and thermodynamic 

functions of free radicals of importance in atmospheric 

chemistry. 

References and Notes 

(1) Bogey, M.; Demuynck, C.; Destombes, J. L. Mol. Phys. 1989, 66, 

955. 

(2) Hillier, I. H.; Kendrick, J.; Guest, M. F. Mol. Phys. 1975, 30, 1133. 

(3) Shih, S.; Peyerimhoff, S. D.; Buenker, R. J. J. Mol. Spectrosc. 1977, 

64, 167. 

(4) Shih, S.; Peyerimhoff, S. D.; Buenker, R. J. J. Mol. Spectrosc. 1979, 

74, 124. 

(5) Fogarasi, G.; Boggs, J. E.; Pulay, P. Mol. Phys. 1983, 50, 139. 

(6) Kraemer, W. P.; Roos, B. O.; Bunker, P. R.; Jensen, P. J. Mol. 

Spectrosc. 1986, 120, 236. 

(7) Peterson, K. A.; Dunning, T. H. J. Chem. Phys. 1997, 106, 4119. 

(8) Müller, H.; Klaus, T.; Winnewisser, G. Astron. Astrophys. 2000, 

357, L65. 

(9) Sastry, K. V. L. N.; Helminger, P.; Charo, A.; Herbst, E.; Delucia, 

F. C. Astrophys. J. 1981, 251, L119. 

(10) Gottlieb, C. A.; Gottlieb, E. W.; Thaddeus, P. Astrophys. J. 1983, 

264, 740. 

(11) Saykally, R. J,; Veseth, L.; Evenson, K. M. J. Chem. Phys. 1984, 

80, 2247. 

(12) McCarthy, M. C.; Gottlieb, C. A.; Thaddeus, P. J. Mol. Spectrosc. 

1995, 173, 303. 

(13) Note that there is a trivial misprint in ref 1 for the former value. 

Bogey, M. Université Lille, France. Private communication, 1999. 

(14) Bogey, M.; Demuynck, C.; Destombes, J. L. Astron. Astrophys. 

1985, 144, L15. 

(15) Costain, C. C. J. Chem. Phys. 1958, 82, 5053. 

(16) See, for example: Mills, I. M. In Molecular Spectroscopy: Modern 

Research; Rao, K. N., Matthews, C. W., Eds.; Academic: New York, 1972; 

p. 115 

(17) Pulay, P.; Meyer, W.; Boggs, J. E. J. Chem. Phys. 1978, 68, 5077. 

(18) McCarthy, M. C.; Gottlieb, C. A.; Thaddeus, P.; Horn, M.; 

Botschwina, P. J. Chem. Phys. 1995, 103, 7820. 

(19) Stanton, J. F.; Lopreore, C. L.; Gauss, J. J. Chem. Phys. 1998, 108, 

7190. 

(20) Pawlowski, F.; Jørgensen, P.; Olsen, J.; Hegelund, F.; Helgaker, 

T.; Gauss, J.; Bak, K. L.; Stanton, J. F. J. Chem. Phys. 2002, 116, 6482. 

(21) Purvis, G. D.; Bartlett, R. J. J. Chem. Phys. 1982, 76, 1910. 

(22) Raghavachari, K.; Trucks, G. W.; Head-Gordon, M.; Pople, J. A. 

Chem. Phys. Lett. 1989, 157, 479. 

(23) Dunning, T. H. J. Chem. Phys. 1989, 90, 1007. 

(24) Woon, D. E.; Dunning, T. H. J. Chem. Phys. 1993, 99, 1914. 

(25) Helgaker, T.; Gauss, J.; Jørgensen, P.; Olsen, J. J. Chem. Phys. 

1997, 106, 6430. 

(26) Bak, K. L.; Gauss, J.; Jørgensen, P.; Olsen, J.; Helgaker, T.; Stanton, 

J. F. J. Chem. Phys. 2001, 114, 6548. 

(27) See, for example: Byrd, E. F. C.; Sherrill, C. D.; Head-Gordon, 

M. J. Phys. Chem. A 2001, 105, 9736. 

(28) Szalay, P. G.; Vazquez, J.; Stanton, J. F. Material to be submitted 

for publication. 

(29) Szalay, P. G.; Bartlett, R. J. Chem. Phys. Lett. 1993, 214, 481. 

(30) Szalay, P. G.; Bartlett, R. J. J. Chem. Phys. 1995, 103, 3600. 

(31) Watts, J. D.; Gauss, J.; Bartlett, R. J. J. Chem. Phys. 1993, 98, 

8718. 

(32) Noga, J.; Bartlett, R. J. J. Chem. Phys. 1987, 86, 7041. 

(33) Scuseria, G. E.; Schaefer, H. F. Chem. Phys. Lett. 1988, 152, 382. 

(34) Watts, J. D.; Bartlett, R. J. J. Chem. Phys 1990, 93, 6104. 

(35) Gauss, J.; Stanton, J. F.; Bartlett, R. J. J. Chem. Phys. 1991, 95, 

2623. 

(36) Watts, J. D.; Gauss, J.; Bartlett, R. J. Chem. Phys. Lett. 1992, 200, 

1. 

(37) Gauss, J.; Lauderdale, W. J.; Stanton, J. F.; Watts, J. D.; Bartlett, 

R. J. Chem. Phys. Lett. 1991, 182, 207. 

(38) Shepard, R.; Lischka, H.; Szalay, P. G.; Kovar, T.; Ernzerhof, M. 

J. Chem. Phys. 1992, 96, 2085. 

(39) Lischka, H.; Shepard, R.; Pitzer, R. M.; Shavitt, I.; Dallos, M.; 

Müller, T.; Szalay, P. G.; Seth, M.; Kedziora, G., Yabushitah, S.; Zhangi, 

Z. Phys. Chem. Chem. Phys. 2001, 3, 664. 

(40) Gauss, J.; Stanton, J. F. Chem. Phys. Lett. 1997, 276, 70. 

(41) Szalay, P. G.; Gauss, J.; Stanton, J. F. Theor. Chem. Acc. 1998, 

100, 5. 

(42) Stanton, J. F.; Gauss, J. Int. ReV. Phys. Chem. 2000, 19, 61. 

(43) Kállay, M.; Gauss, J. J. Chem. Phys., in press. 

(44) Stanton, J. F.; Gauss, J.; Watts, J. D.; Lauderdale, W. J.; Bartlett, 

R. J. Int. J. Quantum Chem. Symp. 1992, 26, 879. 

(45) Lischka, H.; Shepard, R.; Shavitt, I.; Brown, F. B.; Pitzer, R. M.; 

Ahlrichs, R.; Böhm, H.-J.; Chang, A. H. H.; Comeau, D. C.; Gdanitz, R.; 

Dachsel, H.; Dallos, M.; Erhard, C.; Ernzerhof, M.; Gawboy, G.; Höchtl, 

P.; Irle, S.; Kedziora, G.; Kovar, T.; Müller, T.; Parasuk, V.; Pepper, M.; 

Scharf, P.; Schiffer, H.; Schindler, M.; Schüler, M.; Stahlberg, E.; Szalay, 

P. G.; Zhao, J.-G. COLUMBUS, An ab Initio Electronic Structure Program, 

release 5.8, 2001. 

(46) Olsen, J. LUCIA, a Full CI, Restricted ActiVe Space Program; 

Aarhus University: Denmark, with contributions from H. Larsen. 

(47) Kállay, M.; Surján,P.R.J. Chem. Phys. 2000, 113, 1359. 

(48) Kállay, M.; Surján,P.R.J. Chem. Phys. 2001 115, 2945. 

(49) Kállay, M.; Gauss, J.; Szalay P. G. J. Chem. Phys. 2003, 119, 2991. 

(50) It should be mentioned here that our MR-AQCC results are in 

excellent agreement with previous MR-CI calculations by Peterson and 

Dunning. 7 Their best values at the MR-CI level (augmented by a Davidson 

correction) using a full valence active space using a pV5Z basis for carbon 

and a pVQZ basis for hydrogen of 1.2116 and 1.0643 Å are of comparable 

quality as our MR-AQCC/pV5Z(fc) values of 1.2123 and 1.0632 Å. 

(51) Note that the residuals in the least-squares fit were in all cases 

smaller than 1.5 MHz.

Get my PhD Thesis

Create successful ePaper yourself

Delete template?

Save as template?